$120 tested Claude codes · real before/after data · Full tier $15 one-timebuy --sheet=15 →
$Free 40-page Claude guide — setup, 120 prompt codes, MCP servers, AI agents. download --free →
clskills.sh — terminal v2.4 — 2,347 skills indexed● online
[CL]Skills_
DatabricksintermediateNew

Databricks Delta Lake

Share

Build Delta Lake tables with ACID transactions, time travel, and optimization

Works with OpenClaude

You are a Databricks Delta Lake specialist. The user wants to build Delta Lake tables with ACID transactions, time travel, and optimization capabilities.

What to check first

  • Verify Databricks cluster is running and you have a notebook attached (/Clusters in Databricks UI)
  • Confirm delta package is available: run %python import delta in a cell
  • Check your data location: dbutils.fs.ls("/user/hive/warehouse") to see existing databases

Steps

  1. Create a Delta Lake table using CREATE TABLE ... USING DELTA or DataFrame .write.format("delta") with mode("overwrite")
  2. Configure partitioning by including PARTITIONED BY (column_name) in SQL or .partitionBy("column_name") in PySpark to optimize query performance
  3. Write data with df.write.format("delta").mode("append").save("/path/to/delta/table") or use INSERT INTO for SQL operations
  4. Enable ACID transactions by default (Delta handles this) — verify with DESCRIBE EXTENDED table_name showing Provider: delta
  5. Query table history with DESCRIBE HISTORY table_name to see versions, timestamps, and operations
  6. Use time travel syntax SELECT * FROM table_name TIMESTAMP AS OF '2024-01-15 10:30:00' or VERSION AS OF 5 to access previous versions
  7. Optimize table storage by running OPTIMIZE table_name to compact small files and VACUUM table_name RETAIN 168 HOURS to remove old versions
  8. Monitor table statistics: run ANALYZE TABLE table_name COMPUTE STATISTICS and DESCRIBE FORMATTED table_name

Code

from pyspark.sql.types import StructType, StructField, StringType, IntegerType, TimestampType
from pyspark.sql.functions import current_timestamp
import datetime

# Create sample data
schema = StructType([
    StructField("id", IntegerType(), False),
    StructField("name", StringType(), True),
    StructField("salary", IntegerType(), True),
    StructField("department", StringType(), True),
    StructField("created_at", TimestampType(), True)
])

data = [
    (1, "Alice", 95000, "Engineering", datetime.datetime(2024, 1, 10)),
    (2, "Bob", 88000, "Sales", datetime.datetime(2024, 1, 11)),
    (3, "Carol", 105000, "Engineering", datetime.datetime(2024, 1, 12))
]

df = spark.createDataFrame(data, schema=schema)

# Write as Delta Lake table (creates ACID-compliant table)
delta_path = "/user/hive/warehouse/employees_delta"
df.write.format("delta").mode("overwrite").partitionBy("department").save(delta_path)

#

Note: this example was truncated in the source. See the GitHub repo for the latest full version.

Common Pitfalls

  • Treating this skill as a one-shot solution — most workflows need iteration and verification
  • Skipping the verification steps — you don't know it worked until you measure
  • Applying this skill without understanding the underlying problem — read the related docs first

When NOT to Use This Skill

  • When a simpler manual approach would take less than 10 minutes
  • On critical production systems without testing in staging first
  • When you don't have permission or authorization to make these changes

How to Verify It Worked

  • Run the verification steps documented above
  • Compare the output against your expected baseline
  • Check logs for any warnings or errors — silent failures are the worst kind

Production Considerations

  • Test in staging before deploying to production
  • Have a rollback plan — every change should be reversible
  • Monitor the affected systems for at least 24 hours after the change

Quick Info

CategoryDatabricks
Difficultyintermediate
Version1.0.0
AuthorClaude Skills Hub
databricksdelta-lakestorage

Install command:

curl -o ~/.claude/skills/databricks-delta-lake.md https://clskills.in/skills/databricks/databricks-delta-lake.md

Related Databricks Skills

Other Claude Code skills in the same category — free to download.

Want a Databricks skill personalized to YOUR project?

This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.