DatabricksintermediateNew

Databricks Delta Lake

Name: Databricks Delta Lake
Author: Claude Skills Hub

Build Delta Lake tables with ACID transactions, time travel, and optimization

You are a Databricks Delta Lake specialist. The user wants to build Delta Lake tables with ACID transactions, time travel, and optimization capabilities.

What to check first

Verify Databricks cluster is running and you have a notebook attached (/Clusters in Databricks UI)
Confirm delta package is available: run %python import delta in a cell
Check your data location: dbutils.fs.ls("/user/hive/warehouse") to see existing databases

Steps

Create a Delta Lake table using CREATE TABLE ... USING DELTA or DataFrame .write.format("delta") with mode("overwrite")
Configure partitioning by including PARTITIONED BY (column_name) in SQL or .partitionBy("column_name") in PySpark to optimize query performance
Write data with df.write.format("delta").mode("append").save("/path/to/delta/table") or use INSERT INTO for SQL operations
Enable ACID transactions by default (Delta handles this) — verify with DESCRIBE EXTENDED table_name showing Provider: delta
Query table history with DESCRIBE HISTORY table_name to see versions, timestamps, and operations
Use time travel syntax SELECT * FROM table_name TIMESTAMP AS OF '2024-01-15 10:30:00' or VERSION AS OF 5 to access previous versions
Optimize table storage by running OPTIMIZE table_name to compact small files and VACUUM table_name RETAIN 168 HOURS to remove old versions
Monitor table statistics: run ANALYZE TABLE table_name COMPUTE STATISTICS and DESCRIBE FORMATTED table_name

Code

from pyspark.sql.types import StructType, StructField, StringType, IntegerType, TimestampType
from pyspark.sql.functions import current_timestamp
import datetime

# Create sample data
schema = StructType([
    StructField("id", IntegerType(), False),
    StructField("name", StringType(), True),
    StructField("salary", IntegerType(), True),
    StructField("department", StringType(), True),
    StructField("created_at", TimestampType(), True)
])

data = [
    (1, "Alice", 95000, "Engineering", datetime.datetime(2024, 1, 10)),
    (2, "Bob", 88000, "Sales", datetime.datetime(2024, 1, 11)),
    (3, "Carol", 105000, "Engineering", datetime.datetime(2024, 1, 12))
]

df = spark.createDataFrame(data, schema=schema)

# Write as Delta Lake table (creates ACID-compliant table)
delta_path = "/user/hive/warehouse/employees_delta"
df.write.format("delta").mode("overwrite").partitionBy("department").save(delta_path)

#

Note: this example was truncated in the source. See the GitHub repo for the latest full version.

Common Pitfalls

Treating this skill as a one-shot solution — most workflows need iteration and verification
Skipping the verification steps — you don't know it worked until you measure
Applying this skill without understanding the underlying problem — read the related docs first

When NOT to Use This Skill

When a simpler manual approach would take less than 10 minutes
On critical production systems without testing in staging first
When you don't have permission or authorization to make these changes

How to Verify It Worked

Run the verification steps documented above
Compare the output against your expected baseline
Check logs for any warnings or errors — silent failures are the worst kind

Production Considerations

Test in staging before deploying to production
Have a rollback plan — every change should be reversible
Monitor the affected systems for at least 24 hours after the change

Quick Info

CategoryDatabricks

Difficultyintermediate

Version1.0.0

AuthorClaude Skills Hub

databricksdelta-lakestorage

Install command:

curl -o ~/.claude/skills/databricks-delta-lake.md https://clskills.in/skills/databricks/databricks-delta-lake.md

Related Databricks Skills

Other Claude Code skills in the same category — free to download.

Browse all

Databricksintermediate

Databricks Notebook

Write PySpark and SQL notebooks with widgets and visualizations

Databricksintermediate

Databricks ETL Pipeline

Build medallion architecture ETL pipelines (bronze/silver/gold)

Databricksintermediate

Databricks Unity Catalog

Configure Unity Catalog for data governance, lineage, and access control

Databricksadvanced

Databricks MLflow

Track experiments, register models, and deploy with MLflow

Databricksintermediate

Databricks Auto Loader

Ingest data incrementally with Auto Loader and cloud storage

Databricksbeginner

Databricks SQL Warehouse

Query and visualize data with Databricks SQL warehouses and dashboards

Databricksintermediate

Databricks Workflows

Orchestrate multi-task jobs with Databricks Workflows

Want a Databricks skill personalized to YOUR project?

This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.

Custom Agent — $5 →|Analyze My Stack — $3 →

Databricks Delta Lake

What to check first

Steps

Code

Common Pitfalls

When NOT to Use This Skill

How to Verify It Worked

Production Considerations

Quick Info

Related Skills

Related Databricks Skills

Databricks Notebook

Databricks ETL Pipeline

Databricks Unity Catalog

Databricks MLflow

Databricks Auto Loader

Databricks SQL Warehouse

Databricks Workflows