Build Delta Lake tables with ACID transactions, time travel, and optimization
✓Works with OpenClaudeYou are a Databricks Delta Lake specialist. The user wants to build Delta Lake tables with ACID transactions, time travel, and optimization capabilities.
What to check first
- Verify Databricks cluster is running and you have a notebook attached (
/Clustersin Databricks UI) - Confirm delta package is available: run
%python import deltain a cell - Check your data location:
dbutils.fs.ls("/user/hive/warehouse")to see existing databases
Steps
- Create a Delta Lake table using
CREATE TABLE ... USING DELTAor DataFrame.write.format("delta")withmode("overwrite") - Configure partitioning by including
PARTITIONED BY (column_name)in SQL or.partitionBy("column_name")in PySpark to optimize query performance - Write data with
df.write.format("delta").mode("append").save("/path/to/delta/table")or useINSERT INTOfor SQL operations - Enable ACID transactions by default (Delta handles this) — verify with
DESCRIBE EXTENDED table_nameshowingProvider: delta - Query table history with
DESCRIBE HISTORY table_nameto see versions, timestamps, and operations - Use time travel syntax
SELECT * FROM table_name TIMESTAMP AS OF '2024-01-15 10:30:00'orVERSION AS OF 5to access previous versions - Optimize table storage by running
OPTIMIZE table_nameto compact small files andVACUUM table_name RETAIN 168 HOURSto remove old versions - Monitor table statistics: run
ANALYZE TABLE table_name COMPUTE STATISTICSandDESCRIBE FORMATTED table_name
Code
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, TimestampType
from pyspark.sql.functions import current_timestamp
import datetime
# Create sample data
schema = StructType([
StructField("id", IntegerType(), False),
StructField("name", StringType(), True),
StructField("salary", IntegerType(), True),
StructField("department", StringType(), True),
StructField("created_at", TimestampType(), True)
])
data = [
(1, "Alice", 95000, "Engineering", datetime.datetime(2024, 1, 10)),
(2, "Bob", 88000, "Sales", datetime.datetime(2024, 1, 11)),
(3, "Carol", 105000, "Engineering", datetime.datetime(2024, 1, 12))
]
df = spark.createDataFrame(data, schema=schema)
# Write as Delta Lake table (creates ACID-compliant table)
delta_path = "/user/hive/warehouse/employees_delta"
df.write.format("delta").mode("overwrite").partitionBy("department").save(delta_path)
#
Note: this example was truncated in the source. See the GitHub repo for the latest full version.
Common Pitfalls
- Treating this skill as a one-shot solution — most workflows need iteration and verification
- Skipping the verification steps — you don't know it worked until you measure
- Applying this skill without understanding the underlying problem — read the related docs first
When NOT to Use This Skill
- When a simpler manual approach would take less than 10 minutes
- On critical production systems without testing in staging first
- When you don't have permission or authorization to make these changes
How to Verify It Worked
- Run the verification steps documented above
- Compare the output against your expected baseline
- Check logs for any warnings or errors — silent failures are the worst kind
Production Considerations
- Test in staging before deploying to production
- Have a rollback plan — every change should be reversible
- Monitor the affected systems for at least 24 hours after the change
Related Databricks Skills
Other Claude Code skills in the same category — free to download.
Databricks Notebook
Write PySpark and SQL notebooks with widgets and visualizations
Databricks ETL Pipeline
Build medallion architecture ETL pipelines (bronze/silver/gold)
Databricks Unity Catalog
Configure Unity Catalog for data governance, lineage, and access control
Databricks MLflow
Track experiments, register models, and deploy with MLflow
Databricks Auto Loader
Ingest data incrementally with Auto Loader and cloud storage
Databricks SQL Warehouse
Query and visualize data with Databricks SQL warehouses and dashboards
Databricks Workflows
Orchestrate multi-task jobs with Databricks Workflows
Want a Databricks skill personalized to YOUR project?
This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.