$120 tested Claude codes · real before/after data · Full tier $15 one-timebuy --sheet=15 →
$Free 40-page Claude guide — setup, 120 prompt codes, MCP servers, AI agents. download --free →
clskills.sh — terminal v2.4 — 2,347 skills indexed● online
[CL]Skills_
Data & Analyticsadvanced

ETL Script

Share

Create ETL (Extract, Transform, Load) scripts

Works with OpenClaude

You are a data engineer specializing in ETL pipeline design. The user wants to create a production-ready ETL script that extracts data from a source, transforms it according to business rules, and loads it into a target system.

What to check first

  • Verify source system credentials and connectivity: curl -X GET https://api.source.com/health or test database connection with psql -h localhost -U user -d database -c "SELECT 1"
  • Confirm target database exists and user has INSERT/UPDATE permissions: SHOW GRANTS FOR 'etl_user'@'localhost'; (MySQL) or \dp (PostgreSQL)
  • Check available disk space for staging area: df -h /staging — ETL jobs can be I/O intensive
  • Validate required Python packages: pip list | grep -E "pandas|sqlalchemy|requests"

Steps

  1. Define source connector using appropriate library (requests for APIs, sqlalchemy for databases, boto3 for S3) with retry logic and pagination support
  2. Implement extraction function that batches records to avoid memory overflow — use chunk_size=10000 for large datasets
  3. Build transformation pipeline using pandas DataFrame operations, apply business logic functions, and validate data quality rules
  4. Create data validation layer to check for nulls, duplicates, type mismatches using assertions or pandera schema validation
  5. Implement error handling with detailed logging at extraction, transform, and load stages — use Python's logging module with file handlers
  6. Build load function with upsert capability (INSERT ... ON DUPLICATE KEY UPDATE or MERGE) to handle incremental loads safely
  7. Add transaction rollback mechanism — wrap load operations in try/except with explicit rollback on constraint violations
  8. Schedule execution using cron jobs or Airflow DAGs with failure notifications and idempotency checks

Code

import pandas as pd
import logging
from sqlalchemy import create_engine, text
from sqlalchemy.exc import IntegrityError
import requests
from datetime import datetime
from typing import List, Dict, Any

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('/var/log/etl_pipeline.log'),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

class ETLPipeline:
    def __init__(self, source_url: str, db_connection: str, batch_size: int = 10000):
        self.source_url = source_url
        self.engine = create_engine(db_connection)
        self.batch_size = batch_size
        self.records_processed = 0
        self.records_failed = 0

    def extract(self) -> List[Dict[str, Any]]:
        """Extract data from source API with pagination and retry logic."""
        try:
            logger.info(f"Starting extraction from {self.source_url

Note: this example was truncated in the source. See the GitHub repo for the latest full version.

Common Pitfalls

  • Treating this skill as a one-shot solution — most workflows need iteration and verification
  • Skipping the verification steps — you don't know it worked until you measure
  • Applying this skill without understanding the underlying problem — read the related docs first

When NOT to Use This Skill

  • When a simpler manual approach would take less than 10 minutes
  • On critical production systems without testing in staging first
  • When you don't have permission or authorization to make these changes

How to Verify It Worked

  • Run the verification steps documented above
  • Compare the output against your expected baseline
  • Check logs for any warnings or errors — silent failures are the worst kind

Production Considerations

  • Test in staging before deploying to production
  • Have a rollback plan — every change should be reversible
  • Monitor the affected systems for at least 24 hours after the change

Quick Info

Difficultyadvanced
Version1.0.0
AuthorClaude Skills Hub
dataetlprocessing

Install command:

curl -o ~/.claude/skills/etl-script.md https://claude-skills-hub.vercel.app/skills/data/etl-script.md

Related Data & Analytics Skills

Other Claude Code skills in the same category — free to download.

Want a Data & Analytics skill personalized to YOUR project?

This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.