Parse and process CSV files
✓Works with OpenClaudeYou are a data processing engineer. The user wants to parse and process CSV files efficiently with proper error handling and data transformation.
What to check first
- Verify the CSV file exists and is readable:
ls -la your_file.csv - Check if you need to handle special cases: quoted fields, escaped commas, different delimiters, or multiline values
- Determine if the CSV has headers in the first row or if you need to specify column names manually
Steps
- Import the
csvmodule (built-in for Python) or install a CSV library likepapaparsefor JavaScript - Open the CSV file using a file stream to handle large files efficiently
- Create a CSV reader object specifying the delimiter (comma by default, but could be semicolon, tab, or pipe)
- Configure dialect settings for quote characters, escape characters, and newline handling
- Read rows one at a time or load all rows into memory based on file size
- Convert each row into a dictionary with headers as keys for easy field access
- Apply data transformations: type conversion, filtering, cleaning whitespace, or calculating derived fields
- Write processed data to an output CSV or return as structured data
Code
import csv
from pathlib import Path
from typing import List, Dict, Any
class CSVParser:
def __init__(self, filepath: str, delimiter: str = ',', encoding: str = 'utf-8'):
self.filepath = Path(filepath)
self.delimiter = delimiter
self.encoding = encoding
self.data = []
self.headers = []
def parse(self) -> List[Dict[str, Any]]:
"""Parse CSV file and return list of dictionaries."""
try:
with open(self.filepath, 'r', encoding=self.encoding, newline='') as file:
reader = csv.DictReader(
file,
delimiter=self.delimiter,
quotechar='"',
skipinitialspace=True
)
self.headers = reader.fieldnames
self.data = [row for row in reader]
return self.data
except FileNotFoundError:
print(f"Error: File {self.filepath} not found")
return []
except Exception as e:
print(f"Error parsing CSV: {e}")
return []
def filter_rows(self, column: str, value: Any) -> List[Dict[str, Any]]:
"""Filter rows where column matches value."""
return [row for row in self.data if row.get(column) == value]
def transform_column(self, column: str, func) -> None:
"""Apply transformation function to a column."""
for row in self.data:
if column in row:
row[column] = func(row[column])
def write_csv(self, output_path: str, fieldnames: List[str] = None)
Note: this example was truncated in the source. See the GitHub repo for the latest full version.
Common Pitfalls
- Treating this skill as a one-shot solution — most workflows need iteration and verification
- Skipping the verification steps — you don't know it worked until you measure
- Applying this skill without understanding the underlying problem — read the related docs first
When NOT to Use This Skill
- When a simpler manual approach would take less than 10 minutes
- On critical production systems without testing in staging first
- When you don't have permission or authorization to make these changes
How to Verify It Worked
- Run the verification steps documented above
- Compare the output against your expected baseline
- Check logs for any warnings or errors — silent failures are the worst kind
Production Considerations
- Test in staging before deploying to production
- Have a rollback plan — every change should be reversible
- Monitor the affected systems for at least 24 hours after the change
Related Data & Analytics Skills
Other Claude Code skills in the same category — free to download.
Data Transformer
Transform data between formats (JSON, XML, CSV)
Analytics Setup
Set up analytics tracking (GA4, Mixpanel, PostHog)
Data Pipeline
Create data processing pipeline
Report Generator
Generate reports from data
Chart Creator
Create charts and visualizations (Chart.js, D3)
Data Exporter
Export data in multiple formats
ETL Script
Create ETL (Extract, Transform, Load) scripts
Data Validator
Validate data integrity and format
Want a Data & Analytics skill personalized to YOUR project?
This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.