Free 40-page Claude guide — setup, 120 prompt codes, MCP servers, AI agents. Download free →
CLSkills
Databaseintermediate

senior-computer-vision

Share

Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM seg

Works with OpenClaude

Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.

Table of Contents

Quick Start

# Generate training configuration for YOLO or Faster R-CNN
python scripts/vision_model_trainer.py models/ --task detection --arch yolov8

# Analyze model for optimization opportunities (quantization, pruning)
python scripts/inference_optimizer.py model.pt --target onnx --benchmark

# Build dataset pipeline with augmentations
python scripts/dataset_pipeline_builder.py images/ --format coco --augment

Core Expertise

This skill provides guidance on:

  • Object Detection: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR
  • Instance Segmentation: Mask R-CNN, YOLACT, SOLOv2
  • Semantic Segmentation: DeepLabV3+, SegFormer, SAM (Segment Anything)
  • Image Classification: ResNet, EfficientNet, Vision Transformers (ViT, DeiT)
  • Video Analysis: Object tracking (ByteTrack, SORT), action recognition
  • 3D Vision: Depth estimation, point cloud processing, NeRF
  • Production Deployment: ONNX, TensorRT, OpenVINO, CoreML

Tech Stack

CategoryTechnologies
FrameworksPyTorch, torchvision, timm
DetectionUltralytics (YOLO), Detectron2, MMDetection
Segmentationsegment-anything, mmsegmentation
OptimizationONNX, TensorRT, OpenVINO, torch.compile
Image ProcessingOpenCV, Pillow, albumentations
AnnotationCVAT, Label Studio, Roboflow
Experiment TrackingMLflow, Weights & Biases
ServingTriton Inference Server, TorchServe

Workflow 1: Object Detection Pipeline

Use this workflow when building an object detection system from scratch.

Step 1: Define Detection Requirements

Analyze the detection task requirements:

Detection Requirements Analysis:
- Target objects: [list specific classes to detect]
- Real-time requirement: [yes/no, target FPS]
- Accuracy priority: [speed vs accuracy trade-off]
- Deployment target: [cloud GPU, edge device, mobile]
- Dataset size: [number of images, annotations per class]

Step 2: Select Detection Architecture

Choose architecture based on requirements:

RequirementRecommended ArchitectureWhy
Real-time (>30 FPS)YOLOv8/v11, RT-DETRSingle-stage, optimized for speed
High accuracyFaster R-CNN, DINOTwo-stage, better localization
Small objectsYOLO + SAHI, Faster R-CNN + FPNMulti-scale detection
Edge deploymentYOLOv8n, MobileNetV3-SSDLightweight architectures
Transformer-basedDETR, DINO, RT-DETREnd-to-end, no NMS required

Step 3: Prepare Dataset

Convert annotations to required format:

# COCO format (recommended)
python scripts/dataset_pipeline_builder.py data/images/ \
    --annotations data/labels/ \
    --format coco \
    --split 0.8 0.1 0.1 \
    --output data/coco/

# Verify dataset
python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"

Step 4: Configure Training

Generate training configuration:

# For Ultralytics YOLO
python scripts/vision_model_trainer.py data/coco/ \
    --task detection \
    --arch yolov8m \
    --epochs 100 \
    --batch 16 \
    --imgsz 640 \
    --output configs/

# For Detectron2
python scripts/vision_model_trainer.py data/coco/ \
    --task detection \
    --arch faster_rcnn_R_50_FPN \
    --framework detectron2 \
    --output configs/

Step 5: Train and Validate

# Ultralytics training
yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640

# Detectron2 training
python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1

# Validate on test set
yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml

Step 6: Evaluate Results

Key metrics to analyze:

MetricTargetDescription
mAP@50>0.7Mean Average Precision at IoU 0.5
mAP@50:95>0.5COCO primary metric
Precision>0.8Low false positives
Recall>0.8Low missed detections
Inference time<33msFor 30 FPS real-time

Workflow 2: Model Optimization and Deployment

Use this workflow when preparing a trained model for production deployment.

Step 1: Benchmark Baseline Performance

# Measure current model performance
python scripts/inference_optimizer.py model.pt \
    --benchmark \
    --input-size 640 640 \
    --batch-sizes 1 4 8 16 \
    --warmup 10 \
    --iterations 100

Expected output:

Baseline Performance (PyTorch FP32):
- Batch 1: 45.2ms (22.1 FPS)
- Batch 4: 89.4ms (44.7 FPS)
- Batch 8: 165.3ms (48.4 FPS)
- Memory: 2.1 GB
- Parameters: 25.9M

Step 2: Select Optimization Strategy

Deployment TargetOptimization Path
NVIDIA GPU (cloud)PyTorch → ONNX → TensorRT FP16
NVIDIA GPU (edge)PyTorch → TensorRT INT8
Intel CPUPyTorch → ONNX → OpenVINO
Apple SiliconPyTorch → CoreML
Generic CPUPyTorch → ONNX Runtime
MobilePyTorch → TFLite or ONNX Mobile

Step 3: Export to ONNX

# Export with dynamic batch size
python scripts/inference_optimizer.py model.pt \
    --export onnx \
    --input-size 640 640 \
    --dynamic-batch \
    --simplify \
    --output model.onnx

# Verify ONNX model
python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"

Step 4: Apply Quantization (Optional)

For INT8 quantization with calibration:

# Generate calibration dataset
python scripts/inference_optimizer.py model.onnx \
    --quantize int8 \
    --calibration-data data/calibration/ \
    --calibration-samples 500 \
    --output model_int8.onnx

Quantization impact analysis:

PrecisionSizeSpeedAccuracy Drop
FP32100%1x0%
FP1650%1.5-2x<0.5%
INT825%2-4x1-3%

Step 5: Convert to Target Runtime

# TensorRT (NVIDIA GPU)
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16

# OpenVINO (Intel)
mo --input_model model.onnx --output_dir openvino/

# CoreML (Apple)
python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"

Step 6: Benchmark Optimized Model

python scripts/inference_optimizer.py model.engine \
    --benchmark \
    --runtime tensorrt \
    --compare model.pt

Expected speedup:

Optimization Results:
- Original (PyTorch FP32): 45.2ms
- Optimized (TensorRT FP16): 12.8ms
- Speedup: 3.5x
- Accuracy change: -0.3% mAP

Workflow 3: Custom Dataset Preparation

Use this workflow when preparing a computer vision dataset for training.

Step 1: Audit Raw Data

# Analyze image dataset
python scripts/dataset_pipeline_builder.py data/raw/ \
    --analyze \
    --output analysis/

Analysis report includes:

Dataset Analysis:
- Total images: 5,234
- Image sizes: 640x480 to 4096x3072 (variable)
- Formats: JPEG (4,891), PNG (343)
- Corrupted: 12 files
- Duplicates: 45 pairs

Annotation Analysis:
- Format detected: Pascal VOC XML
- Total annotations: 28,456
- Classes: 5 (car, person, bicycle, dog, cat)
- Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
- Empty images: 234

Step 2: Clean and Validate

# Remove corrupted and duplicate images
python scripts/dataset_pipeline_builder.py data/raw/ \
    --clean \
    --remove-corrupted \
    --remove-duplicates \
    --output data/cleaned/

Step 3: Convert Annotation Format

# Convert VOC to COCO format
python scripts/dataset_pipeline_builder.py data/cleaned/ \
    --annotations data/annotations/ \
    --input-format voc \
    --output-format coco \
    --output data/coco/

Supported format conversions:

FromTo
Pascal VOC XMLCOCO JSON
YOLO TXTCOCO JSON
COCO JSONYOLO TXT
LabelMe JSONCOCO JSON
CVAT XMLCOCO JSON

Step 4: Apply Augmentations

# Generate augmentation config
python scripts/dataset_pipeline_builder.py data/coco/ \
    --augment \
    --aug-config configs/augmentation.yaml \
    --output data/augmented/

Recommended augmentations for detection:

# configs/augmentation.yaml
augmentations:
  geometric:
    - horizontal_flip: { p: 0.5 }
    - vertical_flip: { p: 0.1 }  # Only if orientation invariant
    - rotate: { limit: 15, p: 0.3 }
    - scale: { scale_limit: 0.2, p: 0.5 }

  color:
    - brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 }
    - hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 }
    - blur: { blur_limit: 3, p: 0.1 }

  advanced:
    - mosaic: { p: 0.5 }  # YOLO-style mosaic
    - mixup: { p: 0.1 }   # Image mixing
    - cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }

Step 5: Create Train/Val/Test Splits

python scripts/dataset_pipeline_builder.py data/augmented/ \
    --split 0.8 0.1 0.1 \
    --stratify \
    --seed 42 \
    --output data/final/

Split strategy guidelines:

Dataset SizeTrainValTest
<1,000 images70%15%15%
1,000-10,00080%10%10%
>10,00090%5%5%

Step 6: Generate Dataset Configuration

# For Ultralytics YOLO
python scripts/dataset_pipeline_builder.py data/final/ \
    --generate-config yolo \
    --output data.yaml

# For Detectron2
python scripts/dataset_pipeline_builder.py data/final/ \
    --generate-config detectron2 \
    --output detectron2_config.py

Architecture Selection Guide

Object Detection Architectures

ArchitectureSpeedAccuracyBest For
YOLOv8n1.2ms37.3 mAPEdge, mobile, real-time
YOLOv8s2.1ms44.9 mAPBalanced speed/accuracy
YOLOv8m4.2ms50.2 mAPGeneral purpose
YOLOv8l6.8ms52.9 mAPHigh accuracy
YOLOv8x10.1ms53.9 mAPMaximum accuracy
RT-DETR-L5.3ms53.0 mAPTransformer, no NMS
Faster R-CNN R5046ms40.2 mAPTwo-stage, high quality
DINO-4scale85ms49.0 mAPSOTA transformer

Segmentation Architectures

ArchitectureTypeSpeedBest For
YOLOv8-segInstance4.5msReal-time instance seg
Mask R-CNNInstance67msHigh-quality masks
SAMPromptable50msZero-shot segmentation
DeepLabV3+Semantic25msScene parsing
SegFormerSemantic15msEfficient semantic seg

CNN vs Vision Transformer Trade-offs

AspectCNN (YOLO, R-CNN)ViT (DETR, DINO)
Training data needed1K-10K images10K-100K+ images
Training timeFastSlow (needs more epochs)
Inference speedFasterSlower
Small objectsGood with FPNNeeds multi-scale
Global contextLimitedExcellent
Positional encodingImplicitExplicit

Reference Documentation

→ See references/reference-docs-and-commands.md for details

Performance Targets

MetricReal-timeHigh AccuracyEdge
FPS>30>10>15
mAP@50>0.6>0.8>0.5
Latency P99<50ms<150ms<100ms
GPU Memory<4GB<8GB<2GB
Model Size<50MB<200MB<20MB

Resources

  • Architecture Guide: references/computer_vision_architectures.md
  • Optimization Guide: references/object_detection_optimization.md
  • Deployment Guide: references/production_vision_systems.md
  • Scripts: scripts/ directory for automation tools

Quick Info

CategoryDatabase
Difficultyintermediate
Version1.0.0
Authoralirezarezvani
communityalirezarezvani

Install command:

Related Database Skills

Other Claude Code skills in the same category — free to download.

Want a Database skill personalized to YOUR project?

This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.