Set up a new autoresearch experiment interactively. Collects domain, target file, eval command, metric, direction, and evaluator.
✓Works with OpenClaudeSet up a new autoresearch experiment with all required configuration.
Usage
/ar:setup # Interactive mode
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list # Show existing experiments
/ar:setup --list-evaluators # Show available evaluators
What It Does
If arguments provided
Pass them directly to the setup script:
python {skill_path}/scripts/setup_experiment.py \
--domain {domain} --name {name} \
--target {target} --eval "{eval_cmd}" \
--metric {metric} --direction {direction} \
[--evaluator {evaluator}] [--scope {scope}]
If no arguments (interactive mode)
Collect each parameter one at a time:
- Domain — Ask: "What domain? (engineering, marketing, content, prompts, custom)"
- Name — Ask: "Experiment name? (e.g., api-speed, blog-titles)"
- Target file — Ask: "Which file to optimize?" Verify it exists.
- Eval command — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)"
- Metric — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)"
- Direction — Ask: "Is lower or higher better?"
- Evaluator (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?"
- Scope — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?"
Then run setup_experiment.py with the collected parameters.
Listing
# Show existing experiments
python {skill_path}/scripts/setup_experiment.py --list
# Show available evaluators
python {skill_path}/scripts/setup_experiment.py --list-evaluators
Built-in Evaluators
| Name | Metric | Use Case |
|---|---|---|
benchmark_speed | p50_ms (lower) | Function/API execution time |
benchmark_size | size_bytes (lower) | File, bundle, Docker image size |
test_pass_rate | pass_rate (higher) | Test suite pass percentage |
build_speed | build_seconds (lower) | Build/compile/Docker build time |
memory_usage | peak_mb (lower) | Peak memory during execution |
llm_judge_content | ctr_score (higher) | Headlines, titles, descriptions |
llm_judge_prompt | quality_score (higher) | System prompts, agent instructions |
llm_judge_copy | engagement_score (higher) | Social posts, ad copy, emails |
After Setup
Report to the user:
- Experiment path and branch name
- Whether the eval command worked and the baseline metric
- Suggest: "Run
/ar:run {domain}/{name}to start iterating, or/ar:loop {domain}/{name}for autonomous mode."
Related AI/ML Integration Skills
Other Claude Code skills in the same category — free to download.
OpenAI Integration
Integrate OpenAI API with best practices
Claude API Setup
Set up Claude/Anthropic API integration
Embedding Search
Implement vector embedding search
RAG Pipeline
Build Retrieval-Augmented Generation pipeline
Prompt Template
Create reusable prompt templates with variables
AI Streaming
Implement streaming AI responses
LangChain Setup
Set up LangChain for AI workflows
Model Comparison
Compare responses from multiple AI models
Want a AI/ML Integration skill personalized to YOUR project?
This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.