Implement vector embedding search
✓Works with OpenClaudeYou are an ML engineer implementing vector embedding search. The user wants to set up semantic search using embeddings to find relevant documents/items based on similarity rather than keyword matching.
What to check first
- Verify you have an embedding model available (OpenAI API key, local model like sentence-transformers, or Hugging Face access)
- Check your vector database setup:
pip list | grep -E "(pinecone|weaviate|milvus|faiss|chromadb)" - Confirm you have a dataset or document collection ready to embed
Steps
- Install required libraries:
pip install openai numpy scikit-learn(orsentence-transformersfor local embeddings) - Choose embedding source: OpenAI embeddings, sentence-transformers for CPU/GPU, or Hugging Face API
- Create embeddings for your document corpus — batch process to avoid rate limits or memory issues
- Store embeddings in a vector database (FAISS for local, Pinecone/Weaviate for cloud, or ChromaDB for simplicity)
- Implement query embedding function using the same model/API as corpus embeddings
- Execute similarity search using cosine distance or dot product on stored embeddings
- Rank results by similarity score and return top-k matches
- Add optional metadata filtering before or after similarity ranking
Code
import numpy as np
from openai import OpenAI
from sklearn.metrics.pairwise import cosine_similarity
import json
class EmbeddingSearch:
def __init__(self, api_key=None, embedding_model="text-embedding-3-small"):
"""Initialize with OpenAI embeddings or local model"""
self.embedding_model = embedding_model
if api_key:
self.client = OpenAI(api_key=api_key)
self.use_openai = True
else:
from sentence_transformers import SentenceTransformer
self.model = SentenceTransformer("all-MiniLM-L6-v2")
self.use_openai = False
self.documents = []
self.embeddings = None
self.metadata = []
def embed_text(self, text):
"""Generate embedding for a single text"""
if self.use_openai:
response = self.client.embeddings.create(
input=text,
model=self.embedding_model
)
return np.array(response.data[0].embedding)
else:
return self.model.encode(text, convert_to_numpy=True)
def add_documents(self, docs_with_metadata):
"""Add documents and create embeddings
Expected format: [{"text": "...", "id": "...", "category": "..."}]
"""
texts = [doc["text"] for doc in docs_with_metadata]
embeddings_list = []
for text
Note: this example was truncated in the source. See the GitHub repo for the latest full version.
Common Pitfalls
- Forgetting to handle rate limits — Anthropic returns 429 errors that need exponential backoff
- Hardcoding the model name in 50 places — use a single config so you can swap models in one place
- Not setting a timeout on API calls — a hanging request can lock your worker indefinitely
- Logging API responses with sensitive data — PII can end up in your logs without realizing
- Treating the API as deterministic — same prompt, different output. Test on multiple runs
When NOT to Use This Skill
- For deterministic tasks where regex or rule-based code would work — LLMs add cost and latency for no benefit
- When you need 100% accuracy on a known schema — use structured output APIs or fine-tuning instead
- For real-time low-latency applications under 100ms — even the fastest LLM is too slow
How to Verify It Worked
- Test with malformed inputs, empty strings, and edge cases — APIs often behave differently than docs suggest
- Verify your error handling on all 4xx and 5xx responses — most code only handles the happy path
- Run a load test with 10x your expected traffic — rate limits hit fast
- Check token usage matches your estimate — surprises here become surprises on your bill
Production Considerations
- Set a daily spend cap on your Anthropic console — prevents runaway costs from bugs or attacks
- Use prompt caching for static parts of your prompts — can cut costs by 50-90%
- Stream responses for any user-facing output — perceived latency drops by 70%
- Have a fallback model ready — if Claude is down, you should be able to swap to a backup with one config change
Related AI/ML Integration Skills
Other Claude Code skills in the same category — free to download.
OpenAI Integration
Integrate OpenAI API with best practices
Claude API Setup
Set up Claude/Anthropic API integration
RAG Pipeline
Build Retrieval-Augmented Generation pipeline
Prompt Template
Create reusable prompt templates with variables
AI Streaming
Implement streaming AI responses
LangChain Setup
Set up LangChain for AI workflows
Model Comparison
Compare responses from multiple AI models
AI Rate Limiter
Implement rate limiting for AI API calls
Want a AI/ML Integration skill personalized to YOUR project?
This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.