Back to Blog
TutorialFebruary 2, 202616 min

ChromaDB Tutorial: Vector Database for AI Apps

Learn to use ChromaDB for semantic search and RAG applications. Complete beginner-friendly tutorial.

chromadbvector databaseragembeddings

Molted Team

Molted.cloud

Vector databases are the backbone of modern AI applications. Every time you see "chat with your documents" or "semantic search," there's a vector database storing embeddings and finding similarities. ChromaDB emerged as the go-to choice for developers who want something that works locally, requires minimal setup, and scales when needed.

This tutorial walks through ChromaDB from installation to production-ready RAG (Retrieval-Augmented Generation) pipelines. We will cover the theory behind vector databases, compare ChromaDB to alternatives, and build real code that you can adapt to your projects.

Why vector databases matter for AI

Large language models have a fundamental limitation: they only know what was in their training data. Ask Claude about your company's internal documentation, and it has no idea. This is where vector databases come in.

The concept is straightforward:

  1. Convert your text into numerical vectors (embeddings) that capture semantic meaning
  2. Store these vectors in a specialized database optimized for similarity search
  3. When a user asks a question, convert it to a vector and find the most similar stored documents
  4. Feed those relevant documents to the LLM as context

This is RAG in a nutshell. The vector database makes step 3 fast and accurate, even with millions of documents.

Traditional databases use exact matching. You search for "customer support" and find documents containing those exact words. Vector databases understand that "customer support," "help desk," and "user assistance" are semantically similar. This semantic understanding is what makes AI applications actually useful.

ChromaDB vs the competition

The vector database market exploded in 2023-2024. Here is how ChromaDB compares to the main alternatives:

FeatureChromaDBPineconeWeaviateMilvus
HostingLocal or cloudCloud onlySelf-host or cloudSelf-host or cloud
Setup complexitypip installAPI keyDocker requiredDocker/K8s
Free tierUnlimited localLimited vectorsLimited localSelf-host free
ScalingGood (millions)ExcellentExcellentExcellent
Best forPrototypes to mid-scaleProduction at scaleComplex queriesEnterprise scale

Pinecone is fully managed, which means zero ops burden but also vendor lock-in and costs that scale with usage. Great for teams that want to ship fast and have budget.

Weaviate offers powerful hybrid search (vector + keyword) and GraphQL queries. More features than ChromaDB, but more complexity too.

Milvus is enterprise-grade, designed for billions of vectors. Overkill for most projects, but necessary for large-scale production systems.

ChromaDB wins on developer experience. You install it with pip, run it locally during development, and can deploy it as a server when ready. No Docker containers for local dev, no cloud accounts for testing. This simplicity is why it became the default choice for LangChain and LlamaIndex tutorials.

Installation

ChromaDB requires Python 3.8 or higher. Install it with pip:

pip install chromadb

For a clean environment:

python -m venv chroma-env
source chroma-env/bin/activate  # Linux/Mac
# or: chroma-env\Scripts\activate  # Windows
pip install chromadb

Verify the installation:

python -c "import chromadb; print(chromadb.__version__)"

You should see version 0.4.x or higher. ChromaDB also installs SQLite and HNSWLIB (the approximate nearest neighbor library) automatically.

Core concepts

Before writing code, let us understand the key concepts:

Embeddings

An embedding is a vector (array of numbers) that represents text in a way that captures semantic meaning. Similar texts produce similar vectors. The distance between vectors indicates semantic similarity.

Embedding models convert text to vectors. Popular choices:

  • OpenAI text-embedding-3-small - 1536 dimensions, costs money, excellent quality
  • sentence-transformers/all-MiniLM-L6-v2 - 384 dimensions, free, runs locally, good quality
  • Cohere embed-v3 - Competitive with OpenAI, different pricing

Collections

A collection is like a table in a traditional database. It holds documents with their embeddings and metadata. You typically create one collection per use case: one for product descriptions, another for support tickets, etc.

Documents

Documents are the text chunks you store. ChromaDB stores the original text alongside its embedding, so you can retrieve the actual content without a separate database.

Metadata

Each document can have metadata: key-value pairs for filtering. Store the source URL, creation date, category, or any other attribute you might filter on later.

Distance metrics

ChromaDB supports three distance metrics:

  • L2 (Euclidean) - Default, works well for most cases
  • IP (Inner Product) - Use when embeddings are normalized
  • Cosine - Similar to IP, commonly used for text embeddings

For text search, cosine similarity is typically best because it focuses on direction rather than magnitude.

First example: store and search

Let us build a minimal example. We will store some documents and search them semantically.

import chromadb

# Create a client (in-memory by default)
client = chromadb.Client()

# Create a collection
collection = client.create_collection(
    name="my_documents",
    metadata={"hnsw:space": "cosine"}  # Use cosine similarity
)

# Add documents
collection.add(
    documents=[
        "Python is a programming language known for its simplicity.",
        "JavaScript runs in web browsers and on servers with Node.js.",
        "Machine learning models learn patterns from data.",
        "Vector databases store embeddings for semantic search.",
        "ChromaDB is an open-source embedding database."
    ],
    ids=["doc1", "doc2", "doc3", "doc4", "doc5"]
)

# Search
results = collection.query(
    query_texts=["What is Python used for?"],
    n_results=2
)

print("Query: What is Python used for?")
print("Results:")
for doc, distance in zip(results["documents"][0], results["distances"][0]):
    print(f"  - {doc} (distance: {distance:.4f})")

Output:

Query: What is Python used for?
Results:
  - Python is a programming language known for its simplicity. (distance: 0.3214)
  - Machine learning models learn patterns from data. (distance: 0.5891)

Notice that we never specified an embedding model. ChromaDB uses a default model (all-MiniLM-L6-v2) when you do not provide one. This is great for prototyping but you will want more control in production.

Persistent storage

The in-memory client loses data when your script ends. For persistence:

import chromadb

# Persistent client stores data to disk
client = chromadb.PersistentClient(path="./chroma_db")

# Get or create collection (won't fail if it exists)
collection = client.get_or_create_collection(name="my_documents")

Data is saved to ./chroma_db and survives restarts. Use this for development and small production deployments.

Adding metadata

collection.add(
    documents=[
        "The quick brown fox jumps over the lazy dog.",
        "Pack my box with five dozen liquor jugs."
    ],
    metadatas=[
        {"source": "pangram", "year": 1885},
        {"source": "pangram", "year": 1900}
    ],
    ids=["pangram1", "pangram2"]
)

# Query with metadata filter
results = collection.query(
    query_texts=["animal sentences"],
    n_results=5,
    where={"source": "pangram"}  # Only search pangrams
)

Metadata filters use a MongoDB-like syntax. You can combine conditions:

where={
    "$and": [
        {"source": "pangram"},
        {"year": {"$gte": 1890}}
    ]
}

Want RAG without the setup?

OpenClaw can be enhanced with custom knowledge. Hosted by Molted.

Start free trial

Integrating OpenAI embeddings

OpenAI's embedding models produce higher-quality vectors than local alternatives, especially for nuanced semantic matching. Here is how to use them with ChromaDB.

First, install the OpenAI package:

pip install openai

Create a custom embedding function:

import chromadb
from chromadb.utils import embedding_functions

# Create OpenAI embedding function
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-openai-api-key",
    model_name="text-embedding-3-small"
)

# Create client and collection with custom embedding function
client = chromadb.PersistentClient(path="./chroma_openai")
collection = client.get_or_create_collection(
    name="openai_docs",
    embedding_function=openai_ef,
    metadata={"hnsw:space": "cosine"}
)

Now when you add documents or query, ChromaDB calls OpenAI's API to generate embeddings:

# Add documents (OpenAI generates embeddings automatically)
collection.add(
    documents=[
        "The quarterly report shows 15% revenue growth.",
        "Customer satisfaction scores improved across all segments.",
        "New product launch exceeded initial projections."
    ],
    metadatas=[
        {"type": "financial", "quarter": "Q3"},
        {"type": "customer", "quarter": "Q3"},
        {"type": "product", "quarter": "Q3"}
    ],
    ids=["report1", "report2", "report3"]
)

# Query
results = collection.query(
    query_texts=["How is the company performing financially?"],
    n_results=2
)

for doc in results["documents"][0]:
    print(f"- {doc}")

Cost considerations

OpenAI charges per token for embeddings. The text-embedding-3-small model costs $0.02 per million tokens. For reference:

  • 1000 documents at 500 tokens each = 500K tokens = $0.01
  • 1 million documents = roughly $10-20 for initial embedding
  • Each query also costs tokens (usually negligible)

For prototypes and small datasets, OpenAI embeddings are cheap enough to ignore. At scale, the cost adds up and local models become attractive.

Free alternative: sentence-transformers

If you want high-quality embeddings without API costs, sentence-transformers is the answer. Models run locally on your CPU or GPU.

pip install sentence-transformers

ChromaDB has built-in support:

import chromadb
from chromadb.utils import embedding_functions

# Use a sentence-transformers model
sentence_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"  # Fast, 384 dimensions
)

client = chromadb.PersistentClient(path="./chroma_local")
collection = client.get_or_create_collection(
    name="local_docs",
    embedding_function=sentence_ef
)

The first time you use a model, it downloads automatically (80-400MB depending on model). Subsequent runs load from cache.

Choosing a model

Popular sentence-transformers models:

  • all-MiniLM-L6-v2 - Best balance of speed and quality, 384 dimensions
  • all-mpnet-base-v2 - Higher quality, slower, 768 dimensions
  • paraphrase-multilingual-MiniLM-L12-v2 - Good for non-English text
  • multi-qa-MiniLM-L6-cos-v1 - Optimized for question-answering

For most use cases, all-MiniLM-L6-v2 is the right choice. It processes thousands of documents per second on a modern CPU.

GPU acceleration

If you have an NVIDIA GPU:

pip install sentence-transformers[gpu]

The model automatically uses CUDA when available. Embedding speed increases 10-50x depending on batch size and GPU.

Building a complete RAG pipeline

Let us build a real RAG system: a documentation Q&A bot. We will chunk documents, store them in ChromaDB, and use retrieved context to answer questions.

Step 1: Document chunking

Large documents need to be split into smaller chunks. This improves retrieval accuracy (relevant passages, not entire documents) and fits context windows.

def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]:
    """Split text into overlapping chunks."""
    chunks = []
    start = 0
    text_length = len(text)

    while start < text_length:
        end = start + chunk_size

        # Try to break at a sentence boundary
        if end < text_length:
            # Look for sentence endings
            for sep in [". ", "! ", "? ", "\n\n", "\n"]:
                last_sep = text.rfind(sep, start, end)
                if last_sep > start:
                    end = last_sep + len(sep)
                    break

        chunks.append(text[start:end].strip())
        start = end - overlap

    return chunks

# Example usage
document = """
ChromaDB is an open-source embedding database designed for AI applications.
It provides efficient storage and retrieval of vector embeddings alongside
their associated metadata and documents.

The database supports multiple embedding functions including OpenAI and
sentence-transformers models. This flexibility allows developers to choose
the right embedding model for their use case.

ChromaDB can run in-memory for development or persist data to disk for
production deployments. For larger scale applications, it also supports
client-server mode.
"""

chunks = chunk_text(document, chunk_size=200, overlap=30)
for i, chunk in enumerate(chunks):
    print(f"Chunk {i}: {chunk[:50]}...")

Step 2: Ingestion pipeline

import chromadb
from chromadb.utils import embedding_functions
import hashlib

def ingest_documents(documents: list[dict], collection_name: str = "docs"):
    """
    Ingest documents into ChromaDB.

    Each document dict should have:
    - content: str (the text)
    - source: str (filename, URL, etc.)
    - metadata: dict (optional additional metadata)
    """
    # Initialize ChromaDB with sentence-transformers
    client = chromadb.PersistentClient(path="./rag_db")
    embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
        model_name="all-MiniLM-L6-v2"
    )
    collection = client.get_or_create_collection(
        name=collection_name,
        embedding_function=embedding_fn,
        metadata={"hnsw:space": "cosine"}
    )

    all_chunks = []
    all_metadatas = []
    all_ids = []

    for doc in documents:
        chunks = chunk_text(doc["content"])

        for i, chunk in enumerate(chunks):
            # Generate deterministic ID from content hash
            chunk_id = hashlib.md5(f"{doc['source']}_{i}_{chunk[:50]}".encode()).hexdigest()

            all_chunks.append(chunk)
            all_metadatas.append({
                "source": doc["source"],
                "chunk_index": i,
                **doc.get("metadata", {})
            })
            all_ids.append(chunk_id)

    # Add in batches (ChromaDB handles batching internally, but explicit batches help with progress)
    batch_size = 100
    for i in range(0, len(all_chunks), batch_size):
        collection.add(
            documents=all_chunks[i:i + batch_size],
            metadatas=all_metadatas[i:i + batch_size],
            ids=all_ids[i:i + batch_size]
        )
        print(f"Ingested {min(i + batch_size, len(all_chunks))}/{len(all_chunks)} chunks")

    return collection

# Example: ingest some documents
documents = [
    {
        "content": "Your long document text here...",
        "source": "docs/getting-started.md",
        "metadata": {"category": "tutorial"}
    },
    {
        "content": "Another document...",
        "source": "docs/api-reference.md",
        "metadata": {"category": "reference"}
    }
]

# collection = ingest_documents(documents)

Step 3: Retrieval function

def retrieve_context(query: str, collection, n_results: int = 5, where: dict = None) -> list[dict]:
    """
    Retrieve relevant chunks for a query.

    Returns list of dicts with 'content', 'source', 'distance'.
    """
    results = collection.query(
        query_texts=[query],
        n_results=n_results,
        where=where
    )

    contexts = []
    for i in range(len(results["documents"][0])):
        contexts.append({
            "content": results["documents"][0][i],
            "source": results["metadatas"][0][i].get("source", "unknown"),
            "distance": results["distances"][0][i]
        })

    return contexts

# Example usage
# contexts = retrieve_context("How do I install ChromaDB?", collection)
# for ctx in contexts:
#     print(f"[{ctx['source']}] (dist: {ctx['distance']:.3f})")
#     print(ctx['content'][:100] + "...")
#     print()

Step 4: Generate answer with LLM

from openai import OpenAI

def answer_question(question: str, collection, model: str = "gpt-4o-mini") -> str:
    """
    Answer a question using RAG.
    """
    # Retrieve relevant context
    contexts = retrieve_context(question, collection, n_results=5)

    # Format context for the prompt
    context_text = "\n\n".join([
        f"[Source: {ctx['source']}]\n{ctx['content']}"
        for ctx in contexts
    ])

    # Build prompt
    system_prompt = """You are a helpful assistant that answers questions based on the provided context.
Only use information from the context to answer. If the context doesn't contain enough information,
say so clearly. Cite sources when possible."""

    user_prompt = f"""Context:
{context_text}

Question: {question}

Answer based on the context above:"""

    # Call LLM
    client = OpenAI()
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0.3
    )

    return response.choices[0].message.content

# Example
# answer = answer_question("How do I create a collection in ChromaDB?", collection)
# print(answer)

Putting it together

class RAGPipeline:
    """Complete RAG pipeline with ChromaDB."""

    def __init__(self, db_path: str = "./rag_db", collection_name: str = "docs"):
        self.client = chromadb.PersistentClient(path=db_path)
        self.embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
            model_name="all-MiniLM-L6-v2"
        )
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            embedding_function=self.embedding_fn
        )
        self.openai_client = OpenAI()

    def ingest(self, text: str, source: str, metadata: dict = None):
        """Add a document to the knowledge base."""
        chunks = chunk_text(text)
        for i, chunk in enumerate(chunks):
            chunk_id = hashlib.md5(f"{source}_{i}".encode()).hexdigest()
            self.collection.add(
                documents=[chunk],
                metadatas=[{"source": source, "chunk": i, **(metadata or {})}],
                ids=[chunk_id]
            )

    def query(self, question: str) -> str:
        """Answer a question using RAG."""
        return answer_question(question, self.collection)

    def search(self, query: str, n: int = 5) -> list[dict]:
        """Search without generating an answer."""
        return retrieve_context(query, self.collection, n)

# Usage
# rag = RAGPipeline()
# rag.ingest(open("docs/guide.md").read(), "docs/guide.md")
# answer = rag.query("How do I get started?")

AI assistant with context

OpenClaw on Molted. 24-hour free trial.

Try free for 24 hours

Performance and scaling

ChromaDB performance depends on your data size and query patterns. Here are benchmarks and optimization strategies.

Query performance

ChromaDB uses HNSW (Hierarchical Navigable Small World) for approximate nearest neighbor search. This means:

  • Queries are O(log n) complexity, not O(n)
  • Results are approximate (99%+ accuracy with default settings)
  • Memory usage scales linearly with vector count

Typical query times on a standard laptop:

  • 10,000 vectors: 1-5ms
  • 100,000 vectors: 5-15ms
  • 1,000,000 vectors: 15-50ms

Memory usage

Each vector consumes approximately:

Memory = dimensions * 4 bytes + overhead

For all-MiniLM-L6-v2 (384 dims):
- 100K vectors ≈ 150MB + index overhead
- 1M vectors ≈ 1.5GB + index overhead

For OpenAI text-embedding-3-small (1536 dims):
- 100K vectors ≈ 600MB + index overhead
- 1M vectors ≈ 6GB + index overhead

Smaller embedding dimensions mean lower memory usage and faster queries, at the cost of some semantic precision.

Tuning HNSW parameters

For large collections, tune the HNSW index:

collection = client.create_collection(
    name="optimized",
    metadata={
        "hnsw:space": "cosine",
        "hnsw:construction_ef": 200,  # Higher = better index, slower build
        "hnsw:search_ef": 100,        # Higher = better recall, slower search
        "hnsw:M": 32                  # Connections per node, affects memory/speed
    }
)

Default values work well for most cases. Only tune if you have specific latency or recall requirements.

Client-server mode

For production deployments with multiple processes or machines:

# Start the server (terminal or Docker)
# chroma run --path /db --port 8000

# Connect from Python
import chromadb
client = chromadb.HttpClient(host="localhost", port=8000)

# Use exactly like the local client
collection = client.get_or_create_collection("docs")

Server mode enables:

  • Multiple Python processes sharing one database
  • Deployment on separate infrastructure
  • Docker-based production setups

When ChromaDB is not enough

Consider alternatives when:

  • More than 10 million vectors - Milvus or Qdrant handle billion-scale better
  • Sub-millisecond latency required - Dedicated vector search services
  • Complex filtering at scale - Weaviate has more advanced hybrid search
  • Managed service preferred - Pinecone removes operational burden

For most AI applications (chatbots, documentation search, semantic matching), ChromaDB handles the load comfortably. The teams running into limits are processing millions of users or massive document corpora.

Conclusion

ChromaDB hits a sweet spot: simple enough for a weekend project, capable enough for production. The Python-native experience eliminates the infrastructure friction that plagues other vector databases.

Key takeaways:

  • Start with the default embedding model, switch to OpenAI or custom models when needed
  • Use persistent storage from the start, even in development
  • Chunk documents intelligently, around 200-500 characters with overlap
  • Add metadata for filtering, reduces search space and improves relevance
  • Monitor memory usage as your collection grows

The RAG pattern transforms LLMs from general-purpose text generators into domain-specific assistants. Your company docs, product catalog, support history, all become instantly searchable and queryable through natural language. ChromaDB makes this accessible to any Python developer.

That said, building production RAG systems involves more than just the vector database. You need chunking strategies, embedding model selection, prompt engineering, response evaluation, and ongoing maintenance. It is a full engineering project, not a one-day integration.

Related guides

Free 24-hour trial

Skip the infra

If you want an AI assistant without managing vector DBs, try OpenClaw on Molted.

Start free trial

24-hour free trial · No credit card required · Cancel anytime

Ready to try OpenClaw?

Deploy your AI personal assistant in 60 seconds. No coding required.

Start free trial

24-hour free trial · No credit card required