Skip to main content
Version: 0.2.3

Storage Layer

Sercha uses a hybrid storage architecture with three specialized stores, connected via CGO.

Local-First

Sercha is local-first—no server calls, no telemetry, no cloud dependencies. All data lives on your machine and never leaves it. This is a core architectural guarantee.

Storage Architecture

Store Responsibilities

StoreTechnologyPurposeRequired?
Metadata StoreSQLiteStructured data (sources, docs, chunks)Yes
Full-Text IndexXapianKeyword search (BM25 ranking)Yes
Vector IndexHNSWlibSemantic search (embeddings)No - only when embedding service configured
Required vs Optional
  • SQLite and Xapian are always required - they provide core functionality
  • HNSWlib is optional - only created when an embedding service is configured. Without embeddings, Sercha uses pure keyword search.

Why Multiple Stores?

Each store is optimised for its specific purpose:

ConcernSingle StoreSpecialised Stores
Keyword searchSlow scansXapian BM25
Semantic searchNot possibleHNSWlib ANN (when configured)
Metadata queriesOKSQLite optimised
Disk usageDuplicatedSpecialised per concern

Graceful Degradation

When embedding services are not configured, Sercha works with just SQLite + Xapian:

When embeddings are configured, hybrid search combines both:

SQLite: Metadata Store

Stores:

  • Source configurations
  • Document metadata (title, URI, timestamps)
  • Chunk references
  • Sync state (cursors, last sync time)

Schema Overview:

Xapian: Full-Text Index

Purpose: Fast keyword search with relevance ranking

Features:

  • BM25 ranking algorithm
  • Stemming (search "running" finds "run")
  • Boolean operators (AND, OR, NOT)
  • Phrase matching
  • Prefix search

HNSWlib: Vector Index (Optional)

Purpose: Semantic similarity search via embeddings

note

This store is only created when an embedding service is configured. Without it, Sercha uses pure keyword search via Xapian.

Features:

  • Approximate Nearest Neighbor (ANN)
  • Cosine similarity
  • Sub-linear search time
  • Memory-mapped for large indexes

CGO Integration

CGO Considerations:

AspectApproach
MemoryExplicit allocation/deallocation
ThreadingGo routines ↔ C++ thread safety
ErrorsC++ exceptions → Go errors
BuildRequires C++ toolchain

Search Flow

When embeddings are configured, hybrid search uses both stores in parallel:

Without embeddings, only the Xapian path is used (pure keyword search).

Data Locality

All stores live in one directory:

~/.sercha/
├── data/
│ ├── metadata.db # SQLite (always present)
│ ├── xapian/ # Xapian index (always present)
│ │ └── ...
│ └── vectors/ # HNSWlib index (only when embeddings configured)
│ └── ...
└── config.toml # Application configuration

Benefits:

  • Single backup location
  • Portable across machines
  • No network dependencies

Atomic Indexing

Indexing operations are atomic to prevent partial updates:

PhaseAction
BufferDocuments accumulated in memory
CommitAll stores updated together
RollbackOn failure, no partial writes

This ensures the index never contains half-synced data from a crashed operation.

Next