Skip to main content
Version: 0.2.3

Connectors Overview

Connectors are the data ingestion layer of Sercha. They fetch content from external sources and produce a uniform stream of raw documents for the sync pipeline.

What Connectors Do

Each connector handles the specifics of its source (authentication, pagination, rate limiting) and produces a uniform RawDocument stream. The rest of the pipeline does not need to know where data originated.

Core Responsibilities

ResponsibilityDescription
AuthenticationHandle OAuth, PAT, or other auth methods for the source
Data fetchingRetrieve content via APIs, filesystem access, or other protocols
Rate limitingRespect API quotas and avoid service disruption
Cursor managementTrack sync state for incremental updates
Error handlingGracefully handle network failures and API errors
MIME detectionAssign appropriate content types to fetched documents

Connector Interface

Every connector implements a common interface that the sync orchestrator understands. The interface defines the contract between connectors and the core system.

Required Operations

OperationPurpose
FullSyncFetch all documents from the source
IncrementalSyncFetch only documents changed since the last sync
WatchSubscribe to real-time change notifications
ValidateVerify configuration and connectivity
CloseRelease resources cleanly

Not all connectors support all operations. The capability system allows connectors to declare what they support.

Capability System

Connectors declare their capabilities so the sync orchestrator can choose the optimal sync strategy.

Capability Flags

CapabilityDescriptionExample Connectors
SupportsIncrementalCan fetch only changes since a cursorGitHub, Gmail
SupportsWatchCan push real-time change eventsFilesystem
SupportsHierarchyDocuments have parent-child relationshipsGitHub (repos/files), Filesystem (directories)
SupportsBinaryCan handle binary contentAll
RequiresAuthNeeds authentication credentialsGitHub, Gmail, Notion
SupportsValidationCan verify configuration before syncAll
SupportsCursorReturnReturns a cursor for incremental syncGitHub, Filesystem
SupportsRateLimitingHas built-in rate limit handlingGitHub
SupportsPaginationHandles paginated API responses internallyGitHub, Gmail

Capability-Driven Behaviour

The sync orchestrator automatically adapts based on capabilities:

ScenarioOrchestrator Behaviour
First sync, no cursorAlways runs FullSync
Has cursor, supports incrementalRuns IncrementalSync
Supports watchCan start long-running watch mode
Does not support incrementalRuns FullSync every time

Connector Factory

The connector factory creates connector instances from source configuration. This decouples the core system from specific connector implementations.

Factory Responsibilities

ResponsibilityDescription
Builder registrationStores builder functions for each connector type
Type dispatchCreates the correct connector based on source type
Token resolutionResolves authorization IDs to token providers
Configuration parsingValidates and parses source-specific configuration

Registration Pattern

Connectors register themselves at application startup. The factory maintains a registry of connector types and their corresponding builder functions.

Connector TypeBuilder Function
filesystemCreates filesystem connector with path configuration
githubCreates GitHub connector with repository patterns and content filters

New connectors are added by registering a builder function. No changes to core services are required.

Authentication

Connectors that access external services require authentication. The token provider abstraction handles this uniformly.

Token Provider Types

ProviderUse CaseToken Refresh
NullTokenProviderLocal sources (filesystem)Not applicable
PATTokenProviderPersonal access tokensNo refresh needed
OAuthTokenProviderOAuth-based servicesAutomatic refresh before expiry

Authentication Flow

  1. Source configuration references an authorization ID
  2. Factory resolves the authorization ID to a token provider
  3. Connector receives the token provider at creation time
  4. Connector calls token provider when making authenticated requests
  5. Token provider handles refresh transparently

RawDocument Output

All connectors produce RawDocument objects with a common structure.

RawDocument Fields

FieldDescription
SourceIDIdentifies which source this document belongs to
URIUnique identifier for the document within the source
MIMETypeContent type for normaliser dispatch
ContentRaw bytes of the document
ParentURIOptional parent reference for hierarchical sources
MetadataSource-specific metadata (timestamps, authors, etc.)

URI Patterns

Each connector defines URI patterns appropriate for its source:

ConnectorURI PatternExample
FilesystemAbsolute file path/home/user/docs/readme.md
GitHubgithub://{owner}/{repo}/{type}/{path}github://acme/api/file/src/main.go

Sync Strategies

Full Sync

Full sync retrieves all documents from the source. This is used on first sync or when the connector does not support incremental sync.

Incremental Sync

Incremental sync uses a cursor to fetch only documents changed since the last sync. This significantly reduces sync time for large sources.

Watch Mode

Watch mode provides real-time updates by subscribing to change events. Only connectors with local access (like filesystem) support this.

Change Types

Incremental sync and watch mode report changes with a type indicator:

Change TypeDescription
CreatedNew document added to source
UpdatedExisting document modified
DeletedDocument removed from source

Error Handling

Connectors handle errors gracefully to avoid disrupting the sync process.

Error Categories

CategoryHandling
Transient (network timeout)Retry with backoff
Rate limitWait and retry
Authentication failureReport and stop
Document read failureSkip document, continue sync
Configuration errorReport and stop

SyncComplete Signal

When a sync completes successfully, connectors that support cursors send a SyncComplete signal containing the new cursor. The sync orchestrator persists this for the next incremental sync.

Built-in Connectors

Sercha includes the following connectors:

ConnectorSource TypeAuthentication
FilesystemLocal files and directoriesNone required
GitHubGitHub repositories (files, issues, PRs, wikis)OAuth or PAT

Adding New Connectors

New connectors can be added without modifying core services. See Extensibility for the process.

The connector interface is designed for stability. Changes to the interface are rare and follow semantic versioning.

Next