Connector Specifications

Choose the specification that matches your connector type. Each specification provides detailed guidelines and requirements for building robust, production-ready connectors.

API Connector Specification

This specification defines the requirements for implementing a robust, production‑ready API connector. The connector must be language‑agnostic. Any illustrative snippets must be treated as pseudocode, not tied to a specific language or framework.

Scope and Principles

Language‑agnostic: The spec describes behaviors, contracts, and data shapes, not language constructs.
Separation of concerns: Request execution, authentication, retries, rate limits, and pagination are composable, swappable modules.
Deterministic, observable, testable: Deterministic defaults, structured logs/metrics/traces, and clear test surfaces.
Secure by default: Credentials are redacted, transport is encrypted where applicable, and inputs/outputs are validated.
Resilient: Backoff with jitter, circuit breaking, idempotency, and graceful degradation built in.
Extensible: Hooks/middleware enable customization without forking core.

Core Modules and Methods

Every API connector must implement the following core functionality and structure:

Resource Abstraction

Organize code by API resources rather than ETL stages.
Canonical layout for resources:
- Single-file per resource under src/resources/{resource}.ts.
- Barrel export at src/resources/index.ts that re-exports per-resource factories.
- Each resource module must expose:
  - createResource(send) factory that binds a base path (e.g., /{resource}) and returns a CRUD surface
  - A minimal, consistent operation: getAll(params) that returns an async generator of arrays (pages)
  - Optional operations based on upstream capability: getById(id) (or get), and mutation methods when applicable
  - A typed Model describing the item shape for the resource
Cross-cutting helpers should live under src/lib:
- paginate iterator supporting cursor pagination (and extensible for other strategies)
- make-resource (or equivalent) to build the CRUD surface with pagination

Initialization and Lifecycle

initialize(configuration)
Sets up the connector with the provided configuration. Should validate the configuration and prepare any internal state.
connect()
Establishes connection to the API service. May include authentication, session creation, or connection pooling.
disconnect()
Gracefully closes the connection and cleans up resources. Should complete any pending requests before disconnecting.
isConnected()
Returns true if the connector is currently connected and ready to make requests, false otherwise.

Request Methods

request(options)
Core method for making HTTP requests. All other HTTP methods should internally use this method.
Options should include: method, path, headers, query parameters, body, timeout, and any method-specific settings.
get(path, options)
Performs an HTTP GET request to the specified path.
Optional sugar methods (post, put, patch, delete) may be provided for ergonomics.

Advanced Operations

batch(requests)
Executes multiple requests in a single operation where supported by the API. Should handle partial failures gracefully.
paginate(options)
Returns an iterator that automatically handles pagination, fetching subsequent pages as needed. Should support different pagination strategies. Resource modules should consume this via the shared lib.

Optional Operations (if applicable)

stream(options)
Reads streaming responses (e.g., chunked, SSE) with backpressure and cancellation.

Configuration Structure

The connector configuration should support the following settings:

Base Configuration

baseUrl - The base URL for all API requests
timeout - Request timeout in milliseconds (default: 30000)
userAgent - Identifier for outbound requests (include app version/commit when available)
proxy - Optional proxy configuration (host, port, protocol, credentials)
tls - TLS options (verify, min version, CA bundle, mTLS certificates) where applicable
pooling - Connection pooling/keep‑alive settings

Authentication Configuration

Support for multiple authentication types:

type - One of: api_key, bearer, basic, oauth2, or custom
credentials - Authentication credentials specific to the chosen type

Retry Configuration

maxAttempts - Maximum number of retry attempts (default: 3)
initialDelay - Initial retry delay in milliseconds (default: 1000)
maxDelay - Maximum retry delay in milliseconds (default: 30000)
backoffMultiplier - Multiplier for exponential backoff (default: 2)
retryableStatusCodes - HTTP status codes that trigger retries (default: [429, 500, 502, 503, 504])
retryableErrors - Error types/codes that should trigger retries
retryBudgetMs - Hard cap on total time spent retrying a single logical operation
respectRetryAfter - Whether to honor server Retry‑After hints (default: true)
idempotency - Enable idempotency key strategy for unsafe methods (default: enabled)

Rate Limiting Configuration

requestsPerSecond - Maximum requests per second
requestsPerMinute - Maximum requests per minute
requestsPerHour - Maximum requests per hour
concurrentRequests - Maximum concurrent requests (default: 10)
burstCapacity - Allowed burst above steady rate (token bucket)
adaptiveFromHeaders - Update limits from response headers when available (default: true)

Default Settings

defaultHeaders - Headers to include with every request
defaultQueryParams - Query parameters to include with every request

Hooks Configuration

Arrays of hooks to execute at different stages:

beforeRequest - Executed before sending a request
afterResponse - Executed after receiving a response
onError - Executed when an error occurs
onRetry - Executed before retrying a request

Canonical Hook Event Semantics

Hooks must accept a discriminated union context with type in { beforeRequest, afterResponse, onError, onRetry }.
beforeRequest emits http_request; afterResponse emits http_response.
Optional fields controlled by logging options:
- includeQueryParams → query present on both events when parseable
- includeHeaders → headers present on request and response events
- includeBody → response body present on http_response; request body present when relevant

Retry Mechanism

The connector must implement a robust retry strategy with the following requirements:

Retry Strategy Methods

shouldRetry(error, attemptNumber)
Determines whether a request should be retried based on the error and current attempt count.
calculateDelay(attemptNumber)
Calculates the delay before the next retry attempt.
onRetry(error, attemptNumber)
Hook called before each retry attempt for logging or state updates.

Implementation Requirements

Exponential Backoff
Calculate delay as: minimum(initialDelay × (backoffMultiplier ^ attemptNumber), maxDelay)
Jitter
Add randomization to prevent thundering herd: actualDelay = delay × (0.5 + random(0 to 0.5))
Respect Server Hints
Honor "Retry-After" headers when present
Circuit Breaker
Implement circuit breaker pattern to prevent cascading failures
Retry Budget
Abort retries once the per‑operation retry budget is exhausted, even if maxAttempts not reached.

Hook System

Hooks provide extension points for customizing connector behavior without modifying core logic:

Hook Structure

name - Unique identifier for the hook
priority - Execution order (lower numbers execute first)
execute(context) - The hook's main function

Hook Context

Each hook receives a context object containing:

type - The hook type: beforeRequest, afterResponse, onError, or onRetry
request - The request options (when applicable)
response - The response object (when applicable)
error - The error object (when applicable)
metadata - Additional context data

Context Methods

modifyRequest(updates) - Modify the outgoing request
modifyResponse(updates) - Modify the incoming response
abort(reason) - Cancel the request with a reason

Middleware Pipeline (conceptual)

Hooks/middleware execute in a well‑defined order around the core request execution:

PSEUDOCODE pipeline:
1. Build request (defaults → per‑call options → auth → user hooks)
2. Rate limiter: waitForSlot()
3. beforeRequest hooks (ordered by priority)
4. Execute (with timeout + cancellation token)
5. afterResponse hooks (transform/validate)
6. onError hooks (map/enrich), possibly shouldRetry → backoff
7. Metrics/logging at each stage

Common Hook Use Cases

Adding authentication headers
Request/response logging
Metrics collection
Request signing
Response transformation
Error enrichment

Type and Data Model Management

Response Structure

All responses should be wrapped in a consistent structure containing:

data - The actual response payload
status - HTTP status code
headers - Response headers as key-value pairs
meta - Optional metadata including:
- timestamp - When the response was received
- duration - Request duration in milliseconds
- retryCount - Number of retry attempts made
- rateLimit - Current rate limit status
- requestId - Correlation identifier echoed by server or generated by client

Data Transformation

The connector should provide methods for data transformation:

deserialize(data, schema)
Transform API response data into internal application models
serialize(data, schema)
Transform internal models into API-compatible format
validate(data, schema)
Validate data against a schema definition

Schema Definition

Schemas should support:

type - Data type: object, array, string, number, or boolean
properties - For objects, defines nested properties
items - For arrays, defines the schema of array elements
required - List of required property names
format - Specific format constraints (e.g., date-time, email, uri)
transform - Custom transformation function

Type Safety and API Contracts

Strong typing is required wherever the implementation language supports it (e.g., TypeScript). Public APIs must not be untyped or use any.
Prefer named types, generics with constraints, discriminated unions, and exact object shapes over loose records.

OpenAPI Integration

Use a single generator: hey-api for TypeScript.
Canonical input path: schemas/raw/files/openapi.json.
Canonical output directory: src/generated.
Resource files import generated types directly to avoid drift.

Analytical Mode

Provide analytics‑friendly, single‑level objects while preserving arrays.

Deterministic pipeline
- Raw types: Generated by hey-api into src/generated.
- Flat types: Generated by a codegen step into src/generated.
- Config: schemas/flatten.config.json controls delimiter, depth, per‑field aliases, and skips.
Naming
- Keep Raw model names unchanged (e.g., Foo).
- Emit sibling Flat models with Flat suffix (e.g., FooFlat).
- Emit mappers: mapFooToFooFlat(raw: Foo): FooFlat in src/generated.
Flattening rules
- Flatten nested object properties to first level using a delimiter (default: _).
- Arrays remain arrays.
  - Arrays of primitives stay the same.
  - Arrays of objects become arrays of flattened element objects (shape flattened, array preserved).
- Top‑level primitives remain unchanged.
- Field name collisions must be resolved deterministically via config (alias/skip). Auto‑suffixing is allowed only as a fallback.
- Depth is unlimited by default; may be bounded via config.
Hook integration
- Use an afterResponse hook to transform returned data from Raw → Flat
- Behavior: if response.data is an array of objects, map each element; if a single object, map once; otherwise pass through.
- If no mapper is registered for the operation, pass through Raw unchanged.
- Similar approach can be used to handle other analytical needs (i.e. nullability)
Resource API
- Public resource methods should return Flat types.
- Optionally expose Raw variants (e.g., getAllRaw) when needed for low‑level use.

Error Handling

Error Structure

All connector errors should include:

message - Human-readable error description
code - Machine-readable error code
statusCode - HTTP status code (if applicable)
details - Additional error context or data
retryable - Boolean indicating if the request can be retried
requestId - Correlation identifier if available
source - Subsystem where the error occurred (transport, auth, rateLimit, deserialize, userHook, unknown)

Standard Error Codes

Connectors should use these standardized error codes:

NETWORK_ERROR - Network connectivity issues
TIMEOUT - Request exceeded timeout limit
AUTH_FAILED - Authentication or authorization failure
RATE_LIMIT - Rate limit exceeded
INVALID_REQUEST - Malformed or invalid request
SERVER_ERROR - Server-side error (5xx status codes)
PARSING_ERROR - Failed to parse response
VALIDATION_ERROR - Data validation failed
CANCELLED - Request was cancelled by caller
UNSUPPORTED - Operation not supported by target API

Error Handling Best Practices

Preserve original error information for debugging
Provide actionable error messages
Include request context in error details
Differentiate between retryable and non-retryable errors
Log errors with appropriate severity levels

PSEUDOCODE error enrichment:
IF transport error THEN code = NETWORK_ERROR, retryable = true
ELSE IF status in [408, 425, 429, 5xx] THEN retryable = true
ELSE retryable = false
Attach requestId, endpoint, method, attemptNumber, duration

Pagination Support

Pagination Configuration

The paginate method should accept options including:

pageSize - Number of items per page
startCursor - Initial cursor for cursor-based pagination
startPage - Initial page number for page-based pagination
strategy - Pagination type: cursor, offset, page, or link-header
params - Strategy‑specific parameter names (e.g., pageParam, perPageParam, cursorParam, offsetParam, limitParam)

Custom Extraction Functions

Allow customization of pagination logic through:

extractNextCursor(response) - Extract the next page cursor from response
extractItems(response) - Extract items array from response
hasNextPage(response) - Determine if more pages exist

Pagination Implementation

The paginate method should:

Return an iterator for memory-efficient processing
Automatically fetch subsequent pages as needed
Handle different pagination strategies transparently
Yield arrays of items for each page
Stop when no more pages are available

PSEUDOCODE for paginate method:
1. Initialize cursor/page from options
2. Set hasMore = true
3. WHILE hasMore:
   a. Make request with current cursor/page
   b. Extract items from response
   c. Yield items to caller
   d. Extract next cursor/page
   e. Check if more pages exist
   f. Update hasMore flag
4. End iteration when no more pages

Pragmatic Defaults and Starter Pattern

Many vendor APIs either return full lists or have inconsistent pagination. A productive default is:

Provide getAll(params) per resource that:
- Performs a single GET and yields client‑side chunks using pageSize, with maxItems to cap total items
- Supports buildListQuery(params) to map typed filters to query
When real pagination is required, add a paginate helper and implement getAll on top to keep a consistent surface.

Observability should be on by default at sensible levels. Logging options should include:

includeQueryParams – include parsed query params in request/response URL logs
includeHeaders – include request and response headers
includeBody – include response body (and request body when relevant)

Always redact secrets.

Concurrency, Cancellation, and Timeouts

Cancellation token: All operations accept a caller‑provided token to cancel in‑flight work.
Per‑call timeout: Enforced at the transport layer; must trigger cancellation and error with TIMEOUT.
Global shutdown: The connector supports graceful shutdown, draining in‑flight requests.
Max concurrency: Enforced independent of rate limits; bounded work queue to avoid unbounded memory growth.

PSEUDOCODE request with cancellation and timeout:
1. IF !canProceed() THEN waitForSlot()
2. START timer(timeout)
3. TRY execute
4. IF cancelled OR timer expired → abort transport → raise TIMEOUT/CANCELLED
5. ALWAYS release slot

Streaming and Large Payloads

Support reading streaming responses (SSE/chunked) with backpressure.
Support large uploads/downloads with chunking, multi‑part, or resumable mechanisms when available.
Apply checksum/ETag validation when provided by the server.
Surface progress events via hooks or callbacks where relevant.

PSEUDOCODE streaming read:
open stream
FOR EACH chunk IN stream:
  emit chunk to caller
ON error → map to NETWORK_ERROR (retryable if partial/transient)

Rate Limiting

Rate Limiter Methods

The rate limiter should implement:

canProceed()
Returns true if a request can be made immediately without exceeding rate limits
waitForSlot()
Blocks/waits until a request slot becomes available
updateFromResponse(headers)
Updates rate limit state based on response headers (e.g., X-RateLimit-Remaining)
getStatus()
Returns current rate limit status information

Rate Limit Status

Status information should include:

limit - Maximum requests allowed in the window
remaining - Requests remaining in current window
reset - Timestamp when the limit resets
retryAfter - Seconds to wait before retrying (if provided)

Implementation Strategies

Token Bucket - Smooth rate limiting with burst capacity
Sliding Window - Precise rate limiting over time windows
Fixed Window - Simple reset at specific intervals
Adaptive - Adjust based on server feedback

PSEUDOCODE adaptive update:
IF headers contain rate-limit info THEN update limiter state
IF Retry-After present THEN sleep per hint

Authentication Strategies

Authentication Methods

Each authentication strategy should implement:

authenticate(request)
Apply authentication credentials to the outgoing request
refresh()
Refresh expired credentials (optional, for token-based auth)
isValid()
Check if current authentication credentials are still valid

Required Authentication Types

API Key
Support for API keys in headers, query parameters, or custom locations
Bearer Token
JWT or opaque tokens with optional refresh mechanism
Basic Authentication
Username and password encoded in Authorization header
OAuth 2.0
Full OAuth flow with token refresh support
Custom Authentication
Signature-based auth, HMAC, or other custom schemes

Authentication Best Practices

Store credentials securely (never in plain text)
Implement automatic token refresh before expiration
Handle authentication failures gracefully
Support multiple authentication methods per connector
Allow authentication method switching at runtime

PSEUDOCODE auth application:
credentials = load from secure store
IF credentials expiring → refresh()
add auth to request (header/query/signature)

Idempotency

For unsafe methods (e.g., POST), support idempotency keys when the API allows, to safely retry.
Generate a stable key per logical operation; store it in a header or agreed field.
Avoid silent replays when idempotency is not supported (surface clear warnings).

PSEUDOCODE idempotency key:
key = hash(operationName + stableInputs)
set header "Idempotency-Key" = key

Webhooks and Async Jobs (if applicable)

Verify webhook signatures and timestamps; reject stale or invalid deliveries.
Support async job polling patterns (create → poll status → fetch result), with backoff.
De‑duplicate webhook events using delivery IDs or replay IDs.

PSEUDOCODE async job:
jobId = POST /jobs
REPEAT until done:
  status = GET /jobs/{jobId}
  IF status == done → break
  sleep(backoff)
result = GET /jobs/{jobId}/result

Best Practices

Connection Pooling: Reuse connections when possible
Request Deduplication: Prevent duplicate requests for the same resource
Caching: Implement cache headers respect (ETag, Last-Modified)
Compression: Support gzip/deflate compression
Logging: Structured logging with request IDs for tracing
Metrics: Track request count, latency, error rates
Graceful Shutdown: Complete in-flight requests before disconnecting
Resource Cleanup: Properly clean up timers, connections, and listeners

Observability

Logging: Structured logs with correlation requestId, redaction of secrets, and consistent fields.
Metrics: Counters (requests, errors, retries), distributions (latency, payload sizes), gauges (in‑flight, rate limits).
Tracing: Span per request with attributes for method, path, status, retryCount, rateLimit.

Security and Compliance

Redact secrets in logs, metrics, and errors.
Validate inputs and outputs; reject malformed data early.
Use TLS by default; support custom CA bundles and optional mTLS where required.
Clock‑skew aware signature validation when needed.
Respect data residency and minimization; avoid storing payloads unless explicitly enabled.

Versioning and Compatibility

Use the upstream/source version identifiers for organizing connector variants (e.g., v4, dates, API versions). SemVer is not required for registry entries.
Backward‑compatible changes preferable; document breaking changes clearly.
Feature flags or capability negotiation for optional features (e.g., streaming, webhooks).

Testing Requirements

Connectors must include:

Unit tests for all public methods
Integration tests with mock servers
Retry logic testing with various failure scenarios
Rate limit testing
Authentication flow testing
Error handling and recovery testing
Performance benchmarks

Conformance Checklist

Implements lifecycle: initialize, connect, disconnect, isConnected
Provides request primitives, optional stream/upload/download when applicable
Config supports baseUrl, timeouts, proxy/tls, auth, retry, rate limit, defaults, hooks
Retry with backoff + jitter, honors Retry‑After, has circuit breaker and retry budget
Hook pipeline before/after/error/retry; deterministic order and cancellation
Response wrapper with data/status/headers/meta including requestId and rateLimit
Structured errors with code/status/retryable/details and correlation id
Pagination supports cursor/offset/page/link‑header with pluggable extractors
Concurrency limits, cancellation, graceful shutdown
Observability: logs/metrics/traces with redaction
Security controls for credentials, TLS, validation, and redaction

Common Requirements

Analytical Connectors Common Specification

Purpose-built integrations that extract data from a source system and allow you to programatically interact with it for further processing. They prioritize correctness, incremental delivery, and schema stability.

Data Model

Leverage analytical data modeling best practices
- Extract raw data models with types and relationships
- Extract events with a timestamp and a primary key
Don't over process the data, just extract

Sync Semantics

Support both initial full sync and ongoing incremental syncs
Use a deterministic cursor (e.g., updated_at, event_timestamp, or CDC offset); CDC is preferred when available
Chunk and paginate reads; stream writes to avoid unbounded memory

Schema Evolution

Versions should always have a deterministic schema that doesn't change. If you change the schema, you should create a new version.
Use stable, documented naming conventions (snake_case; UTC timestamps)
Emit clear migration notes when columns are added or semantics change

Data Quality and Deletes

Deduplicate data data when the source doesn't guarantee uniqueness
Validate basic types and required fields; surface warnings and errors to the user

Performance and Limits

Respect source rate limits; use concurrency controls and adaptive backoff with jitter
Use incremental checkpoints after each page/batch so jobs can resume safely
Prefer server-side filtering and projection to minimize transfer size

Structure and Modules

Prefer resource-oriented modules over ETL-phase folders.
- Organize code by API resource (e.g., /contacts, /companies), not extract/transform/load.
- Each resource exposes thin CRUD-like operations and streaming helpers: list, get, streamAll, getAll.
- Define a clear model per resource to capture fields and semantics.
- Share cross-cutting utilities (pagination, request helpers) in a common lib module.

Observability

Logs: structured, with job/run IDs and page/batch numbers; never log secrets
Metrics: rows_read, rows_written, lag_seconds, duplicate_rows, retries, and duration_seconds
Optional tracing spans around request execution, pagination, and resource processing

Security

TLS by default; least-privilege access to sources and targets
PII handling: configurable field redaction/masking; scrub sensitive data from logs and metrics

Documentation

List covered entities and their cursors, limitations/quotas, and expected sync cadences
Provide example schemas, sample queries, and recovery steps for common failures
Documentation should have at least:
- A getting started page (/getting-started)
- A configuration page (/configuration)
- A schema overview page (/schema-overview)
- A Limits page (/limits | if no limits are known, clearly state that)
- A changelog page (/changelog)
- An FAQ page (/faq)

Developer Experience and Local Testing

Provide convenient local scripts/CLI to exercise core operations without additional setup beyond environment variables.
- Include an .env.example listing all required variables; do not hard-code secrets.
- Offer npm scripts or pythons for: auth check, list, get, streamAll/getAll, initial and incremental sync, and (if applicable) webhook signature verification with sample payloads.
- Support JSON output for easy piping into tools (jq) and deterministic exit codes (0 success, non-zero on failure).
- Accept configuration via env vars and flags; default to non-interactive execution suitable for CI.