DocsGitHub7

Connector Specifications

Choose the specification that matches your connector type. Each specification provides detailed guidelines and requirements for building robust, production-ready connectors.

API Connector Specification

This specification defines the requirements for implementing a robust, production‑ready API connector. The connector must be language‑agnostic. Any illustrative snippets must be treated as pseudocode, not tied to a specific language or framework.

Scope and Principles

  • Language‑agnostic: The spec describes behaviors, contracts, and data shapes, not language constructs.
  • Separation of concerns: Request execution, authentication, retries, rate limits, and pagination are composable, swappable modules.
  • Deterministic, observable, testable: Deterministic defaults, structured logs/metrics/traces, and clear test surfaces.
  • Secure by default: Credentials are redacted, transport is encrypted where applicable, and inputs/outputs are validated.
  • Resilient: Backoff with jitter, circuit breaking, idempotency, and graceful degradation built in.
  • Extensible: Hooks/middleware enable customization without forking core.

Core Modules and Methods

Every API connector must implement the following core functionality and structure:

Resource Abstraction

  • Organize code by API resources rather than ETL stages.
  • Canonical layout for resources:
    • Single-file per resource under src/resources/{resource}.ts.
    • Barrel export at src/resources/index.ts that re-exports per-resource factories.
    • Each resource module must expose:
      • createResource(send) factory that binds a base path (e.g., /{resource}) and returns a CRUD surface
      • A minimal, consistent operation: getAll(params) that returns an async generator of arrays (pages)
      • Optional operations based on upstream capability: getById(id) (or get), and mutation methods when applicable
      • A typed Model describing the item shape for the resource
  • Cross-cutting helpers should live under src/lib:
    • paginate iterator supporting cursor pagination (and extensible for other strategies)
    • make-resource (or equivalent) to build the CRUD surface with pagination

Initialization and Lifecycle

  • initialize(configuration)
    Sets up the connector with the provided configuration. Should validate the configuration and prepare any internal state.

  • connect()
    Establishes connection to the API service. May include authentication, session creation, or connection pooling.

  • disconnect()
    Gracefully closes the connection and cleans up resources. Should complete any pending requests before disconnecting.

  • isConnected()
    Returns true if the connector is currently connected and ready to make requests, false otherwise.

Request Methods

  • request(options)
    Core method for making HTTP requests. All other HTTP methods should internally use this method.
    Options should include: method, path, headers, query parameters, body, timeout, and any method-specific settings.

  • get(path, options)
    Performs an HTTP GET request to the specified path.

  • Optional sugar methods (post, put, patch, delete) may be provided for ergonomics.

Advanced Operations

  • batch(requests)
    Executes multiple requests in a single operation where supported by the API. Should handle partial failures gracefully.

  • paginate(options)
    Returns an iterator that automatically handles pagination, fetching subsequent pages as needed. Should support different pagination strategies. Resource modules should consume this via the shared lib.

Optional Operations (if applicable)

  • stream(options)
    Reads streaming responses (e.g., chunked, SSE) with backpressure and cancellation.

Configuration Structure

The connector configuration should support the following settings:

Base Configuration

  • baseUrl - The base URL for all API requests
  • timeout - Request timeout in milliseconds (default: 30000)
  • userAgent - Identifier for outbound requests (include app version/commit when available)
  • proxy - Optional proxy configuration (host, port, protocol, credentials)
  • tls - TLS options (verify, min version, CA bundle, mTLS certificates) where applicable
  • pooling - Connection pooling/keep‑alive settings

Authentication Configuration

Support for multiple authentication types:

  • type - One of: api_key, bearer, basic, oauth2, or custom
  • credentials - Authentication credentials specific to the chosen type

Retry Configuration

  • maxAttempts - Maximum number of retry attempts (default: 3)
  • initialDelay - Initial retry delay in milliseconds (default: 1000)
  • maxDelay - Maximum retry delay in milliseconds (default: 30000)
  • backoffMultiplier - Multiplier for exponential backoff (default: 2)
  • retryableStatusCodes - HTTP status codes that trigger retries (default: [429, 500, 502, 503, 504])
  • retryableErrors - Error types/codes that should trigger retries
  • retryBudgetMs - Hard cap on total time spent retrying a single logical operation
  • respectRetryAfter - Whether to honor server Retry‑After hints (default: true)
  • idempotency - Enable idempotency key strategy for unsafe methods (default: enabled)

Rate Limiting Configuration

  • requestsPerSecond - Maximum requests per second
  • requestsPerMinute - Maximum requests per minute
  • requestsPerHour - Maximum requests per hour
  • concurrentRequests - Maximum concurrent requests (default: 10)
  • burstCapacity - Allowed burst above steady rate (token bucket)
  • adaptiveFromHeaders - Update limits from response headers when available (default: true)

Default Settings

  • defaultHeaders - Headers to include with every request
  • defaultQueryParams - Query parameters to include with every request

Hooks Configuration

Arrays of hooks to execute at different stages:

  • beforeRequest - Executed before sending a request
  • afterResponse - Executed after receiving a response
  • onError - Executed when an error occurs
  • onRetry - Executed before retrying a request

Canonical Hook Event Semantics

  • Hooks must accept a discriminated union context with type in { beforeRequest, afterResponse, onError, onRetry }.
  • beforeRequest emits http_request; afterResponse emits http_response.
  • Optional fields controlled by logging options:
    • includeQueryParamsquery present on both events when parseable
    • includeHeaders → headers present on request and response events
    • includeBody → response body present on http_response; request body present when relevant

Retry Mechanism

The connector must implement a robust retry strategy with the following requirements:

Retry Strategy Methods

  • shouldRetry(error, attemptNumber)
    Determines whether a request should be retried based on the error and current attempt count.

  • calculateDelay(attemptNumber)
    Calculates the delay before the next retry attempt.

  • onRetry(error, attemptNumber)
    Hook called before each retry attempt for logging or state updates.

Implementation Requirements

  1. Exponential Backoff
    Calculate delay as: minimum(initialDelay × (backoffMultiplier ^ attemptNumber), maxDelay)

  2. Jitter
    Add randomization to prevent thundering herd: actualDelay = delay × (0.5 + random(0 to 0.5))

  3. Respect Server Hints
    Honor "Retry-After" headers when present

  4. Circuit Breaker
    Implement circuit breaker pattern to prevent cascading failures

  5. Retry Budget
    Abort retries once the per‑operation retry budget is exhausted, even if maxAttempts not reached.

Hook System

Hooks provide extension points for customizing connector behavior without modifying core logic:

Hook Structure

  • name - Unique identifier for the hook
  • priority - Execution order (lower numbers execute first)
  • execute(context) - The hook's main function

Hook Context

Each hook receives a context object containing:

  • type - The hook type: beforeRequest, afterResponse, onError, or onRetry
  • request - The request options (when applicable)
  • response - The response object (when applicable)
  • error - The error object (when applicable)
  • metadata - Additional context data

Context Methods

  • modifyRequest(updates) - Modify the outgoing request
  • modifyResponse(updates) - Modify the incoming response
  • abort(reason) - Cancel the request with a reason

Middleware Pipeline (conceptual)

Hooks/middleware execute in a well‑defined order around the core request execution:

PSEUDOCODE pipeline:
1. Build request (defaults → per‑call options → auth → user hooks)
2. Rate limiter: waitForSlot()
3. beforeRequest hooks (ordered by priority)
4. Execute (with timeout + cancellation token)
5. afterResponse hooks (transform/validate)
6. onError hooks (map/enrich), possibly shouldRetry → backoff
7. Metrics/logging at each stage

Common Hook Use Cases

  • Adding authentication headers
  • Request/response logging
  • Metrics collection
  • Request signing
  • Response transformation
  • Error enrichment

Type and Data Model Management

Response Structure

All responses should be wrapped in a consistent structure containing:

  • data - The actual response payload
  • status - HTTP status code
  • headers - Response headers as key-value pairs
  • meta - Optional metadata including:
    • timestamp - When the response was received
    • duration - Request duration in milliseconds
    • retryCount - Number of retry attempts made
    • rateLimit - Current rate limit status
    • requestId - Correlation identifier echoed by server or generated by client

Data Transformation

The connector should provide methods for data transformation:

  • deserialize(data, schema)
    Transform API response data into internal application models

  • serialize(data, schema)
    Transform internal models into API-compatible format

  • validate(data, schema)
    Validate data against a schema definition

Schema Definition

Schemas should support:

  • type - Data type: object, array, string, number, or boolean
  • properties - For objects, defines nested properties
  • items - For arrays, defines the schema of array elements
  • required - List of required property names
  • format - Specific format constraints (e.g., date-time, email, uri)
  • transform - Custom transformation function

Type Safety and API Contracts

  • Strong typing is required wherever the implementation language supports it (e.g., TypeScript). Public APIs must not be untyped or use any.
  • Prefer named types, generics with constraints, discriminated unions, and exact object shapes over loose records.

OpenAPI Integration

  • Use a single generator: hey-api for TypeScript.
  • Canonical input path: schemas/raw/files/openapi.json.
  • Canonical output directory: src/generated.
  • Resource files import generated types directly to avoid drift.

Analytical Mode

Provide analytics‑friendly, single‑level objects while preserving arrays.

  • Deterministic pipeline
    • Raw types: Generated by hey-api into src/generated.
    • Flat types: Generated by a codegen step into src/generated.
    • Config: schemas/flatten.config.json controls delimiter, depth, per‑field aliases, and skips.
  • Naming
    • Keep Raw model names unchanged (e.g., Foo).
    • Emit sibling Flat models with Flat suffix (e.g., FooFlat).
    • Emit mappers: mapFooToFooFlat(raw: Foo): FooFlat in src/generated.
  • Flattening rules
    • Flatten nested object properties to first level using a delimiter (default: _).
    • Arrays remain arrays.
      • Arrays of primitives stay the same.
      • Arrays of objects become arrays of flattened element objects (shape flattened, array preserved).
    • Top‑level primitives remain unchanged.
    • Field name collisions must be resolved deterministically via config (alias/skip). Auto‑suffixing is allowed only as a fallback.
    • Depth is unlimited by default; may be bounded via config.
  • Hook integration
    • Use an afterResponse hook to transform returned data from Raw → Flat
    • Behavior: if response.data is an array of objects, map each element; if a single object, map once; otherwise pass through.
    • If no mapper is registered for the operation, pass through Raw unchanged.
    • Similar approach can be used to handle other analytical needs (i.e. nullability)
  • Resource API
    • Public resource methods should return Flat types.
    • Optionally expose Raw variants (e.g., getAllRaw) when needed for low‑level use.

Error Handling

Error Structure

All connector errors should include:

  • message - Human-readable error description
  • code - Machine-readable error code
  • statusCode - HTTP status code (if applicable)
  • details - Additional error context or data
  • retryable - Boolean indicating if the request can be retried
  • requestId - Correlation identifier if available
  • source - Subsystem where the error occurred (transport, auth, rateLimit, deserialize, userHook, unknown)

Standard Error Codes

Connectors should use these standardized error codes:

  • NETWORK_ERROR - Network connectivity issues
  • TIMEOUT - Request exceeded timeout limit
  • AUTH_FAILED - Authentication or authorization failure
  • RATE_LIMIT - Rate limit exceeded
  • INVALID_REQUEST - Malformed or invalid request
  • SERVER_ERROR - Server-side error (5xx status codes)
  • PARSING_ERROR - Failed to parse response
  • VALIDATION_ERROR - Data validation failed
  • CANCELLED - Request was cancelled by caller
  • UNSUPPORTED - Operation not supported by target API

Error Handling Best Practices

  • Preserve original error information for debugging
  • Provide actionable error messages
  • Include request context in error details
  • Differentiate between retryable and non-retryable errors
  • Log errors with appropriate severity levels
PSEUDOCODE error enrichment:
IF transport error THEN code = NETWORK_ERROR, retryable = true
ELSE IF status in [408, 425, 429, 5xx] THEN retryable = true
ELSE retryable = false
Attach requestId, endpoint, method, attemptNumber, duration

Pagination Support

Pagination Configuration

The paginate method should accept options including:

  • pageSize - Number of items per page
  • startCursor - Initial cursor for cursor-based pagination
  • startPage - Initial page number for page-based pagination
  • strategy - Pagination type: cursor, offset, page, or link-header
  • params - Strategy‑specific parameter names (e.g., pageParam, perPageParam, cursorParam, offsetParam, limitParam)

Custom Extraction Functions

Allow customization of pagination logic through:

  • extractNextCursor(response) - Extract the next page cursor from response
  • extractItems(response) - Extract items array from response
  • hasNextPage(response) - Determine if more pages exist

Pagination Implementation

The paginate method should:

  1. Return an iterator for memory-efficient processing
  2. Automatically fetch subsequent pages as needed
  3. Handle different pagination strategies transparently
  4. Yield arrays of items for each page
  5. Stop when no more pages are available
PSEUDOCODE for paginate method:
1. Initialize cursor/page from options
2. Set hasMore = true
3. WHILE hasMore:
   a. Make request with current cursor/page
   b. Extract items from response
   c. Yield items to caller
   d. Extract next cursor/page
   e. Check if more pages exist
   f. Update hasMore flag
4. End iteration when no more pages

Pragmatic Defaults and Starter Pattern

Many vendor APIs either return full lists or have inconsistent pagination. A productive default is:

  • Provide getAll(params) per resource that:
    • Performs a single GET and yields client‑side chunks using pageSize, with maxItems to cap total items
    • Supports buildListQuery(params) to map typed filters to query
  • When real pagination is required, add a paginate helper and implement getAll on top to keep a consistent surface.

Observability should be on by default at sensible levels. Logging options should include:

  • includeQueryParams – include parsed query params in request/response URL logs
  • includeHeaders – include request and response headers
  • includeBody – include response body (and request body when relevant)

Always redact secrets.

Concurrency, Cancellation, and Timeouts

  • Cancellation token: All operations accept a caller‑provided token to cancel in‑flight work.
  • Per‑call timeout: Enforced at the transport layer; must trigger cancellation and error with TIMEOUT.
  • Global shutdown: The connector supports graceful shutdown, draining in‑flight requests.
  • Max concurrency: Enforced independent of rate limits; bounded work queue to avoid unbounded memory growth.
PSEUDOCODE request with cancellation and timeout:
1. IF !canProceed() THEN waitForSlot()
2. START timer(timeout)
3. TRY execute
4. IF cancelled OR timer expired → abort transport → raise TIMEOUT/CANCELLED
5. ALWAYS release slot

Streaming and Large Payloads

  • Support reading streaming responses (SSE/chunked) with backpressure.
  • Support large uploads/downloads with chunking, multi‑part, or resumable mechanisms when available.
  • Apply checksum/ETag validation when provided by the server.
  • Surface progress events via hooks or callbacks where relevant.
PSEUDOCODE streaming read:
open stream
FOR EACH chunk IN stream:
  emit chunk to caller
ON error → map to NETWORK_ERROR (retryable if partial/transient)

Rate Limiting

Rate Limiter Methods

The rate limiter should implement:

  • canProceed()
    Returns true if a request can be made immediately without exceeding rate limits

  • waitForSlot()
    Blocks/waits until a request slot becomes available

  • updateFromResponse(headers)
    Updates rate limit state based on response headers (e.g., X-RateLimit-Remaining)

  • getStatus()
    Returns current rate limit status information

Rate Limit Status

Status information should include:

  • limit - Maximum requests allowed in the window
  • remaining - Requests remaining in current window
  • reset - Timestamp when the limit resets
  • retryAfter - Seconds to wait before retrying (if provided)

Implementation Strategies

  • Token Bucket - Smooth rate limiting with burst capacity
  • Sliding Window - Precise rate limiting over time windows
  • Fixed Window - Simple reset at specific intervals
  • Adaptive - Adjust based on server feedback
PSEUDOCODE adaptive update:
IF headers contain rate-limit info THEN update limiter state
IF Retry-After present THEN sleep per hint

Authentication Strategies

Authentication Methods

Each authentication strategy should implement:

  • authenticate(request)
    Apply authentication credentials to the outgoing request

  • refresh()
    Refresh expired credentials (optional, for token-based auth)

  • isValid()
    Check if current authentication credentials are still valid

Required Authentication Types

  • API Key
    Support for API keys in headers, query parameters, or custom locations

  • Bearer Token
    JWT or opaque tokens with optional refresh mechanism

  • Basic Authentication
    Username and password encoded in Authorization header

  • OAuth 2.0
    Full OAuth flow with token refresh support

  • Custom Authentication
    Signature-based auth, HMAC, or other custom schemes

Authentication Best Practices

  • Store credentials securely (never in plain text)
  • Implement automatic token refresh before expiration
  • Handle authentication failures gracefully
  • Support multiple authentication methods per connector
  • Allow authentication method switching at runtime
PSEUDOCODE auth application:
credentials = load from secure store
IF credentials expiring → refresh()
add auth to request (header/query/signature)

Idempotency

  • For unsafe methods (e.g., POST), support idempotency keys when the API allows, to safely retry.
  • Generate a stable key per logical operation; store it in a header or agreed field.
  • Avoid silent replays when idempotency is not supported (surface clear warnings).
PSEUDOCODE idempotency key:
key = hash(operationName + stableInputs)
set header "Idempotency-Key" = key

Webhooks and Async Jobs (if applicable)

  • Verify webhook signatures and timestamps; reject stale or invalid deliveries.
  • Support async job polling patterns (create → poll status → fetch result), with backoff.
  • De‑duplicate webhook events using delivery IDs or replay IDs.
PSEUDOCODE async job:
jobId = POST /jobs
REPEAT until done:
  status = GET /jobs/{jobId}
  IF status == done → break
  sleep(backoff)
result = GET /jobs/{jobId}/result

Best Practices

  • Connection Pooling: Reuse connections when possible
  • Request Deduplication: Prevent duplicate requests for the same resource
  • Caching: Implement cache headers respect (ETag, Last-Modified)
  • Compression: Support gzip/deflate compression
  • Logging: Structured logging with request IDs for tracing
  • Metrics: Track request count, latency, error rates
  • Graceful Shutdown: Complete in-flight requests before disconnecting
  • Resource Cleanup: Properly clean up timers, connections, and listeners

Observability

  • Logging: Structured logs with correlation requestId, redaction of secrets, and consistent fields.
  • Metrics: Counters (requests, errors, retries), distributions (latency, payload sizes), gauges (in‑flight, rate limits).
  • Tracing: Span per request with attributes for method, path, status, retryCount, rateLimit.

Security and Compliance

  • Redact secrets in logs, metrics, and errors.
  • Validate inputs and outputs; reject malformed data early.
  • Use TLS by default; support custom CA bundles and optional mTLS where required.
  • Clock‑skew aware signature validation when needed.
  • Respect data residency and minimization; avoid storing payloads unless explicitly enabled.

Versioning and Compatibility

  • Use the upstream/source version identifiers for organizing connector variants (e.g., v4, dates, API versions). SemVer is not required for registry entries.
  • Backward‑compatible changes preferable; document breaking changes clearly.
  • Feature flags or capability negotiation for optional features (e.g., streaming, webhooks).

Testing Requirements

Connectors must include:

  • Unit tests for all public methods
  • Integration tests with mock servers
  • Retry logic testing with various failure scenarios
  • Rate limit testing
  • Authentication flow testing
  • Error handling and recovery testing
  • Performance benchmarks

Conformance Checklist

  • Implements lifecycle: initialize, connect, disconnect, isConnected
  • Provides request primitives, optional stream/upload/download when applicable
  • Config supports baseUrl, timeouts, proxy/tls, auth, retry, rate limit, defaults, hooks
  • Retry with backoff + jitter, honors Retry‑After, has circuit breaker and retry budget
  • Hook pipeline before/after/error/retry; deterministic order and cancellation
  • Response wrapper with data/status/headers/meta including requestId and rateLimit
  • Structured errors with code/status/retryable/details and correlation id
  • Pagination supports cursor/offset/page/link‑header with pluggable extractors
  • Concurrency limits, cancellation, graceful shutdown
  • Observability: logs/metrics/traces with redaction
  • Security controls for credentials, TLS, validation, and redaction

Common Requirements

Analytical Connectors Common Specification

Purpose-built integrations that extract data from a source system and allow you to programatically interact with it for further processing. They prioritize correctness, incremental delivery, and schema stability.

Data Model

  • Leverage analytical data modeling best practices
    • Extract raw data models with types and relationships
    • Extract events with a timestamp and a primary key
  • Don't over process the data, just extract

Sync Semantics

  • Support both initial full sync and ongoing incremental syncs
  • Use a deterministic cursor (e.g., updated_at, event_timestamp, or CDC offset); CDC is preferred when available
  • Chunk and paginate reads; stream writes to avoid unbounded memory

Schema Evolution

  • Versions should always have a deterministic schema that doesn't change. If you change the schema, you should create a new version.
  • Use stable, documented naming conventions (snake_case; UTC timestamps)
  • Emit clear migration notes when columns are added or semantics change

Data Quality and Deletes

  • Deduplicate data data when the source doesn't guarantee uniqueness
  • Validate basic types and required fields; surface warnings and errors to the user

Performance and Limits

  • Respect source rate limits; use concurrency controls and adaptive backoff with jitter
  • Use incremental checkpoints after each page/batch so jobs can resume safely
  • Prefer server-side filtering and projection to minimize transfer size

Structure and Modules

  • Prefer resource-oriented modules over ETL-phase folders.
    • Organize code by API resource (e.g., /contacts, /companies), not extract/transform/load.
    • Each resource exposes thin CRUD-like operations and streaming helpers: list, get, streamAll, getAll.
    • Define a clear model per resource to capture fields and semantics.
    • Share cross-cutting utilities (pagination, request helpers) in a common lib module.

Observability

  • Logs: structured, with job/run IDs and page/batch numbers; never log secrets
  • Metrics: rows_read, rows_written, lag_seconds, duplicate_rows, retries, and duration_seconds
  • Optional tracing spans around request execution, pagination, and resource processing

Security

  • TLS by default; least-privilege access to sources and targets
  • PII handling: configurable field redaction/masking; scrub sensitive data from logs and metrics

Documentation

  • List covered entities and their cursors, limitations/quotas, and expected sync cadences
  • Provide example schemas, sample queries, and recovery steps for common failures
  • Documentation should have at least:
    • A getting started page (/getting-started)
    • A configuration page (/configuration)
    • A schema overview page (/schema-overview)
    • A Limits page (/limits | if no limits are known, clearly state that)
    • A changelog page (/changelog)
    • An FAQ page (/faq)

Developer Experience and Local Testing

  • Provide convenient local scripts/CLI to exercise core operations without additional setup beyond environment variables.
    • Include an .env.example listing all required variables; do not hard-code secrets.
    • Offer npm scripts or pythons for: auth check, list, get, streamAll/getAll, initial and incremental sync, and (if applicable) webhook signature verification with sample payloads.
    • Support JSON output for easy piping into tools (jq) and deterministic exit codes (0 success, non-zero on failure).
    • Accept configuration via env vars and flags; default to non-interactive execution suitable for CI.