Pipeline Lineage
Moose Lineage Manifest
The lineage manifest is a static description of a pipeline's systems and data flow. It lives under the implementation directory at moose/lineage.manifest.json (preferred) or lineage/manifest.json.
For full details on entity and relationship types, see the sections below.
Entities (Nodes)
Each node has: id, type, name, namespace, version, attrs.
- Allowed types: connector, ingest_api, stream, dlq, transform, sync, table, materialized_view, external_table, consumption_api, openapi_spec, client, workflow
- Required attrs per type (minimal):
- connector:
{ mode: "webhook"|"etl"|"cdc", schema_hash } - ingest_api:
{ route, version, auth: { method: "jwt"|"api_key"|"none", audience?: string } } - stream:
{ partitions, retention_seconds } - dlq:
{ backing: "stream"|"table" } - transform:
{ code_ref: { repo, path, commit, line? }, dlq?: nodeId } - sync:
{ semantics: "at_least_once", flush: { rows?, interval_ms? }, offset_tracking: true } - table:
{ physical_name, engine, order_by, deduplicate?: boolean } - materialized_view:
{ target_table, select_from: string[] } - external_table:
{ provider: "clickpipes"|"debezium"|"aws_dms", lifecycle: "externally_managed" } - consumption_api:
{ route, query_spec: { params_schema_ref, tables_referenced: string[] }, auth } - openapi_spec:
{ path: ".moose/openapi.yaml" } - client:
{ kind: "dashboard"|"service"|"agent", sdk?: { language, version } } - workflow:
{ kind: "workflow"|"task", schedule?: string }
- connector:
Recommended extras for connector nodes:
connector:{ name, version?, author?, language?, implementation? }identifier(e.g., GA4 propertyproperties/1234)schema_path(repo-relative path to a relevant schema file)
Relationships (Edges)
Each edge has: from, to, type, attrs.
- Allowed types: produces, publishes, dead_letters_to, transforms, emits, syncs_to, writes, derives, reads, queries, serves, documents, triggers, backfills, retries_from
- Common edge attrs (optional):
schema_from_hash,schema_to_hashprivacy_tags: string[](e.g., ["pii_email","pii_phone"])policy: { retention_days?, encryption?: "at_rest"|"none" }
Source of truth for types
Runtime types live in packages/models/src/lineage.ts. Keep docs and scaffolds in sync with these types.
Understanding Lineage
Data lineage tracks the flow of data through your pipeline, providing visibility into:
- Where data originates (source systems)
- How data is transformed
- Where data lands (destination systems)
- Dependencies between data elements
Lineage Manifest
Each pipeline includes a lineage manifest that describes the data flow:
{
"nodes": [
{
"id": "ga-source",
"type": "source",
"name": "Google Analytics",
"namespace": "google-analytics",
"version": "v4"
},
{
"id": "transform-1",
"type": "transformation",
"name": "Normalize Events",
"namespace": "pipeline"
},
{
"id": "clickhouse-dest",
"type": "destination",
"name": "ClickHouse Analytics",
"namespace": "clickhouse"
}
],
"edges": [
{
"from": "ga-source",
"to": "transform-1",
"type": "data_flow"
},
{
"from": "transform-1",
"to": "clickhouse-dest",
"type": "data_flow"
}
]
}Generating Lineage Diagrams
Use the provided scripts to generate visual lineage diagrams:
# Generate Mermaid diagram
pnpm run generate:lineage:mermaid
# Generate SVG diagram
pnpm run generate:lineage:svg
# Generate interactive visualization
pnpm run generate:lineage:interactiveLineage Schema References
Pipelines can reference connector schemas to build complete lineage:
{
"datasets": [
{
"kind": "pointer",
"name": "GA Events",
"connector": {
"name": "google-analytics",
"version": "v4",
"author": "514-labs",
"language": "typescript"
}
}
]
}Benefits of Lineage Tracking
- Impact analysis - Understand downstream effects of changes
- Debugging - Trace data issues to their source
- Compliance - Document data flows for regulatory requirements
- Documentation - Auto-generated visual documentation