Pipeline Scaffold
Below is a visualization of the pipeline registry scaffolds. Each section shows the folder layout generated by the scaffold and the intent of the key folders.
Python implementation scaffold
- python
- {implementation}
- _meta
- pipeline.json
- README.md
- CHANGELOG.md
- LICENSE
- assets
- from
- README.md
- to
- README.md
- .gitignore
- .env.example
- README.md
- install.config.toml
- pyproject.toml
- docs
- getting-started.md
- configuration.md
- outputs.md
- schemas
- index.json
- lineage
- schemas
- index.json
- relational
- tables.json
- README.md
- files
- manifest.json
- README.md
- moose
- lineage.manifest.json
- src
- {packageName}
- __init__.py
- runner.py
- config.py
- lib
- paginate.py
- make_resource.py
- hooks.py
- send.py
- {resource}
- __init__.py
- model.py
- tests
- test_runner.py
- examples
- basic_usage.py
TypeScript implementation scaffold
- typescript
- {implementation}
- _meta
- pipeline.json
- README.md
- CHANGELOG.md
- LICENSE
- assets
- from
- README.md
- to
- README.md
- .gitignore
- .env.example
- README.md
- install.config.toml
- package.json
- tsconfig.json
- vitest.config.ts
- docs
- getting-started.md
- configuration.md
- outputs.md
- schemas
- index.json
- moose
- lineage.manifest.json
- lineage
- schemas
- index.json
- relational
- tables.json
- README.md
- files
- manifest.json
- README.md
- src
- index.ts
- runner.ts
- config.ts
- lib
- paginate.ts
- make-resource.ts
- hooks.ts
- send.ts
- {resource}
- index.ts
- model.ts
- tests
- runner.test.ts
- examples
- basic-usage.ts
What the folders mean
{pipeline}/{version}/{author}/{language}/{implementation}/_meta
- Holds all pipeline metadata at the implementation level.
- Files: `pipeline.json` (identifier, name, author, version, source/destination config, schedule, etc.), `README.md`, `CHANGELOG.md`, `LICENSE`, and `assets/` for logos and lineage diagrams.
- The `assets/` folder contains:
- `from/` subdirectory for source system logos
- `to/` subdirectory for destination system logos
- Lineage diagrams (e.g., `lineage.mmd`, `lineage.svg`)
- Each implementation has its own `_meta` folder, allowing different implementations to have different configurations and schedules.
Language implementations under {pipeline}/{version}/{author}/{language}/{implementation}
- `python/{implementation}/` and `typescript/{implementation}/` contain helpers and runnable code.
- Prefer placing docs adjacent to the implementation:
- `docs/` for human-facing guides (getting started, config, outputs)
- `schemas/` at the top level of the language directory for machine-readable datasets/index
- `src/` for code (with subfolders like `extract/`, `transform/`, `load/`)
- `tests/` for unit tests
- `scripts/` for automation like lineage generation
- `lineage/` for lineage-specific schemas and manifests
Key Differences from Connectors
- Lineage tracking - Pipelines include lineage diagrams and manifests
- Source/Destination config - Pipeline metadata includes source and destination specifications
- Transformation focus - More emphasis on transformation logic and data flow
- Scheduling - Built-in support for cron schedules and timezone configuration
Notes
- The `_meta` folder is now at the implementation level, containing all metadata for that specific pipeline implementation.
- Documentation goes in the `docs/` folder within each implementation, not in the `_meta` folder.
- Place schemas at the top level of each language implementation in the `schemas/` folder (not under `src`).
- Lineage diagrams and definitions are stored within the implementation:
- `lineage/` folder for lineage schemas and manifests
- `moose/` folder for Moose-specific lineage manifests
- `_meta/assets/` for generated lineage diagrams and system logos