DocsGitHub7

Pipeline Scaffold

Below is a visualization of the pipeline registry scaffolds. Each section shows the folder layout generated by the scaffold and the intent of the key folders.

Python implementation scaffold

  • python
    • {implementation}
      • _meta
        • pipeline.json
        • README.md
        • CHANGELOG.md
        • LICENSE
        • assets
          • from
            • README.md
          • to
            • README.md
      • .gitignore
      • .env.example
      • README.md
      • install.config.toml
      • pyproject.toml
      • docs
        • getting-started.md
        • configuration.md
        • outputs.md
      • schemas
        • index.json
      • lineage
        • schemas
          • index.json
          • relational
            • tables.json
            • README.md
          • files
            • manifest.json
            • README.md
      • moose
        • lineage.manifest.json
      • src
        • {packageName}
          • __init__.py
          • runner.py
          • config.py
          • lib
            • paginate.py
            • make_resource.py
            • hooks.py
            • send.py
          • {resource}
            • __init__.py
            • model.py
      • tests
        • test_runner.py
      • examples
        • basic_usage.py

TypeScript implementation scaffold

  • typescript
    • {implementation}
      • _meta
        • pipeline.json
        • README.md
        • CHANGELOG.md
        • LICENSE
        • assets
          • from
            • README.md
          • to
            • README.md
      • .gitignore
      • .env.example
      • README.md
      • install.config.toml
      • package.json
      • tsconfig.json
      • vitest.config.ts
      • docs
        • getting-started.md
        • configuration.md
        • outputs.md
      • schemas
        • index.json
      • moose
        • lineage.manifest.json
      • lineage
        • schemas
          • index.json
          • relational
            • tables.json
            • README.md
          • files
            • manifest.json
            • README.md
      • src
        • index.ts
        • runner.ts
        • config.ts
        • lib
          • paginate.ts
          • make-resource.ts
          • hooks.ts
          • send.ts
        • {resource}
          • index.ts
          • model.ts
      • tests
        • runner.test.ts
      • examples
        • basic-usage.ts

What the folders mean

{pipeline}/{version}/{author}/{language}/{implementation}/_meta

  • Holds all pipeline metadata at the implementation level.
  • Files: `pipeline.json` (identifier, name, author, version, source/destination config, schedule, etc.), `README.md`, `CHANGELOG.md`, `LICENSE`, and `assets/` for logos and lineage diagrams.
  • The `assets/` folder contains:
    • `from/` subdirectory for source system logos
    • `to/` subdirectory for destination system logos
    • Lineage diagrams (e.g., `lineage.mmd`, `lineage.svg`)
  • Each implementation has its own `_meta` folder, allowing different implementations to have different configurations and schedules.

Language implementations under {pipeline}/{version}/{author}/{language}/{implementation}

  • `python/{implementation}/` and `typescript/{implementation}/` contain helpers and runnable code.
  • Prefer placing docs adjacent to the implementation:
    • `docs/` for human-facing guides (getting started, config, outputs)
    • `schemas/` at the top level of the language directory for machine-readable datasets/index
    • `src/` for code (with subfolders like `extract/`, `transform/`, `load/`)
    • `tests/` for unit tests
    • `scripts/` for automation like lineage generation
    • `lineage/` for lineage-specific schemas and manifests

Key Differences from Connectors

  • Lineage tracking - Pipelines include lineage diagrams and manifests
  • Source/Destination config - Pipeline metadata includes source and destination specifications
  • Transformation focus - More emphasis on transformation logic and data flow
  • Scheduling - Built-in support for cron schedules and timezone configuration

Notes

  • The `_meta` folder is now at the implementation level, containing all metadata for that specific pipeline implementation.
  • Documentation goes in the `docs/` folder within each implementation, not in the `_meta` folder.
  • Place schemas at the top level of each language implementation in the `schemas/` folder (not under `src`).
  • Lineage diagrams and definitions are stored within the implementation:
    • `lineage/` folder for lineage schemas and manifests
    • `moose/` folder for Moose-specific lineage manifests
    • `_meta/assets/` for generated lineage diagrams and system logos