Skip to main content

Lineage

Data lineage refers to the ability to trace the flow of data through a system, capturing its journey from origin to output, including all transformations and operations applied along the way. Lineage is essential for debugging, compliance, and ensuring data accuracy in complex pipelines.

Datorios provides comprehensive lineage capabilities tailored for Apache Flink, supporting three modes to cater to diverse user needs:

  • Upstream Lineage: Tracing records back to their origin.
  • Downstream Lineage: Following records as they move through the pipeline.
  • Lineage from Any Point: Investigating data flows starting from intermediate pipeline stages.

These features empower users with deep insights into their data, enabling efficient debugging, optimization, and governance.