Skip to main content

How to Use the Lineage Features

User Flow

  1. Selecting a Job:
    Choose the Flink job you wish to investigate. The job's Operator Map is displayed, visually representing the data flow through the pipeline. The sinks appear on the far right.

  2. Choosing an Operator:
    Select the operator associated with the data you want to investigate. This action presents all the records that have reached this point in the pipeline.

  3. Identifying the Record:
    Use the Filter Bar to locate records based on criteria and select a specific record to trace its lineage.

  4. Tracing the Lineage of a Single Record:
    The Job Graph displays the lineage, showing how the record has moved through the pipeline.

  5. Diving Deeper into Data:
    Click on number buttons at points in the graph to view the records entering or exiting specific operators.

  6. Verifying Data Validity:
    Inspect whether the data values adhere to policy thresholds (e.g., speed violations). Identify anomalies (e.g., unrealistic values like 14,000 km/h) to determine whether the issue lies in the source data or pipeline logic.

  7. Pinpointing the Source of Issues:
    Navigate upstream in the Job Graph to check if the anomaly originates from incorrect external data or a miscalculation within the pipeline.

  8. Taking Action:
    After identifying the root cause, take corrective actions to resolve the issue.