How to Use the Lineage Features
User Flow
-
Selecting a Job:
Choose the Flink job you wish to investigate. The job's Operator Map is displayed, visually representing the data flow through the pipeline. The sinks appear on the far right. -
Choosing an Operator:
Select the operator associated with the data you want to investigate. This action presents all the records that have reached this point in the pipeline. -
Identifying the Record:
Use the Filter Bar to locate records based on criteria and select a specific record to trace its lineage. -
Tracing the Lineage of a Single Record:
The Job Graph displays the lineage, showing how the record has moved through the pipeline. -
Diving Deeper into Data:
Click on number buttons at points in the graph to view the records entering or exiting specific operators. -
Verifying Data Validity:
Inspect whether the data values adhere to policy thresholds (e.g., speed violations). Identify anomalies (e.g., unrealistic values like 14,000 km/h) to determine whether the issue lies in the source data or pipeline logic. -
Pinpointing the Source of Issues:
Navigate upstream in the Job Graph to check if the anomaly originates from incorrect external data or a miscalculation within the pipeline. -
Taking Action:
After identifying the root cause, take corrective actions to resolve the issue.