State investigation
Value and Problem Solved
Value and Problem Solved
The "State Insights" feature provides users with comprehensive insights into the state and checkpoints of their Flink jobs. It helps users:
-
Identify bottlenecks, data skew, and resource utilization issues.
-
Understand the flow and behavior of state size, checkpoints status and operator throughput.
-
Correlate performance metrics to checkpoints status and logs during pipeline execution.
This feature simplifies debugging and optimization by presenting detailed state and checkpoint, enabling users to react to issues and optimize their pipelines effectively.
Functionality and How It Works
Overview Boxes
Summarize Checkpoint related detalis and metrics such as:
- The number of Checkpoints - sucess and fail
- The number of restarts
- CPU and Memory size of
State Size
- An over time job state size tracking to detect and monitor status and issues.
- state current size.
Operator throughput
Enabales the user to view the throughput of any operator in order to detect and view spikes during checkpoints
Checkpoints and Jobs performance Overtime
a correlated graph to view a jobs CPU and memory usage along side the checkpoints Every bar is a bucket of checkpoints where:
- sucessful checkpoint is colored in green
- fail checkpoint is colored in related
hovering over any bucket displayes additional information such as:
- TIme frame
- Checkpoints status
- CPU and memory
Checkpoints Table
-A detailed list of all checkpoints, their duraion, size and status.
- The user could sort and filter by any column in order to find the issue faster
State related logs
-A list of all state related logs sorted by time and detialed. -The user could sort and filter by any column
Interaction with the Timeline
- Users can drag and drop the timeline touchpoint using a brush to navigate to checkpoints
- All graphs and logs will updates accordingly.