Skip to main content

State investigation

Value and Problem Solved

Value and Problem Solved

The "State Insights" feature provides users with comprehensive insights into the state and checkpoints of their Flink jobs. It helps users:

  • Identify bottlenecks, data skew, and resource utilization issues.

  • Understand the flow and behavior of state size, checkpoints status and operator throughput.

  • Correlate performance metrics to checkpoints status and logs during pipeline execution.

This feature simplifies debugging and optimization by presenting detailed state and checkpoint, enabling users to react to issues and optimize their pipelines effectively.


Functionality and How It Works

Overview Boxes

Summarize Checkpoint related detalis and metrics such as:

  • The number of Checkpoints - sucess and fail
  • The number of restarts
  • CPU and Memory size of

State Size

  • An over time job state size tracking to detect and monitor status and issues.
  • state current size.

Operator throughput

Enabales the user to view the throughput of any operator in order to detect and view spikes during checkpoints


Checkpoints and Jobs performance Overtime

a correlated graph to view a jobs CPU and memory usage along side the checkpoints Every bar is a bucket of checkpoints where:

  • sucessful checkpoint is colored in green
  • fail checkpoint is colored in related

hovering over any bucket displayes additional information such as:

  • TIme frame
  • Checkpoints status
  • CPU and memory

Checkpoints Table

-A detailed list of all checkpoints, their duraion, size and status.

  • The user could sort and filter by any column in order to find the issue faster

-A list of all state related logs sorted by time and detialed. -The user could sort and filter by any column


Interaction with the Timeline

  • Users can drag and drop the timeline touchpoint using a brush to navigate to checkpoints
  • All graphs and logs will updates accordingly.