Skip to content

Metrics

Srujun Thanmay Gupta edited this page Jan 19, 2018 · 2 revisions

What metrics to collect?

Static metrics:

  • Cluster configuration:
    • Number of containers running
    • Per-container memory and CPU allocation
  • Number of partitions per stream and tasks per job

Dynamic metrics

  • Cluster configuration:
    • Failed containers metrics
    • container memory and CPU utilization (YARN metrics)
  • Per Job and/or per container:
    • process calls
    • time per process
    • message rate and throughput
    • input/output messages size
  • Dependent on job-type:
    • window calls
    • memory-related metrics for stateful Ops
    • memory store get/set calls, data access/storage rate
  • JVM metrics
    • JVM heap metrics
    • Thread metrics

Where are metrics collected and viewed?

Metrics are sent to a time-series database (Graphite or Prometheus) every 10 seconds, and can be optionally monitored live through either Graphite-Web or Grafana.

Stream-Bench reports can be generated by querying the database offline. Analysis will primarily focus on understanding bottlenecks in the cluster configuration.

Reference

LinkedIn Engineering Blog - Operating Samza at Scale Details about the metrics they collect on Samza are listed towards the end of the blog

Clone this wiki locally