-
Notifications
You must be signed in to change notification settings - Fork 483
Description
Background
Pixie provides deep, eBPF-based visibility into Kubernetes clusters, automatically capturing
network and application telemetry without manual instrumentation. However, while Pixie
offers powerful query and visualization capabilities (via PxL and Vizier), it currently lacks
a built-in mechanism for automated anomaly detection or OpenTelemetry-native export
of detected network irregularities.
This limits the ability of operators to detect and correlate real-time operational anomalies
(such as unexpected service-to-service communication, latency spikes, or throughput drops)
directly within Pixie’s observability workflow or external telemetry pipelines.
Problem Statement
Existing open-source tools like Zeek or Suricata perform deep packet inspection but are not
optimized for the dynamic, container-based nature of cloud-native microservices. Pixie already
solves visibility at scale but does not yet provide AI-assisted detection or direct
integration with the OpenTelemetry ecosystem.
Proposed Solution
Introduce a lightweight, optional plugin for Pixie that performs operational anomaly detection
on network traffic metrics and exports the results through OpenTelemetry.
-
AI-driven Anomaly Detection Layer
- Implement a Pixie plugin or PxL script extension that computes simple
streaming anomaly scores on traffic metrics (latency, request rate, error rate, byte count). - Techniques: EWMA, robust z-scores, Isolation Forest, or simple autoencoders
(depending on available library support and compute limits). - Tag anomalies with metadata such as
service_a,service_b,namespace, andanomaly.score.
- Implement a Pixie plugin or PxL script extension that computes simple
-
OpenTelemetry Export Integration
- Extend Pixie’s existing OpenTelemetry export capabilities to include these anomaly events
asmetricsorlogs. - Allow configuration of anomaly thresholds and export frequency via Pixie’s plugin interface.
- Extend Pixie’s existing OpenTelemetry export capabilities to include these anomaly events
-
Example Output
- name: px.anomaly.network.latency_spike attributes: src_service: checkout dst_service: payment namespace: production anomaly.score: 0.94 timestamp: 2025-11-11T12:00:00
Benefits
-
Enables real-time operational anomaly detection without additional instrumentation.
-
Bridges Pixie’s in-cluster visibility with the broader OpenTelemetry and AIOps ecosystem.
-
Provides actionable alerts and insights directly in the Pixie UI and external dashboards (Grafana, Datadog, etc.).
Scope & Alignment
-
Keeps focus on observability and performance analysis, not security or intrusion detection.
-
Aligns with the goal of improving AI-driven insights in Pixie’s roadmap.
-
Can be developed as an independent plugin, avoiding changes to Pixie’s core.