Backend Analysis of ML in SIEM Tools #5
Replies: 2 comments
-
|
Sounds good. I’ll dig into how different SIEMs are actually applying ML — what types of models they use (like anomaly detection, classification, or time-series analysis) and the algorithms behind them. I’ll also look at how they handle large volumes of log data, whether through streaming, batch jobs, or distributed setups. Once I’ve gone through this, I’ll put together a short report that covers the workflows, the strengths/weaknesses I notice, and what we can borrow for our own pipeline. This should give us a clearer picture before we start shaping our hybrid ML design. |
Beta Was this translation helpful? Give feedback.
-
|
Insights From the Prototype1: 1)The model found 4 main patterns (topics) in the data: Firewall traffic (allow/deny and port scans) File access on endpoint systems Web requests on the server 2)Some log entries looked suspicious, like: Many failed logins before success → possible brute-force attempt Privilege escalation by admin → unusual system behavior File delete/write actions on confidential files → insider risk Access to /../etc/passwd → web attack attempt 3)Perplexity score: 22.13 (means the model fits the data fairly well). 4)Most logs were normal, but a few had high anomaly scores (>0.8), showing possible threats. Visualizations: How logs are grouped into different topics Which entries have higher anomaly scores This project helped me understand how machine learning can detect security threats automatically. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Team, as part of our project, we need a clear understanding of how existing SIEM platforms implement their ML pipelines. Please focus on:
1.Analyzing ML Models in Other SIEMs
Identify the types of ML models used (anomaly detection, classification, time-series/sequence analysis).
Understand the algorithms behind these models (e.g., statistical methods, clustering, regression, neural networks).
2.Understanding Data Handling and Processing
Examine how these SIEM tools handle large-scale log and metric data.
Investigate techniques like batch processing, streaming pipelines, sliding windows, and distributed computation.
3.Documenting Insights
Prepare a concise report describing each ML model’s backend workflow.
Highlight strengths, weaknesses, and scalability approaches.
Objective: This analysis will inform the design of our own from-scratch hybrid ML pipeline, ensuring we adopt best practices and optimize for performance.
GOOD LUCK
Beta Was this translation helpful? Give feedback.
All reactions