Skip to content

Conversation

@MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Jan 2, 2026

Refactor Grafana dashboard to use server_name label

Previously, the recommendation was to use the instance label to group everything under the same server:

## Monitoring workers
To monitor a Synapse installation using [workers](workers.md),
every worker needs to be monitored independently, in addition to
the main homeserver process. This is because workers don't send
their metrics to the main homeserver process, but expose them
directly (if they are configured to do so).
To allow collecting metrics from a worker, you need to add a
`metrics` listener to its configuration, by adding the following
under `worker_listeners`:
```yaml
- type: metrics
bind_address: ''
port: 9101
```
The `bind_address` and `port` parameters should be set so that
the resulting listener can be reached by prometheus, and they
don't clash with an existing worker.
With this example, the worker's metrics would then be available
on `http://127.0.0.1:9101`.
Example Prometheus target for Synapse with workers:
```yaml
- job_name: "synapse"
scrape_interval: 15s
metrics_path: "/_synapse/metrics"
static_configs:
- targets: ["my.server.here:port"]
labels:
instance: "my.server"
job: "master"
index: 1
- targets: ["my.workerserver.here:port"]
labels:
instance: "my.server"
job: "generic_worker"
index: 1
- targets: ["my.workerserver.here:port"]
labels:
instance: "my.server"
job: "generic_worker"
index: 2
- targets: ["my.workerserver.here:port"]
labels:
instance: "my.server"
job: "media_repository"
index: 1
```
Labels (`instance`, `job`, `index`) can be defined as anything.
The labels are used to group graphs in grafana.

But the instance label actually has a special meaning and we're actually abusing it by using it that way:

instance: The <host>:<port> part of the target's URL that was scraped.

-- https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series

Since #18592 (Synapse v1.139.0), we now have the server_name label.


Additionally, the assumption that a single process is serving a single server is no longer true with Synapse Pro for small hosts.

Part of https://github.com/element-hq/synapse-small-hosts/issues/106

Motivating use case

This change is spawning from adding Prometheus metrics to our workerized Docker image (#19324, #19336) with a more correct label setup (without instance) and wanting the dashboard to be better.

Todo

  • Add server_name variable to dashboard
  • Figure out how to better display the process_ metrics
  • Update deploys annotation

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

@MadLittleMods MadLittleMods force-pushed the madlittlemods/grafana-instance-to-server-name-refactor branch from 1093fa0 to 2157fa1 Compare January 3, 2026 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants