Add a way to expose metrics from the Docker image (`SYNAPSE_ENABLE_METRICS`) #19324

MadLittleMods · 2025-12-26T18:13:25Z

Add a way to expose metrics from the Docker image (SYNAPSE_ENABLE_METRICS).

Spawning from wanting to run a load test against the Complement Docker image of Synapse and see metrics from the homeserver.

Why not just provide your own homeserver config?

Probably possible but it gets tricky when you try to use the workers variant of the Docker image (docker/Dockerfile-workers). The way to workaround it would probably be to yq edit everything in a script and change /data/homeserver.yaml and /conf/workers/*.yaml to add the metrics listener. And then modify /conf/workers/shared.yaml to add enable_metrics: true. Doesn't spark much joy.

Testing strategy

Make sure your firewall allows the Docker containers to communicate to the host (host.docker.internal) so they can access exposed ports of other Docker containers. We want to allow Synapse to access the Prometheus container and Grafana to access to the Prometheus container.
- sudo ufw allow in on docker0 comment "Allow traffic from the default Docker network to the host machine (host.docker.internal)"
- sudo ufw allow in on br-+ comment "(from Matrix Complement testing) Allow traffic from custom Docker networks to the host machine (host.docker.internal)"
- Complement firewall docs
Build the Docker image for Synapse: docker build -t matrixdotorg/synapse -f docker/Dockerfile . (docs)

Generate config for Synapse:

docker run -it --rm \
    --mount type=volume,src=synapse-data,dst=/data \
    -e SYNAPSE_SERVER_NAME=hs1 \
    -e SYNAPSE_REPORT_STATS=yes \
    -e SYNAPSE_ENABLE_METRICS=1 \
    matrixdotorg/synapse:latest generate

Start Synapse:

docker run -d --name synapse \
   --mount type=volume,src=synapse-data,dst=/data \
   -p 8008:8008 \
   -p 19090:19090 \
   matrixdotorg/synapse:latest

You should be able to see metrics from Synapse at http://localhost:19090/_synapse/metrics

Create a Prometheus config (prometheus.yml)

global:
  scrape_interval: 15s
  scrape_timeout: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: prometheus
    scrape_interval: 15s
    metrics_path: /_synapse/metrics
    scheme: http
    static_configs:
      - targets:
          # This should point to the Synapse metrics listener (we're using `host.docker.internal` because this is from within the Prometheus container)
          - host.docker.internal:19090

Start Prometheus (update the volume bind mount to the config you just saved somewhere):

docker run \
    --detach \
    --name=prometheus \
    -p 9090:9090 \
    -v ~/Documents/code/random/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml \
    prom/prometheus

Make sure you're seeing some data in Prometheus. On http://localhost:9090/query, search for synapse_build_info

Start Grafana

docker run -d --name=grafana --add-host host.docker.internal:host-gateway -p 3000:3000 grafana/grafana

Visit the Grafana dashboard, http://localhost:3000/ (Credentials: admin/admin)
Connections -> Data Sources -> Add data source -> Prometheus
- Prometheus server URL: http://host.docker.internal:9090
Import the Synapse dashboard: https://github.com/element-hq/synapse/blob/develop/contrib/grafana/synapse.json

Dev notes

instance vs job labels, https://prometheus.io/docs/concepts/jobs_instances/
Metrics on matrix.org : Refactor metrics to be scoped to the homeserver #18592 (comment)
Prometheus doesn't support Unix sockets yet, Support Unix socket for metrics address prometheus/prometheus#12024
- Separate project that has SYNAPSE_METRICS_UNIX_SOCKETS; mentioned in Prometheus Enhancements realtyem/synapse-workers#3 but also there is a comment that Prometheus doesn't support this yet
ACME support (port 8009)
- Added in Enable ACME support in the docker image matrix-org/synapse#4566
- Removed in Remove support for ACME v1 matrix-org/synapse#10194

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file. The entry should:
- Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
- Use markdown where necessary, mostly for code blocks.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
Code style is correct (run the linters)

…onf` Original context: matrix-org/synapse#14921 (comment)

MadLittleMods · 2025-12-26T18:17:11Z

docker/Dockerfile

 COPY ./docker/start.py /start.py
 COPY ./docker/conf /conf

-EXPOSE 8008/tcp 8009/tcp 8448/tcp


8009 was removed because it's ACME support was removed in matrix-org/synapse#10194

MadLittleMods · 2025-12-26T18:17:29Z

docker/configure_workers_and_start.py

+        # Keep the `shared_config` up to date with the `shared_extra_conf` from each
+        # worker.
+        shared_config = {
+            **worker_config["shared_extra_conf"],
+            # We combine `shared_config` second to avoid overwriting existing keys
+            # because TODO: why?
+            **shared_config,
+        }


Split out to #19323

docker/configure_workers_and_start.py

Change is now being done in #19325

…lication

MadLittleMods · 2025-12-29T22:51:17Z

docker/Dockerfile

+# SYNAPSE_ENABLE_METRICS=1). Metrics for workers are on ports starting from 19091 but
+# since these are dynamic we don't expose them by default.
+EXPOSE 19090/tcp


In a future PR, I think it would be useful to add Prometheus service discovery endpoint to make it easy to discover all of the workers and random ports here.

Exploring this in #19336

…rate-config` This is necessary as the Docker image actually uses `--generate-config` to generate the main homeserver config. It's only in worker mode that it uses the other route.

Let `ServerConfig.generate_config_section(...)` figure it out

MadLittleMods · 2025-12-30T22:37:29Z

docker/configure_workers_and_start.py

 #         regardless of the SYNAPSE_LOG_LEVEL setting.
 #   * SYNAPSE_LOG_TESTING: if set, Synapse will log additional information useful
 #     for testing.
+#   * SYNAPSE_USE_UNIX_SOCKET: TODO


Something for a future PR to address ⏩

MadLittleMods · 2025-12-30T22:40:24Z

docker/conf/homeserver.yaml

+{% if SYNAPSE_ENABLE_METRICS %}
+  - type: metrics
+    # The main process always uses the same port 19090
+    #
+    # Prometheus does not support Unix sockets so we don't bother with
+    # `SYNAPSE_USE_UNIX_SOCKET`, https://github.com/prometheus/prometheus/issues/12024
+    port: 19090
+{% endif %}


The whole config situation for our Docker image is pretty confusing. This docker/conf/homeserver.yaml is only used for the migrate_config mode,

synapse/docker/start.py

Lines 218 to 230 in 7a24faf

# In generate mode, generate a configuration and missing keys, then exit

if mode == "generate":

return run_generate_config(environ, ownership)

if mode == "migrate_config":

# generate a config based on environment vars.

config_dir = environ.get("SYNAPSE_CONFIG_DIR", "/data")

config_path = environ.get(

"SYNAPSE_CONFIG_PATH", config_dir + "/homeserver.yaml"

)

return generate_config_from_template(

config_dir, config_path, environ, ownership

)

For the generate mode, we also have a bunch of changes to support SYNAPSE_ENABLE_METRICS (see synapse/config)

MadLittleMods · 2025-12-30T22:41:57Z

docker/start.py

+def strtobool(val: str) -> bool:
+    """Convert a string representation of truth to True or False
+
+    True values are 'y', 'yes', 't', 'true', 'on', and '1'; false values
+    are 'n', 'no', 'f', 'false', 'off', and '0'.  Raises ValueError if
+    'val' is anything else.
+
+    This is lifted from distutils.util.strtobool, with the exception that it actually
+    returns a bool, rather than an int.
+    """
+    val = val.lower()
+    if val in ("y", "yes", "t", "true", "on", "1"):
+        return True
+    elif val in ("n", "no", "f", "false", "off", "0"):
+        return False
+    else:
+        raise ValueError("invalid truth value %r" % (val,))


This is copied from

synapse/synapse/util/stringutils.py

Lines 249 to 265 in 7a24faf

def strtobool(val: str) -> bool:

"""Convert a string representation of truth to True or False

True values are 'y', 'yes', 't', 'true', 'on', and '1'; false values

are 'n', 'no', 'f', 'false', 'off', and '0'. Raises ValueError if

'val' is anything else.

This is lifted from distutils.util.strtobool, with the exception that it actually

returns a bool, rather than an int.

"""

val = val.lower()

if val in ("y", "yes", "t", "true", "on", "1"):

return True

elif val in ("n", "no", "f", "false", "off", "0"):

return False

else:

raise ValueError("invalid truth value %r" % (val,))

In docker/start.py, it doesn't seem like we have any dependencies on Synapse code so I just lifted it over here.

docker/start.py

MadLittleMods · 2025-12-30T22:45:08Z

docs/sample_config.yaml

 listeners:
-  - port: 8008
+  - bind_addresses:
+    - ::1
+    - 127.0.0.1
+    port: 8008
+    resources:
+    - compress: false
+      names:
+      - client
+      - federation
    tls: false
    type: http
    x_forwarded: true


These changes are generated from poetry run scripts-dev/generate_sample_config.sh

This has changed because we're now passing in a default set of listeners instead of raw string manipulation. See synapse/config/_base.py and synapse/config/server.py

anoadragon453

I agree that the way we generate worker configs for testing is pretty convoluted. I'm also conscious that ESS is trying to do the same thing for production services, albeit with proprietary components.

Either way, thanks for this. Very useful option!

anoadragon453 · 2025-12-31T19:42:15Z

docker/configure_workers_and_start.py

 #         regardless of the SYNAPSE_LOG_LEVEL setting.
 #   * SYNAPSE_LOG_TESTING: if set, Synapse will log additional information useful
 #     for testing.
+#   * SYNAPSE_USE_UNIX_SOCKET: TODO


Suggested change

# * SYNAPSE_USE_UNIX_SOCKET: TODO

# * SYNAPSE_USE_UNIX_SOCKET: TODO (https://github.com/prometheus/prometheus/issues/12024)

Let's add the blocker here as well.

This is a general doc comment for SYNAPSE_USE_UNIX_SOCKET.

prometheus/prometheus#12024 isn't relevant to link there.

MadLittleMods · 2026-01-01T20:00:22Z

Thanks for the review @anoadragon453 🦋

…configuration scripts (#19323) For reference, this PR used to include this whole `shared_config` block in the diff. But #19324 was merged first which introduced parts of it already. Here is what this code used to look like: https://github.com/element-hq/synapse/blob/566670c363915691826b5b435c4aa7acde61b408/docker/configure_workers_and_start.py#L865-L868 --- Original context for why it was changed this way: matrix-org/synapse#14921 (comment) Previously, this code made me question two things: 1. Do we actually use `worker_config["shared_extra_conf"]` in the templates? - At first glance, I couldn't see why we're updating `shared_extra_conf` here. It's not used in the `worker.yaml.j2` template so all of this seemed a bit pointless. - Turns out, updating `shared_extra_conf` itself is pointless and it's being done as a convenient place to mix the objects to get things right in `shared_config` (confusing). 1. Does it actually do anything? - Because `shared_config` starts out as an empty object, my first glance made me think we we're just updating with an empty object and then just re-assigning. But because we're in a loop, we actually accumulate the `shared_extra_conf` from each worker. I'm not sure whether I'm capturing my confusion well enough here but basically, this made me spend time trying to figure out what/why we're doing things this way and we can use a more clear pattern to accomplish the same thing. --- This change is spawning from looking at the `docker/configure_workers_and_start.py` script in order to add a metrics listener ([upcoming PR](#19324)).

MadLittleMods added 4 commits December 24, 2025 14:35

Remove ununsed listeners

517f1bc

Add some space between different concerns

4c7c650

Use a more clear pattern on combining shared_config/`shared_extra_c…

fa37e54

…onf` Original context: matrix-org/synapse#14921 (comment)

SYNAPSE_ENABLE_METRICS

e04a600

MadLittleMods added A-Docker A-Metrics labels Dec 26, 2025

MadLittleMods commented Dec 26, 2025

View reviewed changes

MadLittleMods mentioned this pull request Dec 26, 2025

Make it more clear how shared_extra_conf is combined in our Docker configuration scripts #19323

Merged

3 tasks

MadLittleMods added 3 commits December 26, 2025 12:19

Better cross-reference SYNAPSE_ENABLE_METRICS

53ce6d7

Be more accurate

04caa64

Add changelog

4b0436d

MadLittleMods mentioned this pull request Dec 26, 2025

Remove ununsed listeners from shared_config in Docker configuration script #19325

Closed

3 tasks

MadLittleMods commented Dec 26, 2025

View reviewed changes

docker/configure_workers_and_start.py Show resolved Hide resolved

MadLittleMods added 6 commits December 26, 2025 16:04

Restore listeners

ffba085

Change is now being done in #19325

Fix dict

7cd9898

Use port 19090+ instead of 9090 as 9093 is already used for rep…

b962ce7

…lication

Expose correct port 19090

e2f19ad

Merge branch 'develop' into madlittlemods/docker-metrics

31ec6c6

Remove auto-formatting changes

685b4f0

MadLittleMods commented Dec 29, 2025

View reviewed changes

MadLittleMods added 9 commits December 30, 2025 11:12

Document missing SYNAPSE_USE_UNIX_SOCKET

eddcd17

Implement SYNAPSE_ENABLE_METRICS and --enable-metrics for `--gene…

6bfde49

…rate-config` This is necessary as the Docker image actually uses `--generate-config` to generate the main homeserver config. It's only in worker mode that it uses the other route.

Fix wrong bind_addresses

501f387

Let `ServerConfig.generate_config_section(...)` figure it out

No need to use enable_metrics arg

03e8b2c

Explain default set of listeners

5f78505

Better docstrings for enable_metrics

4261df3

poetry run scripts-dev/generate_sample_config.sh

5675019

Avoid making the config busy if already False

e9b0bd0

Add tests

ac4052b

Fix lints, update synapse/config/_base.pyi

4e2cdc4

MadLittleMods commented Dec 30, 2025

View reviewed changes

docker/start.py Show resolved Hide resolved

Better error

72b2d9e

MadLittleMods commented Dec 30, 2025

View reviewed changes

Explain why -> less noise

6541283

MadLittleMods marked this pull request as ready for review December 31, 2025 00:17

MadLittleMods requested a review from a team as a code owner December 31, 2025 00:17

anoadragon453 approved these changes Dec 31, 2025

View reviewed changes

MadLittleMods mentioned this pull request Dec 31, 2025

Add Prometheus HTTP service discovery endpoint for easy discovery of all workers in Docker image #19336

Open

3 tasks

MadLittleMods merged commit 9dae6cc into develop Jan 1, 2026
112 of 117 checks passed

MadLittleMods deleted the madlittlemods/docker-metrics branch January 1, 2026 20:00

MadLittleMods mentioned this pull request Jan 2, 2026

Refactor Grafana dashboard to use server_name label #19337

Open

6 tasks

	# In generate mode, generate a configuration and missing keys, then exit
	if mode == "generate":
	return run_generate_config(environ, ownership)

	if mode == "migrate_config":
	# generate a config based on environment vars.
	config_dir = environ.get("SYNAPSE_CONFIG_DIR", "/data")
	config_path = environ.get(
	"SYNAPSE_CONFIG_PATH", config_dir + "/homeserver.yaml"
	)
	return generate_config_from_template(
	config_dir, config_path, environ, ownership
	)

	def strtobool(val: str) -> bool:
	"""Convert a string representation of truth to True or False

	True values are 'y', 'yes', 't', 'true', 'on', and '1'; false values
	are 'n', 'no', 'f', 'false', 'off', and '0'. Raises ValueError if
	'val' is anything else.

	This is lifted from distutils.util.strtobool, with the exception that it actually
	returns a bool, rather than an int.
	"""
	val = val.lower()
	if val in ("y", "yes", "t", "true", "on", "1"):
	return True
	elif val in ("n", "no", "f", "false", "off", "0"):
	return False
	else:
	raise ValueError("invalid truth value %r" % (val,))

	# * SYNAPSE_USE_UNIX_SOCKET: TODO
	# * SYNAPSE_USE_UNIX_SOCKET: TODO (https://github.com/prometheus/prometheus/issues/12024)

Add a way to expose metrics from the Docker image (SYNAPSE_ENABLE_METRICS) #19324

Add a way to expose metrics from the Docker image (SYNAPSE_ENABLE_METRICS) #19324

Uh oh!

Conversation

MadLittleMods commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why not just provide your own homeserver config?

Testing strategy

Dev notes

Pull Request Checklist

Uh oh!

MadLittleMods Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MadLittleMods Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MadLittleMods Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anoadragon453 left a comment

Choose a reason for hiding this comment

Uh oh!

anoadragon453 Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MadLittleMods commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add a way to expose metrics from the Docker image (`SYNAPSE_ENABLE_METRICS`) #19324

Add a way to expose metrics from the Docker image (`SYNAPSE_ENABLE_METRICS`) #19324

MadLittleMods commented Dec 26, 2025 •

edited

Loading

MadLittleMods Dec 30, 2025 •

edited

Loading

MadLittleMods Dec 30, 2025 •

edited

Loading

MadLittleMods Dec 30, 2025 •

edited

Loading