-
Notifications
You must be signed in to change notification settings - Fork 436
Add a way to expose metrics from the Docker image (SYNAPSE_ENABLE_METRICS)
#19324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| COPY ./docker/start.py /start.py | ||
| COPY ./docker/conf /conf | ||
|
|
||
| EXPOSE 8008/tcp 8009/tcp 8448/tcp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8009 was removed because it's ACME support was removed in matrix-org/synapse#10194
| # Keep the `shared_config` up to date with the `shared_extra_conf` from each | ||
| # worker. | ||
| shared_config = { | ||
| **worker_config["shared_extra_conf"], | ||
| # We combine `shared_config` second to avoid overwriting existing keys | ||
| # because TODO: why? | ||
| **shared_config, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split out to #19323
| # SYNAPSE_ENABLE_METRICS=1). Metrics for workers are on ports starting from 19091 but | ||
| # since these are dynamic we don't expose them by default. | ||
| EXPOSE 19090/tcp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a future PR, I think it would be useful to add Prometheus service discovery endpoint to make it easy to discover all of the workers and random ports here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exploring this in #19336
…rate-config` This is necessary as the Docker image actually uses `--generate-config` to generate the main homeserver config. It's only in worker mode that it uses the other route.
Let `ServerConfig.generate_config_section(...)` figure it out
| # regardless of the SYNAPSE_LOG_LEVEL setting. | ||
| # * SYNAPSE_LOG_TESTING: if set, Synapse will log additional information useful | ||
| # for testing. | ||
| # * SYNAPSE_USE_UNIX_SOCKET: TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something for a future PR to address ⏩
| {% if SYNAPSE_ENABLE_METRICS %} | ||
| - type: metrics | ||
| # The main process always uses the same port 19090 | ||
| # | ||
| # Prometheus does not support Unix sockets so we don't bother with | ||
| # `SYNAPSE_USE_UNIX_SOCKET`, https://github.com/prometheus/prometheus/issues/12024 | ||
| port: 19090 | ||
| {% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole config situation for our Docker image is pretty confusing. This docker/conf/homeserver.yaml is only used for the migrate_config mode,
Lines 218 to 230 in 7a24faf
| # In generate mode, generate a configuration and missing keys, then exit | |
| if mode == "generate": | |
| return run_generate_config(environ, ownership) | |
| if mode == "migrate_config": | |
| # generate a config based on environment vars. | |
| config_dir = environ.get("SYNAPSE_CONFIG_DIR", "/data") | |
| config_path = environ.get( | |
| "SYNAPSE_CONFIG_PATH", config_dir + "/homeserver.yaml" | |
| ) | |
| return generate_config_from_template( | |
| config_dir, config_path, environ, ownership | |
| ) |
For the generate mode, we also have a bunch of changes to support SYNAPSE_ENABLE_METRICS (see synapse/config)
| def strtobool(val: str) -> bool: | ||
| """Convert a string representation of truth to True or False | ||
| True values are 'y', 'yes', 't', 'true', 'on', and '1'; false values | ||
| are 'n', 'no', 'f', 'false', 'off', and '0'. Raises ValueError if | ||
| 'val' is anything else. | ||
| This is lifted from distutils.util.strtobool, with the exception that it actually | ||
| returns a bool, rather than an int. | ||
| """ | ||
| val = val.lower() | ||
| if val in ("y", "yes", "t", "true", "on", "1"): | ||
| return True | ||
| elif val in ("n", "no", "f", "false", "off", "0"): | ||
| return False | ||
| else: | ||
| raise ValueError("invalid truth value %r" % (val,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from
synapse/synapse/util/stringutils.py
Lines 249 to 265 in 7a24faf
| def strtobool(val: str) -> bool: | |
| """Convert a string representation of truth to True or False | |
| True values are 'y', 'yes', 't', 'true', 'on', and '1'; false values | |
| are 'n', 'no', 'f', 'false', 'off', and '0'. Raises ValueError if | |
| 'val' is anything else. | |
| This is lifted from distutils.util.strtobool, with the exception that it actually | |
| returns a bool, rather than an int. | |
| """ | |
| val = val.lower() | |
| if val in ("y", "yes", "t", "true", "on", "1"): | |
| return True | |
| elif val in ("n", "no", "f", "false", "off", "0"): | |
| return False | |
| else: | |
| raise ValueError("invalid truth value %r" % (val,)) |
In docker/start.py, it doesn't seem like we have any dependencies on Synapse code so I just lifted it over here.
| listeners: | ||
| - port: 8008 | ||
| - bind_addresses: | ||
| - ::1 | ||
| - 127.0.0.1 | ||
| port: 8008 | ||
| resources: | ||
| - compress: false | ||
| names: | ||
| - client | ||
| - federation | ||
| tls: false | ||
| type: http | ||
| x_forwarded: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are generated from poetry run scripts-dev/generate_sample_config.sh
This has changed because we're now passing in a default set of listeners instead of raw string manipulation. See synapse/config/_base.py and synapse/config/server.py
anoadragon453
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the way we generate worker configs for testing is pretty convoluted. I'm also conscious that ESS is trying to do the same thing for production services, albeit with proprietary components.
Either way, thanks for this. Very useful option!
| # regardless of the SYNAPSE_LOG_LEVEL setting. | ||
| # * SYNAPSE_LOG_TESTING: if set, Synapse will log additional information useful | ||
| # for testing. | ||
| # * SYNAPSE_USE_UNIX_SOCKET: TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # * SYNAPSE_USE_UNIX_SOCKET: TODO | |
| # * SYNAPSE_USE_UNIX_SOCKET: TODO (https://github.com/prometheus/prometheus/issues/12024) |
Let's add the blocker here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general doc comment for SYNAPSE_USE_UNIX_SOCKET.
prometheus/prometheus#12024 isn't relevant to link there.
|
Thanks for the review @anoadragon453 🦋 |
…configuration scripts (#19323) For reference, this PR used to include this whole `shared_config` block in the diff. But #19324 was merged first which introduced parts of it already. Here is what this code used to look like: https://github.com/element-hq/synapse/blob/566670c363915691826b5b435c4aa7acde61b408/docker/configure_workers_and_start.py#L865-L868 --- Original context for why it was changed this way: matrix-org/synapse#14921 (comment) Previously, this code made me question two things: 1. Do we actually use `worker_config["shared_extra_conf"]` in the templates? - At first glance, I couldn't see why we're updating `shared_extra_conf` here. It's not used in the `worker.yaml.j2` template so all of this seemed a bit pointless. - Turns out, updating `shared_extra_conf` itself is pointless and it's being done as a convenient place to mix the objects to get things right in `shared_config` (confusing). 1. Does it actually do anything? - Because `shared_config` starts out as an empty object, my first glance made me think we we're just updating with an empty object and then just re-assigning. But because we're in a loop, we actually accumulate the `shared_extra_conf` from each worker. I'm not sure whether I'm capturing my confusion well enough here but basically, this made me spend time trying to figure out what/why we're doing things this way and we can use a more clear pattern to accomplish the same thing. --- This change is spawning from looking at the `docker/configure_workers_and_start.py` script in order to add a metrics listener ([upcoming PR](#19324)).
Add a way to expose metrics from the Docker image (
SYNAPSE_ENABLE_METRICS).Spawning from wanting to run a load test against the Complement Docker image of Synapse and see metrics from the homeserver.
Why not just provide your own homeserver config?
Probably possible but it gets tricky when you try to use the workers variant of the Docker image (
docker/Dockerfile-workers). The way to workaround it would probably be toyqedit everything in a script and change/data/homeserver.yamland/conf/workers/*.yamlto add themetricslistener. And then modify/conf/workers/shared.yamlto addenable_metrics: true. Doesn't spark much joy.Testing strategy
host.docker.internal) so they can access exposed ports of other Docker containers. We want to allow Synapse to access the Prometheus container and Grafana to access to the Prometheus container.sudo ufw allow in on docker0 comment "Allow traffic from the default Docker network to the host machine (host.docker.internal)"sudo ufw allow in on br-+ comment "(from Matrix Complement testing) Allow traffic from custom Docker networks to the host machine (host.docker.internal)"docker build -t matrixdotorg/synapse -f docker/Dockerfile .(docs)prometheus.yml)synapse_build_infoadmin/admin)http://host.docker.internal:9090Dev notes
instancevsjoblabels, https://prometheus.io/docs/concepts/jobs_instances/matrix.org: Refactor metrics to be scoped to the homeserver #18592 (comment)SYNAPSE_METRICS_UNIX_SOCKETS; mentioned in Prometheus Enhancements realtyem/synapse-workers#3 but also there is a comment that Prometheus doesn't support this yet8009)Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.