diff --git a/CHANGELOG.md b/CHANGELOG.md index 960d7e56..cd1a9e86 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,7 +7,12 @@ - Support objectOverrides using `.spec.objectOverrides`. See [objectOverrides concepts page](https://docs.stackable.tech/home/nightly/concepts/overrides/#object-overrides) for details ([#726]). +### Fixed + +- Default `API_WORKERS` to 1 (instead of letting Airflow default to 4) to prevent crashloop and update/correct docs to reflect this ([#727]). + [#726]: https://github.com/stackabletech/airflow-operator/pull/726 +[#727]: https://github.com/stackabletech/airflow-operator/pull/727 ## [25.11.0] - 2025-11-07 diff --git a/docs/modules/airflow/pages/troubleshooting/index.adoc b/docs/modules/airflow/pages/troubleshooting/index.adoc index c796bb4c..f89d5438 100644 --- a/docs/modules/airflow/pages/troubleshooting/index.adoc +++ b/docs/modules/airflow/pages/troubleshooting/index.adoc @@ -29,8 +29,8 @@ See e.g. https://github.com/minio/minio/issues/20845[this MinIO issue] for detai == Setting API Workers In Airflow the webserver (called the API Server in Airflow 3.x+) can use multiple workers. -This is determined by the environment variable `+AIRFLOW__API__WORKERS+` and is set by default to `4`. -For most cases the default should work without problem, but if you run into performance issues and would like to add more workers, you can either modulate multiple worker processes at the level of webserver, keeping the default value for each one: +This is determined by the environment variable `+AIRFLOW__API__WORKERS+` and is set by default to `1`. +For most cases this should work without problem, but if you run into performance issues and would like to add more workers, you can either modulate multiple worker processes at the level of webserver (keeping the default value for each instance): [source,yaml] ---- @@ -46,7 +46,7 @@ or change the environment variable using `envOverrides`: ---- webservers: envOverrides: - AIRFLOW__API__WORKERS: "6" # something other than the default + AIRFLOW__API__WORKERS: "4" # something other than the default ---- TIP: Our strong recommendation is to increase the webserver replicas, with each webserver running a single worker, as this removes the risk of running into timeouts or memory issues. diff --git a/rust/operator-binary/src/env_vars.rs b/rust/operator-binary/src/env_vars.rs index b04144bc..92960842 100644 --- a/rust/operator-binary/src/env_vars.rs +++ b/rust/operator-binary/src/env_vars.rs @@ -505,6 +505,18 @@ fn add_version_specific_env_vars( JWT_SECRET_SECRET_KEY, ), ); + // The Airflow default for this is 4. + // However, with the default resources this could cause problems, + // as the Pod went to 100% CPU usage and didn't get healthy + // quick enough, resulting in a crashloop. + env.insert( + "AIRFLOW__API__WORKERS".into(), + EnvVar { + name: "AIRFLOW__API__WORKERS".into(), + value: Some("1".into()), + ..Default::default() + }, + ); if airflow_role == &AirflowRole::Webserver { // Sometimes a race condition can arise when both scheduler and // api-server are updating the DB, which adds overhead (conflicts diff --git a/tests/templates/kuttl/logging/41-install-airflow-cluster.yaml.j2 b/tests/templates/kuttl/logging/41-install-airflow-cluster.yaml.j2 index 40e027aa..498f6db0 100644 --- a/tests/templates/kuttl/logging/41-install-airflow-cluster.yaml.j2 +++ b/tests/templates/kuttl/logging/41-install-airflow-cluster.yaml.j2 @@ -84,11 +84,6 @@ spec: max: 2000m memory: limit: 3Gi - envOverrides: - # logging tests use two webservers and if two tests should run in - # parallel then the CPU usage can be high if the default number of - # workers (4) is used - AIRFLOW__API__WORKERS: "1" roleGroups: automatic-log-config: replicas: 1