fix: set restart limits to 0 to prevent being marked as failed #1952

cstockton · 2025-11-28T18:14:47Z

The systemd default is 10s / 5 for these values with a DefaultRestartUSec of 100ms. Most services set a RestartSec limit of 3, under most circumstances it takes 15s to restart 5 times so the limit of 10s is not exceeded. However if other system processes (salt, cloud init) restart it explicitly, or recovering system services within the --before chain trigger a restart the limit can be exceeded causing it to be marked as failed. Since no services mark gotrue.service as required it will remain offline until the next explicit restart is issued.

Setting these values to 0 with Restart=always and RestartSec=3 will prevent gotrue from being marked as failed.

samrose

We'll need to create a testing AMI to thoroughly test these changes out. Will request @LGUG2Z to perform these tests as he's also going to be helping us find ways to automate these testing approaches.

samrose

When we ultimately merge this, we should bump the versions in ansible/vars.yml to create a release for these changes. This way, it will be a distinct change instead of bundled with other changes.

cstockton · 2025-12-08T15:53:05Z

Hi @samrose - I've just updated the branch. Any updates on this?

The systemd default is 10s / 5 for these values with a DefaultRestartUSec of 100ms. Most services set a RestartSec limit of 3, under most circumstances it takes 15s to restart 5 times so the limit of 10s is not exceeded. However if other system processes (salt, cloud init) restart it explicitly, or recovering system services within the --before chain trigger a restart the limit can be exceeded causing it to be marked as failed. Since no services mark gotrue.service as required it will remain offline until the next explicit restart is issued. Setting these values to 0 with Restart=always and RestartSec=3 will prevent gotrue from being marked as failed.

I've noticed all !oneshot services set a `RestartSec` of `3s` and we use the systemd defaults of `StartLimitBurst=5` and `StartLimitInterval=10s`. Together this forms a property that under typical conditions a service will be restarted indefinitely until it comes back up due to `(3s * 5) > 10s`, but it is still possible for a service to enter a failed state under some scenarios. This change defensively sets them to 0/0 to keep them in restart loops.

samrose

Just needs a rebase

cstockton requested review from a team as code owners November 28, 2025 18:14

cstockton enabled auto-merge December 2, 2025 13:18

samrose requested review from darora and pcnc December 2, 2025 13:20

samrose requested changes Dec 2, 2025

View reviewed changes

samrose requested a review from LGUG2Z December 2, 2025 13:52

samrose requested changes Dec 2, 2025

View reviewed changes

cstockton force-pushed the cs/gotrue-start-limit-fix branch 2 times, most recently from e097bf1 to 3ef31ba Compare December 8, 2025 17:25

Chris Stockton added 2 commits December 8, 2025 10:28

cstockton force-pushed the cs/gotrue-start-limit-fix branch from 3ef31ba to c89c805 Compare December 8, 2025 17:28

samrose self-requested a review December 11, 2025 05:15

samrose approved these changes Dec 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: set restart limits to 0 to prevent being marked as failed #1952

fix: set restart limits to 0 to prevent being marked as failed #1952

cstockton commented Nov 28, 2025

Uh oh!

samrose left a comment

Uh oh!

samrose left a comment

Uh oh!

cstockton commented Dec 8, 2025

Uh oh!

samrose left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

fix: set restart limits to 0 to prevent being marked as failed #1952

Are you sure you want to change the base?

fix: set restart limits to 0 to prevent being marked as failed #1952

Conversation

cstockton commented Nov 28, 2025

Uh oh!

samrose left a comment

Choose a reason for hiding this comment

Uh oh!

samrose left a comment

Choose a reason for hiding this comment

Uh oh!

cstockton commented Dec 8, 2025

Uh oh!

samrose left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants