Skip to content

revisit worker idle time shutdown #170

@escapewindow

Description

@escapewindow

Tl;dr: let's modify the workers' idle time config+logic to be able to ping a service to determine whether they should shut down.

We're currently seeing slowness spinning worker pools up and down. Whether we have solved the worker-manager slowness or not, we can still make the behavior of idle workers more efficient. If we solve this before we rewrite worker-manager, we'll spin workers up and down less often, mitigating the current slowness.

It appears that workers shut themselves down after n seconds of idleness, even if that reduces us below the minimum: docker-worker, generic-worker.

This means that if we have a minimum of 10 workers, and we have 10 workers running but idle, they'll all shut down and worker-manager will spin them back up every n seconds.

If spinning up workers were instantaneous, it would be best to shut them all down and only spin them up on demand. Since it can take minutes or sometimes even hours to spin up new workers, and since human time is more valuable than machine time, we should keep a minimum number of critical worker pools running.

We've had people try to work around the worker-manager slowness issue by increasing the idle time, which works, but if we spin up 100 extra workers, all 100 extra workers will stick around for that extra idle time. Increasing the minimum worker capacity is even worse: we'll spin even more workers down and up every n seconds of idleness.

Proposal: If the workers query a central service: "Should I shut myself down on idle?", that service can answer "no" to the minimum number, and respond "yes" to the rest. With some additional metadata, we can tell the oldest workers "yes" and keep the newest workers running. We may want the workers to ping the service every ~15 minutes so the service has a better idea of which workers are still running.

We'll need to both write the worker configs+logic, as well as the service. Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions