Implement warm pool base struct #1636

djeebus · 2025-12-20T01:59:23Z

Once this looks good, we should use it for the nbd device pool as well.

Note

Introduces a reusable, telemetry‑instrumented pooling primitive and adopts it for network slots in the orchestrator.

Add utils.WarmPool with OTEL metrics, lifecycle handling, and tests; provide Item/ItemFactory interfaces and generated mocks
Refactor orchestrator network pool to utils.WarmPool (network.Pool now wraps a warm pool); add network.Config and Slot.String()
Config/infra: expose pool sizing via NETWORK_SLOTS_REUSE_POOL_SIZE and NETWORK_SLOTS_FRESH_POOL_SIZE; set these in Nomad jobs (orchestrator.hcl, template-manager.hcl)
Call site updates: switch constructors in benchmark and build-template; main orchestrator uses new pool
Tooling: add scripts/fix-meters.sh and improve fix-tracers.sh; run both in make fmt; CI workflow now relies on make fmt (drops explicit tracer fix step)
Minor go.mod/sum updates across packages to support new telemetry/logging deps

^{Written by Cursor Bugbot for commit 9d960af. This will update automatically on new commits. Configure here.}

packages/shared/pkg/utils/warmpool.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

packages/shared/pkg/utils/warmpool.go

packages/orchestrator/internal/sandbox/network/pool.go

packages/shared/pkg/utils/warmpool.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

packages/shared/pkg/utils/warmpool.go

packages/orchestrator/internal/sandbox/nbd/pool.go

scripts/fix-meters.sh

packages/shared/pkg/utils/warmpool.go

jakubno · 2025-12-29T19:34:46Z

packages/shared/pkg/utils/warmpool.go

+		return err
+	}
+
+	wp.doneOnce.Do(func() {


nit: why do you close freshItems in different function? Why not to keep it consistent?

It's safe to read from a closed channel, but you'll panic if you write to a closed channel or close a channel twice. As such, golang encourages you to close channels near the writer, not the reader, and only when you're done writing to it. The freshItems channel gives us an easy place to do that - when the loop has exited, nothing will ever write to the channel again. reusableItems is odd, as its writer doesn't write in a loop, so it's awkward to close it within that channel; we have to resort to closing it in another function, then protecting it by selecting from both the write and a read to done.

You should check the wg.done before running wg.Go; otherwise, you have no guarantee there won't be a new Return called, please add test for that (long running Return) + Close and these Returns concurrently

Good point, the mechanism in Populate is much cleaner and easier to get right

Ugh. Okay, nope, this wasn't an option here. I moved it into the Close.

It works perfectly fine if you call Populate before calling Close. If you don't, you end up with a deadlock while trying to empty the channel, as the channel never got closed. Safer to close everything in Close.

packages/shared/pkg/utils/warmpool.go

packages/orchestrator/internal/sandbox/network/pool.go

packages/shared/pkg/utils/warmpool_test.go

packages/shared/pkg/utils/warmpool.go

jakubno · 2025-12-30T16:36:16Z

packages/shared/pkg/utils/warmpool.go

+func (wp *WarmPool[T]) Get(ctx context.Context) (T, error) {
+	var item T
+
+	recordSuccess := func(source string, attrs ...attribute.KeyValue) {


We have 3 different failures for closed

jakubno · 2025-12-30T16:45:16Z

packages/shared/pkg/utils/warmpool.go

+		return err
+	}
+
+	wp.doneOnce.Do(func() {


You should check the wg.done before running wg.Go; otherwise, you have no guarantee there won't be a new Return called, please add test for that (long running Return) + Close and these Returns concurrently

jakubno · 2025-12-30T16:46:11Z

packages/shared/pkg/utils/warmpool.go

+		return err
+	}
+
+	wp.doneOnce.Do(func() {


Good point, the mechanism in Populate is much cleaner and easier to get right

Co-authored-by: Jakub Novák <jakub@e2b.dev>

packages/shared/pkg/utils/warmpool.go

cursor · 2025-12-30T21:42:38Z

packages/shared/pkg/utils/warmpool.go

+	})
+
+	return err
+}


Race condition between Close and Populate can cause panic

Close doesn't wait for Populate to exit before closing freshItems. The Populate goroutine runs independently and is never added to wp.wg. When Close is called, it closes wp.done, then immediately calls wp.wg.Wait() (which returns since Populate isn't tracked), then closes wp.freshItems. If Populate is in create() during this sequence, it may try to send to the closed freshItems channel in its select statement. Since both <-wp.done and the send case would be "ready", Go may randomly pick the send, causing a panic.

Additional Locations (1)

packages/shared/pkg/utils/warmpool.go#L79-L106

cursor · 2025-12-30T21:42:38Z

packages/shared/pkg/utils/warmpool.go

+	timer.success()
+
+	return item, err
+}


Timer records both success and failure on create error

In the create function, timer.success() is called unconditionally after timer.failure() when there's an error. When acquisition.Create fails, both timer.failure() and timer.success() are executed, adding both result=failure and result=success attributes to the telemetry metric. The timer.success() call needs to be inside an else branch or the function needs to return early after timer.failure().

cursor · 2025-12-30T23:51:29Z

packages/shared/pkg/utils/warmpool.go

+
+			return ctx.Err()
+		case wp.freshItems <- item:
+		}


Race condition can panic when sending to closed channel

There's a race condition between Populate() and Close() that can cause a panic. The cleanup goroutine at line 86 closes freshItems after done is closed. However, Populate's main loop at line 117 may try to send to freshItems in the select statement. If Populate passes the isClosed check at line 96, then Close() closes done, and the cleanup goroutine closes freshItems before Populate enters the select, Go may randomly select the send case. Sending to a closed channel always panics in Go, and the select doesn't prevent this. The race window exists between done being closed and freshItems being closed by the cleanup goroutine.

Additional Locations (1)

packages/shared/pkg/utils/warmpool.go#L84-L86

cursor · 2025-12-31T00:02:54Z

packages/shared/pkg/utils/warmpool.go

+		for item := range wp.freshItems {
+			wp.destroy(ctx, item, operationClose, reasonCleanupFresh)
+		}
+	})


Race condition may panic when closing freshItems channel

When Close() is called, it closes wp.done which unblocks both the cleanup goroutine (line 83) and makes the main loop's select <-wp.done case ready. The cleanup goroutine then closes wp.freshItems at line 86. However, the main loop's select at lines 108-118 has both <-wp.done and wp.freshItems <- item as potentially ready cases, and Go picks randomly among ready cases. If the cleanup goroutine closes wp.freshItems after the select determines the send case is ready but before it executes, sending to the closed channel will cause a panic. This race window exists because both goroutines are triggered by the same close(wp.done) event.

Additional Locations (1)

packages/shared/pkg/utils/warmpool.go#L107-L118

djeebus · 2025-12-31T01:02:01Z

Somehow, avoiding race conditions got overly complicated. Going to put this into draft until I can figure out why.

cursor · 2025-12-31T01:03:38Z

packages/shared/pkg/utils/warmpool.go

+		<-cleanup
+
+		// close freshItems
+		println("closing freshItems")


Debug println statements left in production code

Several println() debug statements were left in the production code. These appear at lines 81, 84, 90, 94, 267, and 271 with messages like "creating clean up", "closing clean up", "waiting for clean up", "closing freshItems", "closing done", and "closing reusableItems". These should be removed before merging.

Additional Locations (1)

packages/shared/pkg/utils/warmpool.go#L266-L271

cursor · 2025-12-31T01:03:38Z

packages/shared/pkg/utils/warmpool.go

+		for item := range wp.freshItems {
+			wp.destroy(ctx, item, operationClose, reasonCleanupFresh)
+		}
+	})


Calling Populate twice causes panic on closed channel

When Populate() returns (due to context cancellation), the cleanup goroutine at line 88-102 closes freshItems. If Populate() is called again with a new context, the isClosed() check passes (since only wp.done and context are checked), and the code proceeds to write to freshItems at line 126, causing a panic on write to closed channel. The pool has no protection (like sync.Once) to prevent calling Populate() multiple times. While not typical usage, this could cause crashes if someone attempts to restart population after a transient failure.

Additional Locations (1)

packages/shared/pkg/utils/warmpool.go#L125-L126

cursor · 2025-12-31T01:08:02Z

packages/shared/pkg/utils/warmpool.go

+// - When done, destroys all items in the `freshItems` channel before returning
+// - When an error is encountered, continue trying to create more entries
+func (wp *WarmPool[T]) Populate(ctx context.Context) error {
+	println("creating clean up")


Debug println statements left in production code

Multiple println debug statements were left in the production code. These appear at lines 81, 84, 90, 94, 267, and 271 with messages like "creating clean up", "closing clean up", "waiting for clean up", "closing freshItems", "closing done", and "closing reusableItems". These will print to stdout and clutter logs in production.

Additional Locations (1)

packages/shared/pkg/utils/warmpool.go#L266-L271

djeebus added 2 commits December 19, 2025 17:40

implement warm pool base struct

3fb95e6

linting, config, env vars

0f3f2fb

djeebus requested review from ValentaTomas, dobrac and jakubno as code owners December 20, 2025 01:59

djeebus marked this pull request as draft December 20, 2025 01:59

e2b-request-same-site-reviewers bot assigned levb Dec 20, 2025

djeebus mentioned this pull request Dec 20, 2025

Clean up current implementation of network pool #1634

Closed

claude bot reviewed Dec 20, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

claude bot reviewed Dec 20, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Dec 20, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

claude bot reviewed Dec 20, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Show resolved Hide resolved

claude bot reviewed Dec 20, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

claude bot reviewed Dec 20, 2025

View reviewed changes

packages/orchestrator/internal/sandbox/network/pool.go Outdated Show resolved Hide resolved

claude bot reviewed Dec 20, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

cursor bot reviewed Dec 20, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Show resolved Hide resolved

packages/shared/pkg/utils/warmpool.go Show resolved Hide resolved

packages/shared/pkg/utils/warmpool.go Show resolved Hide resolved

clean up metrics

de1538f

djeebus marked this pull request as ready for review December 20, 2025 03:22

chatgpt-codex-connector bot reviewed Dec 20, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

cursor bot reviewed Dec 20, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

packages/shared/pkg/utils/warmpool.go Show resolved Hide resolved

jakubno self-assigned this Dec 29, 2025

jakubno requested changes Dec 29, 2025

View reviewed changes

djeebus added 2 commits December 29, 2025 17:15

code review comments

fa6addf

linting

042d245

cursor bot reviewed Dec 30, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Show resolved Hide resolved

packages/orchestrator/internal/sandbox/network/pool.go Outdated Show resolved Hide resolved

djeebus added 4 commits December 29, 2025 17:33

fix some more channel issues, clean up pool error handling

cf036b1

Merge branch 'main' into create-warmpool

b5a6441

destroy slot if it can't be returned in 15 minutes

6c48ebb

fix a flaky test

523f9b8

jakubno requested changes Dec 30, 2025

View reviewed changes

djeebus and others added 4 commits December 30, 2025 10:18

Update packages/shared/pkg/utils/warmpool_test.go

64e1ca5

Co-authored-by: Jakub Novák <jakub@e2b.dev>

don't start a goroutine if the pool is already closed

d9740f3

only mock destroy on clean up

b296ae8

add a test to verify reuse vs fresh priorities

364074f

cursor bot reviewed Dec 30, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

add a return timeout, clean up some tests

6fd59b3

levb reviewed Dec 30, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

packages/shared/pkg/utils/warmpool.go Show resolved Hide resolved

clean up telemetry

d16629b

cursor bot reviewed Dec 30, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

packages/shared/pkg/utils/warmpool.go Show resolved Hide resolved

packages/shared/pkg/utils/warmpool.go Outdated Show resolved Hide resolved

fix some tests

91680b5

cursor bot reviewed Dec 30, 2025

View reviewed changes

packages/shared/pkg/utils/warmpool.go Show resolved Hide resolved

remove unused code

d4fbeb2

cursor bot reviewed Dec 30, 2025

View reviewed changes

djeebus added 5 commits December 30, 2025 13:47

code review, linting, clean up

046ae00

Merge branch 'main' into create-warmpool

602afa0

remove some extra code

5fda3f5

fix race conditions?

70118ae

a little more clean up

da46f01

cursor bot reviewed Dec 30, 2025

View reviewed changes

fix a test?

700e8b5

cursor bot reviewed Dec 31, 2025

View reviewed changes

improvements

9d960af

djeebus closed this Dec 31, 2025

djeebus reopened this Dec 31, 2025

djeebus marked this pull request as draft December 31, 2025 01:02

cursor bot reviewed Dec 31, 2025

View reviewed changes

Implement warm pool base struct #1636

Are you sure you want to change the base?

Implement warm pool base struct #1636

Uh oh!

Conversation

djeebus commented Dec 20, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Dec 30, 2025

Choose a reason for hiding this comment

Race condition between Close and Populate can cause panic

Uh oh!

cursor bot Dec 30, 2025

Choose a reason for hiding this comment

Timer records both success and failure on create error

Uh oh!

djeebus commented Dec 20, 2025 •

edited by cursor bot

Loading