Add e2e test for multiport InferencePool enhancement #1885

RyanRosario · 2025-11-20T08:02:49Z

What type of PR is this?

kind/cleanup

What this PR does / why we need it:

Adds an E2E test for multi-port enhancement. Currently verifyTrafficRouting is implemented, verifyMetrics to follow.

Which issue(s) this PR fixes:

Fixes #1768

Does this PR introduce a user-facing change?:

NONE

linux-foundation-easycla · 2025-11-20T08:02:55Z

The committers listed above are authorized under a signed CLA.

✅ login: RyanRosario / name: Ryan R. Rosario (bc9f24d)

netlify · 2025-11-20T08:02:56Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`bc9f24d`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69379106ea78a70008e85877
😎 Deploy Preview	https://deploy-preview-1885--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-11-20T08:02:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: RyanRosario
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-11-20T08:02:59Z

Hi @RyanRosario. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

RyanRosario · 2025-11-25T03:08:52Z

Hey @danehans and @nirrozenbaum , my first PR is ready for review.

nirrozenbaum · 2025-11-25T13:20:25Z

/ok-to-test

Thanks @RyanRosario. seems like your PR needs a rebase.
it would be good to solve conflicts in order to see if the tests are passing.

additionally - please pay attention that your commits are not verified and if the PR is ready for review it would be good to remove the /hold to let others know this is ready.

RyanRosario · 2025-11-26T12:45:05Z

/retest

RyanRosario · 2025-11-26T21:43:13Z

Thank you for your patience!

The failing test seems to be related to issue 1872. Can we continue with review or should 1872 be resolved first?

test/e2e/epp/e2e_suite_test.go

hack/test-e2e.sh

nirrozenbaum · 2025-11-27T09:47:17Z

Thank you for your patience!

The failing test seems to be related to issue 1872. Can we continue with review or should 1872 be resolved first?

failing test isn't blocking the review but it is blocking the merge.
if this is failing due to a flake, triggering a /retest should solve it (eventually).
if it's failing consistently, we might have a hidden issue here.

test/e2e/epp/e2e_test.go

test/utils/utils.go

test/e2e/epp/e2e_test.go

RyanRosario · 2025-12-02T03:18:12Z

/hold cancel

All initial feedback regarding rebase, tests, and global configuration changes have been compleed.

shmuelk · 2025-12-03T17:51:44Z

test/e2e/epp/e2e_suite_test.go

 func createInferExt(testConfig *testutils.TestConfig, filePath string) {
-	inManifests := testutils.ReadYaml(filePath)
+
+	// This image needs to be updated to open multiple ports and respond.


Is this comment still valid?

Good catch. The code comment is stale. I have the fix locally and will push it in the next batched commit to avoid triggering a full CI run just for this doc update.

RyanRosario · 2025-12-04T19:45:31Z

go.sum

 sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.31.2 h1:jpcvIRr3GLoUoEKRkHKSmGjxb6lWwrBlJsXc+eUYQHM=
 sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.31.2/go.mod h1:Ve9uj1L+deCXFrPOk1LpFXqTg7LCFzFso6PA48q/XZw=
+sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.34.0 h1:hSfpvjjTQXQY2Fol2CS0QHMNs/WI1MOSGzCm1KhM5ec=
+sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.34.0/go.mod h1:Ve9uj1L+deCXFrPOk1LpFXqTg7LCFzFso6PA48q/XZw=


This file will be removed before merge. It seemed to help me pass the CI test (which was passing locally).

RyanRosario · 2025-12-04T19:48:51Z

go.mod

+	sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.34.0 // indirect
 	sigs.k8s.io/json v0.0.0-20250730193827-2d320260d730 // indirect
 	sigs.k8s.io/randfill v1.0.0 // indirect
 )


This file will be removed before merge.

RyanRosario · 2025-12-06T23:36:45Z

Adding @LukeAVanDrie to help review to reduce some load.

LukeAVanDrie · 2025-12-09T02:10:55Z

Great work on the verification logic! This test looks really good from two standpoints:

Traffic Routing: Proving traffic actually hits different ports (via the x-inference-port header).
Virtual Pod Abstraction: Proving the EPP sees "virtual" pods (via the ...-rank-N metric label).

I have a few suggestions to make the test suite more robust and easier to debug. We want to avoid flakiness where possibly and improve maintainability.

LukeAVanDrie

Great job on this test! The only major change I'm asking for is to simplify the test setup a bit where possible.

LukeAVanDrie · 2025-12-09T01:43:13Z

test/e2e/epp/e2e_test.go

+
 var _ = ginkgo.Describe("InferencePool", func() {
 	var infObjective *v1alpha2.InferenceObjective
 	ginkgo.BeforeEach(func() {


You are dynamically modifying the existing vllm-llama3-8b-instruct Deployment in BeforeEach and trying to revert it in AfterEach. If the test crashes or the runner is killed halfway through, AfterEach might not fully restore the state. This leaves the cluster "dirty" (configured for multi-port) which will cause subsequent single-port tests to fail.

I would encourage creating separate test resources that already have the ports and args configured correctly (e.g., testdata/inferencepool-multiport.yaml) with a corresponding Deployment manifest. This way if the test fails, we just delete the new resources, and the original single-port Deployment remains untouched. It also makes the code a bit easier to understand and maintain.

In the test, apply this specific manifest.

In AfterEach, just delete these resources.

This ensures that even if the test fails cataclysmically, the original environment is untouched. It also removes the need for the complex argument-parsing code in BeforeEach.

LukeAVanDrie · 2025-12-09T01:49:44Z

test/e2e/epp/e2e_test.go

+					for idx, msg := range originalMessages {
+						msgCopy := make(map[string]any, len(msg))
+						maps.Copy(msgCopy, msg)
+						// Inject a unique nonce into the content of *EACH* message


Good catch on adding the 'Nonce' to the prompt. Since our scheduling layer prioritizes prefix caching (affinity), sending identical requests would likely result in them all going to the same pod, which defeats the purpose of this test. Varying the prompt body seems like the best approach here.

I think we can simplify the implementation. Instead of the complex struct reflection logic, consider just prepending a simple string prefix.

LukeAVanDrie · 2025-12-09T01:53:24Z

test/e2e/epp/e2e_test.go

+		// Probability: need to compute estimate of number of batches to send to have high confidence of hitting all ports.
+		// Using the Coupon Collector's Problem formula: n * H_n, where H_n is the nth harmonic number.
+		// This gives us an expected number of trials to collect all coupons (ports).
+		batches := int(math.Ceil(numPorts * harmonicNumber(numPorts)))


I see you used the "Coupon Collector's Problem" to calculate the necessary requests. This is very cool; however, for E2E tests, we should prioritize determinism and simplicity over efficiency.

For numPorts = 2 this is probably overkill. Instead of calculating the perfect number of requests, let's just brute-force it. Sending 20 requests sequentially is statistically guaranteed to hit both ports if the system is working.

LukeAVanDrie · 2025-12-09T01:55:23Z

test/e2e/epp/e2e_test.go

+
+				curlCmd := getCurlCommand(envoyName, testConfig.NsName, envoyPort, modelName, curlTimeout, t.api, currentPromptOrMessages, false)
+
+				resp, err := testutils.ExecCommandInPod(testConfig, "curl", "curl", curlCmd)


Wrapping kubectl exec (which is what ExecCommandInPod does) in Go routines adds a lot of complexity (WaitGroups, channels) for a small gain. Since we are only targeting 2 ports, a simple sequential loop is likely enough and much easier to debug.

LukeAVanDrie · 2025-12-09T01:55:59Z

test/e2e/epp/e2e_test.go

+			// Instead of hardcoding arguments, we can instead replace the arguments that need
+			// to be changed, preserving any others that may exist.
+			var newArgs []string
+			skipNext := false


If you move to a dedicated manifest file, this entire block of code disappears, making the test much cleaner and easier to maintain.

LukeAVanDrie · 2025-12-09T01:56:51Z

test/e2e/epp/e2e_test.go

+		}, testConfig.ExistsTimeout, testConfig.Interval).Should(gomega.Succeed())
+
+		ginkgo.By("Restarting EPP to force configuration reload")
+		// We delete the EPP *POD*, not the deployment. The Deployment will recreate it immediately.


Nice! This is a good call.

LukeAVanDrie · 2025-12-09T01:58:18Z

test/e2e/epp/e2e_test.go

+
+	for _, modelServerPod := range modelServerPods {
+		for rank := range numPorts {
+			metricQueueSize := fmt.Sprintf(


Good verification here!

LukeAVanDrie · 2025-12-09T02:03:29Z

test/e2e/epp/e2e_test.go

If this test flakes in CI, it helps to know why. Inside your verification loop, can you add a GinkgoWriter log to print the final map of actualPort and actualModel before the assertions.

Example: ginkgo.GinkgoWriter.Printf("Port distribution: %v\n", actualPort)

This way, if it fails, we can see if it was a total connectivity failure (empty map) or a distribution failure (stuck on one port).

LukeAVanDrie · 2025-12-09T02:05:12Z

test/e2e/epp/e2e_test.go

+		// This gives us an expected number of trials to collect all coupons (ports).
+		batches := int(math.Ceil(numPorts * harmonicNumber(numPorts)))
+		// Send curl requests to verify routing to all target ports in the InferencePool.
 		gomega.Eventually(func() error {


Wrapping the entire batch of request generation inside Eventually can be risky. If one request fails, we retry the whole batch, which is slow and heavy. Since we already wait for the deployment to be ready in BeforeEach, we can probably remove the Eventually wrapper around the traffic generation loop. Instead, just loop 20 times.

If a curl fails, you can use a small retry loop just for that specific command (like you did in generateTraffic), but let's avoid retrying the entire batch verification unless absolutely necessary.

LukeAVanDrie · 2025-12-09T02:06:22Z

test/e2e/epp/e2e_test.go

 )

+const (
+	firstPort             = 8000


If you switch to using a static testdata/inferencepool-multiport.yaml, please make sure to add a comment here saying something like:

// Must match ports defined in testdata/inferencepool-multiport.yaml.

This helps future contributors who might edit the YAML but forget to update the Go test.

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Nov 20, 2025

k8s-ci-robot requested review from danehans and nirrozenbaum November 20, 2025 08:02

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 20, 2025

RyanRosario changed the title ~~[WIP] Add e2e test for multiport InferencePool enhancement~~ Add e2e test for multiport InferencePool enhancement Nov 25, 2025

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 25, 2025

nirrozenbaum reviewed Nov 27, 2025

View reviewed changes

test/e2e/epp/e2e_suite_test.go Outdated Show resolved Hide resolved

nirrozenbaum reviewed Nov 27, 2025

View reviewed changes

hack/test-e2e.sh Outdated Show resolved Hide resolved