-
Notifications
You must be signed in to change notification settings - Fork 206
Add e2e test for multiport InferencePool enhancement #1885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: RyanRosario The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @RyanRosario. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hey @danehans and @nirrozenbaum , my first PR is ready for review. |
|
/ok-to-test Thanks @RyanRosario. seems like your PR needs a rebase. additionally - please pay attention that your commits are not verified and if the PR is ready for review it would be good to remove the |
|
/retest |
|
Thank you for your patience! The failing test seems to be related to issue 1872. Can we continue with review or should 1872 be resolved first? |
failing test isn't blocking the review but it is blocking the merge. |
|
/hold cancel All initial feedback regarding rebase, tests, and global configuration changes have been compleed. |
| func createInferExt(testConfig *testutils.TestConfig, filePath string) { | ||
| inManifests := testutils.ReadYaml(filePath) | ||
|
|
||
| // This image needs to be updated to open multiple ports and respond. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. The code comment is stale. I have the fix locally and will push it in the next batched commit to avoid triggering a full CI run just for this doc update.
go.sum
Outdated
| sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.31.2 h1:jpcvIRr3GLoUoEKRkHKSmGjxb6lWwrBlJsXc+eUYQHM= | ||
| sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.31.2/go.mod h1:Ve9uj1L+deCXFrPOk1LpFXqTg7LCFzFso6PA48q/XZw= | ||
| sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.34.0 h1:hSfpvjjTQXQY2Fol2CS0QHMNs/WI1MOSGzCm1KhM5ec= | ||
| sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.34.0/go.mod h1:Ve9uj1L+deCXFrPOk1LpFXqTg7LCFzFso6PA48q/XZw= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file will be removed before merge. It seemed to help me pass the CI test (which was passing locally).
| sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.34.0 // indirect | ||
| sigs.k8s.io/json v0.0.0-20250730193827-2d320260d730 // indirect | ||
| sigs.k8s.io/randfill v1.0.0 // indirect | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file will be removed before merge.
|
Adding @LukeAVanDrie to help review to reduce some load. |
|
Great work on the verification logic! This test looks really good from two standpoints:
I have a few suggestions to make the test suite more robust and easier to debug. We want to avoid flakiness where possibly and improve maintainability. |
LukeAVanDrie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on this test! The only major change I'm asking for is to simplify the test setup a bit where possible.
|
|
||
| var _ = ginkgo.Describe("InferencePool", func() { | ||
| var infObjective *v1alpha2.InferenceObjective | ||
| ginkgo.BeforeEach(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are dynamically modifying the existing vllm-llama3-8b-instruct Deployment in BeforeEach and trying to revert it in AfterEach. If the test crashes or the runner is killed halfway through, AfterEach might not fully restore the state. This leaves the cluster "dirty" (configured for multi-port) which will cause subsequent single-port tests to fail.
I would encourage creating separate test resources that already have the ports and args configured correctly (e.g., testdata/inferencepool-multiport.yaml) with a corresponding Deployment manifest. This way if the test fails, we just delete the new resources, and the original single-port Deployment remains untouched. It also makes the code a bit easier to understand and maintain.
- In the test, apply this specific manifest.
- In
AfterEach, just delete these resources.
This ensures that even if the test fails cataclysmically, the original environment is untouched. It also removes the need for the complex argument-parsing code in BeforeEach.
| for idx, msg := range originalMessages { | ||
| msgCopy := make(map[string]any, len(msg)) | ||
| maps.Copy(msgCopy, msg) | ||
| // Inject a unique nonce into the content of *EACH* message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on adding the 'Nonce' to the prompt. Since our scheduling layer prioritizes prefix caching (affinity), sending identical requests would likely result in them all going to the same pod, which defeats the purpose of this test. Varying the prompt body seems like the best approach here.
I think we can simplify the implementation. Instead of the complex struct reflection logic, consider just prepending a simple string prefix.
| // Probability: need to compute estimate of number of batches to send to have high confidence of hitting all ports. | ||
| // Using the Coupon Collector's Problem formula: n * H_n, where H_n is the nth harmonic number. | ||
| // This gives us an expected number of trials to collect all coupons (ports). | ||
| batches := int(math.Ceil(numPorts * harmonicNumber(numPorts))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you used the "Coupon Collector's Problem" to calculate the necessary requests. This is very cool; however, for E2E tests, we should prioritize determinism and simplicity over efficiency.
For numPorts = 2 this is probably overkill. Instead of calculating the perfect number of requests, let's just brute-force it. Sending 20 requests sequentially is statistically guaranteed to hit both ports if the system is working.
|
|
||
| curlCmd := getCurlCommand(envoyName, testConfig.NsName, envoyPort, modelName, curlTimeout, t.api, currentPromptOrMessages, false) | ||
|
|
||
| resp, err := testutils.ExecCommandInPod(testConfig, "curl", "curl", curlCmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping kubectl exec (which is what ExecCommandInPod does) in Go routines adds a lot of complexity (WaitGroups, channels) for a small gain. Since we are only targeting 2 ports, a simple sequential loop is likely enough and much easier to debug.
| // Instead of hardcoding arguments, we can instead replace the arguments that need | ||
| // to be changed, preserving any others that may exist. | ||
| var newArgs []string | ||
| skipNext := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you move to a dedicated manifest file, this entire block of code disappears, making the test much cleaner and easier to maintain.
| }, testConfig.ExistsTimeout, testConfig.Interval).Should(gomega.Succeed()) | ||
|
|
||
| ginkgo.By("Restarting EPP to force configuration reload") | ||
| // We delete the EPP *POD*, not the deployment. The Deployment will recreate it immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is a good call.
|
|
||
| for _, modelServerPod := range modelServerPods { | ||
| for rank := range numPorts { | ||
| metricQueueSize := fmt.Sprintf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good verification here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this test flakes in CI, it helps to know why. Inside your verification loop, can you add a GinkgoWriter log to print the final map of actualPort and actualModel before the assertions.
Example: ginkgo.GinkgoWriter.Printf("Port distribution: %v\n", actualPort)
This way, if it fails, we can see if it was a total connectivity failure (empty map) or a distribution failure (stuck on one port).
| // This gives us an expected number of trials to collect all coupons (ports). | ||
| batches := int(math.Ceil(numPorts * harmonicNumber(numPorts))) | ||
| // Send curl requests to verify routing to all target ports in the InferencePool. | ||
| gomega.Eventually(func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping the entire batch of request generation inside Eventually can be risky. If one request fails, we retry the whole batch, which is slow and heavy. Since we already wait for the deployment to be ready in BeforeEach, we can probably remove the Eventually wrapper around the traffic generation loop. Instead, just loop 20 times.
If a curl fails, you can use a small retry loop just for that specific command (like you did in generateTraffic), but let's avoid retrying the entire batch verification unless absolutely necessary.
| ) | ||
|
|
||
| const ( | ||
| firstPort = 8000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you switch to using a static testdata/inferencepool-multiport.yaml, please make sure to add a comment here saying something like:
// Must match ports defined in testdata/inferencepool-multiport.yaml.
This helps future contributors who might edit the YAML but forget to update the Go test.
What type of PR is this?
kind/cleanup
What this PR does / why we need it:
Adds an E2E test for multi-port enhancement. Currently verifyTrafficRouting is implemented, verifyMetrics to follow.
Which issue(s) this PR fixes:
Fixes #1768
Does this PR introduce a user-facing change?:
NONE