-
Notifications
You must be signed in to change notification settings - Fork 14.2k
server: prevent data race from HTTP threads #18263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Stress testing to reproduce the race condition with and without this PRNo PR-merged, results :The script seems to be working; perhaps a little too heavy-handed. we'll see what happens with the PR! |
With this PRI'm surprised by the improvement because I really overloaded the server with this LLM-made script. I think I need to narrow down, but it's a good proof the PR make way more reliable ! There is some strange behavior remaining like HTTP 000 error : |
|
I try to narrow down on HTTP 000 case |
|
quite interesting, I think this script can be useful to test changes related to batching too btw, looking at your report "No PR results", I suppose the 503 error was because the test don't wait until server starts, right? |
|
The HTTP 000 case has a timeout of exactly 10 seconds which seems a quite suspicious, probably |
The server was already started. I think the 503 errors in "No PR results" are the data race your PR fixes! |
So yes, the HTTP 000 at 10s is definitely curl timeout with error code defaulted to 000. The HTTP 000 at 2s is I think reverse proxy timeout |
|
Better script : |
|
I've already restarted the Windows runner. I'll have to test it on my Windows machine! I try a server-queue.cpp/h patch |
If that was for the |
|
prompt processing progress, n_tokens = 1355, batch.n_tokens = 1355, progress = 1.000000 from my smartphone. i check tomorrow morning |
|
I spotted some more bugs on the way, fixed in the last commit(s):
Edit: hmm, LoRA endpoint can also cause data race the same way as |
|
So turns out, the endpoints for lora hot-swap can also cause data race as it reads data directly off |
|
Compared to what I did from the phone (where the server wasn't working), after theses 5 commit this time everything works and the last DoS script no longer displays an HTTP 000 error, and server recovers after a while. It seems more robust |
|
Is this error related to this PR? This is using the full rocm docker image, running the llama-server with args: I am using the ngxson:xsn/server_data_race branch for building. I am using the sleep option in the models preset file. If the above error is not related to this PR, let me know and I'll open a different ticket. Note I did not experience this error prior to updating today. |
Looks similar to what I hit. When a child exits with error (status 1), the router seems to keep trying to proxy to it. |
|
@stephensrmmartin the best way to isolate the problem is to use CPU-only build |
It was a red herring. The issue is indeed that an error was thrown earlier, due to an unrelated issue. The actual issue had to do with using rocm 7.1. with 7.0, the error does not occur, and the subsequent http error no longer occurs. |
|
@ServeurpersoCom the tests are now passed on CI, would appreciate if you can re-run the stress test on your side! |
I did it early this morning, no new commits, the result is clean, no more "HTTP 000" errors since 5 commit. Re testing : |
|
For the last commit -> apply and re test -> about the same result, and no regression for my standard use |
|
I think this is should be pretty much ready to merge then. The last 2 commits are nits so we don't need to test it again, it's fine as soon as the CI workflow passes. Would be nice if you can have a quick look in the code and approve @ServeurpersoCom , I also have this mirrored PR with AI-assisted review: ngxson#63 |
|
I always test for my Linux/Nvidia use case on several models in practice and do some reviews with GPT High/Codex + Claude Thinking/Code before merging :) |
|
Failed macOS-latest-cmake-arm64 here but OK on https://github.com/ngxson/llama.cpp/actions/runs/20431225417/job/58702205221?pr=63 I retry at end |
|
yes it's expected that some tests may fail randomly on hosted hardware. for server changes, just server workflows passed is enough |
Fix #18234
This includes quite many changes, but on a high level:
server_context_implclass member is now highly restricted. Only some pointers likevocab, model, mctxare exposed.server_context_meta. This is to prevent any accesses to non-thread-safe data insideserver_context_implserver_routes, the HTTP can only access some pointers likevocab, model, mctx. Any other data MUST be passed throughserver_context_metaAs a consequence:
/modelsand/v1/modelscan no longer be accessed during model loading. It is NOT thread-safe and can potentially cause data race/modelsand/v1/modelscan now be accessed during server sleeping. This is because it no longer accessesserver_context_impldirectlyAlso include some other fixes described in #18263 (comment) to make things safer
cc @ServeurpersoCom would appreciate if you can do some testing, thanks!