-
-
Notifications
You must be signed in to change notification settings - Fork 3
Update rawbody.go - Fix broken encoding due to mishandling Content-Type #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Fix: ZenPrivacy#34 This is a classic "Mojibake" issue caused by double-decoding. The artifacts you are seeing (→ instead of →) occur when UTF-8 bytes are misinterpreted as Windows-1252 (Latin-1). The function getRawBodyReader relies on golang.org/x/net/html/charset.NewReader to handle character encoding. When the upstream website (e.g., data-star.dev) returns Content-Type: text/html without an explicit charset=utf-8 parameter, the Go charset library defaults to Windows-1252 (to be spec-compliant with legacy HTML). Your proxy takes the valid UTF-8 bytes (e.g., E2 86 92 for →), "decodes" them as Windows-1252 (resulting in â, †, ’), and then re-encodes them as UTF-8 for the output. The browser receives this re-encoded mess and displays →. Both StreamRewrite and BufferRewrite unconditionally force this header because they assume the content has been successfully converted to UTF-8 by getRawBodyReader. Original Response: Content-Type: text/html (No charset specified). getRawBodyReader: Sees no charset. The golang.org/x/net/html/charset library defaults to Windows-1252 for compatibility. It reads the valid UTF-8 bytes from the server as if they were Windows-1252 bytes. Result: UTF-8 → (bytes E2 86 92) becomes string →. StreamRewrite: Sets Content-Type: text/html; charset=utf-8. Browser: Sees charset=utf-8 header. It renders the string → correctly as those characters, instead of the arrow you wanted. Signed-off-by: Gunir <134402102+gunir@users.noreply.github.com>
WalkthroughChange in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (3)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
httprewrite/rawbody.go (1)
110-116: Critical:bodyis nil in multiCloser – will panic on Close().The
multiCloserincludesbodywhich isnil. When the returned reader is closed,multiCloser.Close()will attempt to callClose()on a nil interface, causing a panic.🐛 Proposed fix
return struct { io.Reader io.Closer }{ decodedReader, - &multiCloser{[]io.Closer{decompressedReader, body}}, + &multiCloser{[]io.Closer{decompressedReader, res.Body}}, }, mimeType, nil
🤖 Fix all issues with AI agents
In `@httprewrite/rawbody.go`:
- Around line 79-80: The code uses an undefined variable header when reading
headers (encoding := header.Get("Content-Encoding") and contentType :=
header.Get("Content-Type")), causing a compile error; update those calls to use
the response's header (res.Header.Get(...)) so they reference the existing
res.Header, keeping the same variable names encoding and contentType.
- Around line 86-91: The early return uses the named return variable body which
is never assigned and is nil; change the pass-through branch so it returns the
actual response body (res.Body) to callers instead of the zero-valued
body—either assign res.Body to body before returning or directly return the
response body along with mimeType and nil error in the branch that checks
charsetParam, encoding and params.
- Around line 93-96: The code passes an uninitialized nil variable "body" into
decompressReader which will fail; locate where "body" is declared/should be set
and initialize it with the actual request payload (e.g., the incoming
io.ReadCloser or a bytes.Reader wrapping the read bytes) before calling
decompressReader(enconding), ensuring you read the raw request body into that
variable, handle errors, and preserve/restore the original request body if
needed and close any readers to avoid leaks.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
httprewrite/rawbody.go
🔇 Additional comments (2)
httprewrite/rawbody.go (2)
82-84: LGTM on the fallback handling.Falling back to
text/plainwhen Content-Type parsing fails is reasonable. Accessingparams["charset"]on a nil map safely returns an empty string in Go.
98-102: Good fix for the Windows-1252 default issue.This correctly forces
charset=utf-8when the upstream response has compression but no explicit charset, preventingcharset.NewReaderfrom defaulting to Windows-1252 per HTML5 legacy behavior. This addresses the root cause described in issue#34.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Update rawbody.go - Fix some variable mismatches Signed-off-by: Gunir <134402102+gunir@users.noreply.github.com>
| // [FIX 2] If we reach here (because of compression), ensure we don't | ||
| // let charset.NewReader default to Windows-1252 if charset is missing. | ||
| if charsetParam == "" { | ||
| contentType = mimeType + "; charset=utf-8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will unfortunately prevent charset.NewReader from performing a prescan algorithm to find a potential <meta charset> element, see:

Fix: #34
This is a classic "Mojibake" issue caused by double-decoding. The artifacts you are seeing (→ instead of →) occur when UTF-8 bytes are misinterpreted as Windows-1252 (Latin-1).
The function getRawBodyReader relies on golang.org/x/net/html/charset.NewReader to handle character encoding.
Both StreamRewrite and BufferRewrite unconditionally force this header because they assume the content has been successfully converted to UTF-8 by getRawBodyReader.
Original Response: Content-Type: text/html (No charset specified).
getRawBodyReader: Sees no charset. The golang.org/x/net/html/charset library defaults to Windows-1252 for compatibility. It reads the valid UTF-8 bytes from the server as if they were Windows-1252 bytes.
StreamRewrite: Sets Content-Type: text/html; charset=utf-8.
Browser: Sees charset=utf-8 header. It renders the string → correctly as those characters, instead of the arrow you wanted.
What does this PR do?
How did you verify your code works?
What are the relevant issues?