Update rawbody.go - Fix broken encoding due to mishandling Content-Type #35

gunir · 2026-01-15T14:29:40Z

Fix: #34

This is a classic "Mojibake" issue caused by double-decoding. The artifacts you are seeing (â†’ instead of →) occur when UTF-8 bytes are misinterpreted as Windows-1252 (Latin-1).

The function getRawBodyReader relies on golang.org/x/net/html/charset.NewReader to handle character encoding.

When the upstream website (e.g., data-star.dev) returns Content-Type: text/html without an explicit charset=utf-8 parameter, the Go charset library defaults to Windows-1252 (to be spec-compliant with legacy HTML).

Your proxy takes the valid UTF-8 bytes (e.g., E2 86 92 for →), "decodes" them as Windows-1252 (resulting in â, †, ’), and then re-encodes them as UTF-8 for the output.

The browser receives this re-encoded mess and displays â†’.

Both StreamRewrite and BufferRewrite unconditionally force this header because they assume the content has been successfully converted to UTF-8 by getRawBodyReader.

Original Response: Content-Type: text/html (No charset specified).

getRawBodyReader: Sees no charset. The golang.org/x/net/html/charset library defaults to Windows-1252 for compatibility. It reads the valid UTF-8 bytes from the server as if they were Windows-1252 bytes.

Result: UTF-8 → (bytes E2 86 92) becomes string â†’.

StreamRewrite: Sets Content-Type: text/html; charset=utf-8.

Browser: Sees charset=utf-8 header. It renders the string â†’ correctly as those characters, instead of the arrow you wanted.

What does this PR do?

How did you verify your code works?

What are the relevant issues?

Fix: ZenPrivacy#34 This is a classic "Mojibake" issue caused by double-decoding. The artifacts you are seeing (â†’ instead of →) occur when UTF-8 bytes are misinterpreted as Windows-1252 (Latin-1). The function getRawBodyReader relies on golang.org/x/net/html/charset.NewReader to handle character encoding. When the upstream website (e.g., data-star.dev) returns Content-Type: text/html without an explicit charset=utf-8 parameter, the Go charset library defaults to Windows-1252 (to be spec-compliant with legacy HTML). Your proxy takes the valid UTF-8 bytes (e.g., E2 86 92 for →), "decodes" them as Windows-1252 (resulting in â, †, ’), and then re-encodes them as UTF-8 for the output. The browser receives this re-encoded mess and displays â†’. Both StreamRewrite and BufferRewrite unconditionally force this header because they assume the content has been successfully converted to UTF-8 by getRawBodyReader. Original Response: Content-Type: text/html (No charset specified). getRawBodyReader: Sees no charset. The golang.org/x/net/html/charset library defaults to Windows-1252 for compatibility. It reads the valid UTF-8 bytes from the server as if they were Windows-1252 bytes. Result: UTF-8 → (bytes E2 86 92) becomes string â†’. StreamRewrite: Sets Content-Type: text/html; charset=utf-8. Browser: Sees charset=utf-8 header. It renders the string â†’ correctly as those characters, instead of the arrow you wanted. Signed-off-by: Gunir <134402102+gunir@users.noreply.github.com>

coderabbitai · 2026-01-15T14:29:53Z

Walkthrough

Change in httprewrite/rawbody.go modifies content-type parsing to fall back to "text/plain" on parse failure, treats missing charset as UTF-8 for passthrough and post-decompression, and routes decompression/closer logic to a local body variable instead of res.Body.

Changes

Cohort / File(s)	Summary
Content-type, charset, and body routing `httprewrite/rawbody.go`	- On Content-Type parse failure, use `mimeType = "text/plain"` rather than returning an error. - If charset is missing/empty, treat it as UTF-8 for passthrough and decoding decisions. - When decompression is used, ensure `contentType` includes `charset=utf-8` if absent before creating decoder. - Use `header.Get(...)` for header access. - Redirect decompression input and resource cleanup to a local `body` variable instead of `res.Body`. - Adjust error paths to account for updated contentType/charset handling and decoder creation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing broken encoding in rawbody.go due to Content-Type mishandling, which directly addresses the pull request's core objective.
Description check	✅ Passed	The description provides excellent context about the mojibake issue, explains the root cause (Windows-1252 vs UTF-8), and references the linked issue (`#34`), but the template sections are left blank.
Linked Issues check	✅ Passed	The PR directly addresses issue `#34` by fixing the Content-Type/charset handling in rawbody.go to prevent UTF-8 bytes from being misinterpreted as Windows-1252, which was causing mojibake on data-star.dev.
Out of Scope Changes check	✅ Passed	All changes in rawbody.go are focused on fixing the charset/encoding handling issue identified in `#34`; no unrelated modifications are present.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

httprewrite/rawbody.go (1)
110-116: Pre-existing edge case: potential double-close when encoding == "".

When there's no compression but a non-UTF-8 charset is specified (e.g., charset=iso-8859-1), decompressReader returns res.Body unchanged (line 141). The multiCloser then contains [res.Body, res.Body], causing res.Body.Close() to be called twice.

This is a pre-existing issue (not introduced by this PR), and the PR's FIX 1 actually reduces the frequency of hitting this path. While double-closing may be idempotent in practice, it's not guaranteed.
♻️ Optional fix to avoid double-close
+	// Avoid adding res.Body twice when no decompression occurred
+	var closers []io.Closer
+	if decompressedReader != res.Body {
+		closers = []io.Closer{decompressedReader, res.Body}
+	} else {
+		closers = []io.Closer{res.Body}
+	}
+
 	return struct {
 		io.Reader
 		io.Closer
 	}{
 		decodedReader,
-		&multiCloser{[]io.Closer{decompressedReader, res.Body}},
+		&multiCloser{closers},
 	}, mimeType, nil

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f81e182 and 0725a57.

📒 Files selected for processing (1)

httprewrite/rawbody.go

🔇 Additional comments (3)

httprewrite/rawbody.go (3)

82-84: LGTM – graceful fallback for malformed Content-Type.

The fallback to "text/plain" is appropriate. Note that when parsing fails, params is nil, but accessing params["charset"] on line 88 safely returns "" in Go (nil map returns zero value), so the subsequent logic correctly handles this case.

86-91: LGTM – correctly prevents Windows-1252 default for uncompressed UTF-8 content.

This fix correctly addresses the mojibake issue from #34. When there's no compression and the charset is either UTF-8 or unspecified (modern web default), passing through the raw body avoids the charset.NewReader call that would incorrectly default to Windows-1252. The case-insensitive comparison via strings.ToLower is appropriate.

98-102: LGTM – correctly forces UTF-8 for compressed content without explicit charset.

This complements FIX 1 by handling the compressed content path. When charset is unspecified, explicitly setting charset=utf-8 in the contentType passed to charset.NewReader prevents the library from defaulting to Windows-1252. Good approach of modifying only the local variable without affecting the actual response header.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

httprewrite/rawbody.go (1)
110-116: Critical: body is nil in multiCloser – will panic on Close().

The multiCloser includes body which is nil. When the returned reader is closed, multiCloser.Close() will attempt to call Close() on a nil interface, causing a panic.
🐛 Proposed fix
 	return struct {
 		io.Reader
 		io.Closer
 	}{
 		decodedReader,
-		&multiCloser{[]io.Closer{decompressedReader, body}},
+		&multiCloser{[]io.Closer{decompressedReader, res.Body}},
 	}, mimeType, nil

🤖 Fix all issues with AI agents

In `@httprewrite/rawbody.go`:
- Around line 79-80: The code uses an undefined variable header when reading
headers (encoding := header.Get("Content-Encoding") and contentType :=
header.Get("Content-Type")), causing a compile error; update those calls to use
the response's header (res.Header.Get(...)) so they reference the existing
res.Header, keeping the same variable names encoding and contentType.
- Around line 86-91: The early return uses the named return variable body which
is never assigned and is nil; change the pass-through branch so it returns the
actual response body (res.Body) to callers instead of the zero-valued
body—either assign res.Body to body before returning or directly return the
response body along with mimeType and nil error in the branch that checks
charsetParam, encoding and params.
- Around line 93-96: The code passes an uninitialized nil variable "body" into
decompressReader which will fail; locate where "body" is declared/should be set
and initialize it with the actual request payload (e.g., the incoming
io.ReadCloser or a bytes.Reader wrapping the read bytes) before calling
decompressReader(enconding), ensuring you read the raw request body into that
variable, handle errors, and preserve/restore the original request body if
needed and close any readers to avoid leaks.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e44ec7 and f81e182.

📒 Files selected for processing (1)

httprewrite/rawbody.go

🔇 Additional comments (2)

httprewrite/rawbody.go (2)

82-84: LGTM on the fallback handling.

Falling back to text/plain when Content-Type parsing fails is reasonable. Accessing params["charset"] on a nil map safely returns an empty string in Go.

98-102: Good fix for the Windows-1252 default issue.

This correctly forces charset=utf-8 when the upstream response has compression but no explicit charset, preventing charset.NewReader from defaulting to Windows-1252 per HTML5 legacy behavior. This addresses the root cause described in issue #34.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

httprewrite/rawbody.go

Update rawbody.go - Fix some variable mismatches Signed-off-by: Gunir <134402102+gunir@users.noreply.github.com>

gunir · 2026-01-15T14:36:27Z

With this the issue is 100% gone, this is screenshot testing with Zen:

anfragment · 2026-01-17T12:30:45Z

httprewrite/rawbody.go

+	// [FIX 2] If we reach here (because of compression), ensure we don't 
+	// let charset.NewReader default to Windows-1252 if charset is missing.
+	if charsetParam == "" {
+		contentType = mimeType + "; charset=utf-8"


This will unfortunately prevent charset.NewReader from performing a prescan algorithm to find a potential <meta charset> element, see:

https://cs.opensource.google/go/x/net/+/master:html/charset/charset.go;drc=8f003b3712faff75138744ff08334dddbac0c573;l=52

https://cs.opensource.google/go/x/net/+/master:html/charset/charset.go;l=139;drc=8f003b3712faff75138744ff08334dddbac0c573

gunir mentioned this pull request Jan 15, 2026

Broken encoding on data-star.dev #34

Open

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

httprewrite/rawbody.go Outdated Show resolved Hide resolved

httprewrite/rawbody.go Show resolved Hide resolved

httprewrite/rawbody.go Outdated Show resolved Hide resolved

Update rawbody.go - Fix some variable mismatches

0725a57

Update rawbody.go - Fix some variable mismatches Signed-off-by: Gunir <134402102+gunir@users.noreply.github.com>

anfragment self-assigned this Jan 16, 2026

anfragment reviewed Jan 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update rawbody.go - Fix broken encoding due to mishandling Content-Type #35

Update rawbody.go - Fix broken encoding due to mishandling Content-Type #35

Uh oh!

gunir commented Jan 15, 2026

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gunir commented Jan 15, 2026 •

edited

Loading

Uh oh!

anfragment Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Update rawbody.go - Fix broken encoding due to mishandling Content-Type #35

Are you sure you want to change the base?

Update rawbody.go - Fix broken encoding due to mishandling Content-Type #35

Uh oh!

Conversation

gunir commented Jan 15, 2026

What does this PR do?

How did you verify your code works?

What are the relevant issues?

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gunir commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anfragment Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

gunir commented Jan 15, 2026 •

edited

Loading