Skip to content

Conversation

@sanderpick
Copy link
Contributor

@sanderpick sanderpick commented Jun 13, 2025

This took me forever to diagnose. After testing many different configurations (diff fendermint configs, diff commit refs, old localnet vs. new localnet, node restart sequences), I saw that changing the fendermint log level on a live network lead to a consensus failure... which is, super weird and didn't make any sense. I finally realized this is related to actor debugging.

The actor debug flag cannot change on a live network due to how debugging works with the FVM / Wasmtime. Since WASM modules can't actually log to stdout/err, they are returned via syscall, which effects the respond data for a function call. The upshot is that changing to/from actor debugging on a live network leads to non-determinism and consensus failures. This can now happen when the fendermint log level goes from info to debug or debug to info, due to this change which ties the actor debugging flag to the fendermint log level introduced in #610. I hysterically commented "perfect" on this change :)

Initial consensus failure:

E[2025-06-13|05:39:34.296] CONSENSUS FAILURE!!!                         module=consensus err="+2/3 committed an invalid block: wrong Block.Header.LastResultsHash.  Expected 165B026A01F89BE3D648285701BA0A5E679050E0C1ADE78B4F0D124897F56CB4, got D6372018325A08E8AF1D65C8C63283783DADC5EA5E90B8C141F4B10868729B32" stack="goroutine 2626 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\tgithub.com/cometbft/cometbft/consensus/state.go:737 +0x46\npanic({0xf78a00?, 0xc001b0cca0?})\n\truntime/panic.go:770 +0x132\ngithub.com/cometbft/cometbft/consensus.(*State).finalizeCommit(0xc000325888, 0x333)\n\tgithub.com/cometbft/cometbft/consensus/state.go:1651 +0xfb3\ngithub.com/cometbft/cometbft/consensus.(*State).tryFinalizeCommit(0xc000325888, 0x333)\n\tgithub.com/cometbft/cometbft/consensus/state.go:1620 +0x2e8\ngithub.com/cometbft/cometbft/consensus.(*State).handleCompleteProposal(0xc000325888, 0x333)\n\tgithub.com/cometbft/cometbft/consensus/state.go:2016 +0x372\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc000325888, {{0x131e3e0, 0xc0027a4ff0}, {0xc00035a810, 0x28}})\n\tgithub.com/cometbft/cometbft/consensus/state.go:847 +0x178\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc000325888, 0x0)\n\tgithub.com/cometbft/cometbft/consensus/state.go:773 +0x3f1\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 2568\n\tgithub.com/cometbft/cometbft/consensus/state.go:384 +0x10c\n"

Later you will see:

E[2025-06-13|15:39:23.489] Error in validation                          module=blockchain err="wrong Block.Header.LastResultsHash.  Expected CBE836990A8F3B39C27F40D7A7D39240064170C99BA8E7315B6A9E802AD10B3E, got F56F3216717F0641D8B8720918A4B425448E2B3F25BE20F43D2E544DF526DAE7"

Ideally, we could somehow alter this flag without rebuilding fendermint, but we need to think about how to do that in a way that's non "configuration".

Signed-off-by: Sander Pick <sanderpick@gmail.com>
@sanderpick sanderpick requested a review from maksmore June 13, 2025 20:15
Copy link

@maksmore maksmore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great research! We'll check your hypothesis in the next release

@maksmore maksmore merged commit 567108a into main Jun 18, 2025
17 checks passed
@maksmore maksmore deleted the sander/fix-actor-debugging branch June 18, 2025 08:47
@sam701
Copy link
Contributor

sam701 commented Jun 20, 2025

@sanderpick that's a really great finding! 💪

Since WASM modules can't actually log to stdout/err, they are returned via syscall, which effects the respond data for a function call.

Do I understand correct that the logging data are located in the function call response so that changing the log level means changing the contract code? Is it (theoretically) possible to log by calling a runtime function so that it wouldn't change the output data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants