Skip to content

Conversation

@aymuos15
Copy link

Summary

  • Replace algo_to_pickle() with algo_to_json() using compact JSON + MONAI's existing _target_ ConfigParser pattern for truly pickle-free algo metadata serialization
  • Add _make_json_serializable() helper to handle numpy arrays, tensors, Path objects
  • Backward compatible: algo_from_json() can still load legacy .pkl files (with deprecation warning)
  • Keep algo_to_pickle() / algo_from_pickle() as deprecated aliases

Note: Model weights still use torch.save separately—this PR focuses on the algo object serialization.

Test plan

  • Linting passes (./runtests.sh --codeformat, ./runtests.sh --ruff)
  • Unit tests for _make_json_serializable() and _add_path_with_parent() helpers
  • Integration test with tests/apps/test_auto3dseg_bundlegen.py (requires optional deps)

Fixes #8586

- Rename algo_to_pickle() to algo_to_json():
  - Serialize algo state as JSON with _target_ for class reconstruction
  - Uses MONAI's ConfigParser pattern for dynamic instantiation
  - Truly pickle-free serialization (for algo metadata; model weights still use torch.save)
  - Add _make_json_serializable() to handle numpy arrays, tensors, Path objects

- Rename algo_from_pickle() to algo_from_json()

- Add deprecated aliases for backward compatibility

- Update import_bundle_algo_history() to prefer .json files

Fixes Project-MONAI#8586

Signed-off-by: Soumya Snigdha Kundu <soumya_snigdha.kundu@kcl.ac.uk>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

📝 Walkthrough

Walkthrough

The pull request migrates Auto3DSeg's algorithm serialization from pickle to JSON format. The changes introduce two new public functions (algo_to_json and algo_from_json) that handle JSON-based serialization with state management. Backward compatibility is maintained through deprecated pickle function aliases that delegate to JSON, plus legacy pickle support with deprecation warnings. The implementation includes helpers for JSON-friendly type conversion and template path resolution.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly and concisely describes the main change: replacing pickle with JSON for algo serialization.
Description check ✅ Passed Description covers the main changes, backward compatibility, and test status but lacks an issue number reference per template.
Linked Issues check ✅ Passed PR fully addresses issue #8586 by replacing pickle with JSON-based serialization and maintaining backward compatibility.
Out of Scope Changes check ✅ Passed All changes are directly related to replacing pickle with JSON serialization. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @monai/auto3dseg/utils.py:
- Around line 474-494: The loop over paths_to_try manipulates sys.path
non-atomically (sys.path.insert/remove) around ConfigParser/
parser.get_parsed_content which can race if concurrent; fix by introducing a
module-level lock (e.g., SYS_PATH_LOCK) and acquire it before any sys.path
modification and release it after cleanup, wrapping the path insert, parser
instantiation (ConfigParser and parser.get_parsed_content) and the removal in a
try/finally so removal always happens and the lock is released; reference the
paths_to_try loop, sys.path.insert/sys.path.remove, ConfigParser,
parser.get_parsed_content, and used_template_path when applying the lock.
🧹 Nitpick comments (10)
monai/apps/auto3dseg/utils.py (1)

67-70: Minor readability nit: condition can be simplified.

The or not only_trained is a double negative. Consider inverting for clarity, though current logic is correct.

monai/auto3dseg/utils.py (9)

281-304: LGTM with minor observation.

Good coverage of common types. The fallback to str(value) on line 304 silently converts unknown types—consider logging a warning for unexpected types to aid debugging.


307-312: Missing docstring parameter/return documentation.

Per coding guidelines, docstrings should describe parameters and return values.

Suggested docstring
 def _add_path_with_parent(paths: list[str], path: str | None) -> None:
-    """Add a path and its parent directory to the list if the path is a valid directory."""
+    """
+    Add a path and its parent directory to the list if the path is a valid directory.
+
+    Args:
+        paths: List to append paths to (modified in-place).
+        path: Directory path to add; skipped if None or not a valid directory.
+    """

327-333: Hardcoded attribute list may become stale.

Consider defining SERIALIZABLE_ATTRS as a module-level constant or documenting why these specific attributes are serialized. Easier to maintain and extend.


346-348: Missing encoding parameter for file write.

Explicit encoding="utf-8" is recommended for cross-platform consistency.

Fix
-    with open(json_filename, "w") as f:
+    with open(json_filename, "w", encoding="utf-8") as f:

414-414: Unused kwargs parameter.

Static analysis (ARG001) notes kwargs is unused. Docstring says "reserved for future use"—acceptable, but consider using _ prefix convention or **_kwargs to suppress linter warnings.


446-447: Missing encoding parameter for file read.

Same as write—explicit encoding="utf-8" recommended.

Fix
-    with open(filename) as f:
+    with open(filename, encoding="utf-8") as f:

449-450: Consider TypeError for type validation.

Static analysis (TRY004) suggests TypeError over ValueError for invalid types. Minor pedantic issue.


479-481: Simplify dictionary access.

Static analysis (RUF019) notes unnecessary key check. Use state.get("template_path") instead.

Fix
             algo_config: dict[str, Any] = {"_target_": target}
-            if "template_path" in state and state["template_path"]:
-                algo_config["template_path"] = state["template_path"]
+            if state.get("template_path"):
+                algo_config["template_path"] = state["template_path"]

499-502: State restoration silently skips unknown attributes.

If the JSON contains an attribute not present on the algo object, it's silently ignored. This is likely intentional for forward compatibility, but logging a debug message would help troubleshoot version mismatches.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 57fdd59 and 23433a7.

📒 Files selected for processing (3)
  • monai/apps/auto3dseg/utils.py
  • monai/auto3dseg/__init__.py
  • monai/auto3dseg/utils.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/apps/auto3dseg/utils.py
  • monai/auto3dseg/__init__.py
  • monai/auto3dseg/utils.py
🧬 Code graph analysis (2)
monai/apps/auto3dseg/utils.py (2)
monai/auto3dseg/utils.py (2)
  • algo_from_json (414-510)
  • algo_to_json (315-350)
monai/utils/enums.py (1)
  • AlgoKeys (687-699)
monai/auto3dseg/__init__.py (1)
monai/auto3dseg/utils.py (3)
  • algo_from_json (414-510)
  • algo_from_pickle (669-670)
  • algo_to_json (315-350)
🪛 Ruff (0.14.10)
monai/auto3dseg/utils.py

384-384: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


391-391: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


414-414: Unused function argument: kwargs

(ARG001)


450-450: Prefer TypeError exception for invalid type

(TRY004)


450-450: Avoid specifying long messages outside the exception class

(TRY003)


455-455: Avoid specifying long messages outside the exception class

(TRY003)


480-480: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


497-497: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: min-dep-pytorch (2.7.1)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: min-dep-pytorch (2.8.0)
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: build-docs
  • GitHub Check: packaging
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: flake8-py3 (codeformat)
🔇 Additional comments (7)
monai/apps/auto3dseg/utils.py (3)

17-17: LGTM!

Import updated to use new JSON-based functions.


45-56: LGTM!

Clean preference logic: JSON first, pickle fallback. Good migration strategy.


75-84: LGTM!

Docstring and function body updated to use JSON serialization.

monai/auto3dseg/utils.py (3)

353-411: Legacy pickle loader retained for backward compatibility.

Static analysis flags pickle security (S301). Acceptable here since it's deprecated legacy support with clear warnings. No action required beyond eventual removal.


504-504: used_template_path overwrites state-restored template_path.

Line 501-502 restores template_path from state, but line 504 immediately overwrites it with used_template_path (which could be different). Verify this is the intended behavior.


662-669: LGTM!

Clean deprecated aliases delegating to new JSON functions.

monai/auto3dseg/__init__.py (1)

27-39: LGTM!

New JSON functions properly exported alongside deprecated pickle aliases.

Comment on lines +474 to +494
for path in paths_to_try:
try:
if path and path not in sys.path:
sys.path.insert(0, path)

algo_config: dict[str, Any] = {"_target_": target}
if "template_path" in state and state["template_path"]:
algo_config["template_path"] = state["template_path"]

parser = ConfigParser(algo_config)
algo = parser.get_parsed_content()
used_template_path = path

if path and path in sys.path:
sys.path.remove(path)
break
except (ModuleNotFoundError, ImportError) as e:
if path and path in sys.path:
sys.path.remove(path)
logging.debug(f"Failed to instantiate {target} with path {path}: {e}")
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's see the function containing this code
sed -n '450,500p' monai/auto3dseg/utils.py

Repository: Project-MONAI/MONAI

Length of output: 2103


🏁 Script executed:

# Check what function this is part of
grep -n "^def\|^class" monai/auto3dseg/utils.py | grep -B1 "474"

Repository: Project-MONAI/MONAI

Length of output: 45


🏁 Script executed:

# Search for calls to algo_from_json or algo_from_pickle
rg "algo_from_json|algo_from_pickle" --type py -C 2

Repository: Project-MONAI/MONAI

Length of output: 4135


🏁 Script executed:

# Check if there's any threading or async pattern in this file
rg "Thread|threading|asyncio|concurrent|Executor" monai/auto3dseg/utils.py

Repository: Project-MONAI/MONAI

Length of output: 45


🏁 Script executed:

# Check if the function is exposed in __init__.py or used in tests
rg "algo_from_json|algo_from_pickle" --type py | head -20

Repository: Project-MONAI/MONAI

Length of output: 1560


🏁 Script executed:

# Look for tests of algo_from_json
find . -path ./build -prune -o -name "*test*.py" -type f -exec grep -l "algo_from_json\|algo_from_pickle" {} \;

Repository: Project-MONAI/MONAI

Length of output: 45


🏁 Script executed:

# Check the context of calls in hpo_gen.py to see if concurrent
sed -n '1,100p' monai/apps/auto3dseg/hpo_gen.py | head -60

Repository: Project-MONAI/MONAI

Length of output: 2408


🏁 Script executed:

# Get the full context of algo_from_json and algo_from_pickle function signatures
sed -n '420,510p' monai/auto3dseg/utils.py

Repository: Project-MONAI/MONAI

Length of output: 3791


sys.path manipulation lacks thread-safety guarantees, though current usage is sequential.

While sys.path.insert() and sys.path.remove() are individually atomic due to the GIL, the paired operations aren't atomic together and could race if called concurrently. Current callers invoke this sequentially during initialization (in hpo_gen.py.__init__ and utils.py iteration), so practical risk is low. Consider a lock or context manager if concurrent calls are anticipated.

🧰 Tools
🪛 Ruff (0.14.10)

480-480: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)

🤖 Prompt for AI Agents
In @monai/auto3dseg/utils.py around lines 474 - 494, The loop over paths_to_try
manipulates sys.path non-atomically (sys.path.insert/remove) around
ConfigParser/ parser.get_parsed_content which can race if concurrent; fix by
introducing a module-level lock (e.g., SYS_PATH_LOCK) and acquire it before any
sys.path modification and release it after cleanup, wrapping the path insert,
parser instantiation (ConfigParser and parser.get_parsed_content) and the
removal in a try/finally so removal always happens and the lock is released;
reference the paths_to_try loop, sys.path.insert/sys.path.remove, ConfigParser,
parser.get_parsed_content, and used_template_path when applying the lock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Auto3DSeg

1 participant