[Fix] InternNav Doc update for v0.2.0 (#3)

kew6688 · web-flow · commit 10daad1d5847 · 2025-12-18T12:15:35.000+08:00
* update doc for refactor

* rename env section

* add aliyun dlc bash

* add evaluation metrics
diff --git a/source/en/user_guide/internnav/quick_start/installation.md b/source/en/user_guide/internnav/quick_start/installation.md
@@ -190,7 +190,7 @@ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
     --index-url https://download.pytorch.org/whl/cu118
 
 # install InternNav with model dependencies
-pip install -e .[model]
+pip install -e .[model] --no-build-isolation
 
 ```
 
diff --git a/source/en/user_guide/internnav/quick_start/train_eval.md b/source/en/user_guide/internnav/quick_start/train_eval.md
@@ -11,20 +11,39 @@ The training pipeline is currently under preparation and will be open-sourced so
 Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).
 
 #### Evaluation on Isaac Sim
+[UPDATE] We support using local model and isaac sim in one process now. Evaluate on Single-GPU:
+
+```bash
+python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py    
+```
+
+For multi-gpu inference, currently we support inference on environments that expose a torchrun-compatible runtime model (e.g., Torchrun or Aliyun DLC).
+
+```bash
+# for torchrun
+./scripts/eval/bash/torchrun_eval.sh \
+    --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
+
+# for alicloud dlc
+./scripts/eval/bash/eval_vln_distributed.sh \
+    internutopia \
+    --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
+```
+
 The main architecture of the whole-system evaluation adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then submits tasks to the Ray distributed framework based on the corresponding cfg file, enabling the entire evaluation process to run.
 
 First, change the 'model_path' in the cfg file to the path of the InternVLA-N1 weights. Start the evaluation server:
 ```bash
 # from one process
 conda activate <model_env>
-python scripts/eval/start_server.py --config scripts/eval/configs/h1_internvla_n1_cfg.py
+python scripts/eval/start_server.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
 ```
 
 Then, start the client to run evaluation:
 ```bash
 # from another process
 conda activate <internutopia>
-MESA_GL_VERSION_OVERRIDE=4.6 python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_cfg.py
+MESA_GL_VERSION_OVERRIDE=4.6 python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
 ```
 
 The evaluation results will be saved in the `eval_results.log` file in the output_dir of the config file. The whole evaluation process takes about 10 hours at RTX-4090 graphics platform.
@@ -36,13 +55,23 @@ The simulation can be visualized by set `vis_output=True` in eval_cfg.
 Evaluate on Single-GPU:
 
 ```bash
-python scripts/eval/eval_habitat.py --model_path checkpoints/InternVLA-N1 --continuous_traj --output_path result/InternVLA-N1/val_unseen_32traj_8steps
+python scripts/eval/eval.py --config scripts/eval/configs/habitat_dual_system_cfg.py
 ```
 
-For multi-gpu inference, currently we only support inference on SLURM.
+For multi-gpu inference, currently we support inference on SLURM as well as environments that expose a torchrun-compatible runtime model (e.g., Aliyun DLC).
 
 ```bash
+# for slurm
 ./scripts/eval/bash/eval_dual_system.sh
+
+# for torchrun
+./scripts/eval/bash/torchrun_eval.sh \
+    --config scripts/eval/configs/habitat_dual_system_cfg.py
+
+# for alicloud dlc
+./scripts/eval/bash/eval_vln_distributed.sh \
+    habitat \
+    --config scripts/eval/configs/habitat_dual_system_cfg.py
 ```
 
 
@@ -125,7 +154,18 @@ Currently we only support evaluate single System2 on Habitat:
 Evaluate on Single-GPU:
 
 ```bash
-python scripts/eval/eval_habitat.py --model_path checkpoints/InternVLA-N1-S2 --mode system2 --output_path results/InternVLA-N1-S2/val_unseen \
+python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py
+
+# set config with the following fields
+eval_cfg = EvalCfg(
+    agent=AgentCfg(
+        model_name='internvla_n1',
+        model_settings={
+            "mode": "system2",  # inference mode: dual_system or system2
+            "model_path": "checkpoints/<s2_checkpoint>",  # path to model checkpoint
+        }
+    )
+)
 ```
 
 For multi-gpu inference, currently we only support inference on SLURM.
diff --git a/source/en/user_guide/internnav/tutorials/env.md b/source/en/user_guide/internnav/tutorials/env.md
@@ -1,4 +1,4 @@
-# Customizing Environments and Tasks in InternNav
+# Environments Design in InternNav
 
 This tutorial provided a step-by-step guide to define a new environment and a new navigation task within the InternNav framework.
 
@@ -17,26 +17,24 @@ Because of this separation:
 
 - We can run the same agent in simulation (Isaac / InternUtopia) or on a real robot, as long as both environments implement the same API.
 
-- We can benchmark different tasks (VLN, PointGoalNav, etc.) in different worlds without rewriting the agent.
+- We can benchmark different tasks in different worlds without rewriting the agent.
 
-InternNav already ships with two major environment backends:
+![img.png](../../../_static/image/internnav_process.png)
+
+InternNav already ships with three major environment backends:
 
 - **InternUtopiaEnv**:
 Simulated environment built on top of InternUtopia / Isaac Sim. This supports complex indoor scenes, object semantics, RGB-D sensing, and scripted evaluation loops.
-- **HabitatEnv** (WIP): Simulated environment built on top of Habitat Sim.
+
+- **HabitatEnv**: Simulated environment built on top of Habitat Sim. This supports gym style workflow and handles distribution episodes set up.
 
 - **RealWorldEnv**:
 Wrapper around an actual robot platform and its sensors (e.g. RGB camera, depth, odometry). This lets you deploy the same agent logic in the physical world.
 
 Both of these are children of the same base [`Env`](https://github.com/InternRobotics/InternNav/blob/main/internnav/env/base.py) class.
 
-## Evaluation Task (WIP)
-For the vlnpe benchmark, we build the task based on internutopia. Here is a diagram.
-
-![img.png](../../../_static/image/agent_definition.png)
 
-
-## Evaluation Metrics (WIP)
+### Evaluation Metrics in VLN-PE
 For the VLN-PE benchmark in internutopia, InternNav provides comprehensive evaluation metrics:
 - **Success Rate (SR)**: The proportion of episodes in which the agent successfully reaches the goal location within a 3-meter radius.
 - **Success Rate weighted by Path Length (SPL)**: Measures both efficiency and success. It is defined as the ratio of the shortest-path distance to the actual trajectory length, weighted by whether the agent successfully reaches the goal.
@@ -47,4 +45,18 @@ A higher SPL indicates that the agent not only succeeds but does so efficiently,
 - **Fall Rate (FR)**: The frequency at which the agent falls or loses balance during navigation.
 - **Stuck Rate (StR)**: The frequency at which the agent becomes immobile or trapped (e.g., blocked by obstacles or unable to proceed).
 
-The implementation is under `internnav/env/utils/internutopia_extensions`, we highly suggested follow the guide of [InternUtopia](../../internutopia).
+### Evaluation Metrics in VLN-CE
+For the VLN-CE benchmark in Habitat, InternNav keeps the original Habitat evaluation configuration and registers the following metrics:
+
+- **Distance to Goal (DistanceToGoal)**: The geodesic distance from the agent’s current position to the goal location.
+
+- **Success (Success)**: A binary indicator of whether the agent stops within **3 meters** of the goal.
+
+- **Success weighted by Path Length (SPL)**: Measures both success and navigation efficiency. It is defined as the ratio of the shortest-path distance to the actual trajectory length, weighted by whether the agent successfully reaches the goal.
+A higher SPL indicates that the agent not only succeeds but does so efficiently, without taking unnecessarily long routes.
+
+- **Oracle Success Rate (OracleSuccess)**: The proportion of episodes in which **any point** along the agent’s trajectory comes within **3 meters** of the goal, representing potential success if the agent were to stop optimally.
+
+- **Oracle Navigation Error (OracleNavigationError)**: The minimum geodesic distance between the agent and the goal over the entire trajectory.
+
+- **Normalized Dynamic Time Warping (nDTW)**: Measures how closely the agent’s trajectory follows the ground-truth demonstration path. Only registered in rxr benchmarks.
diff --git a/source/en/user_guide/internnav/tutorials/index.md b/source/en/user_guide/internnav/tutorials/index.md
@@ -12,7 +12,6 @@ myst:
 :caption: Tutorials
 :maxdepth: 2
 
-core
 dataset
 model
 training