You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: source/en/user_guide/internnav/quick_start/train_eval.md
+45-5Lines changed: 45 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,20 +11,39 @@ The training pipeline is currently under preparation and will be open-sourced so
11
11
Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).
12
12
13
13
#### Evaluation on Isaac Sim
14
+
[UPDATE] We support using local model and isaac sim in one process now. Evaluate on Single-GPU:
For multi-gpu inference, currently we support inference on environments that expose a torchrun-compatible runtime model (e.g., Torchrun or Aliyun DLC).
The main architecture of the whole-system evaluation adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then submits tasks to the Ray distributed framework based on the corresponding cfg file, enabling the entire evaluation process to run.
15
34
16
35
First, change the 'model_path' in the cfg file to the path of the InternVLA-N1 weights. Start the evaluation server:
The evaluation results will be saved in the `eval_results.log` file in the output_dir of the config file. The whole evaluation process takes about 10 hours at RTX-4090 graphics platform.
@@ -36,13 +55,23 @@ The simulation can be visualized by set `vis_output=True` in eval_cfg.
For multi-gpu inference, currently we only support inference on SLURM.
61
+
For multi-gpu inference, currently we support inference on SLURM as well as environments that expose a torchrun-compatible runtime model (e.g., Aliyun DLC).
InternNav already ships with three major environment backends:
23
25
24
26
-**InternUtopiaEnv**:
25
27
Simulated environment built on top of InternUtopia / Isaac Sim. This supports complex indoor scenes, object semantics, RGB-D sensing, and scripted evaluation loops.
26
-
-**HabitatEnv** (WIP): Simulated environment built on top of Habitat Sim.
28
+
29
+
-**HabitatEnv**: Simulated environment built on top of Habitat Sim. This supports gym style workflow and handles distribution episodes set up.
27
30
28
31
-**RealWorldEnv**:
29
32
Wrapper around an actual robot platform and its sensors (e.g. RGB camera, depth, odometry). This lets you deploy the same agent logic in the physical world.
30
33
31
34
Both of these are children of the same base [`Env`](https://github.com/InternRobotics/InternNav/blob/main/internnav/env/base.py) class.
32
35
33
-
## Evaluation Task (WIP)
34
-
For the vlnpe benchmark, we build the task based on internutopia. Here is a diagram.
For the VLN-PE benchmark in internutopia, InternNav provides comprehensive evaluation metrics:
41
39
-**Success Rate (SR)**: The proportion of episodes in which the agent successfully reaches the goal location within a 3-meter radius.
42
40
-**Success Rate weighted by Path Length (SPL)**: Measures both efficiency and success. It is defined as the ratio of the shortest-path distance to the actual trajectory length, weighted by whether the agent successfully reaches the goal.
@@ -47,4 +45,18 @@ A higher SPL indicates that the agent not only succeeds but does so efficiently,
47
45
-**Fall Rate (FR)**: The frequency at which the agent falls or loses balance during navigation.
48
46
-**Stuck Rate (StR)**: The frequency at which the agent becomes immobile or trapped (e.g., blocked by obstacles or unable to proceed).
49
47
50
-
The implementation is under `internnav/env/utils/internutopia_extensions`, we highly suggested follow the guide of [InternUtopia](../../internutopia).
48
+
### Evaluation Metrics in VLN-CE
49
+
For the VLN-CE benchmark in Habitat, InternNav keeps the original Habitat evaluation configuration and registers the following metrics:
50
+
51
+
-**Distance to Goal (DistanceToGoal)**: The geodesic distance from the agent’s current position to the goal location.
52
+
53
+
-**Success (Success)**: A binary indicator of whether the agent stops within **3 meters** of the goal.
54
+
55
+
-**Success weighted by Path Length (SPL)**: Measures both success and navigation efficiency. It is defined as the ratio of the shortest-path distance to the actual trajectory length, weighted by whether the agent successfully reaches the goal.
56
+
A higher SPL indicates that the agent not only succeeds but does so efficiently, without taking unnecessarily long routes.
57
+
58
+
-**Oracle Success Rate (OracleSuccess)**: The proportion of episodes in which **any point** along the agent’s trajectory comes within **3 meters** of the goal, representing potential success if the agent were to stop optimally.
59
+
60
+
-**Oracle Navigation Error (OracleNavigationError)**: The minimum geodesic distance between the agent and the goal over the entire trajectory.
61
+
62
+
-**Normalized Dynamic Time Warping (nDTW)**: Measures how closely the agent’s trajectory follows the ground-truth demonstration path. Only registered in rxr benchmarks.
0 commit comments