Skip to content

Conversation

@ko3n1g
Copy link
Contributor

@ko3n1g ko3n1g commented Jan 19, 2026

We shouldn't merge it like this, I think this will have negative side-effects on Slurm deployments. But on Kubernetes with the KubeRay operator, this solves the init hang.

pthombre and others added 2 commits January 14, 2026 10:24
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
**model_config_kwargs,
):
# Use replica-specific environment variables to avoid conflicts
master_addr = "127.0.0.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need to hard code the master address and port here?

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g ko3n1g force-pushed the ko3n1g/pranav/ray_k8s_issue_debug branch from a6c94ee to 7ed9663 Compare January 22, 2026 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants