Tenant Based Deployment with vLLM

A Kubernetes-based LLM inference deployment with GPU support, model management, and monitoring using vLLM, Istio, MinIO, Prometheus, and Grafana.

Architecture

Overview

This platform enables deployment and serving of LLM models on Kubernetes with:

GPU-accelerated inference using vLLM
Centralized model storage using MinIO
Web-based interaction via Open WebUI
Observability using Prometheus and Grafana
Secure traffic routing via Istio Gateway

Key Components

1. Istio Gateway

Entry point for all external traffic.

Hosts:

open-webui.* → Open WebUI
grafana.* → Grafana dashboards
minio-console.* → MinIO console

2. Open WebUI

Main web interface for interacting with the deployed LLM model.

3. Tenant Namespace (Single Deployment)

Deployment runs in its own namespace with:

vLLM Pod (GPU)
Runs the inference server on GPU-enabled nodes.
Init Container
Downloads model files from MinIO during pod startup.
Persistent Volume (PVC)
Stores downloaded model files to avoid re-downloading.
Network Policies
Restrict traffic to only required services (WebUI, monitoring).

4. MinIO Storage

Centralized object storage for LLM model artifacts.

Stores model weights
Used by init containers to fetch models

5. Monitoring

Prometheus collects metrics from nodes and workloads
Grafana visualizes GPU, pod, and system performance

6. GPU Infrastructure

GPU nodes are used exclusively for inference workloads
vLLM pods request GPUs via Kubernetes resource limits

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
certs		certs
helm/vllm-platform		helm/vllm-platform
istio		istio
scripts		scripts
values		values
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tenant Based Deployment with vLLM

Architecture

Overview

Key Components

1. Istio Gateway

2. Open WebUI

3. Tenant Namespace (Single Deployment)

4. MinIO Storage

5. Monitoring

6. GPU Infrastructure

About

Uh oh!

Releases

Packages

Languages

pabhi18/InferenceHub

Folders and files

Latest commit

History

Repository files navigation

Tenant Based Deployment with vLLM

Architecture

Overview

Key Components

1. Istio Gateway

2. Open WebUI

3. Tenant Namespace (Single Deployment)

4. MinIO Storage

5. Monitoring

6. GPU Infrastructure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages