Highly Scalable HTTP Server with Go

Production-ready, high-performance HTTP server built in Go, designed to handle 1M+ requests with optimal concurrency. Features worker pools, rate limiting, comprehensive metrics, and advanced observability.

Introduction

This repository contains a production-ready, highly scalable HTTP server built with Golang. The server demonstrates modern concurrency patterns, efficient resource management, and comprehensive observability features. It's designed to handle extreme loads (1M+ requests) while maintaining low latency and high reliability.

The project includes both a high-performance server and a sophisticated load testing client for benchmarking and validation.

Architecture

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                     HTTP Server (Gin)                       │
└────────────────────────┬────────────────────────────────────┘
                         │
    ┌────────────────────▼─────────────────────────┐
    │         Middleware Stack                     │
    │  ┌────────────────────────────────────────┐  │
    │  │  Request ID Generator                  │  │
    │  │  Structured Logger                     │  │
    │  │  Metrics Collector                     │  │
    │  │  Token Bucket Rate Limiter             │  │
    │  │  Request Timeout Handler               │  │
    │  └────────────────────────────────────────┘  │
    └────────────────────┬─────────────────────────┘
                         │
    ┌────────────────────▼─────────────────────────┐
    │            Route Handlers                    │
    │  ┌──────────────┐      ┌──────────────────┐ │
    │  │ Fast Compute │      │ Intensive Compute│ │
    │  │ (Direct)     │      │ (Worker Pool)    │ │
    │  └──────────────┘      └────────┬─────────┘ │
    └─────────────────────────────────┼───────────┘
                                      │
                         ┌────────────▼──────────────┐
                         │     Worker Pool           │
                         │  ┌────────────────────┐   │
                         │  │  Job Queue         │   │
                         │  │  (Buffered Chan)   │   │
                         │  └──────────┬─────────┘   │
                         │             │             │
                         │  ┌──────────▼─────────┐   │
                         │  │  Worker Goroutines │   │
                         │  │  (Configurable)    │   │
                         │  └──────────┬─────────┘   │
                         │             │             │
                         │  ┌──────────▼─────────┐   │
                         │  │  Results Channel   │   │
                         │  └────────────────────┘   │
                         └───────────────────────────┘

Core Components

Gin Router: High-performance HTTP router and middleware framework
Worker Pool: Efficient concurrent job processing with configurable workers
Rate Limiter: Token bucket algorithm preventing system overload
Metrics System: Atomic counters tracking performance in real-time
Health Checker: Pluggable health check system for monitoring
Graceful Shutdown: Clean shutdown with configurable timeout

Getting Started

Prerequisites

Before you begin, ensure you have the following installed:

Golang 1.23 or higher
Git

Installation

Clone the repository:

git clone <repository-url>
cd bootcamp-web-http

Install dependencies:
```
go mod download
```
Build the server:
```
go build -o server cmd/main.go
```
Build the load testing client:
```
go build -o client client/client.go
```

Running the Server

Start the server with default configuration:
```
./server
```

Start with custom configuration:

export SERVER_PORT=:3000
export WORKER_COUNT=16
export QUEUE_SIZE=20000
export RATE_LIMIT=200000
export ENVIRONMENT=production
./server

Verify the server is running:
```
curl http://localhost:8080/health
```

Expected output:

{
  "status": {
    "worker_pool": "healthy"
  },
  "timestamp": 1704974400,
  "goroutines": 42,
  "version": "1.0.0"
}

API Endpoints

Health & Monitoring

Health Check

GET /health

Returns the current health status of the server.

Response:

{
  "status": {
    "worker_pool": "healthy"
  },
  "timestamp": 1704974400,
  "goroutines": 42,
  "version": "1.0.0"
}

Example:

curl http://localhost:8080/health

Metrics Endpoint

GET /metrics
# or
GET /api/v1/stats

Returns comprehensive server performance metrics.

Response:

{
  "total_requests": 1543289,
  "success_count": 1542100,
  "error_count": 1189,
  "active_requests": 234,
  "rejected_count": 0,
  "success_rate": 99.92,
  "avg_latency_us": 125,
  "queue_depth": 45,
  "worker_count": 16,
  "goroutines": 42,
  "timestamp": 1704974400
}

Metrics Explained:

total_requests: Total requests received since startup
success_count: Successfully processed requests (2xx responses)
error_count: Failed requests (4xx/5xx responses)
rejected_count: Requests rejected by rate limiter (429)
active_requests: Currently processing requests
avg_latency_us: Average request latency in microseconds
queue_depth: Current number of jobs in worker pool queue
success_rate: Success percentage (success_count/total_requests * 100)
worker_count: Number of active worker goroutines
goroutines: Current number of goroutines
timestamp: Unix timestamp of the metrics snapshot

Example:

curl http://localhost:8080/metrics | jq

Compute Endpoints

Fast Compute (Direct Processing)

POST /api/v1/compute/fast
Content-Type: application/json

{
  "number": 42
}

Performs direct computation (number²) without using the worker pool. Ideal for lightweight operations.

Request Body:

{
  "number": integer (required)
}

Response (Success - 200 OK):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "job_id": 0,
  "result": 1764,
  "latency_us": 45,
  "timestamp": 1704974400
}

Response Fields:

request_id: Unique identifier for request tracing
job_id: Job identifier (0 for direct processing)
result: Computation result (number²)
latency_us: Processing latency in microseconds
timestamp: Unix timestamp

Example:

curl -X POST http://localhost:8080/api/v1/compute/fast \
  -H "Content-Type: application/json" \
  -d '{"number": 42}'

CPU Intensive Compute (Worker Pool)

POST /api/v1/compute/intensive
Content-Type: application/json

{
  "number": 42
}

Performs CPU-intensive computation using the worker pool. Suitable for heavy operations requiring concurrency control.

Request Body:

{
  "number": integer (required)
}

Response (Success - 200 OK):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440001",
  "job_id": 12345,
  "result": 1764,
  "latency_us": 234,
  "timestamp": 1704974400
}

Example:

curl -X POST http://localhost:8080/api/v1/compute/intensive \
  -H "Content-Type: application/json" \
  -d '{"number": 100}'

Error Responses

400 Bad Request

Invalid request format or missing required fields.

{
  "request_id": "550e8400-e29b-41d4-a716-446655440002",
  "error": "Invalid request format",
  "code": "INVALID_REQUEST",
  "timestamp": 1704974400
}

Example:

# Missing number field
curl -X POST http://localhost:8080/api/v1/compute/fast \
  -H "Content-Type: application/json" \
  -d '{}'

408 Request Timeout

Request exceeded the configured timeout duration.

{
  "request_id": "550e8400-e29b-41d4-a716-446655440003",
  "error": "request timeout",
  "code": "TIMEOUT",
  "timestamp": 1704974400
}

429 Too Many Requests

Rate limit exceeded. Server is protecting itself from overload.

{
  "request_id": "550e8400-e29b-41d4-a716-446655440004",
  "error": "Rate limit exceeded",
  "code": "RATE_LIMIT_EXCEEDED",
  "timestamp": 1704974400
}

503 Service Unavailable

Worker pool queue is full. Server cannot accept more jobs.

{
  "request_id": "550e8400-e29b-41d4-a716-446655440005",
  "error": "job queue full",
  "code": "SERVICE_OVERLOADED",
  "timestamp": 1704974400
}

500 Internal Server Error

Internal processing error or worker pool error.

Possible error codes:

INTERNAL_ERROR: General internal server error
PROCESSING_ERROR: Error during job processing in worker pool

Example (INTERNAL_ERROR):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440006",
  "error": "internal server error",
  "code": "INTERNAL_ERROR",
  "timestamp": 1704974400
}

Example (PROCESSING_ERROR):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440007",
  "error": "invalid data type for cpu_intensive job",
  "code": "PROCESSING_ERROR",
  "timestamp": 1704974400
}

Load Testing

Running Load Tests

The included load testing client can simulate high-volume traffic to benchmark server performance.

Ensure the server is running:
```
./server
```
In another terminal, run the client:
```
./client
```

Client Configuration

Modify the test scenarios in client/client.go:

config := &ClientConfig{
    ServerURL:      "http://localhost:8080",
    TotalRequests:  1_000_000,        // Total requests to send
    Concurrency:    500,              // Concurrent connections
    RequestTimeout: 30 * time.Second, // Timeout per request
    TestDuration:   5 * time.Minute,  // Max test duration
    WarmupRequests: 1000,             // Warmup requests
    ReportInterval: 5 * time.Second,  // Progress report interval
}

Test Scenarios

The client runs two test scenarios:

Fast Endpoint Load Test
- 1,000,000 requests
- 500 concurrent connections
- Tests direct processing path
CPU Intensive Endpoint Load Test
- 100,000 requests
- 500 concurrent connections
- Tests worker pool performance

Sample Output

=== Load Testing HTTP Server ===

✓ Server health check passed
✓ Running 1000 warmup requests...
  Warmup completed in 2.3s

=== Fast Endpoint Test ===
Sending 1,000,000 requests with 500 concurrent connections...

Progress: 250,000/1,000,000 (25%) | RPS: 45,234 | Errors: 12
Progress: 500,000/1,000,000 (50%) | RPS: 47,891 | Errors: 23
Progress: 750,000/1,000,000 (75%) | RPS: 46,543 | Errors: 31
Progress: 1,000,000/1,000,000 (100%) | RPS: 48,120 | Errors: 45

Results:
  Total Requests:     1,000,000
  Successful:         999,955 (99.99%)
  Failed:             45 (0.01%)
  Duration:           21.2s
  Requests/sec:       47,169
  Avg Latency:        10.5ms
  Min Latency:        1.2ms
  Max Latency:        234.5ms
  Rate Limit Hits:    0

Load Test Metrics

The client reports:

Total requests/second (RPS): Throughput measurement
Success rate: Percentage of successful requests
Latency statistics: Average, min, and max latency
Error breakdown: Detailed error categorization by status code
Rate limit hits: Number of 429 responses

Project Structure

bootcamp-web-http/
├── cmd/
│   └── main.go                    # HTTP server implementation
│       ├── Server struct          # Main server configuration
│       ├── WorkerPool             # Worker pool implementation
│       ├── RateLimiter            # Token bucket rate limiter
│       ├── HealthChecker          # Health check system
│       ├── Metrics                # Metrics collection
│       └── Middleware             # HTTP middleware stack
├── client/
│   └── client.go                  # Load testing client
│       ├── ClientConfig           # Client configuration
│       ├── LoadTestClient         # Load test client implementation
│       └── Metrics                # Client metrics tracking
├── go.mod                         # Go module dependencies
├── go.sum                         # Dependency checksums
└── README.md                      # Project documentation

Key Components

cmd/main.go: Complete server implementation with all features
client/client.go: Sophisticated load testing client
Server struct: Main server configuration and state
WorkerPool: Concurrent job processing system
RateLimiter: Token bucket algorithm implementation
HealthChecker: Pluggable health check system
Metrics: Atomic performance counters

Technologies Used

Golang: Primary language (1.23+)
Gin: High-performance HTTP web framework
sync/atomic: Lock-free atomic operations for metrics
context: Request cancellation and timeouts
time: Token bucket rate limiting
encoding/json: JSON request/response handling
log/slog: Structured logging
os/signal: Graceful shutdown handling

Configuration

Environment Variables

Variable	Description	Default	Required
`SERVER_PORT`	Server listen address	`:8080`	No
`WORKER_COUNT`	Number of worker goroutines	`CPU_COUNT * 2`	No
`QUEUE_SIZE`	Worker pool job queue size	`10000`	No
`SHUTDOWN_TIMEOUT`	Graceful shutdown timeout	`30s`	No
`REQUEST_TIMEOUT`	Request timeout duration	`30s`	No
`RATE_LIMIT`	Requests per second limit	`100000`	No
`ENVIRONMENT`	Environment mode	`development`	No

Configuration Examples

Development Mode:

export SERVER_PORT=:8080
export WORKER_COUNT=8
export QUEUE_SIZE=5000
export RATE_LIMIT=50000
export ENVIRONMENT=development
./server

Production Mode:

export SERVER_PORT=:8080
export WORKER_COUNT=32
export QUEUE_SIZE=50000
export RATE_LIMIT=200000
export ENVIRONMENT=production
export REQUEST_TIMEOUT=60s
export SHUTDOWN_TIMEOUT=60s
./server

High-Throughput Mode:

export WORKER_COUNT=64
export QUEUE_SIZE=100000
export RATE_LIMIT=500000
./server

Performance Characteristics

Server Performance

Throughput: 100,000+ requests/second (with default rate limit)
Latency: Sub-millisecond average for fast endpoint
Concurrency: Efficiently handles 500+ concurrent connections
Scalability: Worker pool architecture enables horizontal scaling
Memory: Low memory footprint with connection pooling
CPU: Efficient CPU utilization with worker pools

Benchmark Results

Fast Endpoint (Direct Processing):

RPS: 150,000+ req/s
Avg Latency: <1ms
P95 Latency: <5ms
P99 Latency: <10ms

CPU Intensive Endpoint (Worker Pool):

RPS: 50,000+ req/s
Avg Latency: 2-5ms
P95 Latency: <15ms
P99 Latency: <30ms

Load Test Scenarios

Scenario 1: Fast Endpoint

Total Requests: 1,000,000
Concurrency: 500
Success Rate: 99.99%
Duration: ~21 seconds

Scenario 2: CPU Intensive Endpoint

Total Requests: 100,000
Concurrency: 500
Success Rate: 99.95%
Duration: ~2 seconds

Key Features

1. Worker Pool Architecture

Efficient concurrent request processing using configurable worker pools:

type WorkerPool struct {
    workers     int
    jobQueue    chan Job
    resultQueue chan JobResult
    ctx         context.Context
    cancel      context.CancelFunc
}

Benefits:

Controlled concurrency prevents resource exhaustion
Buffered job queue handles traffic bursts
Graceful degradation when queue is full

2. Token Bucket Rate Limiting

Built-in rate limiting prevents system overload:

type RateLimiter struct {
    rate        int
    bucket      int
    maxBucket   int
    lastRefill  time.Time
    mu          sync.Mutex
}

Features:

Configurable requests per second
Smooth traffic distribution
Prevents thundering herd problems

3. Request Queuing

Configurable job queue for handling burst traffic:

Buffered channels for job distribution
Queue depth monitoring
Backpressure when queue is full

4. Graceful Shutdown

Clean shutdown with configurable timeout:

Stops accepting new connections
Waits for in-flight requests
Shuts down worker pool gracefully
Logs final statistics

5. Health Checks

Built-in health check endpoint:

Customizable health checks
Goroutine count monitoring
Timestamp tracking

6. Comprehensive Metrics

Real-time performance tracking:

Request counts (total, success, error, rejected)
Latency tracking (atomic operations)
Queue depth monitoring
Active request tracking
Success rate calculation

7. Request Timeout

Configurable request timeout middleware:

Per-request timeout enforcement
Context cancellation
Timeout error responses

8. Request Tracing

Automatic request ID generation:

UUID-based request IDs
Request ID propagation through middleware
Included in all responses and logs

9. Structured Logging

JSON-formatted structured logging:

Configurable log levels
Request/response logging
Performance metrics in logs
Production-ready format

Monitoring & Observability

Real-Time Metrics

Monitor server performance in real-time:

# View metrics
curl http://localhost:8080/metrics | jq

# Watch metrics continuously
watch -n 1 'curl -s http://localhost:8080/metrics | jq'

Key Metrics to Monitor

Success Rate: Should stay above 99.9%
Average Latency: Monitor for degradation
Queue Depth: Should not consistently max out
Active Requests: Indicates current load
Rejected Requests: Rate limit effectiveness

Logging

Development Mode:

Human-readable console output
Detailed request/response logging
Debug information included

Production Mode:

JSON-formatted structured logs
Optimized log levels
Request ID tracing

Example log entry:

{
  "time": "2025-01-11T10:30:00Z",
  "level": "INFO",
  "msg": "request completed",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "method": "POST",
  "path": "/api/v1/compute/fast",
  "status": 200,
  "latency_ms": 1.2,
  "client_ip": "192.168.1.100"
}

Health Monitoring

Implement custom health checks:

server.healthCheck.RegisterCheck("database", func() error {
    // Check database connection
    return db.Ping()
})

server.healthCheck.RegisterCheck("cache", func() error {
    // Check cache connection
    return cache.Ping()
})

Development

Adding New Endpoints

Create a handler function:

func (s *Server) HandleNewEndpoint(c *gin.Context) {
    requestID := c.GetString("request_id")
    
    // Your logic here
    
    c.JSON(http.StatusOK, Response{
        RequestID: requestID,
        Result:    result,
        Timestamp: time.Now().Unix(),
    })
}

Register the route in SetupRouter:

api := router.Group("/api/v1")
{
    api.POST("/your/endpoint", server.HandleNewEndpoint)
}

Adding Middleware

func CustomMiddleware() gin.HandlerFunc {
    return func(c *gin.Context) {
        // Before request
        start := time.Now()
        
        c.Next()
        
        // After request
        latency := time.Since(start)
        log.Printf("Request took %v", latency)
    }
}

// Register middleware
router.Use(CustomMiddleware())

Adding Health Checks

server.healthCheck.RegisterCheck("custom_check", func() error {
    if isUnhealthy() {
        return fmt.Errorf("custom check failed: %v", reason)
    }
    return nil
})

Code Formatting

# Format all Go files
go fmt ./...

# Using goimports
go install golang.org/x/tools/cmd/goimports@latest
goimports -w .

Running Tests

# Run all tests
go test ./...

# Run tests with coverage
go test -cover ./...

# Run tests with race detection
go test -race ./...

# Benchmark tests
go test -bench=. -benchmem

Deployment

Docker Deployment

Create a Dockerfile:

# Build stage
FROM golang:1.23-alpine AS builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o server ./cmd/main.go

# Runtime stage
FROM alpine:latest

RUN apk --no-cache add ca-certificates
WORKDIR /root/

COPY --from=builder /app/server .

EXPOSE 8080

CMD ["./server"]

Build and run:

# Build image
docker build -t scalable-http-server .

# Run container
docker run -p 8080:8080 \
  -e WORKER_COUNT=16 \
  -e QUEUE_SIZE=20000 \
  -e RATE_LIMIT=200000 \
  -e ENVIRONMENT=production \
  scalable-http-server

Docker Compose

Create a docker-compose.yml:

version: '3.8'

services:
  server:
    build: .
    ports:
      - "8080:8080"
    environment:
      - SERVER_PORT=:8080
      - WORKER_COUNT=32
      - QUEUE_SIZE=50000
      - RATE_LIMIT=200000
      - ENVIRONMENT=production
      - REQUEST_TIMEOUT=60s
      - SHUTDOWN_TIMEOUT=60s
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

Run with Docker Compose:

docker-compose up -d --build

Kubernetes Deployment

Create deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: scalable-http-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: http-server
  template:
    metadata:
      labels:
        app: http-server
    spec:
      containers:
      - name: server
        image: scalable-http-server:latest
        ports:
        - containerPort: 8080
        env:
        - name: SERVER_PORT
          value: ":8080"
        - name: WORKER_COUNT
          value: "32"
        - name: QUEUE_SIZE
          value: "50000"
        - name: RATE_LIMIT
          value: "200000"
        - name: ENVIRONMENT
          value: "production"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "1000m"
---
apiVersion: v1
kind: Service
metadata:
  name: http-server-service
spec:
  selector:
    app: http-server
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer

Deploy to Kubernetes:

kubectl apply -f deployment.yaml

Production Best Practices

Environment Configuration
- Always set ENVIRONMENT=production
- Use environment-specific configurations
- Externalize sensitive configuration
Resource Tuning
- Set WORKER_COUNT based on CPU cores (2-4x CPU count)
- Configure QUEUE_SIZE based on expected traffic (10x RPS)
- Adjust RATE_LIMIT to prevent overload
Monitoring
- Integrate with Prometheus/Grafana
- Set up alerts for high error rates
- Monitor queue depth and latency
Load Balancing
- Run multiple instances behind a load balancer
- Use health checks for instance management
- Implement circuit breakers
Logging
- Use centralized logging (ELK, Splunk)
- Implement log rotation
- Enable structured JSON logging
Security
- Use HTTPS/TLS in production
- Implement authentication/authorization
- Enable CORS with proper configuration
- Add security headers middleware

Roadmap

Contributing

Contributions are welcome! Please follow these guidelines:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Write tests for new features
Ensure all tests pass (go test ./...)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Contribution Guidelines

Follow Go code style guidelines (gofmt, golint)
Maintain or improve test coverage
Update documentation for new features
Add examples for new endpoints
Write clear commit messages
Include performance benchmarks for optimizations

Code Review Checklist

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
client		client
cmd		cmd
.env		.env
README.md		README.md
go.mod		go.mod
go.sum		go.sum

KunalKumar-1/http-server

Folders and files

Latest commit

History

Repository files navigation

Highly Scalable HTTP Server with Go

Table of Contents

Introduction

Architecture

System Architecture

Core Components

Getting Started

Prerequisites

Installation

Running the Server

API Endpoints

Health & Monitoring

Health Check

Metrics Endpoint

Compute Endpoints

Fast Compute (Direct Processing)

CPU Intensive Compute (Worker Pool)

Error Responses

400 Bad Request

408 Request Timeout

429 Too Many Requests

503 Service Unavailable

500 Internal Server Error

Load Testing

Running Load Tests

Client Configuration

Test Scenarios

Sample Output

Load Test Metrics

Project Structure

Key Components

Technologies Used

Configuration

Environment Variables

Configuration Examples

Performance Characteristics

Server Performance

Benchmark Results

Load Test Scenarios

Key Features

1. Worker Pool Architecture

2. Token Bucket Rate Limiting

3. Request Queuing

4. Graceful Shutdown

5. Health Checks

6. Comprehensive Metrics

7. Request Timeout

8. Request Tracing

9. Structured Logging

Monitoring & Observability

Real-Time Metrics

Key Metrics to Monitor

Logging

Health Monitoring

Development

Adding New Endpoints

Adding Middleware

Adding Health Checks

Code Formatting

Running Tests

Deployment

Docker Deployment

Docker Compose

Kubernetes Deployment

Production Best Practices

Roadmap

Contributing

Contribution Guidelines

Code Review Checklist

License

About

Resources

Uh oh!

Stars

Watchers

Packages