Production-ready, high-performance HTTP server built in Go, designed to handle 1M+ requests with optimal concurrency. Features worker pools, rate limiting, comprehensive metrics, and advanced observability.
- Introduction
- Architecture
- Getting Started
- API Endpoints
- Load Testing
- Project Structure
- Technologies Used
- Configuration
- Performance Characteristics
- Key Features
- Monitoring & Observability
- Development
- Deployment
- Roadmap
- Contributing
- License
This repository contains a production-ready, highly scalable HTTP server built with Golang. The server demonstrates modern concurrency patterns, efficient resource management, and comprehensive observability features. It's designed to handle extreme loads (1M+ requests) while maintaining low latency and high reliability.
The project includes both a high-performance server and a sophisticated load testing client for benchmarking and validation.
┌─────────────────────────────────────────────────────────────┐
│ HTTP Server (Gin) │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼─────────────────────────┐
│ Middleware Stack │
│ ┌────────────────────────────────────────┐ │
│ │ Request ID Generator │ │
│ │ Structured Logger │ │
│ │ Metrics Collector │ │
│ │ Token Bucket Rate Limiter │ │
│ │ Request Timeout Handler │ │
│ └────────────────────────────────────────┘ │
└────────────────────┬─────────────────────────┘
│
┌────────────────────▼─────────────────────────┐
│ Route Handlers │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ Fast Compute │ │ Intensive Compute│ │
│ │ (Direct) │ │ (Worker Pool) │ │
│ └──────────────┘ └────────┬─────────┘ │
└─────────────────────────────────┼───────────┘
│
┌────────────▼──────────────┐
│ Worker Pool │
│ ┌────────────────────┐ │
│ │ Job Queue │ │
│ │ (Buffered Chan) │ │
│ └──────────┬─────────┘ │
│ │ │
│ ┌──────────▼─────────┐ │
│ │ Worker Goroutines │ │
│ │ (Configurable) │ │
│ └──────────┬─────────┘ │
│ │ │
│ ┌──────────▼─────────┐ │
│ │ Results Channel │ │
│ └────────────────────┘ │
└───────────────────────────┘
- Gin Router: High-performance HTTP router and middleware framework
- Worker Pool: Efficient concurrent job processing with configurable workers
- Rate Limiter: Token bucket algorithm preventing system overload
- Metrics System: Atomic counters tracking performance in real-time
- Health Checker: Pluggable health check system for monitoring
- Graceful Shutdown: Clean shutdown with configurable timeout
Before you begin, ensure you have the following installed:
-
Clone the repository:
git clone <repository-url> cd bootcamp-web-http
-
Install dependencies:
go mod download
-
Build the server:
go build -o server cmd/main.go
-
Build the load testing client:
go build -o client client/client.go
-
Start the server with default configuration:
./server
-
Start with custom configuration:
export SERVER_PORT=:3000 export WORKER_COUNT=16 export QUEUE_SIZE=20000 export RATE_LIMIT=200000 export ENVIRONMENT=production ./server
-
Verify the server is running:
curl http://localhost:8080/health
Expected output:
{
"status": {
"worker_pool": "healthy"
},
"timestamp": 1704974400,
"goroutines": 42,
"version": "1.0.0"
}GET /healthReturns the current health status of the server.
Response:
{
"status": {
"worker_pool": "healthy"
},
"timestamp": 1704974400,
"goroutines": 42,
"version": "1.0.0"
}Example:
curl http://localhost:8080/healthGET /metrics
# or
GET /api/v1/statsReturns comprehensive server performance metrics.
Response:
{
"total_requests": 1543289,
"success_count": 1542100,
"error_count": 1189,
"active_requests": 234,
"rejected_count": 0,
"success_rate": 99.92,
"avg_latency_us": 125,
"queue_depth": 45,
"worker_count": 16,
"goroutines": 42,
"timestamp": 1704974400
}Metrics Explained:
total_requests: Total requests received since startupsuccess_count: Successfully processed requests (2xx responses)error_count: Failed requests (4xx/5xx responses)rejected_count: Requests rejected by rate limiter (429)active_requests: Currently processing requestsavg_latency_us: Average request latency in microsecondsqueue_depth: Current number of jobs in worker pool queuesuccess_rate: Success percentage (success_count/total_requests * 100)worker_count: Number of active worker goroutinesgoroutines: Current number of goroutinestimestamp: Unix timestamp of the metrics snapshot
Example:
curl http://localhost:8080/metrics | jqPOST /api/v1/compute/fast
Content-Type: application/json
{
"number": 42
}Performs direct computation (number²) without using the worker pool. Ideal for lightweight operations.
Request Body:
{
"number": integer (required)
}Response (Success - 200 OK):
{
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"job_id": 0,
"result": 1764,
"latency_us": 45,
"timestamp": 1704974400
}Response Fields:
request_id: Unique identifier for request tracingjob_id: Job identifier (0 for direct processing)result: Computation result (number²)latency_us: Processing latency in microsecondstimestamp: Unix timestamp
Example:
curl -X POST http://localhost:8080/api/v1/compute/fast \
-H "Content-Type: application/json" \
-d '{"number": 42}'POST /api/v1/compute/intensive
Content-Type: application/json
{
"number": 42
}Performs CPU-intensive computation using the worker pool. Suitable for heavy operations requiring concurrency control.
Request Body:
{
"number": integer (required)
}Response (Success - 200 OK):
{
"request_id": "550e8400-e29b-41d4-a716-446655440001",
"job_id": 12345,
"result": 1764,
"latency_us": 234,
"timestamp": 1704974400
}Example:
curl -X POST http://localhost:8080/api/v1/compute/intensive \
-H "Content-Type: application/json" \
-d '{"number": 100}'Invalid request format or missing required fields.
{
"request_id": "550e8400-e29b-41d4-a716-446655440002",
"error": "Invalid request format",
"code": "INVALID_REQUEST",
"timestamp": 1704974400
}Example:
# Missing number field
curl -X POST http://localhost:8080/api/v1/compute/fast \
-H "Content-Type: application/json" \
-d '{}'Request exceeded the configured timeout duration.
{
"request_id": "550e8400-e29b-41d4-a716-446655440003",
"error": "request timeout",
"code": "TIMEOUT",
"timestamp": 1704974400
}Rate limit exceeded. Server is protecting itself from overload.
{
"request_id": "550e8400-e29b-41d4-a716-446655440004",
"error": "Rate limit exceeded",
"code": "RATE_LIMIT_EXCEEDED",
"timestamp": 1704974400
}Worker pool queue is full. Server cannot accept more jobs.
{
"request_id": "550e8400-e29b-41d4-a716-446655440005",
"error": "job queue full",
"code": "SERVICE_OVERLOADED",
"timestamp": 1704974400
}Internal processing error or worker pool error.
Possible error codes:
INTERNAL_ERROR: General internal server errorPROCESSING_ERROR: Error during job processing in worker pool
Example (INTERNAL_ERROR):
{
"request_id": "550e8400-e29b-41d4-a716-446655440006",
"error": "internal server error",
"code": "INTERNAL_ERROR",
"timestamp": 1704974400
}Example (PROCESSING_ERROR):
{
"request_id": "550e8400-e29b-41d4-a716-446655440007",
"error": "invalid data type for cpu_intensive job",
"code": "PROCESSING_ERROR",
"timestamp": 1704974400
}The included load testing client can simulate high-volume traffic to benchmark server performance.
-
Ensure the server is running:
./server
-
In another terminal, run the client:
./client
Modify the test scenarios in client/client.go:
config := &ClientConfig{
ServerURL: "http://localhost:8080",
TotalRequests: 1_000_000, // Total requests to send
Concurrency: 500, // Concurrent connections
RequestTimeout: 30 * time.Second, // Timeout per request
TestDuration: 5 * time.Minute, // Max test duration
WarmupRequests: 1000, // Warmup requests
ReportInterval: 5 * time.Second, // Progress report interval
}The client runs two test scenarios:
-
Fast Endpoint Load Test
- 1,000,000 requests
- 500 concurrent connections
- Tests direct processing path
-
CPU Intensive Endpoint Load Test
- 100,000 requests
- 500 concurrent connections
- Tests worker pool performance
=== Load Testing HTTP Server ===
✓ Server health check passed
✓ Running 1000 warmup requests...
Warmup completed in 2.3s
=== Fast Endpoint Test ===
Sending 1,000,000 requests with 500 concurrent connections...
Progress: 250,000/1,000,000 (25%) | RPS: 45,234 | Errors: 12
Progress: 500,000/1,000,000 (50%) | RPS: 47,891 | Errors: 23
Progress: 750,000/1,000,000 (75%) | RPS: 46,543 | Errors: 31
Progress: 1,000,000/1,000,000 (100%) | RPS: 48,120 | Errors: 45
Results:
Total Requests: 1,000,000
Successful: 999,955 (99.99%)
Failed: 45 (0.01%)
Duration: 21.2s
Requests/sec: 47,169
Avg Latency: 10.5ms
Min Latency: 1.2ms
Max Latency: 234.5ms
Rate Limit Hits: 0
The client reports:
- Total requests/second (RPS): Throughput measurement
- Success rate: Percentage of successful requests
- Latency statistics: Average, min, and max latency
- Error breakdown: Detailed error categorization by status code
- Rate limit hits: Number of 429 responses
bootcamp-web-http/
├── cmd/
│ └── main.go # HTTP server implementation
│ ├── Server struct # Main server configuration
│ ├── WorkerPool # Worker pool implementation
│ ├── RateLimiter # Token bucket rate limiter
│ ├── HealthChecker # Health check system
│ ├── Metrics # Metrics collection
│ └── Middleware # HTTP middleware stack
├── client/
│ └── client.go # Load testing client
│ ├── ClientConfig # Client configuration
│ ├── LoadTestClient # Load test client implementation
│ └── Metrics # Client metrics tracking
├── go.mod # Go module dependencies
├── go.sum # Dependency checksums
└── README.md # Project documentation
cmd/main.go: Complete server implementation with all featuresclient/client.go: Sophisticated load testing client- Server struct: Main server configuration and state
- WorkerPool: Concurrent job processing system
- RateLimiter: Token bucket algorithm implementation
- HealthChecker: Pluggable health check system
- Metrics: Atomic performance counters
- Golang: Primary language (1.23+)
- Gin: High-performance HTTP web framework
- sync/atomic: Lock-free atomic operations for metrics
- context: Request cancellation and timeouts
- time: Token bucket rate limiting
- encoding/json: JSON request/response handling
- log/slog: Structured logging
- os/signal: Graceful shutdown handling
| Variable | Description | Default | Required |
|---|---|---|---|
SERVER_PORT |
Server listen address | :8080 |
No |
WORKER_COUNT |
Number of worker goroutines | CPU_COUNT * 2 |
No |
QUEUE_SIZE |
Worker pool job queue size | 10000 |
No |
SHUTDOWN_TIMEOUT |
Graceful shutdown timeout | 30s |
No |
REQUEST_TIMEOUT |
Request timeout duration | 30s |
No |
RATE_LIMIT |
Requests per second limit | 100000 |
No |
ENVIRONMENT |
Environment mode | development |
No |
Development Mode:
export SERVER_PORT=:8080
export WORKER_COUNT=8
export QUEUE_SIZE=5000
export RATE_LIMIT=50000
export ENVIRONMENT=development
./serverProduction Mode:
export SERVER_PORT=:8080
export WORKER_COUNT=32
export QUEUE_SIZE=50000
export RATE_LIMIT=200000
export ENVIRONMENT=production
export REQUEST_TIMEOUT=60s
export SHUTDOWN_TIMEOUT=60s
./serverHigh-Throughput Mode:
export WORKER_COUNT=64
export QUEUE_SIZE=100000
export RATE_LIMIT=500000
./server- Throughput: 100,000+ requests/second (with default rate limit)
- Latency: Sub-millisecond average for fast endpoint
- Concurrency: Efficiently handles 500+ concurrent connections
- Scalability: Worker pool architecture enables horizontal scaling
- Memory: Low memory footprint with connection pooling
- CPU: Efficient CPU utilization with worker pools
Fast Endpoint (Direct Processing):
- RPS: 150,000+ req/s
- Avg Latency: <1ms
- P95 Latency: <5ms
- P99 Latency: <10ms
CPU Intensive Endpoint (Worker Pool):
- RPS: 50,000+ req/s
- Avg Latency: 2-5ms
- P95 Latency: <15ms
- P99 Latency: <30ms
Scenario 1: Fast Endpoint
- Total Requests: 1,000,000
- Concurrency: 500
- Success Rate: 99.99%
- Duration: ~21 seconds
Scenario 2: CPU Intensive Endpoint
- Total Requests: 100,000
- Concurrency: 500
- Success Rate: 99.95%
- Duration: ~2 seconds
Efficient concurrent request processing using configurable worker pools:
type WorkerPool struct {
workers int
jobQueue chan Job
resultQueue chan JobResult
ctx context.Context
cancel context.CancelFunc
}Benefits:
- Controlled concurrency prevents resource exhaustion
- Buffered job queue handles traffic bursts
- Graceful degradation when queue is full
Built-in rate limiting prevents system overload:
type RateLimiter struct {
rate int
bucket int
maxBucket int
lastRefill time.Time
mu sync.Mutex
}Features:
- Configurable requests per second
- Smooth traffic distribution
- Prevents thundering herd problems
Configurable job queue for handling burst traffic:
- Buffered channels for job distribution
- Queue depth monitoring
- Backpressure when queue is full
Clean shutdown with configurable timeout:
- Stops accepting new connections
- Waits for in-flight requests
- Shuts down worker pool gracefully
- Logs final statistics
Built-in health check endpoint:
- Customizable health checks
- Goroutine count monitoring
- Timestamp tracking
Real-time performance tracking:
- Request counts (total, success, error, rejected)
- Latency tracking (atomic operations)
- Queue depth monitoring
- Active request tracking
- Success rate calculation
Configurable request timeout middleware:
- Per-request timeout enforcement
- Context cancellation
- Timeout error responses
Automatic request ID generation:
- UUID-based request IDs
- Request ID propagation through middleware
- Included in all responses and logs
JSON-formatted structured logging:
- Configurable log levels
- Request/response logging
- Performance metrics in logs
- Production-ready format
Monitor server performance in real-time:
# View metrics
curl http://localhost:8080/metrics | jq
# Watch metrics continuously
watch -n 1 'curl -s http://localhost:8080/metrics | jq'- Success Rate: Should stay above 99.9%
- Average Latency: Monitor for degradation
- Queue Depth: Should not consistently max out
- Active Requests: Indicates current load
- Rejected Requests: Rate limit effectiveness
Development Mode:
- Human-readable console output
- Detailed request/response logging
- Debug information included
Production Mode:
- JSON-formatted structured logs
- Optimized log levels
- Request ID tracing
Example log entry:
{
"time": "2025-01-11T10:30:00Z",
"level": "INFO",
"msg": "request completed",
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"method": "POST",
"path": "/api/v1/compute/fast",
"status": 200,
"latency_ms": 1.2,
"client_ip": "192.168.1.100"
}Implement custom health checks:
server.healthCheck.RegisterCheck("database", func() error {
// Check database connection
return db.Ping()
})
server.healthCheck.RegisterCheck("cache", func() error {
// Check cache connection
return cache.Ping()
})- Create a handler function:
func (s *Server) HandleNewEndpoint(c *gin.Context) {
requestID := c.GetString("request_id")
// Your logic here
c.JSON(http.StatusOK, Response{
RequestID: requestID,
Result: result,
Timestamp: time.Now().Unix(),
})
}- Register the route in
SetupRouter:
api := router.Group("/api/v1")
{
api.POST("/your/endpoint", server.HandleNewEndpoint)
}func CustomMiddleware() gin.HandlerFunc {
return func(c *gin.Context) {
// Before request
start := time.Now()
c.Next()
// After request
latency := time.Since(start)
log.Printf("Request took %v", latency)
}
}
// Register middleware
router.Use(CustomMiddleware())server.healthCheck.RegisterCheck("custom_check", func() error {
if isUnhealthy() {
return fmt.Errorf("custom check failed: %v", reason)
}
return nil
})# Format all Go files
go fmt ./...
# Using goimports
go install golang.org/x/tools/cmd/goimports@latest
goimports -w .# Run all tests
go test ./...
# Run tests with coverage
go test -cover ./...
# Run tests with race detection
go test -race ./...
# Benchmark tests
go test -bench=. -benchmemCreate a Dockerfile:
# Build stage
FROM golang:1.23-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o server ./cmd/main.go
# Runtime stage
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/server .
EXPOSE 8080
CMD ["./server"]Build and run:
# Build image
docker build -t scalable-http-server .
# Run container
docker run -p 8080:8080 \
-e WORKER_COUNT=16 \
-e QUEUE_SIZE=20000 \
-e RATE_LIMIT=200000 \
-e ENVIRONMENT=production \
scalable-http-serverCreate a docker-compose.yml:
version: '3.8'
services:
server:
build: .
ports:
- "8080:8080"
environment:
- SERVER_PORT=:8080
- WORKER_COUNT=32
- QUEUE_SIZE=50000
- RATE_LIMIT=200000
- ENVIRONMENT=production
- REQUEST_TIMEOUT=60s
- SHUTDOWN_TIMEOUT=60s
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40sRun with Docker Compose:
docker-compose up -d --buildCreate deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: scalable-http-server
spec:
replicas: 3
selector:
matchLabels:
app: http-server
template:
metadata:
labels:
app: http-server
spec:
containers:
- name: server
image: scalable-http-server:latest
ports:
- containerPort: 8080
env:
- name: SERVER_PORT
value: ":8080"
- name: WORKER_COUNT
value: "32"
- name: QUEUE_SIZE
value: "50000"
- name: RATE_LIMIT
value: "200000"
- name: ENVIRONMENT
value: "production"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "1000m"
---
apiVersion: v1
kind: Service
metadata:
name: http-server-service
spec:
selector:
app: http-server
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: LoadBalancerDeploy to Kubernetes:
kubectl apply -f deployment.yaml-
Environment Configuration
- Always set
ENVIRONMENT=production - Use environment-specific configurations
- Externalize sensitive configuration
- Always set
-
Resource Tuning
- Set
WORKER_COUNTbased on CPU cores (2-4x CPU count) - Configure
QUEUE_SIZEbased on expected traffic (10x RPS) - Adjust
RATE_LIMITto prevent overload
- Set
-
Monitoring
- Integrate with Prometheus/Grafana
- Set up alerts for high error rates
- Monitor queue depth and latency
-
Load Balancing
- Run multiple instances behind a load balancer
- Use health checks for instance management
- Implement circuit breakers
-
Logging
- Use centralized logging (ELK, Splunk)
- Implement log rotation
- Enable structured JSON logging
-
Security
- Use HTTPS/TLS in production
- Implement authentication/authorization
- Enable CORS with proper configuration
- Add security headers middleware
- High-performance HTTP server
- Worker pool architecture
- Token bucket rate limiting
- Comprehensive metrics
- Graceful shutdown
- Health check system
- Request tracing
- Structured logging
- Load testing client
- Prometheus metrics integration
- OpenTelemetry tracing
- Circuit breaker pattern
- Request caching layer
- WebSocket support
- gRPC endpoints
- Database connection pooling
- Redis integration
- API authentication (JWT)
- API versioning
- OpenAPI/Swagger documentation
- Distributed tracing (Jaeger)
- Performance profiling endpoints
- Horizontal pod autoscaling (HPA)
- Service mesh integration (Istio)
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Write tests for new features
- Ensure all tests pass (
go test ./...) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow Go code style guidelines (
gofmt,golint) - Maintain or improve test coverage
- Update documentation for new features
- Add examples for new endpoints
- Write clear commit messages
- Include performance benchmarks for optimizations
- Code follows Go best practices
- Tests added and passing
- Documentation updated
- No performance regressions
- Error handling implemented
- Logging added where appropriate
- Metrics updated if needed
This project is licensed under the MIT License - see the LICENSE file for details.