A distributed multi-node key-value store with adaptive quorum-based replication. ACP automatically adjusts read (R) and write (W) quorum parameters in real-time based on cluster health metrics including latency, availability, variance, and error rates.
- Adaptive Quorum System: Dynamic R/W adjustment based on Consistency Confidence Score (CCS)
- CCS Computation: Calculating an authoritative score to figure out what to do with our quorum params (RTT, availability, variance, error rates)
- Moving Average Smoothing: 10-sample sliding window to prevent oscillation
- Hysteresis Mechanism: 5-second lockout period between adjustments
- Real-time Monitoring: Prometheus metrics for all adaptive quorum parameters
-
Clone the repo
git clone https://github.com/rachitkumar/acp-kv.git cd acp-kv -
Generate gRPC code
make proto
-
Start the cluster
make docker-up
-
Access services
- Node 1 gRPC:
localhost:8080 - Node 2 gRPC:
localhost:8081 - Node 3 gRPC:
localhost:8082 - Prometheus:
http://localhost:9093 - Prometheus Connection URL:
http://prometheus:9090 - Grafana:
http://localhost:3000(admin/admin)
- Node 1 gRPC:
-
Build the Docker image
make build docker build -t acp-kv:latest . -
Deploy to Kubernetes
kubectl apply -f k8s/service.yaml kubectl apply -f k8s/headless-service.yaml kubectl apply -f k8s/statefulset.yaml
-
Verify deployment
kubectl get pods -l app=acp-node kubectl logs acp-node-0 --tail=50
-
Access services
# port-forward gRPC endpoint kubectl port-forward acp-node-0 8080:8080 # port-forward Prometheus metrics kubectl port-forward acp-node-0 9090:9090
go test ./internal/storage -v
go test ./internal/replication -v
go test ./internal/adaptive -vRun integration tests (requires running cluster):
go test ./test -v# watch adjuster logs in real-time
kubectl logs -f acp-node-0 | grep -E "(adjuster|ccs|quorum)"
# check prometheus metrics
kubectl port-forward acp-node-0 9090:9090
curl http://localhost:9090/metrics | grep acp_ccs
curl http://localhost:9090/metrics | grep acp_current
curl http://localhost:9090/metrics | grep acp_quorum_adjustmentsExpected output shows:
acp_ccs_raw: Raw consistency confidence score (0.0-1.0)acp_ccs_smoothed: 10-sample moving average of CCSacp_ccs_component_rtt: RTT health component (1.0 = healthy)acp_ccs_component_avail: Availability health (success rate)acp_ccs_component_var: Variance health (1.0 = low variance)acp_ccs_component_error: Error health (1.0 = no errors)acp_current_r: Current read quorum sizeacp_current_w: Current write quorum sizeacp_quorum_adjustments_total: Total number of adjustmentsacp_quorum_adjustment_reason_total{reason="tighten"}: Tighten adjustmentsacp_quorum_adjustment_reason_total{reason="relax"}: Relax adjustmentsacp_hysteresis_active: Whether in lockout period (0 or 1)
In healthy cluster conditions (all nodes up, low latency):
# check CCS is high (close to 1.0)
kubectl logs acp-node-0 --tail=10 | grep smoothed_ccsExpected: "smoothed_ccs":1.0 or close to 1.0
When CCS > 0.80 (tighten threshold), system should tighten (increase W, decrease R):
# watch for tightening
kubectl logs -f acp-node-0 | grep "tighten"Expected logs:
"msg":"ccs above tighten threshold","smoothed_ccs":0.82,"threshold":0.8
"msg":"quorum adjusted","old_r":2,"new_r":1,"old_w":2,"new_w":3,"reason":"tighten"
To test relaxation when CCS < 0.75 (relax threshold), introduce network latency:
# inject 300ms latency on pod network interface
kubectl exec -it acp-node-1 -- tc qdisc add dev eth0 root netem delay 300ms
# watch for relaxation
kubectl logs -f acp-node-0 | grep "relax"Expected logs:
"msg":"ccs below relax threshold","smoothed_ccs":0.72,"threshold":0.75
"msg":"quorum adjusted","old_r":1,"new_r":2,"old_w":3,"new_w":2,"reason":"relax"
Remove latency after testing:
kubectl exec -it acp-node-1 -- tc qdisc del dev eth0 rootSystem should reject adjustments that violate bounds:
# watch for bound violations
kubectl logs -f acp-node-0 | grep "adjustment rejected"Expected when R hits minR=1:
"msg":"adjustment rejected by validation","attempted_r":0,"attempted_w":4,"error":"r=0 outside bounds [1, 5]"
The system enforces a 5-second lockout between adjustments:
# check adjustment timestamps
kubectl logs acp-node-0 | grep "quorum adjusted" | tail -5Adjustments should be at least 5 seconds apart.
Configuration is done via environment variables:
| Variable | Description | Default |
|---|---|---|
| NODE_ID | Unique node identifier | node1 |
| LISTEN_ADDR | gRPC server address | :8080 |
| METRICS_ADDR | Metrics server address | :9090 |
| PEERS | Comma-separated peer addresses | "" |
| QUORUM_R | Initial read quorum size | 2 |
| QUORUM_W | Initial write quorum size | 2 |
| REPLICATION_TIMEOUT | Replication timeout | 500ms |
| HEALTH_PROBE_INTERVAL | Health check interval | 500ms |
| Variable | Description | Default |
|---|---|---|
| HEADLESS_SERVICE | Headless service name for discovery | "" |
| NAMESPACE | Kubernetes namespace | default |
| CLUSTER_SIZE | Expected cluster size | 3 |
| Variable | Description | Default |
|---|---|---|
| ADAPTIVE_ENABLED | Enable adaptive quorum system | false |
| MIN_R | Minimum read quorum size | 1 |
| MAX_R | Maximum read quorum size | 5 |
| MIN_W | Minimum write quorum size | 2 |
| MAX_W | Maximum write quorum size | 5 |
| ADAPTIVE_INTERVAL | CCS computation and adjustment interval | 2s |
| CCS_RELAX_THRESHOLD | CCS threshold to relax (decrease W) | 0.45 |
| CCS_TIGHTEN_THRESHOLD | CCS threshold to tighten (increase W) | 0.75 |
The Consistency Confidence Score (CCS) is computed as:
CCS = α×RTTHealth + β×AvailHealth + γ×VarHealth + δ×ErrorHealth + ε×ClockHealth
Where:
- α = 0.20 (RTT weight)
- β = 0.40 (Availability weight - critical for system health)
- γ = 0.15 (Variance weight)
- δ = 0.15 (Error weight)
- ε = 0.10 (Clock drift weight)
Component formulas:
- RTTHealth = 1 - min(AvgRTT/200ms, 1)
- AvailHealth = Write success rate (0.0-1.0)
- VarHealth = 1 - min(Variance/50ms², 1)
- ErrorHealth = 1 - Error rate (0.0-1.0)
- ClockHealth = 1 - min(ClockDrift/100ms, 1)
- Tighten (CCS >
upper_bound): W++, R-- (favors consistency when cluster is healthy) - Relax (CCS <
lower_bound): W--, R++ (favors availability when cluster is degraded) - Stable (
lower_bound≤ CCS ≤upper_bound): No adjustment - Hysteresis: 5-second lockout between adjustments
- Bounds: minR ≤ R ≤ maxR, minW ≤ W ≤ maxW
- Invariant: R + W > N (quorum intersection for strong consistency)
| Metric Name | Type | Description |
|---|---|---|
acp_puts_total |
Counter | Total PUT requests |
acp_gets_total |
Counter | Total GET requests |
acp_writes_success_total |
Counter | Successful write operations |
acp_writes_failure_total |
Counter | Failed write operations |
acp_reads_success_total |
Counter | Successful read operations |
acp_reads_failure_total |
Counter | Failed read operations |
acp_put_latency_seconds |
Histogram | PUT operation latency |
acp_get_latency_seconds |
Histogram | GET operation latency |
acp_replicate_acks_total |
Counter | Replication acknowledgements (success/failure) |
acp_replicate_latency_seconds |
Histogram | Replication latency per peer |
acp_errors_total |
Counter | Errors by type (timeout/rpc) |
| Metric Name | Type | Description |
|---|---|---|
acp_ccs_raw |
Gauge | Raw consistency confidence score |
acp_ccs_smoothed |
Gauge | 10-sample moving average of CCS |
acp_ccs_component_rtt |
Gauge | RTT health component (0.0-1.0) |
acp_ccs_component_avail |
Gauge | Availability health component (0.0-1.0) |
acp_ccs_component_var |
Gauge | Variance health component (0.0-1.0) |
acp_ccs_component_error |
Gauge | Error health component (0.0-1.0) |
acp_current_r |
Gauge | Current read quorum size |
acp_current_w |
Gauge | Current write quorum size |
acp_quorum_adjustments_total |
Counter | Total quorum adjustments |
acp_quorum_adjustment_reason_total |
Counter | Adjustments by reason (tighten/relax) |
acp_hysteresis_active |
Gauge | Whether in lockout period (0 or 1) |
| Metric Name | Type | Description |
|---|---|---|
acp_peer_health |
Gauge | Peer health status (1=healthy, 0=down) |
acp_health_probe_latency_seconds |
Histogram | Health probe latency per peer |
# start cluster
make docker-up
# view logs
docker-compose logs -f node1
# stop cluster
make docker-down# build and deploy
make build
docker build -t acp-kv:latest .
kubectl apply -f k8s/
# verify adaptive quorum is working
kubectl logs acp-node-0 | grep "adaptive quorum adjuster started"
kubectl logs -f acp-node-0 | grep smoothed_ccs
# check current quorum values
kubectl port-forward acp-node-0 9090:9090
curl http://localhost:9090/metrics | grep -E "^acp_current_(r|w)"
# scale cluster
kubectl scale statefulset acp-node --replicas=5- Check
ADAPTIVE_ENABLED=truein statefulset.yaml - Verify CCS is being computed:
kubectl logs acp-node-0 | grep ccs - Check if bounds are blocking:
kubectl logs acp-node-0 | grep "adjustment rejected" - Verify hysteresis isn't blocking: wait 5 seconds between expected adjustments
MIT