Skip to content

DCO peer inflation: Status files show 70-150% more entries than non-DCO despite equal load #931

@alloc33

Description

@alloc33

Environment:

  • OpenVPN 2.6.14 with DCO enabled
  • Linux kernel 6.14.0 with ovpn-dco kernel module
  • Production setup: 50+ OpenVPN server processes per machine
  • dnsproxy forwarding traffic using random 127.0.x.x source IPs (censorship circumvention)

Problem:
Server with DCO consistently shows 70-150% more peer entries in status files compared to identical non-DCO server with equal load distribution:

  • DCO server: 1,282 unique clients (after deduplication)
  • Non-DCO server: 741 unique clients
  • Inflation: 541 extra entries (73%)

Root Cause Analysis:
When clients disconnect from process A and reconnect to process B, the old entry in process A is not removed:

  • DCO: Old entries persist indefinitely in status files → accumulation over time
  • Non-DCO: Old entries are cleaned up properly → 1:1 ratio maintained

Evidence:
Example client appearing in 3 different OpenVPN processes:
server-72581: connected 05:38:25 (still in status file after 2+ hours)
server-91967: connected 05:56:00 (still in status file after 1.5+ hours)
server-91970: connected 06:30:16 (current active connection)

All 3 entries remain in respective status files. With non-DCO, only the most recent connection appears.

Keepalive Configuration:
keepalive 25 180
Server-side timeout should be 360 seconds (180 × 2), but old entries never expire.

Log Analysis (2-hour window):

  • New connections created: 634
  • DEL_PEER notifications received: 283
  • Gap: 351 peers never sent expiry notification

This suggests DCO kernel module is not triggering keepalive expiry for all disconnected peers.

What We've Tried:

  1. Periodic cleanup of orphaned instances - Failed: Either found nothing to clean or removed active connections
  2. Duplicate detection at instance creation - Failed: Common name not available until after TLS handshake completes
  3. Duplicate detection after TLS handshake - Partial success: Prevents within-process duplicates, but doesn't fix
    cross-process inflation (the main problem)

Current Status:

  • Within-process duplicates: Fixed (0 duplicates found in same process)
  • Cross-process stale entries: Not fixed (70% inflation persists)

Question for OpenVPN Team:
Why would DCO fail to send DEL_PEER notifications for ~50% of disconnected peers, causing stale entries to persist indefinitely in userspace status files? Is this a known limitation with DCO keepalive mechanism, or is there a configuration/implementation issue we're missing?

Any guidance on how to ensure proper cleanup of stale DCO peer entries would be greatly appreciated.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions