Skip to content

Conversation

@mlejva
Copy link
Member

@mlejva mlejva commented Jan 20, 2026

Add a design document for the persistent filesystem feature to outline its architecture, API, and implementation plan.


Slack Thread

Open in Cursor Open in Web

This design document covers the V0 implementation of persistent volumes that can
be mounted in sandboxes and accessed via SDK/API:

- NFS v3 proxy server using willscott/go-nfs
- GCS backend for file storage
- Database schema for filesystem metadata
- API endpoints for filesystem CRUD and file operations
- SDK interface for TypeScript/JavaScript and Python
- Volume mount configuration for sandbox creation
- Security and observability considerations

Based on discussion in #proj-filesystem-persistence Slack channel.

Co-authored-by: vasek <vasek@e2b.dev>
@cursor
Copy link

cursor bot commented Jan 20, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

team_id UUID NOT NULL REFERENCES teams(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
size_bytes BIGINT NOT NULL, -- Quota/limit for the volume
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL schema defines size_bytes BIGINT NOT NULL but the API spec uses sizeGB (integer gigabytes). This creates a conversion point that could lead to precision issues or overflow. Consider using bytes consistently throughout the API and SDK, or clearly document the conversion behavior (e.g., does 1.5GB get truncated to 1GB?).

created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),

UNIQUE(team_id, name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema is missing updated_at timestamp update trigger. Based on existing patterns in the codebase (e.g., migrations/20231124185944_create_schemas_and_tables.sql), tables typically need a trigger to auto-update the updated_at field. Without this, the timestamp will never change after creation.

UNIQUE(team_id, name)
);

-- Index for efficient team lookups
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing RLS (Row Level Security) policy. Based on existing patterns in the codebase, tables should have ENABLE ROW LEVEL SECURITY and appropriate policies to prevent cross-team data access. Without this, the security model relies solely on application-level checks, which is less secure than database-level enforcement.

type: integer
minimum: 1
maximum: 100
description: Storage quota in gigabytes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API spec allows maximum 100GB quotas (maximum: 100) but doesn't define minimum constraints. Should there be a minimum size (e.g., 1GB)? Also, consider if 100GB is appropriate for a PoC - this could lead to high storage costs if many users create max-size volumes.

tags: [filesystems]
security:
- ApiKeyAuth: []
parameters:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path parameter in file operations endpoints uses wildcard matching but lacks validation constraints. File paths need sanitization to prevent directory traversal attacks (e.g., ../../etc/passwd). The design should specify: (1) path validation rules, (2) whether absolute vs relative paths are allowed, and (3) how to handle symbolic links.


### Concurrency Considerations

For V0, we allow concurrent access via NFS (no explicit locking):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCS provides eventual consistency for concurrent writes, which means multiple sandboxes writing to the same file could result in data loss or corruption. The last write wins, but there's no guarantee of ordering. This is a significant data integrity risk for concurrent access scenarios. Consider adding file locking mechanisms or documenting this limitation prominently.

2. **Orchestrator** resolves filesystem IDs to NFS server addresses
3. **Firecracker VM** starts with NFS client configured
4. **Init script** mounts NFS shares to specified paths:
```bash
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using NFS nolock option (line 236) disables file locking entirely. This combined with eventual consistency in GCS means there's no coordination mechanism for concurrent access. This could lead to: (1) corrupted files from concurrent writes, (2) partial reads of files being written, (3) race conditions in file creation/deletion. The design needs either locking support or very clear documentation of these limitations.

"createdAt": "2026-01-20T12:00:00Z"
}
]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design doesn't specify what happens when a filesystem is deleted while it's still mounted in active sandboxes. This could lead to: (1) sandboxes experiencing I/O errors, (2) stale NFS mounts, (3) orphaned data. Consider adding reference counting or preventing deletion of in-use filesystems.

```typescript
// Create a filesystem
const filesystem = await Filesystem.create('my-volume', '5gb')

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SDK allows mounting by filesystem name ('my-volume') in addition to ID, but name resolution isn't specified. Since names are only unique within a team, the orchestrator needs to resolve names to IDs during sandbox creation. This adds latency and a potential point of failure. Consider whether name-based mounting is necessary for v0 or if ID-only would be simpler.

### NFS Proxy → GCS Mapping

| NFS Operation | GCS Operation |
|---------------|---------------|
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NFS WRITE operations are mapped to GCS objects.insert, but GCS objects are immutable. Each write creates a new object version. For frequently modified files, this will: (1) generate excessive object versions consuming storage, (2) increase costs significantly, (3) require lifecycle policies for cleanup. Consider using objects.patch or implementing a write-back cache to batch updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants