Skip to content

rearchitect get parts urls #306

@thecaffiend

Description

@thecaffiend

Right now we have an endpoint that gives a bunch of pre-signed PUT parts urls for multipart upload given a number of parts. This is problematic in a number of instances. E.g. if you don't know the size of the file you're going to upload cause you're building it on the fly. Or if your on a network that's so low bandwidth the pre-signed urls expire before you get a chance to use them.

We are going to change how this works. Instead of asking for all parts urls up front, we'll just ask for them one at a time when they are needed. The front end will not need to provide a number of parts as we won't be asking for a specific number, just ad hoc.

The front end will have to change at the same time, or this endpoint will have to co-exist with the others at least until the front end can be changed to take advantage. It may make sense for both ways to exist since asking for all up front actually saves on some API hits.

This will probably end up being a direct integration (aws integration type in the openapi spec) instead of being a lambda. There is nothing we need to do that the s3 endpoint isn't already handling for us.

UPDATE 2025.10.01: We'll leave the way it is working when we implement this. It's better to use that when the upload file size is known (and the file can be uploaded w/in the expiration time of the pre-signed urls) as it saves on API calls, thus saving money.

It's also the case that just using the single file non-multipart upload would be preferred on a stable and sufficient network anyway (even fewer API calls)

UPDATE 2025.10.09: We will wait on implementation of this until we know what's going on in the way of tracking raw data to etl status and pipeline results. The natural way to do this issue is a direct proxy to the S3 api, but when tracking we need to generate some sort of id when upploading raw data, which means that when doing MPUs we'll need the complete MPU step (if not others) returning said id, which means that those have to be converted away from direct S3 proxy.

Metadata

Metadata

Assignees

Labels

P0Highest Priority

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions