Implementing AWS CodeBuild Batch Builds #31

harshavemula-ua · 2025-10-26T04:51:48Z

harshavemula-ua
Oct 26, 2025
Collaborator

Migration from Self-Hosted ARM Runners to AWS CodeBuild Batch Builds

Problem Statement & Proposed Solution

I think we can migrate our testing pipeline from self-hosted ARM runners to AWS CodeBuild batch builds. Currently, our tests run only on PRs using self-hosted ARM runners, with build times typically reaching 40 minutes per run. This creates significant developer friction—waiting nearly an hour for test feedback dramatically slows iteration cycles. More critically, our self-hosted runners are not ephemeral, meaning they maintain state between runs, can accumulate cruft over time, and may produce inconsistent test results due to environmental drift.

The proposed AWS CodeBuild solution addresses these issues by providing truly ephemeral compute environments that start clean for every build, execute 11 test suites in parallel (core processing, AWS integrations for short/medium-range/analysis-assim/retro, GCS sources, weights/gpkg, plotting, output formats, and NRDS pipeline), and allow us to customize CPU and RAM allocations per test suite based on actual needs. With fewer than 10 pipeline runs per day, CodeBuild's pay-per-minute pricing model is far more cost-effective than maintaining always-on self-hosted runners.

Built Both FP and FP-Deps in ~15mins on CodeBuild:-
A Workflow is tested on AWS CodeBuild to run the Forcing Processor (FP) tests in an ARM environment. Below is the reference to the configuration and project setup:

Buildspec.yml: https://github.com/CIROH-UA/forcingprocessor/blob/test_workflows/buildspec.yml

AWS CodeBuild Project: https://879381264451-2wnlq6cv.us-east-1.console.aws.amazon.com/codesuite/codebuild/879381264451/projects/test-code-build/batch/test-code-build%3A1b9f2a0c-aa29-412b-b6f9-51188783e310?region=us-east-1

Customizable Resource Allocation and Ephemeral Environments

CodeBuild provides critical advantages over self-hosted runners through its ephemeral architecture and flexible resource allocation. Each batch job starts with a completely clean environment—no leftover Docker images, no stale cache files, no configuration drift from previous runs. This eliminates an entire class of "works on my machine" and "flaky test" problems that plague persistent infrastructure. Additionally, we can configure different compute types for different test suites: memory-intensive tests (like weight generation) can get more RAM, while I/O-heavy tests (like AWS data fetching) can use optimized instance types. Self-hosted runners force us into a one-size-fits-all configuration where we're either over-provisioning (wasting resources) or under-provisioning (causing failures). CodeBuild's batch build graph also lets us express test dependencies explicitly—all tests depend on the image build, and cleanup runs after all tests complete—ensuring proper ordering without complex workflow logic.

Cost Efficiency and Usage-Based Billing

For our usage pattern of fewer than 10 builds per day, CodeBuild's pay-per-minute model is significantly more economical than self-hosted runners. With self-hosted ARM runners, we're paying for compute resources 24/7 regardless of whether they're running tests or sitting idle. At ~40 minutes per build × 10 builds/day = 400 minutes/day of actual usage, we're utilizing these runners less than 28% of the time, yet paying for 100% uptime. CodeBuild eliminates this waste—we pay only for the compute minutes actually used, and we get precise visibility into costs through CloudWatch metrics showing exactly which test suites consume how much time. The ephemeral nature also means no maintenance overhead: no OS updates, no Docker cleanup scripts, no monitoring for runner health. This operational simplicity translates to both direct cost savings (no idle compute) and indirect savings (reduced engineering time managing infrastructure).

I tested a workflow on AWS CodeBuild to run the Forcing Processor (FP) tests in an ARM environment. Below is the reference to the configuration and project setup:

Buildspec.yml: https://github.com/CIROH-UA/forcingprocessor/blob/test_workflows/buildspec.yml

AWS CodeBuild Project: https://879381264451-2wnlq6cv.us-east-1.console.aws.amazon.com/codesuite/codebuild/879381264451/projects/test-code-build/batch/test-code-build%3A1b9f2a0c-aa29-412b-b6f9-51188783e310?region=us-east-1

I'd appreciate feedback on this proposal. For teams currently using self-hosted runners with low utilization rates, this approach could significantly reduce both costs and operational complexity while improving test reliability. What concerns should we address before proceeding with this migration?

JordanLaserGit · 2025-10-28T18:55:22Z

JordanLaserGit
Oct 28, 2025
Maintainer

Thanks for taking the time to write this up @harshavemula-ua.

Initially I'd like to shy away from using a cloud provider to perform testing for us when it comes to testing the functionality in docker containers. I'm not sure the benefits outweigh the complexity that comes with an added layer. You mention ephemeral instances, but I believe github offers ephemeral instances. Is there a substantial cost difference? By default, I don't believe the Git actions preserve state from one workflow to the next. Have you observed this occurring?

You mentioned the speed increase, but it's not clear to me why things would be faster using CodeBuild. I don't believe you're building the container on arm based on this https://github.com/CIROH-UA/forcingprocessor/blob/test_workflows/buildspec.yml . So I think the speed increase you're observing is from not performing the arm build. If that build can be made faster, I don't see why not implementing that into the existing process.

Does CodeBuild provide logs so we could debug why these tests fail?

I'm not familiar with CodeBuild, so I may be overly wary and default to what I'm comfortable with (GitActions). Happy to continue a discourse here to see what is the best route for this project (and potentially others).

4 replies

harshavemula-ua Oct 28, 2025
Collaborator Author

You’re right that GitHub’s hosted runners are ephemeral and don’t preserve state between workflow runs. However, we’re currently relying on self-hosted ARM runners backed by long-lived EC2 instances. These do retain state across jobs — including leftover Docker layers, cached Python environments, and artifacts from previous PRs. I’ve already noticed cases where this leads to inconsistent test behavior and makes builds less reproducible unless we manually clean the runner.

That’s why I brought up AWS CodeBuild — it gives us native ARM environments that are fully ephemeral per build and also supports parallel batch testing, without us having to maintain custom ARM runners. This becomes especially important as we scale the number of ARM builds. For example, if we start running FP, DS, and other ARM workloads in parallel, a single shared self-hosted runner will become a bottleneck. All builds will compete for the same underlying EC2 resources, causing queue delays and slower builds. In contrast, CodeBuild provisions isolated compute for each build, so parallel jobs don’t impact each other’s performance.

I also want to clarify the buildspec you saw — that was something I was quickly testing for x86. I’ll share the actual ARM buildspec.

build_spec_bkp.yml

And to answer your question — yes, AWS CodeBuild does provide full logs for debugging. Every build automatically streams logs to CloudWatch Logs, and you can also enable S3 log exports if we want to persist logs per run (similar to how we track Step Function executions today). Debugging failed tests was pretty straightforward using the CloudWatch log stream for each test phase in the batch.

Just to note — CodeBuild is widely used in the industry for ARM workloads precisely because of its ephemerality, reproducibility, and cost control. But I’m not tied to CodeBuild — if there’s another clean way to achieve ARM test isolation + parallel scalability + minimal maintenance, I’m happy to explore it.

Happy to iterate based on whatever direction makes the most sense for the team.

JordanLaserGit Oct 28, 2025
Maintainer

I don't believe the ec2 instances are long lived. They shouldn't be. The infra creates and destroys the ec2 on every execution. Can you provide an example of how you're using the infra to issue jobs to the same instance from different workflows?

I'm not sure if there is a case when we will be running of the repositories actions in parallel, but if we did, the actions would run in parallel right? How is CodeBuild simplifying that? Again, the workflows should won't be using the same ec2 instance. Each workflow will create their own fresh instance from an AMI and terminate it after the command has either completed or failed.

harshavemula-ua Oct 28, 2025
Collaborator Author

I should’ve been more specific earlier — I’m not referring to the EC2 instances created by the Step Function workflow. Those are ephemeral and get spun up and terminated per execution as expected.

What I meant was the self-hosted ARM GitHub runner we currently use outside of the Step Function path. That runner is a long-lived EC2 instance registered with GitHub Actions, and since it isn’t recreated for every workflow, it can retain state between runs (Docker layers, cached files, temp artifacts, etc.). That’s what I was referring to when I mentioned environment drift and lack of isolation during ARM testing.

To clarify:
Step Function EC2 → ephemeral (used only for ARM integration testing)
Self-hosted ARM GitHub runner → long-lived (used for ARM build + test workflows)

The CodeBuild POC is focused only on replacing the self-hosted ARM runner for build/test of FP — not the Step Function–based ARM FP DS integration testing. Hope that clears it up!

JordanLaserGit Oct 28, 2025
Maintainer

Oh! I get what you're saying now. Yes I agree, we should not use any non-ephemeral hosts for testing. Can we switch the ARM Github runner to be ephemeral and launch from some basic AWS AMI? If not, we can either use the step function for the arm build and test workflows too (not just the datastreamcli integration) or go with CodeBuild if you feel that's the way to go there. If we are using AWS either way, I'm more amenable to using CodeBuild. I just figure the arm build is possible in github somehow, though it may not be possible for our use case.

Implementing AWS CodeBuild Batch Builds #31

Uh oh!

Uh oh!

harshavemula-ua Oct 26, 2025 Collaborator

Migration from Self-Hosted ARM Runners to AWS CodeBuild Batch Builds

Problem Statement & Proposed Solution

Customizable Resource Allocation and Ephemeral Environments

Cost Efficiency and Usage-Based Billing

Replies: 1 comment · 4 replies

Uh oh!

JordanLaserGit Oct 28, 2025 Maintainer

Uh oh!

Uh oh!

harshavemula-ua Oct 28, 2025 Collaborator Author

Uh oh!

Uh oh!

JordanLaserGit Oct 28, 2025 Maintainer

Uh oh!

harshavemula-ua Oct 28, 2025 Collaborator Author

Uh oh!

Uh oh!

JordanLaserGit Oct 28, 2025 Maintainer

harshavemula-ua
Oct 26, 2025
Collaborator

Replies: 1 comment 4 replies

JordanLaserGit
Oct 28, 2025
Maintainer

harshavemula-ua Oct 28, 2025
Collaborator Author

JordanLaserGit Oct 28, 2025
Maintainer

harshavemula-ua Oct 28, 2025
Collaborator Author

JordanLaserGit Oct 28, 2025
Maintainer