Implementing AWS CodeBuild Batch Builds #31
Replies: 1 comment 4 replies
-
|
Thanks for taking the time to write this up @harshavemula-ua. Initially I'd like to shy away from using a cloud provider to perform testing for us when it comes to testing the functionality in docker containers. I'm not sure the benefits outweigh the complexity that comes with an added layer. You mention ephemeral instances, but I believe github offers ephemeral instances. Is there a substantial cost difference? By default, I don't believe the Git actions preserve state from one workflow to the next. Have you observed this occurring? You mentioned the speed increase, but it's not clear to me why things would be faster using CodeBuild. I don't believe you're building the container on arm based on this https://github.com/CIROH-UA/forcingprocessor/blob/test_workflows/buildspec.yml . So I think the speed increase you're observing is from not performing the arm build. If that build can be made faster, I don't see why not implementing that into the existing process. Does CodeBuild provide logs so we could debug why these tests fail? I'm not familiar with CodeBuild, so I may be overly wary and default to what I'm comfortable with (GitActions). Happy to continue a discourse here to see what is the best route for this project (and potentially others). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Migration from Self-Hosted ARM Runners to AWS CodeBuild Batch Builds
Problem Statement & Proposed Solution
I think we can migrate our testing pipeline from self-hosted ARM runners to AWS CodeBuild batch builds. Currently, our tests run only on PRs using self-hosted ARM runners, with build times typically reaching 40 minutes per run. This creates significant developer friction—waiting nearly an hour for test feedback dramatically slows iteration cycles. More critically, our self-hosted runners are not ephemeral, meaning they maintain state between runs, can accumulate cruft over time, and may produce inconsistent test results due to environmental drift.
The proposed AWS CodeBuild solution addresses these issues by providing truly ephemeral compute environments that start clean for every build, execute 11 test suites in parallel (core processing, AWS integrations for short/medium-range/analysis-assim/retro, GCS sources, weights/gpkg, plotting, output formats, and NRDS pipeline), and allow us to customize CPU and RAM allocations per test suite based on actual needs. With fewer than 10 pipeline runs per day, CodeBuild's pay-per-minute pricing model is far more cost-effective than maintaining always-on self-hosted runners.
Built Both FP and FP-Deps in ~15mins on CodeBuild:-
A Workflow is tested on AWS CodeBuild to run the Forcing Processor (FP) tests in an ARM environment. Below is the reference to the configuration and project setup:
Buildspec.yml: https://github.com/CIROH-UA/forcingprocessor/blob/test_workflows/buildspec.yml
AWS CodeBuild Project: https://879381264451-2wnlq6cv.us-east-1.console.aws.amazon.com/codesuite/codebuild/879381264451/projects/test-code-build/batch/test-code-build%3A1b9f2a0c-aa29-412b-b6f9-51188783e310?region=us-east-1
Customizable Resource Allocation and Ephemeral Environments
CodeBuild provides critical advantages over self-hosted runners through its ephemeral architecture and flexible resource allocation. Each batch job starts with a completely clean environment—no leftover Docker images, no stale cache files, no configuration drift from previous runs. This eliminates an entire class of "works on my machine" and "flaky test" problems that plague persistent infrastructure. Additionally, we can configure different compute types for different test suites: memory-intensive tests (like weight generation) can get more RAM, while I/O-heavy tests (like AWS data fetching) can use optimized instance types. Self-hosted runners force us into a one-size-fits-all configuration where we're either over-provisioning (wasting resources) or under-provisioning (causing failures). CodeBuild's batch build graph also lets us express test dependencies explicitly—all tests depend on the image build, and cleanup runs after all tests complete—ensuring proper ordering without complex workflow logic.
Cost Efficiency and Usage-Based Billing
For our usage pattern of fewer than 10 builds per day, CodeBuild's pay-per-minute model is significantly more economical than self-hosted runners. With self-hosted ARM runners, we're paying for compute resources 24/7 regardless of whether they're running tests or sitting idle. At ~40 minutes per build × 10 builds/day = 400 minutes/day of actual usage, we're utilizing these runners less than 28% of the time, yet paying for 100% uptime. CodeBuild eliminates this waste—we pay only for the compute minutes actually used, and we get precise visibility into costs through CloudWatch metrics showing exactly which test suites consume how much time. The ephemeral nature also means no maintenance overhead: no OS updates, no Docker cleanup scripts, no monitoring for runner health. This operational simplicity translates to both direct cost savings (no idle compute) and indirect savings (reduced engineering time managing infrastructure).
I tested a workflow on AWS CodeBuild to run the Forcing Processor (FP) tests in an ARM environment. Below is the reference to the configuration and project setup:
Buildspec.yml: https://github.com/CIROH-UA/forcingprocessor/blob/test_workflows/buildspec.yml
AWS CodeBuild Project: https://879381264451-2wnlq6cv.us-east-1.console.aws.amazon.com/codesuite/codebuild/879381264451/projects/test-code-build/batch/test-code-build%3A1b9f2a0c-aa29-412b-b6f9-51188783e310?region=us-east-1

I'd appreciate feedback on this proposal. For teams currently using self-hosted runners with low utilization rates, this approach could significantly reduce both costs and operational complexity while improving test reliability. What concerns should we address before proceeding with this migration?
Beta Was this translation helpful? Give feedback.
All reactions