This is the official repository for versioned input databases used by the Open Energy Outlook (OEO) initiative. It contains a command-line tool (datamanager) designed to manage these Temoa-compatible SQLite databases using a secure, auditable, and CI/CD-driven workflow.
The SQLite databases hosted here are designed to be used as inputs for Temoa, an open-source energy system optimization model. This data is curated and maintained by the Open Energy Outlook (OEO) team. The goal is to provide a transparent, version-controlled, and publicly accessible set of data for energy systems modeling and analysis.
The system works by treating your Git repository as a source of truth for metadata. The final publication of data is handled by a trusted, automated GitHub Actions workflow after a Pull Request has been reviewed and merged.
This two-phase process ensures security and consistency:
- Prepare Phase (Local): A developer prepares a new data version. The large file is uploaded to a temporary staging bucket, and a change to
manifest.jsonis proposed. - Publish Phase (Automated): After the proposal is approved and merged into the
mainbranch, a GitHub Action performs a secure, server-side copy from the staging bucket to the final production bucket, making the data live.
---
config:
layout: elk
theme: mc
look: classic
---
flowchart TD
subgraph Developer_Machine["Developer_Machine"]
B["Staging R2 Bucket"]
A["datamanager prepare"]
C["New Git Branch"]
D["GitHub"]
end
subgraph GitHub["GitHub"]
E["Open Pull Request"]
F["main branch"]
G["Publish Workflow"]
end
subgraph Cloudflare_R2["Cloudflare_R2"]
H["Production R2 Bucket"]
end
A -- Upload GBs --> B
A -- Commit manifest change --> C
C -- Push Branch --> D
D --> E
E -- Review & Merge --> F
F -- Triggers --> G
G -- "Server-Side Copy" --> B
B -- "...to" --> H
G -- Finalize manifest --> F
- CI/CD-Driven Publishing: Data publication is transactional and automated via GitHub Actions after a pull request is merged, preventing inconsistent states.
- Enhanced Security: Production credentials are never stored on developer machines; they are only used by the trusted GitHub Actions runner.
- Interactive TUI: Run
datamanagerwith no arguments for a user-friendly, menu-driven interface. - Data Lifecycle Management: A full suite of commands for rollback, deletion, and pruning, all gated by the same secure PR workflow.
- Integrity Verification: All downloaded files are automatically checked against their SHA256 hash from the manifest.
- Credential Verification: A detailed verify command reports read/write/delete permissions for both production and staging buckets.
- Python 3.12+
- Git
sqlite3command-line tool- An active Cloudflare account with two R2 buckets (one for production, one for staging).
- For the data in this repo, contact the OEO team for access to the R2 buckets.
-
Clone the Repository:
git clone git@github.com:TemoaProject/data.git cd data -
Install Dependencies: This project uses and recommends
uvfor fast and reliable dependency management.# Create a virtual environment and install dependencies uv venv source .venv/bin/activate uv pip install -e .
The
-eflag installs the package in "editable" mode, so changes to the source code are immediately reflected. -
Configure Environment Variables: The tool is configured using a
.envfile. Create one by copying the example:cp .env.example .env
Now, edit the
.envfile with your Cloudflare R2 credentials. This file should be in your.gitignoreand never committed to the repository..env# Get these from your Cloudflare R2 dashboard R2_ACCOUNT_ID="your_cloudflare_account_id" R2_ACCESS_KEY_ID="your_r2_access_key" R2_SECRET_ACCESS_KEY="your_r2_secret_key" R2_PRODUCTION_BUCKET="your-production-bucket-name" R2_STAGING_BUCKET="your-staging-bucket-name"
-
Verify Configuration: Run the
verifycommand to ensure your credentials and bucket access are correct.uv run datamanager verify
All changes to the data—whether creating, updating, or deleting—follow a strict, safe, and reviewable Git-based workflow.
Always start by creating a new branch from the latest version of main. This isolates your changes.
git checkout main
git pull
git checkout -b feat/update-energy-dataUse the datamanager tool to stage your changes. The prepare command handles both creating new datasets and updating existing ones.
# This uploads the file to the staging bucket and updates manifest.json locally
uv run datamanager prepare energy-data.sqlite ./local-files/new-energy.sqliteThe tool will guide you through the process. For other maintenance tasks like rollback or delete, use the corresponding command.
Commit the modified manifest.json file to your branch with a descriptive message. This message will become the official description for the new data version.
git add manifest.json
git commit -m "feat: Add 2025 energy data with new technology columns"
git push --set-upstream origin feat/update-energy-dataGo to GitHub and open a pull request from your feature branch to main. The diff will clearly show the proposed changes to the manifest for your team to review.
Once the PR is reviewed, approved, and all status checks pass, merge it. The CI/CD pipeline takes over automatically:
- It copies the data from the staging bucket to the production bucket.
- It finalizes the
manifest.jsonwith the new commit hash and description. - It pushes a final commit back to
main.
The new data version is now live and available to all users via datamanager pull.
The primary workflow is now to prepare a dataset, then use standard Git practices to propose the change.
For a guided experience, simply run the command with no arguments:
uv run datamanagerThis will launch a menu where you can choose your desired action, including the new "Prepare a dataset for release" option.
You can also use the command-line interface directly for specific tasks or for scripting purposes.
Prepares a dataset for release by uploading it to the staging area and updating the manifest locally. This command intelligently handles both creating new datasets and updating existing ones.
This is the first step of the new workflow.
uv run datamanager prepare <dataset-name.sqlite> <path/to/local/file.sqlite>After running prepare, follow the on-screen instructions:
git add manifest.jsongit commit -m "Your descriptive message"git push- Open a Pull Request in GitHub.
Lists all datasets currently tracked in manifest.json.
uv run datamanager list-datasetsDownloads a dataset from the production R2 bucket and verifies its integrity.
# Pull the latest version
uv run datamanager pull user-profiles.sqlite
# Pull a specific version
uv run datamanager pull user-profiles.sqlite --version v2Prepares a rollback to a previous stable version by creating a new version entry that points to the old data.
uv run datamanager rollback <dataset-name.sqlite> --to-version v1Prepares the permanent deletion of an entire dataset and all its versions. Requires strong confirmation.
uv run datamanager delete <dataset-name.sqlite>Prepares the permanent deletion of old versions of a dataset, keeping a specified number of recent versions.
uv run datamanager prune-versions <dataset-name.sqlite> --keep 5Checks R2 credentials and reports granular read/write/delete permissions for both production and staging buckets.
uv run datamanager verifyTo contribute to the tool's development:
-
Install development dependencies using
uv pip install -e .[dev]. -
Run the test suite using
pytest:uv run pytest
-
For code quality checks, run
pre-commit:uv run pre-commit run --all-files





