Skip to content

Conversation

@damonrand
Copy link
Contributor

Summary

When simulating periods with sparse missing data (e.g., 30 half-hour slots out of 17,520 in a full year), the entire summary would show NaN values for rates and costs. This made annual simulations unusable when even 0.17% of imbalance pricing data was missing.

Root cause

Both aggregation paths used skipna=False, causing any NaN to propagate:

  1. output.py: safe_average() used np.average() which returns NaN if any input value is NaN
  2. breakdown.py: cost totals used .sum(skipna=False).sum(skipna=False) which returns NaN if any cell is NaN

Solution

Add a 5% NaN threshold to both aggregation functions:

  • safe_average(): Filter out NaN values and their weights before calculating weighted average. Only return NaN if >5% of data is missing.
  • safe_sum(): New helper function that uses np.nansum() to sum valid values. Only return NaN if >5% of data is missing.

This allows simulations to complete with valid results when small amounts of data are missing, while still flagging unreliable results when too much data (>5%) is absent.

Files changed

  • output.py: Enhanced safe_average() with NaN threshold
  • breakdown.py: Added safe_sum() helper, replaced double sum calls
  • pyproject.toml: Bump version to 2.0.1

Test plan

  • Full year 2025 Findhorn simulation now completes with valid BESS gains
  • All 4 scenarios (Baseline, Bess500_500, Bess1000_1000, Bess500_1000) produce valid results
  • Lint passes

Problem:
When simulating periods with sparse missing data (e.g., 30 half-hour
slots out of 17,520 in a full year), the entire summary would show
NaN values for rates and costs. This made annual simulations unusable
when even 0.17% of imbalance pricing data was missing.

Root cause:
Both aggregation paths used skipna=False, causing any NaN to propagate:
1. output.py: safe_average() used np.average() which returns NaN if
   any input value is NaN
2. breakdown.py: cost totals used .sum(skipna=False).sum(skipna=False)
   which returns NaN if any cell is NaN

Solution:
Add a 5% NaN threshold to both aggregation functions:
- safe_average(): Filter out NaN values and their weights before
  calculating weighted average. Only return NaN if >5% of data is missing.
- safe_sum(): New helper function that uses np.nansum() to sum valid
  values. Only return NaN if >5% of data is missing.

This allows simulations to complete with valid results when small
amounts of data are missing, while still flagging unreliable results
when too much data (>5%) is absent.

Files changed:
- output.py: Enhanced safe_average() with NaN threshold
- breakdown.py: Added safe_sum() helper, replaced double sum calls
- pyproject.toml: Bump version to 2.0.1
@damonrand damonrand merged commit b317512 into main Jan 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants