Skip to content

Conversation

@JanMaartenvanDoorn
Copy link
Collaborator

@JanMaartenvanDoorn JanMaartenvanDoorn commented Dec 19, 2025

Changes proposed in this PR include:

@egordm egordm added feature New feature or request OpenSTEF 4.0 Work for OpenSTEF 4.0 labels Dec 19, 2025
Copy link
Collaborator

@egordm egordm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great changes. I have only a few nipicks about test and possibly moving some logic to timeseries dataset.

load_lag_PT1H=[1.0, np.nan, np.nan],
load_lag_PT2H=[4.0, 1.0, np.nan],
load_lag_PT3H=[7.0, 4.0, 1.0],
available_at=index,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually forecast input dataset no longer has an available at. So this could be removed for simplicity. If input dataset as an extra timestamp then it's horizon but in this case only one is supported.

Comment on lines +187 to +222
def _infer_frequency(index: pd.DatetimeIndex) -> pd.Timedelta:
"""Infer the frequency of a pandas DatetimeIndex if the freq attribute is not set.

This method calculates the most common time difference between consecutive timestamps,
which is more permissive of missing chunks of data than the pandas infer_freq method.

Args:
index (pd.DatetimeIndex): The datetime index to infer the frequency from.

Returns:
pd.Timedelta: The inferred frequency as a pandas Timedelta.

Raises:
ValueError: If the index has fewer than 2 timestamps.
"""
minimum_required_length = 2
if len(index) < minimum_required_length:
raise ValueError("Cannot infer frequency from an index with fewer than 2 timestamps.")

# Calculate the differences between consecutive timestamps
deltas = index.to_series().diff().dropna()

# Find the most common difference
return deltas.mode().iloc[0]

def _frequency_matches(self, index: pd.DatetimeIndex) -> bool:
"""Check if the frequency of the input data matches the model frequency.

Args:
index (pd.DatetimeIndex): The input data to check.

Returns:
bool: True if the frequencies match, False otherwise.
"""
input_frequency = self._infer_frequency(index) if index.freq is None else index.freq
return input_frequency == self.frequency
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be nice to move this to TimeSeriesDataset. To have something like one function called validate_sample_interval that checks the data against the set sample interval. If user wants to be sure they can call it.

It would make it easier to test and median model code would be a lot simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request OpenSTEF 4.0 Work for OpenSTEF 4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants