-
Notifications
You must be signed in to change notification settings - Fork 39
Implement-median-model-v4 #793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/v4.0.0
Are you sure you want to change the base?
Conversation
egordm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great changes. I have only a few nipicks about test and possibly moving some logic to timeseries dataset.
| load_lag_PT1H=[1.0, np.nan, np.nan], | ||
| load_lag_PT2H=[4.0, 1.0, np.nan], | ||
| load_lag_PT3H=[7.0, 4.0, 1.0], | ||
| available_at=index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually forecast input dataset no longer has an available at. So this could be removed for simplicity. If input dataset as an extra timestamp then it's horizon but in this case only one is supported.
| def _infer_frequency(index: pd.DatetimeIndex) -> pd.Timedelta: | ||
| """Infer the frequency of a pandas DatetimeIndex if the freq attribute is not set. | ||
|
|
||
| This method calculates the most common time difference between consecutive timestamps, | ||
| which is more permissive of missing chunks of data than the pandas infer_freq method. | ||
|
|
||
| Args: | ||
| index (pd.DatetimeIndex): The datetime index to infer the frequency from. | ||
|
|
||
| Returns: | ||
| pd.Timedelta: The inferred frequency as a pandas Timedelta. | ||
|
|
||
| Raises: | ||
| ValueError: If the index has fewer than 2 timestamps. | ||
| """ | ||
| minimum_required_length = 2 | ||
| if len(index) < minimum_required_length: | ||
| raise ValueError("Cannot infer frequency from an index with fewer than 2 timestamps.") | ||
|
|
||
| # Calculate the differences between consecutive timestamps | ||
| deltas = index.to_series().diff().dropna() | ||
|
|
||
| # Find the most common difference | ||
| return deltas.mode().iloc[0] | ||
|
|
||
| def _frequency_matches(self, index: pd.DatetimeIndex) -> bool: | ||
| """Check if the frequency of the input data matches the model frequency. | ||
|
|
||
| Args: | ||
| index (pd.DatetimeIndex): The input data to check. | ||
|
|
||
| Returns: | ||
| bool: True if the frequencies match, False otherwise. | ||
| """ | ||
| input_frequency = self._infer_frequency(index) if index.freq is None else index.freq | ||
| return input_frequency == self.frequency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be nice to move this to TimeSeriesDataset. To have something like one function called validate_sample_interval that checks the data against the set sample interval. If user wants to be sure they can call it.
It would make it easier to test and median model code would be a lot simpler.
Changes proposed in this PR include: