Skip to content

model_selection.split_data_train_validation_test: Test set starts one point too early #774

@don-alejandrino

Description

@don-alejandrino

When calling model_selection.split_data_train_validation_test() in backtest mode, the test set starts one point too early. Assume you have a DataFrame of size 100, and test_fraction is 0.1.
Then start_date_test = end_date - np.round(number_indices * test_fraction) * delta (line 176) implies that the test set starts 10 time steps earlier than the last timestamp. So, because we have 100 timestamps in this simple example, the test set starts from timestamp 90. Due to this off-by-one-error, the actual test set size is 11, instead of the expected 10.

A simple fix would be to change line 176 in model_selection/model_selection.py to

start_date_test = end_date - (np.round(number_indices * test_fraction) - 1) * delta

Metadata

Metadata

Assignees

Labels

fixSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions