数据集:
monash_tsf
任务:
计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
no-annotation源数据集:
original许可:
The first comprehensive time series forecasting repository containing datasets of related time series to facilitate the evaluation of global forecasting models. All datasets are intended to use only for research purpose. Our repository contains 30 datasets including both publicly available time series datasets (in different formats) and datasets curated by us. Many datasets have different versions based on the frequency and the inclusion of missing values, making the total number of dataset variations to 58. Furthermore, it includes both real-world and competition time series datasets covering varied domains.
The following table shows a list of datasets available:
| Name | Domain | No. of series | Freq. | Pred. Len. | Source |
|---|---|---|---|---|---|
| weather | Nature | 3010 | 1D | 30 | Sparks et al., 2020 |
| tourism_yearly | Tourism | 1311 | 1Y | 4 | Athanasopoulos et al., 2011 |
| tourism_quarterly | Tourism | 1311 | 1Q-JAN | 8 | Athanasopoulos et al., 2011 |
| tourism_monthly | Tourism | 1311 | 1M | 24 | Athanasopoulos et al., 2011 |
| cif_2016 | Banking | 72 | 1M | 12 | Stepnicka and Burda, 2017 |
| london_smart_meters | Energy | 5560 | 30T | 60 | Jean-Michel, 2019 |
| australian_electricity_demand | Energy | 5 | 30T | 60 | Godahewa et al. 2021 |
| wind_farms_minutely | Energy | 339 | 1T | 60 | Godahewa et al. 2021 |
| bitcoin | Economic | 18 | 1D | 30 | Godahewa et al. 2021 |
| pedestrian_counts | Transport | 66 | 1H | 48 | City of Melbourne, 2020 |
| vehicle_trips | Transport | 329 | 1D | 30 | fivethirtyeight, 2015 |
| kdd_cup_2018 | Nature | 270 | 1H | 48 | KDD Cup, 2018 |
| nn5_daily | Banking | 111 | 1D | 56 | Ben Taieb et al., 2012 |
| nn5_weekly | Banking | 111 | 1W-MON | 8 | Ben Taieb et al., 2012 |
| kaggle_web_traffic | Web | 145063 | 1D | 59 | Google, 2017 |
| kaggle_web_traffic_weekly | Web | 145063 | 1W-WED | 8 | Google, 2017 |
| solar_10_minutes | Energy | 137 | 10T | 60 | Solar, 2020 |
| solar_weekly | Energy | 137 | 1W-SUN | 5 | Solar, 2020 |
| car_parts | Sales | 2674 | 1M | 12 | Hyndman, 2015 |
| fred_md | Economic | 107 | 1M | 12 | McCracken and Ng, 2016 |
| traffic_hourly | Transport | 862 | 1H | 48 | Caltrans, 2020 |
| traffic_weekly | Transport | 862 | 1W-WED | 8 | Caltrans, 2020 |
| hospital | Health | 767 | 1M | 12 | Hyndman, 2015 |
| covid_deaths | Health | 266 | 1D | 30 | Johns Hopkins University, 2020 |
| sunspot | Nature | 1 | 1D | 30 | Sunspot, 2015 |
| saugeenday | Nature | 1 | 1D | 30 | McLeod and Gweon, 2013 |
| us_births | Health | 1 | 1D | 30 | Pruim et al., 2020 |
| solar_4_seconds | Energy | 1 | 4S | 60 | Godahewa et al. 2021 |
| wind_4_seconds | Energy | 1 | 4S | 60 | Godahewa et al. 2021 |
| rideshare | Transport | 2304 | 1H | 48 | Godahewa et al. 2021 |
| oikolab_weather | Nature | 8 | 1H | 48 | Oikolab |
| temperature_rain | Nature | 32072 | 1D | 30 | Godahewa et al. 2021 |
To load a particular dataset just specify its name from the table above e.g.:
load_dataset("monash_tsf", "nn5_daily")
Notes:
The univariate time series forecasting tasks involves learning the future one dimensional target values of a time series in a dataset for some prediction_length time steps. The performance of the forecast models can then be validated via the ground truth in the validation split and tested via the test split.
multivariate-time-series-forecastingThe multivariate time series forecasting task involves learning the future vector of target values of a time series in a dataset for some prediction_length time steps. Similar to the univariate setting the performance of a multivariate model can be validated via the ground truth in the validation split and tested via the test split.
A sample from the training set is provided below:
{
'start': datetime.datetime(2012, 1, 1, 0, 0),
'target': [14.0, 18.0, 21.0, 20.0, 22.0, 20.0, ...],
'feat_static_cat': [0],
'feat_dynamic_real': [[0.3, 0.4], [0.1, 0.6], ...],
'item_id': '0'
}
For the univariate regular time series each series has the following keys:
For the multivariate time series the target is a vector of the multivariate dimension for each time point.
The datasets are split in time depending on the prediction length specified in the datasets. In particular for each time series in a dataset there is a prediction length window of the future in the validation split and another prediction length more in the test split.
To facilitate the evaluation of global forecasting models. All datasets in our repository are intended for research purposes and to evaluate the performance of new forecasting algorithms.
Out of the 30 datasets, 23 were already publicly available in different platforms with different data formats. The original sources of all datasets are mentioned in the datasets table above.
After extracting and curating these datasets, we analysed them individually to identify the datasets containing series with different frequencies and missing observations. Nine datasets contain time series belonging to different frequencies and the archive contains a separate dataset per each frequency.
Who are the source language producers?The data comes from the datasets listed in the table above.
The annotations come from the datasets listed in the table above.
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Creative Commons Attribution 4.0 International
@InProceedings{godahewa2021monash,
author = "Godahewa, Rakshitha and Bergmeir, Christoph and Webb, Geoffrey I. and Hyndman, Rob J. and Montero-Manso, Pablo",
title = "Monash Time Series Forecasting Archive",
booktitle = "Neural Information Processing Systems Track on Datasets and Benchmarks",
year = "2021",
note = "forthcoming"
}
Thanks to @kashif for adding this dataset.