Glimpse of dataset - (2) real-data

Paper Writing 1/Experiments

Glimpse of dataset - (2) real-data

밤 편지 2024. 10. 27. 02:00

Previous works on time-series foundation models have shown that, to achieve good zero-shot forecasting performance, it is necessary to train on a large-scale time-series corpus that covers diverse domains, trends, seasonality patterns, and time granularities.

Additionally, MOIRAI has released a large dataset as part of its efforts to create a foundation model.

However, it is challenging for me to train my model on massive datasets due to limited time and resources. To address this problem, I decided to train my model on a combination of synthetic and real data.

I aimed to make my pretraining corpus as diverse as possible in terms of domains, patterns, and time frequencies.

Referring to previous work (TimesFM), I created a mixture of 80% real data and 20% synthetic data, ensuring that the real data was evenly weighted across all granularities.

(1) BuildingBench

105 variates with length 8,760

(2) LibCity

156 variates with length 2,976

(3) ClimateUSA

862 variates with length 16,470

(4) ClimateUSA precipitation

862 variates with length 11,323