-
Glimpse of dataset - (2) real-dataPaper Writing 1/Experiments 2024. 10. 27. 02:00
Previous works on time-series foundation models have shown that, to achieve good zero-shot forecasting performance, it is necessary to train on a large-scale time-series corpus that covers diverse domains, trends, seasonality patterns, and time granularities.
Additionally, MOIRAI has released a large dataset as part of its efforts to create a foundation model.
However, it is challenging for me to train my model on massive datasets due to limited time and resources. To address this problem, I decided to train my model on a combination of synthetic and real data.
I aimed to make my pretraining corpus as diverse as possible in terms of domains, patterns, and time frequencies.
Referring to previous work (TimesFM), I created a mixture of 80% real data and 20% synthetic data, ensuring that the real data was evenly weighted across all granularities.
(1) BuildingBench
105 variates with length 8,760
(2) LibCity
156 variates with length 2,976
(3) ClimateUSA
862 variates with length 16,470
(4) ClimateUSA precipitation
862 variates with length 11,323
'Paper Writing 1 > Experiments' 카테고리의 다른 글
[breaktime] LLM's pattern recognition (0) 2024.10.28 Sanity check (1) 2024.10.27 Glimpse of dataset - (1) synthetic time series generation (0) 2024.10.25 issue #1 (0) 2024.10.24 Modeling (0) 2024.10.20