Sanity check
< supervised long-term forecasting results of my base model* >
* base model: GPT-2 without injecting any additional information
The backbone model can be any LLM, but I used GPT-2 with 6 layers as default for simplicity.
I may conduct an ablation study on different LLM model variants and sizes. Several previous studies have demonstrated that the scaling law also applies to time-series forecasting in relation to the number of model parameters and the size of the training corpus.
content length 512 / forecasting horizon 96
1) ETTh1 : training epochs 10
512_96_MyModel_ETTh1_sl512_pl96_dm32_nh8_df128_0
test on the ETTh1 dataset: mse: 0.3996824, mae: 0.4219979
2) ETTm1: training epochs 10
512_96_MyModel_ETTm1_sl512_pl96_dm32_nh8_df128_0
test on the ETTm1 dataset: mse: 0.3175505, mae: 0.3626745
3) Weather : training epochs 1
512_96_MyModel_Weather_sl512_pl96_dm32_nh8_df32_0
test on the weather dataset: mse: 0.1589350, mae: 0.2111652
4) Electricity: training epochs 1
512_96_MyModel_ ECL _sl512_pl96_dm32_nh8_df32_0
test on the electricity dataset: mse: 0.1420454 , mae: 0.2483649
Some visualization (cherry picking)
1) ETTh1
2) ETTm1
3) Weather