-
Experimental results # 5Paper Writing 1/Experiments 2024. 11. 10. 00:46
* The effect of injecting auxiliary information via vision prompting
In this section, we assess whether incorporating exogenous information via vision prompts as a prefix helps guide the large language model (LLM) to improve forecasting accuracy.
To investigate the effect of vision prompting, we evaluate the forecasting performance of our pretrained model both with and without vision prompts on the Electricity consumption dataset and compare the results.
The electricity consumption data (kWh) were collected from Jun 1, 2022, to August 24, 2022, and include the hourly power usage of 100 distinct buildings. These buildings are classified into 11 categories: 'Others', 'Public', 'University', 'Data Center', 'Outlet', 'Hospital', 'Commercial', 'Residential', 'Research Institute', 'Knowledge Industry Center', 'Discount Mart', and 'Hotel', each exhibiting distinct energy consumption patterns.We fine-tune our pretrained base models on this dataset for 10 epochs. For the vision-prompted model, we additionally input an image corresponding to one of the 11 building categories. All other experimental configurations are kept identical.
We set the context length T = 512, and use three different prediction horizons H ∈ {96,192,336}. We omit the 720-horizon due to sequence length limitations. The evaluation metrics used are mean squared error (MSE) and mean absolute error (MAE).
Interestingly, the vision-prompted model outperforms the model without vision information across all prediction horizons, with the performance gap widening as the prediction horizon increases.
We further compare our approach to a wide range of state-of-the-art models, including recent LLM-based time series forecasting models.
Our baselines consist of several Transformer-based methods: PatchTST (2023), FEDformer (2022), Autoformer (2021), Informer (2021), and Reformer (2020) and recent competitive models such as Time-LLM (2023), GPT4TS (2023), DLinear (2023), and TimesNet (2023).
To ensure fair comparisons, we use the same experimental configurations across all baselines with a unified evaluation pipeline (https://github.com/thuml/Time-Series-Library).
For Time-LLM prompt bank, we use the following prompt templete: "The Electricity consumption data (kWh) was collected from June 1, 2022, to August 24, 2022, and includes the hourly power usage of 100 distinct buildings. These 100 buildings are classified into 11 categories: 'Others,' 'Public,' 'University,' 'Data Center,' 'Outlet,' 'Hospital,' 'Commercial,' 'Residential,' 'Research Institute,' 'Knowledge Industry Center,' 'Discount Mart,' and 'Hotel.' Each category exhibits distinct energy consumption patterns."
Interestingly, most of the baseline models exhibit a significant degradation in performance when the prediction horizon is set to 336 compared to 96 or 192. This is likely due to the reduced training split ratio we applied for the 336 horizon, necessitated by the sequence length limitation. This result suggests that under data-scarcity conditions, many baseline models struggle to accurately extrapolate future trajectories. In contrast, our model demonstrates a relatively smaller performance drop as the prediction horizon increases and the amount of training data decreases, especially when vision prompting is incorporated.
We attribute this impressive results not only to the sucessful activation of the LLM's generalization capabilities through the alignment of time series patches with word embeddings, followed by subsequent pretraining on a large corpus of time series data, but also to the LLM's multi-modal reasoning capability enabled by incorporating visual information through prompting.
'Paper Writing 1 > Experiments' 카테고리의 다른 글
To-do list & Time-line (0) 2024.11.13 vision prompting 결과에 대한 고찰 (0) 2024.11.10 base model의 zero-shot & in-distribution 성능에 대한 고찰 (0) 2024.11.07 Experimental results # 4 (0) 2024.11.06 Experimental results # 3 (0) 2024.11.06