전체 글
-
strict=FalsePaper Writing 1/Experiments 2024. 10. 30. 02:53
미스테리..strict=False 하면 되야하는 거 아녀..? 원래 vision checkpoint는 model 내에서 loading 했었는데, base model checkpoint로 finetuning하려고 load_state_dict하면서 strict=False해도,vision checkpoint 내놓으라고 해서.. model 내에서 loading하던 거랑 똑같이 해줬는데, glob.glob도 못읽고.. 그래서 hard coding해서 주소 넣어줬는데, 또, strict=False 안하고, mismatch나서.. 그래서 결국 내가 어떻게 했냐면.. ㅋㅋㅋㅋㅋ "vision_tower.vision_model.embeddings.patch_embedding.weight", "vision_tower..
-
멘붕Paper Writing 1/Experiments 2024. 10. 28. 22:26
아 진짜 우울하다.. 좌절 그 잡채. 기껏 코드 잘 짜놓고 다 잘 돌아가는데.. Siglip을 갖다 붙히기만 하면 너무 무거워져서.. training 시간이 엄청 길어지고, memory를 계속 잡아먹어서 학습을 진행할 수가 없다. vision_prompt=False argument 받아서, vision_prompting 기능을 turn off 해버려서, dataloader batch내에서 image 처리 하지 않도록 하고, model에서도 vision tower build하지 않도록 해도, (그래서 base model만 돌아가도록 해도) siglip이 붙어있는 거 만으로도 엄청나게 느려진다. base model 만 training한 checkpoint를 가져온 후에, vision_prompting 기능..
-
[breaktime] LLM's pattern recognitionPaper Writing 1/Experiments 2024. 10. 28. 15:00
Just for fun! I'm just curious about how can model recognize the patterns in the sequences. Later on, I'm planning to research on how the model with mixture of attention and s6 layer can recoginize various patterns in the sequences. I think it will be interesting. I trained my base model* (gpt2 frozen, wo/ additional information injection, only alignment(cross-attn) & head parameters updated) on..
-
Sanity checkPaper Writing 1/Experiments 2024. 10. 27. 16:17
* base model: GPT-2 without injecting any additional information The backbone model can be any LLM, but I used GPT-2 with 6 layers as default for simplicity. I may conduct an ablation study on different LLM model variants and sizes. Several previous studies have demonstrated that the scaling law also applies to time-series forecasting in relation to the number of model parameters and the size o..
-
Glimpse of dataset - (2) real-dataPaper Writing 1/Experiments 2024. 10. 27. 02:00
Previous works on time-series foundation models have shown that, to achieve good zero-shot forecasting performance, it is necessary to train on a large-scale time-series corpus that covers diverse domains, trends, seasonality patterns, and time granularities. Additionally, MOIRAI has released a large dataset as part of its efforts to create a foundation model. However, it is challenging for me t..
-
Glimpse of dataset - (1) synthetic time series generationPaper Writing 1/Experiments 2024. 10. 25. 23:50
Several studies have trained models using synthetic time series data as a complement to real-world data or even as a standalone approach, showing comparable zero-shot performance. (ForecastPFN, TimsFM, Fu et al., 2024) Given my limited resource budget, incorporating synthetic data into my training dataset is a viable option. Additionally, conducting an ablation study to evaluate the effectivene..
-
issue #1Paper Writing 1/Experiments 2024. 10. 24. 12:25
이 난관을 어떻게 타개할 것인가 batch 내에 image를 input으로 넣어주다보니, memory를 무섭게 잡아먹어서training이 지속되지 못하고 중단되는 현상이 발생한다. input image를 읽은 후에 객체를 del하고 garbage collect를 해주어도 memory 용량은 계속 줄어든다. 아마 뭔가 내가 할당된 memory를 완전히 반환하도록 하는 방법을 제대로 모르는 것 같다.그리고 설령, 그 방법을 찾더라도,이 작업을 data_loader의 batch마다 해주는 것만으로도 엄청나게 training 시간이 길어진다. 결국 memory 용량 부족으로 중단되기 전까지의 loss를 보면,학습은 잘 이루어지는 걸로 보인다. image를 prompt에 넣어서 추가 정보를 주는 게 내 모델의..
-
[TimesFM] A decoder-only foundation model for time-series forecastingPaper Writing 1/Related_Work 2024. 10. 22. 02:36
https://arxiv.org/pdf/2310.10688https://github.com/google-research/timesfm(Oct 2023 Google Research)AbstractMotivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised foreca..