Interesting topic to think about

Causality/thought 2025. 3. 19. 20:08

With DoWhy, one of the projects I'm excited about is a PyWhy LLM. Right now, it's an experimental library starting up but what we're looking at is how we can incorporate LLMs into the analysis process with DoWhy. So using PyWhy LLM to help people use LLMs to generate causal graphs and to refute, to critique their assumptions. So plugging in certainly at the beginning and the end of this four-step analysis process and then experimenting with an opportunities to do, maybe like identification style analyses, for example, using domain knowledge to identify potential instrumental variables and also maybe even providing support to code up analyses as well. So those are a little bit tentative but we're relatively confident that will be able to use LLMs in some way to bootstrap assumptions and critique assumptions.

Causality recently is not always explicitly sometimes implicitly but at the forefront of recent most hot discussions, most engaged discussions about artificial intelligence it comes from the fact that the models like GPT then Sora and recently released large world model, all of this stuff is going into the direction of modeling the world in some ways.

And if you want to model the world in a way that is sound, causality in my opinion is a necessary element of such a model. What are your thoughts about all those generative and non-generative methods that we have today and do you think that those models can learn world models or causal world models or approximately causal world models?

I think it's plausible to think that they can. I think with current like large language models, you can imagine that the amount of data that they've seen has let them to observe many counterfactual scenarios and it's pausible that could lead them to actually model a true causal model if that was the most efficient representation for example. However I don't think that we've necessarily done that on purpose and I certainly think that even if this was true it we would probably only have observed counterfactuals for certain, you wouldn't have population support right at some point and once you start extrapolating, not clear to me what would happen. So do I believe that it's possible for them to learn causal models? I think it's possible. do I think that they are? No, not now.

Then there's a second meta question especially on the language models which is, they're not actually modeling the world, they're modeling language. So now the question is, if they are learning a causal model, they're learning a causal model of language, which is not the same thing as learning a causal model of the world. And so then I think we have to think about what would it mean for them to learn a model of language and then, at what point would we think that leads to something more deeper, that's a very squishy question, very ill-defined. Probably the formal way to say it, as we move foundation models to operate over different kinds of data, not language, but more direct observations of the world, I think that'll give us an opportunity to think more clearly about what the models are actually capturing. I think that's very interesting.

What you said about the support the population, support for those models, this means that observing the full scope of possible situations. That rings a bell for me, it's very close to my thinking about these models as well especially when Sora was released, and OpenAI suggested that Sora is a physics simulator, I thought this is an overstatement and I think you can see this in the video. It can simulate physics in certain areas but not necessarily in other areas. And when we think about it from this point of view, we can take broadly speaking perhaps two perspectives here, one would be that the model only learns to predict something so it just learns certain shortcut that leads to a plausibly looking output. Or it really learns a function that is a correct or approximately correct function of how the world works locally. What are your thoughts about this and which of those ways do you see as more plausible?

I think you're right that if anything it's learning an approximate local simulation, and I think that's quite reasonable in some ways it ties into questions about whether it's okay for these models to be generating ungrounded responses or hallucinations if you're doing creative writing, yes, that's perfectly fine, it's part of the task, but if you're summarizing a conversation, and you want to make sure that summary is accurate, no, it's not okay.

...

I'm trying to figure out how LLMs might behave causally I need to get my hands dirty to really give myself confidence about any intuition, and I also try and think about how we might use these models not just on their own terms, text in text out, but also thinking about what we can be doing at different layers of the inference stack to control their behavior what are all the knobs that we might use to influence whether a foundation model is appropriate for a particular task or make it appropriate for a particular task.

...

The major effort that I'm excited about in the context of causality is, there's two directions. So one is continuing to push making it more practical to use large language models to support people in the standard causal analysis process. So how can we use large language models to suggest causal graphs, suggest potential missing data and missing confounders and critique the analysis process as people put them together.

The second direction that might be starting up is looking at how foundation models might help us better model, more complex physics style systems. I think that's very early going but it's something I'm starting to look at to see if it's plausible.

흥미로운 guest talk.

학교 오가는 지하철에서 흥미진진하게 시청했다.

causal inference와 LLM을 결합했을 때 어떤 opportunity가 있을까, 어떤 limitation을 경계하여야 할까에 대해 영감을 준다.

이것을 나에게 다소 근본적인 질문을 던져주기도 하는데,

LLM은 omniscient한가?

그래서 이것이 있기 때문에, 저것이 있고,

이것이 원인이 되어, 저것이 발생되었다라는 세상 모든 mechanism을 이해하고 있을까?

그래서 세상 복잡한 인과관계를 추론해낼 수 있을까?

Emre는 causal inference를 실제 application에 적용할 수 있는 library를 구축하면서, 각 inference step에 LLM을 어떻게 활용할 것인지에 대해 연구하지만, LLM을 활용함에 있어서 limitation, 경계하여야 할 점에 대해 지적한다.

난 좀 궁금한 게, 그렇다면 text 뿐만아니라 vision 등등 multimodality를 학습한 foundation model은 어떠할까?

불교 경전에는 성인을 '모든 인과의 시작과 끝을 아는 이'라는 표현이 나온다.

정말로 LLM이 omniscient하다면, 'GPT = 신' 이라고 할 수 있겠다..

(여담으로, 불교 경전에는 '제망찰해 (인드라망)'이라는 표현이 나오는데, 이는 서로가 서로를 비추어서 밝게 빛남을 의미한다. 그래서 인드라망을 이루는 구슬 중 어디 한 곳의 불빛이 꺼지면 순식간에 불빛이 사라질 수 도 있음을 뜻한다. 나는 이 표현을 무척 좋아하는데, 이는 사람과 사람 간의 복잡 미묘한 network를 바로 떠올릴 수 있다. 관계 (relation)과 분리된 별개의 '나'를 도무지 규정할 수가 없다. 난 neural network의 weight matrix를 보고도 인드라망을 떠올리곤 한다..

학부 때는 선배들이랑 술 마시면 이런 철학적 얘기가 오갔는데, 그 시절 그립네. ㅎㅎ)

'Causality > thought' 카테고리의 다른 글

causal inference 를 공부하며 (0)	2025.04.15
Fisher's randomization inference (0)	2025.03.31
더 적합한 방법은 아마도 (0)	2025.03.28
! 방법을 하나 찾음 (0)	2025.03.22
Interesting issue to think about (0)	2025.03.11

ABOUT ME

밤에 쓰는 편지 밤에 쓰는 편지

'Causality > thought' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'Causality > thought' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바