이힛 :: 밤에 쓰는 편지

이힛

Campus Life 2025. 5. 15. 19:28

이힛 신난다.

교수님께서 답변해주셨다.

게시판 질문에 답변 안 주셔서 영영 안보실 줄 알았는데, 뜻밖의 답변.

우왕 넘나 감사

찬찬히 읽어보고 생각 중..

이 운동장 돌면서 생각해봐야지.

Is it reasonable to interpret GANs as some sort of "reinforcement learning", where the Generator acts as a "policy network" and the Discriminator as a "reward network" such that the policy network optimizes its distributional policy based on the given reward?

Yes, it is interesting to make an analogy between GAN and RL as you stated. Under this analogy, a fundamental difference between them is that the disciminator is learnable, while the reward is given by the environment.
Even in the case we learn a reward network, the goal of learning is to simulate the environment, rather than to be adversarial against the agent (or its policy).

Would it be reasonable to interpret "Depthwise Separable Convolution" as operationally similar to the "multi-head attn + FFNN"?

I mean, in multi-head attention, each head focuses on different parts of the input features, analogous to how, in Depthwise convolution, each filter look at a different input channel. And also, FFNN mixes information across feature dimensions, like how the Pointwise convolution mixes information across channels.

"Depthwise Separable Convolution" are analogous to "multi-head attn + FFNN" in some sense, as you stated.
Maybe the authors of transformer were inspired by the design of Depthwise Separable Convolution, that FFN followed by multihead attn is a reasonable choice taking account of the fact taht the behavior of multihead attn is similar to depthwise conv in efficient CNNs.
Still there are some difference between them, e.g., the receptive field size is full for multihead attn, while it is local for conv layers.

When training a student ViT to match the predictions of a teacher CNN for distillation,

is it possible to use divergence metrics other than KL divergence, such as Jensen-Shannon or Wasserstein?

You can use any kind of metric as long as they are differentiable.
However, there might be some tradeoff between the choice of the metric and the optimization behavior/efficiency, and the KLdiv is a good default choice, in that it aligns with the conventional cross-entropy loss and is proven to work well in general.

'Campus Life' 카테고리의 다른 글

우연인지 필연인지 이 시국에 내 논문은 (0)	2025.05.21
June, 3 - Presidential election day (0)	2025.05.20
..... (0)	2025.05.14
취향저격 (0)	2025.05.13
이 눔의 글짓기.. (0)	2025.05.13

ABOUT ME

밤에 쓰는 편지 밤에 쓰는 편지

'Campus Life' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'Campus Life' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바