-
[Project Proposal] Improving the performance of machine-generated text (MGT) detection by identifying the significance of individual tokensResearch/NLP_Paper 2024. 11. 11. 14:49
※ 수정 중..!!
※ This is the project proposal for Team 5 in the 2024 NLP class.
※ The main idea for this project was provided by D.H. Lee.
※ The content of this proposal is based on discussions with our team members: S.J.Kim, D.H.Lee, S.J.Lee, S.Y.Park.
※ The final proposal PPT will be created in collaboration with S.J.Lee.
※ The paper review presentation will be given by S.J.Kim.
※ The proposal & project presentations will be given by D.H.Lee.
※ All experiments will be conducted by D.H.Lee.
1. Project Subject
Improving the performance of machine-generated text (MGT) detection by identifying the significance of individual tokens.
2. Background & Motivation
- Recent advancement of LLMs have a significant impact on our lives. They can be used to improve the efficiency of daily work and life activities, such as content creation, programming code analysis, and writing.
- However, text generated by LLMs can result in various issues, such as fake news, misinformation, and social media spam.
- Research on solving the problems caused by LLM-generated text and effectively distinguishing between machine and human-generated text is becoming increasingly important.
- However, the the popularity of LLMs has led to a significant increase in the amount of text they generate, making it difficult for humans to differentiate between machine-generated and human-written text.
=> There is a need for research on "how to distinguish between the two automatically."
- To address this issue, several method has been proposed for detecting machine-generated text. However, previous works has shown several limitations in MGT performance. We aim to address these shortcomings.
=> Our project aims to investigate the significance of individual tokens in MGT based on BERT, and apply masking to these tokens, thereby enhancing the detection performance.
3. Related Work & limitation
- DetectGPT proposes a curvature-based criterion for determining whether a passage is generated by a given LLM. This method leverages the observation that text sampled from an LLM tends to occupy regions of negative curvature in the model's log probability function. A key advantage of this approach is that it does not require additional training, relying only on log probabilities computed by the target model and random perturbations of the passage from another generic pre-trained language model.
- Limitation of DetectGPT
"Information loss caused by the random masking"
DetectGPT has significant limitations. It perturbs the original sample randomly and without restrictions, which can introduce noise and degrade performance. Addtionally, entity-relationship structure, which are important for detection, may be disrupted by the random perturbations used in DetectGPT.
=> 1. Need to devise data perturbation methods tailored for MGT detection.
=> 2. "Selective Strategy Perturbation" adapting the mask-selection probability for each text token based on its importance, thus generating perturbed inputs with strategically placed masks. It better represent meaningful recombination spaces while preserving the inherent semantic features of the text, ultimately enhancing the diversity of samples.
=> 3. "Token Importance Assessment" must be preceded, for the successive implement of Selective Perturbation.
4. Proposed Method
We propose a novel method for assessing the importance of individual tokens using attribution scores. We evaluate the effectiveness of our method by masking tokens with high importance scores, paraphrasing them, and analyzing the corresponding detection performance.
First, we identify tokens with high importance by estimating their attribution scores.
Next, we apply masking and paraphrasing to these tokens.
Finally, we evaluate the impact of this masking on detection performance.
Through these experiments, we aim to measure the effectiveness of the attribution score-based masking method in MGT.
* Dataset
- HC3 (Human ChatGPT Comparison Corpus)
- A dataset designed to systematically investigate the linguistic characteristics of both human and ChatGPT responses, and to analyze the differences and gaps between them.
- A dataset useful for evaluating the effectiveness of detection models.
- Collection of 40K responses from both human experts and ChatGPT, covering questions across open-domain topics, computer science, finance, medicine, law and psychology.
(B. Guo, et al. How close is chatGPT to human experts? Comparison corpus, evaluation, and detection)
(https://github.com/Hello-SimpleAI/chatgpt-comparison-detection)
(https://arxiv.org/pdf/2301.07597)
* Model
- BERT
- We investigate the tokens with high importance in MGT using Transformer encoder-based models (e.g., BERT, RoBERTa).
* Implementation details
- Optimizer: AdamW
- Learning rate: Temp
- Batch size: Temp
- epochs: Temp
- weight_decay = 0.01
- GPU: L40S or RTX 3060
* Tools
1) Captum
2) Transformers-Interpret
- https://github.com/cdpierse/transformers-interpret
* (1) Find important tokens
* (2) Paraphrasing important tokens
-
* (3) Evaluate paraphrased texts
-
5. Expected Results
- By identifying important tokens through attribution scores, our method enables more effective selective perturbation. As a result, this approach will relieve the information loss caused by the random masking used in DetectGPT.
- Consequently, the improvement on MGT will strengthen efforts to prevent the misuse of large language models (LLMs).
6. Conclusion & Future work
6.1. Summary
- Advances in large language models (LLMs) have revolutionized the field of natural language processing. However, text generated by LLMs can lead to various problems, such as fake news, misinformation, and social media spam. Moreover, detecting machine-generated text (MGT) is becoming increasingly challenging as this text closely mimics human writing.
- Several supervised learning methods have been proposed to address these issues. However, these models remain vulnerable to attacks involving textual variations, such as paraphrasing.
- Unlike previous works, DetectGPT introdues a zero-shot detection method that leverages the log probabilities of sentence tokens.
- While DetectGPT demonstrates improved performance over earlier methods for detecting MGT, we identified a critical limitation: it does not account for the imporatance of individual tokens, and its random masking of tokens leads to information loss.
- To mitigate these issues, we propose a masking method based on the imporatance of tokens, as measured by attribution scores.
- We will conduct experiments to validate the effectiveness of our approach.
- We anticipate that our results will contribute to expanding the scope of research on LLM-generated text detection.
6.2. Future work
- Additional experiments may be conducted with a variaety of recent models.
- Furtuer research to improve detection robustness is neccessary as language models continually improve their reproductions of human text.
'Research > NLP_Paper' 카테고리의 다른 글