#ai #gpt3 #nlp
Want To Reduce Labeling Cost? This research paper proposes GPT-3 Language Model for Data Annotation in NLP. GPT-3 is an autoregressive language model that uses deep learning to produce human-like text. The authors perform extensive experimentation to evaluate the quality of labels produced by GPT-3 and its cost-effectiveness when compared to human annotators.
⏩ Abstract: Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often task-specific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 175 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that, to make the downstream model achieve the same performance on a variety of NLU and NLG tasks, it costs 50% to 96% less to use labels from GPT-3 than using labels from humans. Furthermore, we propose a novel framework of combining pseudo labels from GPT-3 with human labels, which leads to even better performance with limited labeling budget. These results present a cost-effective data labeling methodology that is generalizable to many practical applications.
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - /channel/UCoz8NrwgL7U9535VNc0mRPA
⏩ OUTLINE:
0:00 - Abstract and Introduction
02:30 - GPT-3 Input Construction
04:35 - Labelling Cost Analysis
05:45 - Four Data Labeling Strategies under Fixed Budget
07:18 - GPT-3 Labeling
08:05 - GPT3-Human Labeling
09:29 - Active labeling and Wrap-up
⏩ Paper Title: Want To Reduce Labeling Cost? GPT-3 Can Help
⏩ Paper: https://www.microsoft.com/en-us/research/uploads/prod/2021/09/emnlp2021.pdf
⏩ Author: Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng
⏩ Organisation: Microsoft Cognitive Services Research Group
**********************************************
If you want to support me financially which is totally optional and voluntary ❤️
You can consider buying me chai ( because I don't drink coffee :) ) at https://www.buymeacoffee.com/TechvizCoffee
❤️ Support using Paypal - https://www.paypal.com/paypalme/TechVizDataScience
**********************************************
⏩ Youtube - /c/TechVizTheDataScienceGuy
⏩ LinkedIn - https://linkedin.com/in/prakhar21
⏩ Medium - https://medium.com/@ prakhar.mishra
⏩ GitHub - https://github.com/prakhar21
⏩ Twitter - https://twitter.com/rattller
*********************************************
Tools I use for making videos :)
⏩ iPad - https://tinyurl.com/y39p6pwc
⏩ Apple Pencil - https://tinyurl.com/y5rk8txn
⏩ GoodNotes - https://tinyurl.com/y627cfsa
#techviz #datascienceguy #nlproc #research #machinelearning
About Me:
I am Prakhar Mishra and this channel is my passion project. I am currently pursuing my MS (by research) in Data Science. I have an industry work-ex of 3 years in the field of Data Science and Machine Learning with a particular focus on Natural Language Processing (NLP).