Duration 11:13

Want To Reduce Labeling Cost GPT-3 Can Help (Machine Learning Research Paper Walkthrough)

1 380 watched
0
45
Published 16 Sep 2021

#ai #gpt3 #nlp Want To Reduce Labeling Cost? This research paper proposes GPT-3 Language Model for Data Annotation in NLP. GPT-3 is an autoregressive language model that uses deep learning to produce human-like text. The authors perform extensive experimentation to evaluate the quality of labels produced by GPT-3 and its cost-effectiveness when compared to human annotators. ⏩ Abstract: Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often task-specific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 175 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that, to make the downstream model achieve the same performance on a variety of NLU and NLG tasks, it costs 50% to 96% less to use labels from GPT-3 than using labels from humans. Furthermore, we propose a novel framework of combining pseudo labels from GPT-3 with human labels, which leads to even better performance with limited labeling budget. These results present a cost-effective data labeling methodology that is generalizable to many practical applications. Please feel free to share out the content and subscribe to my channel :) ⏩ Subscribe - /channel/UCoz8NrwgL7U9535VNc0mRPA ⏩ OUTLINE: 0:00 - Abstract and Introduction 02:30 - GPT-3 Input Construction 04:35 - Labelling Cost Analysis 05:45 - Four Data Labeling Strategies under Fixed Budget 07:18 - GPT-3 Labeling 08:05 - GPT3-Human Labeling 09:29 - Active labeling and Wrap-up ⏩ Paper Title: Want To Reduce Labeling Cost? GPT-3 Can Help ⏩ Paper: https://www.microsoft.com/en-us/research/uploads/prod/2021/09/emnlp2021.pdf ⏩ Author: Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng ⏩ Organisation: Microsoft Cognitive Services Research Group ********************************************** If you want to support me financially which is totally optional and voluntary ❤️ You can consider buying me chai ( because I don't drink coffee :) ) at https://www.buymeacoffee.com/TechvizCoffee ❤️ Support using Paypal - https://www.paypal.com/paypalme/TechVizDataScience ********************************************** ⏩ Youtube - /c/TechVizTheDataScienceGuy ⏩ LinkedIn - https://linkedin.com/in/prakhar21 ⏩ Medium - https://medium.com/@ prakhar.mishra ⏩ GitHub - https://github.com/prakhar21 ⏩ Twitter - https://twitter.com/rattller ********************************************* Tools I use for making videos :) ⏩ iPad - https://tinyurl.com/y39p6pwc ⏩ Apple Pencil - https://tinyurl.com/y5rk8txn ⏩ GoodNotes - https://tinyurl.com/y627cfsa #techviz #datascienceguy #nlproc #research #machinelearning About Me: I am Prakhar Mishra and this channel is my passion project. I am currently pursuing my MS (by research) in Data Science. I have an industry work-ex of 3 years in the field of Data Science and Machine Learning with a particular focus on Natural Language Processing (NLP).

Category

Show more

Comments - 6