DMI Webinar "The challenges, and some solutions, to crowdsourcing for NLU"

Image of DMI Webinar Series
Zoom Meeting

"The challenges, and some solutions, to crowdsourcing for NLU"

26 April 2021, 17:30-18:30 CEST


As NLP models and their pretraining data balloon in size, the models are achieving "human level" performance on new NLP benchmarks at an accelerated clip. For example, the GLUE benchmark went from challenging to essentially solved within the first year of its release. There is a clear and growing need for high-quality evaluation data. Such data is necessary not only to study and benchmark the models' language understanding ability, but we also need evaluation datasets to gauge how biased our models are. However, there are large open questions around how to build such data, and specifically around how to crowdsource challenging written examples. In this talk, we'll discuss some of the challenges we face and possible solutions for crowdsourcing high-quality written examples for NLU.


Nikita Nangia a third-year PhD student at New York University’s Center for Data Science (CDS) where she is advised by Sam Bowman. Her research is in on machine learning and natural language processing, with a focus on crowdsourcing high-quality datasets.In 2018, she got a Masters degree in Data Science at NYU. Prior to that, she did R&D for a few year at a start-up, after graduating from the University of Chicago in 2013 with a Bachelor’s in Physics. While in the Physics world, she studied and researched in experimental high energy.

The talks will be held online. If you would like to participate, please fill this form.

For more information, write to