We bring together natural language processing and robotics to connect language to the world (RoboNLP). Our lab is broadly interested in connecting language to agent perception and action, and lifelong learning through interaction.
|URAP REU available Fall 2022 - Spring 2023 for one undergraduate student.|
Project Description: Sun is to day as moon is to ____? Analogies are a compact way to test our understanding of a written language. Analogies are intuitive, even fun, for humans to solve and to create. However, modern large language models used for tasks in Natural Language Processing (NLP), from web search to customer service dialogue, have mixed abilities to reason about them. In particular, human language understanding is grounded in experience with the world, and so analogies that draw heavily on physical or social intuitions, for example "glass is to break as rubber is to ____" can stump language models trained only on large-scale text from the Internet. For example, the popular OpenAI GPT-2 large language model predicts "...rubber is to chew" with 99% confidence. To study the effects of experiential, multimodal training of language understanding models, we propose to use such an analogy completion task. We propose to collect and curate a benchmark of linear word analogies to test the understanding capabilities of modern large language models. Benchmarks that probe physical and social understanding are difficult and expensive to create, and so we propose analogies as a minimal probe that require little human effort to create, and will collect analogies that are easy for humans and hard for models as an online game.
Bonus skills: Familiarity with PyTorch code and HuggingFace models.
Expected outcome: Develop a web game for collecting analogies, benchmark a curated analogy dataset against large language models, draft and submit a research paper. Apply: Fill this form and mention the project, your background and interests, and why you might be a good match.