Research Mission.
To democratize programming, especially to empower social scientists, journalists, data scientists, and other non-traditional programmers.
To make programming tools so usable that automating a task is easier than teaching the same task to a human.
Students
David Minh-Duy Cao (PhD)
Slim Lim (PhD)
Justin Lubin (PhD)
Gabriel Matute (PhD)
Hellina Hailu Nigatu (PhD)
Eric Rawn (PhD)
Parker Ziegler (PhD)
Eunice Jun (Postdoc)
Jeremy Ferguson (MSc)
Sora Kanosue (MSc)
Jacob Yim (MSc)
Dhanya Jayagopal (MSc, graduated)
Lisa Rennels (MSc, graduated)
Pragya Kallanagoudar (undergraduate)
Rebecca Hicke (undergraduate, graduated)
Rajavi Mishra (undergraduate, graduated)
Kevin Ye (undergraduate, graduated)
Teaching
CS294-184 Building User-Centered Programming Tools (Spring 2024)
CS164 Programming Languages and Compilers (Fall 2023)
CS39-001 Technology, Society, and Power (Fall 2023)
CS298-007 Research Culture and Community Norms (Fall 2023)
CS294-184 Building User-Centered Programming Tools (Spring 2023)
CS164 Programming Languages and Compilers (Fall 2022)
CS294-184 Building User-Centered Programming Tools (Spring 2022)
CS164 Programming Languages and Compilers (Fall 2021)
CS298-007 Research Culture and Community Norms (Fall 2021)
CS298-007 Research Culture and Community Norms (Fall 2020)
Research Groups.
EPIC Data Lab —
Berkeley Programming Systems Research Group —
Programming Languages for Approachable and Inclusive Tools (PLAIT) —
faculty affiliate at the Berkeley Institute for Data Science (BIDS)
Projects.
For details of my lab's recent and ongoing research, see:
Programming Languages for Approachable and Inclusive Tools (PLAIT)
End-User Programming for Web Automation
Helena
is a high-level programming language for web automation tasks such as data collection and data entry.
Users draft Helena programs by recording themselves completing a subtask; Helena's Programming By Demonstration (PBD) tools can use these recordings to synthesize programs for completing all subtasks. (See below.)
The Helena editor lets users adapt, extend, and understand their programs.
Ringer
is a low-level programming language for web automation tasks.
Many statements in the high-level Helena language are implemented with Ringer.
Ringer comes with a record and replay tool; when a user demonstrates how to complete an interaction in a normal browser, the tool writes a straight-line Ringer program that completes the same interaction on the same pages.
Rousillon
is a PBD tool for writing Helena programs that automate large-scale web data collection tasks. It is implemented as a Chrome extension. Rousillon lets users demonstrate how to collect the first row of a multi-relational dataset, then generalizes the straight-line interaction into a program for collecting all rows.
DIYDA
(the DIY Digital Assistant) is a tool for adding custom 'skills' to a voice assistant.
To add new skills, users (i) provide the text of the question they'll ask, (ii) demonstrate how to find the answer to the question on the web, then (iii) tell Diyda what to say aloud to answer the question.
Based on the user's demonstration, Diyda writes a Helena program to automate the web interaction.
This project is still in development, but check back for updates!
End-User Programming for Other Domains
Dapper
synthesizes Probabilistic Programming Language (PPL) models from input datasets.
The goal is to put small, readable PPL models in the hands of social scientists and data scientists, so that they have access to the powerful abstractions PPLs offer for inference and simulated interventions.
Dapper currently produces output programs in the BLOG language, but its IR makes it easily retargetable.
Driver
uses Syntax-Guided Synthesis (SyGuS) to synthesize reactive robot motion planners.
As input, it takes a simple description of the environment, the target, the obstacles, how the obstacles can move, and how the robot itself can move.
As output, it produces a motion planner that can react to an adversarial environment.
Usable Parallelization
Dicer
is a framework for parallelizing large web automation experiments.
As more tools start consuming webpages as inputs, the need for controlled experiments of these tools grows greater.
Dicer facilitates such experiments by (i) using a custom caching proxy server to hold webpage inputs constant during a programmer-defined session and (ii) offering a simple programming model that allows Dicer to automatically and transparently parallelize experiments.
The
Parallel Skip Block, a new language construct introduced in Helena, offers end users a way to parallelize their web automation scripts by answering questions they already understand.
In particular, users indicate how they decide whether two online objects - e.g. authors, restaurants - represent the same entity.
People who identify as non-programmers can use this construct to parallelize their programs in 61 seconds, on average.
Service
POPL 2025
Program Committee
PLDI 2024
Program Committee
PLATEAU 2024
Co-Organizer
PLATEAU 2023
Co-Organizer
WICSE 2023
Faculty Mentor
PL+HCI “Swimmer” School 2022
Co-Organizer
PLDI 2022
Program Committee
PLATEAU 2021
Co-Organizer
SRC Grand Finals 2021
Reviewing
OOPSLA 2021
Review Committee
PLDI 2021
SRC Co-Chair
ASPLOS 2021
Program Committee
PL+HCI “Swimmer” School 2020
Co-Organizer
IPA 2020
Program Committee
HATRA 2020
Review Committee
PLATEAU 2020
Co-Organizer
OOPSLA 2020
Review Committee
PLDI 2020
SRC Co-Chair
PLDI 2020
External Review Committee
MAPL 2020
Program Committee
PLATEAU 2019
Co-Organizer
SPLASH 2018
SRC Reviewer and SRC Judge
PLATEAU 2018
Co-Organizer
SNAPL 2017
Submissions Chair
PLATEAU 2017
Program Committee
PLATEAU 2017
Co-Organizer