Events / Big data for Development and Governance

Big data for Development and Governance

From October 20, 2022 to October 21, 2022

Organized by DevLab@Penn/PDRI and co-sponsored by Perry World House, USAID’s Center for Democracy, Human Rights and Governance, USAID’s Innovation, Technology, and Research (ITR) Hub and Penn SAS’ Data Driven Discovery Initiative this event will convene an excellent group of academics and practitioners to discuss the current state of big data and machine learning in the development sector. While the earlier conference provided a broad overview of topics, the goal of this conference is to dig more deeply into four issues of crucial importance to both the research and practitioner communities:

  • The use of computational tools to detect corruption and provide a tool for accountability
  • The use of big data from media to measure civic activity, political polarization and disinformation
  • The use of large-scale, micro-level data and computational tools to study the causes, dynamics, and consequences of conflict
  • The use of social media data to monitor and respond to humanitarian crises

RSVP by Oct 17 by emailing Diego Romero (


Thursday, October 20th
Location: Auditorium in Perelman Center for Political Science & Economics

1:40 p.m. – 1:45 p.m.: Welcome | Erik Wibbels, DevLab@Penn & University of Pennsylvania

1:45 p.m. – 2:45 p.m.: WORKSHOP 1: Training BERT (and BERT-like) Models for Text Classification | Machine Learning for Peace Team

Advances in natural language processing have greatly improved the capacity of researchers to treat text (e.g., the content of laws, newspaper articles, or tweets) as data. However, applied research in the social sciences still often relies on dated techniques (e.g., the use of dictionaries, keyword searches or LDA models) to classify text. In this workshop the presenter will give a practical overview of the use of pre-trained Transformer models a-la-BERT for development-related text classification purposes.

2:45 p.m. – 3:00 p.m.: Coffee Break

3:00 p.m. – 4:00 p.m.: WORKSHOP 2: Gathering and Analyzing Social Media Data | Tiago Ventura, NYU/Georgetown

This session provides an overview of the ways in which social media data can be gathered, analyzed and used for research on development and governance. This will be a practical session, walking participants through data sources and beginner code to understand basic characteristics of the data.

4:00 p.m. – 4:15 p.m.: Coffee Break

4:15 p.m. – 5:15 p.m.: WORKSHOP 3: Running Digital Experiments | Ernesto Calvo (iLCSS & University of Maryland) 

The presenter will introduce participants to the implementation of digital experiments for research on development and governance. The presenter will walk participants through the process of implementing a series of typical experiments bearing on attention, disinformation, information sharing, etc. and the analysis of the subsequent data.

6p: Dinner for Conference Participants

Friday, October 21st

Location: The Forum in Perelman Center for Political Science & Economics

8:30 a.m. – 9:00 a.m.: Breakfast available

9:00 a.m. – 10:30 a.m.: SESSION 1: Big Data, Machine Learning and Corruption | Academic discussant: Mihaily Fazekas (Central European University), Policy discussant: Camila Salazar (Open Contracting Partnership) 

The papers in this panel contain research related to the application of machine learning methods to the measurement and study of corruption. There will be papers dealing with predicting corruption and collusion in public procurement, or identifying networks of corruption.


  • Vítězslav Titl (Utrecht University) 
  • Diego Romero (University of Pennsylvania)
  • Jorge Gallego (Universidad del Rosario, Colombia) 
  • Massimo Mastruzzi (World Bank)


10:30 a.m. – 10:45 a.m.: Coffee Break

10:45 a.m. – 12:15 p.m.: SESSION 2: Big Data and Conflict | Academic Discussant: Nicholas Sambanis (University of Pennsylvania), Policy Discussant: Rebecca Wolfe (University of Chicago)

The papers in this panel contain research related to the application of machine learning methods to large-scale, micro-level conflict data.


  • Yuri Zhukov (University of Michigan) 
  • Zachary C. Steinert-Threlkeld (UCLA) 
  • Austin Wright (University of Chicago) 
  • Tamar Mitts (Columbia University) 


12:15 p.m. – 1:30 p.m.: Informal moderated lunch discussion: Big data at the interface between the academic and policy worlds

  • Rebecca Wolfe (Univ of Chicago)
  • Andrew Shaver (UC Merced)
  • Amanda Blair (USIP)
  • Joshua Tucker (New York University)


1:30 p.m. – 3:00 p.m.: SESSION 3: Big Data and Humanitarian Crises | Academic Discussant: Andrew Shaver (UC Merced), Policy Discussant: Theresa Beltramo (UNHCR)

The papers in this panel contain research on the application of big data to monitoring and understanding humanitarian crises.


  • Thiri Shwesin Aung (Harvard University)
  • Emily Aiken (University of California, Berkeley) 
  • Katherine Hoffmann Pham (UN Global Pulse) 
  • Ernesto Calvo,  [iLCSS] & University of Maryland on behalf of #Data4Ukraine research team

3:00 p.m. – 3:30 p.m.: Coffee Break

3:30 p.m. – 5:00 p.m.: SESSION 4: Panel on Civic Space, Polarization and Misinformation | Academic Discussant: Alexandra Siegel (University of Colorado), Policy Discussant:Olivia Sohr (Chequeado)

The papers in this panel contain research related to misinformation, polarization and civic space-relevant events based on the use of big data (e.g., social media, large amounts of newspaper articles).


  • Tiago Ventura (NYU/Georgetown Universtiy)
  • Josephine Lukito (University of Texas – Austin)
  • Erik Wibbels/Machine Learning for Peace (University of Pennsylvania)
  • Joshua Tucker (New York University)