Data Science for Social Good

In 2022 the Data Driven Discovery initiative launched its initial Data Science for Social Good (DSSG)  projects. Below we list our currently active DSSG projects

Leveraging Large Scale Human Mobility Data for Epidemic Modeling

Prof. Hamed Hassani, Department of Electrical and Systems Engineering and Department of Math, is working to validate large-scale (several TBs per year) GPS mobility datasets so that they can construct realistic epidemic networks from sparse and incomplete data. A primary challenge with working with such data is that 50% of devices are tracked for less than 35% of the days. Addressing these challenges would result in tractable methods that can infer the true state of an epidemic.

Why the poor are more depressed and differently depressed

Prof. Martha Farah, Department of Psychology, and Prof. Lyle Ungar, Department of Computer and Information Science, are studying why SES-associated depression is different from general depression in terms of neuroanatomical symptoms and causes. This means that risk factors, prognoses, and treatments may not be effective with low SES. The research team will use UK Biobank data with 500,000 with psychiatric screening questionnaires, genomic data, and other markers, including 40,000 MRI scans, to explore this question.

Linking climate with poverty and social instability: a hotspot mapping project

Prof. Irina Marinov, Department of Earth and Environmental Science, and Prof. Michael Weisberg, Department of Philosophy, will be exploring where  hot spots of climate change will be, how their distribution will change between now and 2100, how global poverty and inequality will change in response to expected climate changes, and whether there is predictive power in indicators in determining socioeconomic and political outcomes? The project involves the newest generation of climate models with high spatial and temporal resolution and more sophisticated representations of Earth System components. The petabyte size of the datasets requires substantial computation time, relying on Python on the new Pangeo platform for Big Data geoscience on the Cloud.

Machine Learning and Police Reform

Prof. Hanming Fang, Department of Economics, and Prof. David Abrams, Penn Law, will be using policing data from Chicago and Philadelphia to improve our understanding of what causes police confrontations and escalations. The  project will use machine learning methods to explore the characteristics of police partners to see whether they have an impact on whether an incident escalates. The project will also assess whether there are important external factors, like economic conditions, time of day, calendar effects, or weather, that influence the likelihood of negative incidents.

The Immigration Courts: Processing and Analyzing Data from The Executive Office for Immigration Review

Prof. Emilio Parrado, Department of Sociology, will be using data science methods to organize and explore data from the Executive Office for Immigration Review (EOIR). Immigration cases now represent close to 52 percent of all federal criminal prosecutions making prosecutions for illegal entry, illegal reentry, and other immigration violations, the major focus of federal criminal enforcement efforts. EOIR is required to make the case data available for public inspection, but the raw data is far from usable. This project will extract key information from close to 10 million court proceedings going back to 1951. The project will explore factors affecting court decisions such as removals resulting from illegal entries and asylum petitions.

Black Representation in and of International Governance: Evidence from a Text‐As‐Data Approach to African American Newspapers

Prof. Julia Gray, Department of Political Science, will conduct a pioneering study on coverage of international affairs in black newspapers in the US. These include Pennsylvania papers such as the Christian Recorder (the oldest continually published black paper in the US); the Philadelphia Independent (which ceased publication in 1971); the Philadelphia Tribune; and the Pittsburgh Courier (defunct in 1966), as well as prominent midAtlantic papers such as the Baltimore Afro-American, the Washington Informer, and the Washington Sun. The project uses textasdata methodologies including Named Entity Recognition, Structural Topic Modeling, and sentiment analysis to quantify and analyze coverage of international issues in black communities in the US, particularly surrounding the creation of the League of Nations as well as in the formation of and development in the United Nations. Black newspapers are a rich although surprisingly understudied source of information about historical discourse on international racial issues and racial representation in the 20th century.