Instructors: If you have a data science course that is not listed here, please contact us.


We have listed here a selection of data science courses offered in Penn’s School of Arts and Sciences. Please note that these courses are not necessarily taught every semester or every year, and new data science courses are regularly being developed. A complete list of courses available this semester is available via the course roster. Penn’s Computer Science and Statistics departments, as well as others in SEAS and Wharton, offer data science courses that SAS students are encouraged to consider.

Information about the Data Driven Discovery Initiative’s new Data Science and Analytics Minor is available via this link, along with the curriculum for the minor.

BIOL 4536/BIOL 5535/CIS 4360: Introduction to Computational Biology & Biological Modeling

The goal of this course is to develop a deeper understanding of techniques and concepts used in Computational Biology. The course will strive to focus on a small set of approaches to gain both theoretical and practical understanding of the methods. We will aim to cover practical issues such as programming and the use of programs, as well as theoretical issues such as algorithm design, statistical data analysis, theory of algorithms and statistics. This course WILL NOT provide a broad survey of the field nor teach specific tools but focus on a deep understanding of a small set of topics. We will discuss string algorithms, hidden markov models, dimension reduction, and machine learning (or phylogeny estimation) for biomedical problems.

Prerequisites: MATH 1400 AND (BIOL 2510 OR BIOL 5510)

COGS 4290/PSYC 4290: Big Data, Memory and the Human Brain

Advances in brain recording methods over the last decade have generated vastly more brain data than had been collected by neuroscientists during the previous century. To understand the human brain, scientists must now use computational methods that exploit the power of these huge data sets. This course will introduce you to the use of big data analytics in the study of human memory. Through hands-on Python-based programming projects, we will analyze very large data sets both to replicate existing phenomena and to make new discoveries.

Prerequisites: CIS 1100 AND MATH 1400

COMM 3130: Computational Text Analysis for Communication Research

In this ‘big data’ era, presidents and popes tweet daily. Anyone can broadcast their thoughts and experiences through social media. Speeches, debates and events are recorded in online text archives. The resulting explosion of available textual data means that journalists and marketers summarize ideas and events by visualizing the results of textual analysis (the ubiquitous ‘word cloud’ just scratches the surface of what is possible). Automated text analysis reveals similarities and differences between groups of people and ideological positions. In this hands-on course students will learn how to manage large textual datasets (e.g. Twitter, YouTube, news stories) to investigate research questions. They will work through a series of steps to collect, organize, analyze and present textual data by using automated tools toward a final project of relevant interest. The course will cover linguistic theory and techniques that can be applied to textual data (particularly from the fields of corpus linguistics and natural language processing). No prior programming experience is required. Through this course students will gain skills writing Python programs to handle large amounts of textual data and become familiar with one of the key techniques used by data scientists, which is currently one of the most in-demand jobs.

COMM 3180: Stories From Data: Introduction to Programming for Data Journalism

Today masses of data are available everywhere, capturing information on just about everything and anything. Related but distinct data streams about newsworthy events and issues — including activity from social media and open data sources (e.g., The Open Government Initiative) — have given rise to a new source for and style of reporting sometimes called Data Journalism. Increasingly, news sites and information portals present visually engaging, dynamic, and interactive stories linked to the underlying data (e.g., The Guardian DataBlog). This course offers an introduction to Python programming for data analysis and visualization. Students will learn how to collect, analyze, and present various forms of data. Because numbers and their visualizations do not speak for themselves but require context, interpretation, and narrative, students will practice making effective stories from data and presenting them in blogs and other formats. No programming experience is required for this class.

COMM 6840: Data Visualization for Research

Empirical research employs data to gain insights and build a theoretical understanding of the world. An appropriate visualization of data is key to illuminating hidden patterns and effectively communicate the main findings of research. This course will discuss the visualization strategies of published research, give recommendations of best practice, and discuss tips and techniques for specific research purposes (i.e. hypothesis testing, group comparison) and data structures, including temporal, geographic, and network data. The course will equip students with tools they can use to learn through visualization and to communicate more effectively their own research.

CRIM 4002/CRIM 6002: Criminal Justice Data Analytics

This course covers the tools and techniques necessary to acquire, organize, and visualize complex data in order to answer questions with a primary focus on crime and the criminal justice system. The course is organized around key questions about police shootings, victimization rates, identifying crime hotspots, calculating the cost of crime, and finding out what happens to crime when it rains. On the way to answer these questions, the course will cover topics including data sources, basic programming techniques, building and working with SQL databases, regular expressions, webscraping, and working with geographic data and geocoding. The course will use R, an open-source, object oriented scripting language with a large set of available add-on packages.

ECON 4330: Econometric Machine Learning Methods and Models

This course covers econometric methods, machine learning methods, and their interface, focusing on aspects of estimation, inference, and prediction in causal and non-causal environments. Topics may include Bayesian learning; recursive estimation and optimal filtering; randomized controlled trials and their approximation; latent variables; classification; topic analysis; LDA models; neural networks; random forests; regularization (shrinkage, selection, …); network estimation and description.

ENGL 1670: Data Science for the Humanities

Over the last decade, humanists have turned to data and to computational methods of data analysis to seek new understandings of literature, history, and culture. This course will provide you with a practical introduction to data-driven inquiry in the humanities, with a focus on statistical analysis in the Python programming language. (No prior knowledge of programming is required or expected). In addition to learning foundational scripting and data science skills, we will ask questions about the role of data in the humanities. How does humanities data differ from data in the physical and social sciences? What new research questions in the humanities can we investigate using data-driven methods? And how can we make our conclusions relevant within the larger frame of humanistic inquiry? Course work will include readings, weekly programming exercises, and a final project.  

GAFL 5310: Data Science for Public Policy

In the 21st century, Big Data surround us. Data are being collected about all aspects of our daily lives. To improve transparency and accountability an increasing number of public organizations are sharing their data with the public. But data are not information. You need good information to make sound decisions. To be an effective public leader, you will need to learn how to harness information from available data. This course will introduce you to key elements of data science, including data transformation, analysis, visualization, and presentation. An emphasis is placed on manipulating data to create informative and compelling analyses that provide valuable evidence in public policy debates. We will teach you how to present information using interactive apps that feature software packages. As in all courses at Fels, we will concentrate on more practical skills than theoretical concepts behind the techniques. This course is designed to expand upon core concepts in data management and analysis that you learned in GAFL 640: Program Evaluation and Data Analysis. This is a graduate level course and while GAFL 640 is not a pre-requisite, students are expected to have a foundation of data management and analysis before beginning this course.

LING 0700/PSYC 2314: Data Science for Studying Language and the Mind

Data Science for studying Language and the Mind is an entry-level course designed to teach basic principles of data science to students with little or no background in statistics or computer science. Students will learn to identify patterns in data using visualizations and descriptive statistics; make predictions from data using machine learning and optimization; and quantify the certainty of their predictions using statistical models. This course aims to help students build a foundation of critical thinking and computational skills that will allow them to work with data in all fields related to the study of the mind (e.g. linguistics, psychology, philosophy, cognitive science).

PHYS 3358: Data Analysis for the Natural Sciences I: Fundamentals

This is a course on the fundamentals of data analysis and statistical inference for the natural sciences. Topics include probability distributions, linear and non-linear regression, Monte Carlo methods, frequentist and Bayesian data analysis, parameter and error estimation, Fourier analysis, power spectra, and signal and image analysis techniques. Students will obtain both the theoretical background in data analysis and also get hands-on experience analyzing real scientific data.

Prerequisites: Prior programming experience, MATH 2400 AND PHYS 2260

PHYS 3359: Data Analysis for the Natural Sciences II: Machine Learning

This is a course on data analysis and statistical inference for the natural sciences focused on machine learning techniques. The main topics are: Review of modern statistics, including probability distribution functions and their moments, conditional distributions and Bayes’ theorem, parameter estimation, Markov chains; Fundamentals of machine learning, including training/validation samples, cross-validation, supervised vs. unsupervised learning, regularization and resampling methods, tree-based methods, support vector machines, neural networks, deep learning and image analysis with convolutional neural networks. Students will obtain both the theoretical background in data analysis and get hands-on experience analyzing real scientific data. This course forms a two-course sequence with PHYS 3358.

Prerequisites: Students must also have prior programming experience in python, PHYS 3358

PSCI 1800: Introduction to Data Science

Understanding and interpreting large, quantitative data sets is increasingly central in political and social science. Whether one seeks to understand political communication, international trade, inter-group conflict, or other issues, the availability of large quantities of digital data has revolutionized the study of politics. Nonetheless, most data-related courses focus on statistical estimation, rather than on the related but distinctive problems of data acquisition, management and visualization–in a term, data science. This course addresses that imbalance by focusing squarely on data science. Leaving this course, students will be able to acquire, format, analyze, and visualize various types of political data using the statistical programming language R. This course is not a statistics class, but it will increase the capacity of students to thrive in future statistics classes. While no background in statistics or political science is required, students are expected to be generally familiar with contemporary computing environments (e.g. know how to use a computer) and have a willingness to learn a variety of data science tools. You are encouraged (but certainly not required) to register for both this course and PSCI 1801 at the same time, as the courses cover distinct, but complimentary material.

PSCI 3800: Applied Data Science

Jobs in data science are quickly proliferating throughout nearly every industry in the American economy. The purpose of this class is to build the statistics, programming, and qualitative skills that are required to excel in data science. The substantive focus of the class will largely be on topics related to politics and elections, although the technical skills can be applied to any subject matter.

Prerequisites: PSCI 1800 OR PSCI 1801

SOCI 6050: Public-Use Data for Social Science Research

Public-use data are quantitative information obtained from surveys and other databases that are available for anyone to use at no cost. This course prepares students to work with public-use data to address social science research questions. Participants will become familiar with the origins, purpose, design, structure, and limitations of US and international public-use data to study individuals, families, neighborhoods, and institutions such as schools and state and national governments; acquire skills to design analytic samples and manage data for reproducibility and replicability; and apply a variety of quantitative methods to public-use data to answer illustrative research questions.