Instructors: If you have a data science course that is not listed here, please contact us


Penn’s School of Arts and Sciences is in the process of developing comprehensive data science course offerings. These may constitute a data science minor that provides a foundation in the methodology of data science coupled with disciplinary applications that students can select in accordance with their interests. The minor will consolidate existing courses and add new ones that provide a complete education in data science.

We have listed here a selection of SAS courses in data science. These courses are not necessarily taught every semester or every year. New data science courses are regularly being added to the course roster. A complete list of courses available this semester are viewable in the course roster. Penn’s Computer Science and Statistics departments, as well as others in SEAS and Wharton, offer data science courses that SAS students are encouraged to consider.


CRIM402/CRIM602/SOCI605: Data Analytics in R

This course covers the tools and techniques necessary to acquire, organize, and visualize complex data in order to answer questions with a primary focus on crime and the criminal justice system. The course is organized around key questions about police shootings, victimization rates, identifying crime hotspots, calculating the cost of crime, and finding out what happens to crime when it rains. On the way to answer these questions, the course will cover topics including data sources, basic programming techniques, building and working with SQL databases, regular expressions, webscraping, and working with geographic data and geocoding. The course will use R, an open-source, object oriented scripting language with a large set of available add-on packages.

ECON712: Computational Methods in Economics

This course will study introduce some of the essential tools to undertake research in computational economics. Among other topics, this course covers: elementary concepts in software engineering (scientific programming languages, programming paradigms, code design and implementation), basic software tools (version control software, compilers, IDEs, debuggers, profilers, parallelization), introduction to large data sets: databases and web scraping, concepts of numerical analysis (differentiation, integration, optimization), introduction to machine learning (neural networks and deep learning), and projection and perturbation methods. This is an intensive graduate course that covers a lot of material in 15 weeks. Students should be ready to spend time with all the coding homeworks and replication project.

ENGL098: Data Science for the Humanities

Over the last decade, humanists have turned to data and to computational methods of data analysis to seek new understandings of literature, history, and culture. This course will provide you with a practical introduction to data-driven inquiry in the humanities, with a focus on statistical analysis in the Python programming language. (No prior knowledge of programming is required or expected). In addition to learning foundational scripting and data science skills, we will ask questions about the role of data in the humanities. How does humanities data differ from data in the physical and social sciences? What new research questions in the humanities can we investigate using data-driven methods? And how can we make our conclusions relevant within the larger frame of humanistic inquiry? Course work will include readings, weekly programming exercises, and a final project.     

LING172/PSYC215: Data Science for Studying Language and the Mind

Data Science for Language & Mind is an entry-level course designed to teach basic principles of data science to students with little or no background in statistics or computer science. Students will learn to identify patterns in data using visualizations and descriptive statistics; make predictions from data using machine learning and optimization; and quantify the certainty of their predictions using statistical models. This course aims to help students build a foundation of critical thinking and computational skills that will allow them to work with data in all fields related to the study of the mind (e.g. linguistics, psychology, philosophy, cognitive science, neuroscience).


PHYS 358,359: Statistics, Data Mining, Machine Learning

This is a two-semester sequence of courses on numerical methods, statistics, and data analysis techniques with particular emphasis on data mining and machine learning applied to large datasets. Topics include basic numerical methods and algorithms, probability theory, classical and Bayesian statistical inference, model fitting, Monte Carlo methods and classification. A number of machine learning approaches are introduced including neural networks. Assignments will involve applications to different areas of the natural science as well as business and medicine.  We will be using Python for programming exercises. Working knowledge of calculus and prior experience in programming (in any language) is required.

PSCI107: Introduction to Data Science

Understanding and interpreting large, quantitative data sets is increasingly central in political and social science. Whether one seeks to understand political communication, international trade, inter-group conflict, or a host of other issues, the availability of large quantities of digital data has revolutionized the study of politics. Nonetheless, most data-related courses focus on statistical estimation, rather than on the related but distinctive problems of data acquisition, management and visualization – in a term, data science. This course seeks to address that imbalance by focusing squarely on the tools of data science. Leaving this course, students will be able to acquire, format, analyze, and visualize various types of political data using the statistical programming language R. This course is not a statistics class, but it will increase the capacity of students to thrive in future statistics classes. ENTERING THIS CLASS… students are expected to have a basic familiarity with computation. While no background in statistics, political science is required, students are expected to be generally familiar with contemporary computing environments (e.g. know how to use a computer) and have a willingness to learn a wide variety of data science tools.

PSCI207: Applied Data Science

Jobs in data science are quickly proliferating throughout nearly every industry in the American economy. The purpose of this class is to build the statistics, programming, and qualitative skills that are required to excel in data science. The substantive focus of the class will largely be on topics related to politics and elections, although the technical skills can be applied to any subject matter.