This class will examine methods and practices for text analysis. Freely available tools and excellent tutorials have made it easier to apply computational text analysis techniques, but researchers may still find themselves struggling to build a corpus, decide between methods, and interpret results. We will survey the hows and whys of a variety of commonly used methods, including word counting, topic modeling, and natural language processing techniques.Students who take this course will be able to:
Find and prepare texts for analysis.Store, access, and document their text objects and data.
Discuss their corpus-building decisions and textual data in ways that are methodologically and disciplinary sound.
Identify appropriate text analysis methods for a given question.
Engage in text analysis methods that use word frequency, word location, and natural language processing.
Articulate statistical, computational, and linguistic principles — and how they intersect with humanistic approaches to texts — for a few text analysis methods.
We will use a mixture of free tools and scripts in R and Python (you don’t need to know R or Python to take the class). We will primarily work together from shared data sets the instructors will provide. This course will be appropriate for people at all levels of technical expertise. Students should have administrative rights to load software on their laptop.
Representing the libraries and working with Price Lab stakeholders, Dr. Scott Enderle collaborates with Penn faculty, staff, students, librarians, and the international community to facilitate the creation, use and re-use of data in both instructional and research environments. He provides expert advisory services for the creation of digital textual materials, for the management of digital textual projects, and for the form of online publications in order to create effective, innovative, sustainable digital products in support of Penn’s mission. Before coming to Penn, Scott was Visiting Assistant Professor of English at Skidmore College, where he taught courses and mentored research on Digital Humanities and 18th century literature. In addition, Scott has more than 10 years of programming experience. Some of his research involves interpreting the networks produced by topic modeling as well as text mining, archival research, and data visualization. Scott holds a B.A. in English Literature from Texas A&M. He holds an M.A. and a Ph.D. in English Literature from Penn. Scott was the 2010-11 recipient of the Brizdle-Schoenberg Fellowship in the History of Material Texts at Penn Libraries.
Katie Rawson is Director of Library Services and Operations at the University of Pennsylvania’s Annenberg School for Communication. She previously worked as the Director of Learning Innovation and the Coordinator for Digital Research at the Penn Libraries and the English librarian at Emory University. She has a PhD from Emory’s Institute for the Liberal Arts. With Franky Abbott and Sarah Melton (among others), she published Southern Spaces; with Molly Des Jardin, she founded Word Lab; with Trevor Muñoz, she developed Curating Menus; with Elliott Shore, she wrote Dining Out: A Global History of Restaurants. She has taught computational text analysis workshops in various venues, including HILT, for the past nine years. Whether studying data models or short-order cooks, her research focuses on ways of knowing.