Text Analysis

This class will examine methods and practices for text analysis. Freely available tools and excellent tutorials have made it easier to apply computational text analysis techniques, but researchers may still find themselves struggling to build a corpus, decide between methods, and interpret results. We will survey the hows and whys of a variety of commonly used methods, including word counting, topic modeling, and natural language processing techniques.

Students who take this course will be able to:

  • Find and prepare texts for analysis.
  • Store, access, and document their text objects and data.
  • Discuss their corpus-building decisions and textual data in ways that are methodologically and disciplinarily sound.
  • Identify appropriate text analysis methods for a given question.
  • Engage in text analysis methods that use word frequency, word location, and natural language processing.
  • Articulate statistical, computational, and linguistic principles — and how they intersect with humanistic approaches to texts — for a few text analysis methods.

We will use a mixture of free tools and scripts in R and Python (you don’t need to know R or Python to take the class). We will primarily work together from shared data sets the instructors will provide. This course will be appropriate for people at all levels of technical expertise. Students should have administrative rights to load software on their laptop.

Instructor: Scott Enderle, Penn

About the Instructor- Representing the libraries and working with Price Lab stakeholders, Scott Enderle collaborates with Penn faculty, staff, students, librarians, and the international community to facilitate the creation, use and re-use of data in both instructional and research environments. He provides expert advisory services for the creation of digital textual materials, for the management of digital textual projects, and for the form of online publications in order to create effective, innovative, sustainable digital products in support of Penn’s mission. Before coming to Penn, Scott was Visiting Assistant Professor of English at Skidmore College, where he taught courses and mentored research on Digital Humanities and 18th century literature. In addition, Scott has more than 10 years of programming experience. Some of his research involves interpreting the networks produced by topic modeling as well as text mining, archival research, and data visualization. Scott holds a B.A. in English Literature from Texas A&M. He holds an M.A. and a Ph.D. in English Literature from Penn. Scott was the 2010-11 recipient of the Brizdle-Schoenberg Fellowship in the History of Material Texts at Penn Libraries.

Skip to toolbar