This class will examine methods and practices for text analysis. Freely available tools and excellent tutorials have made it easier to apply computational text analysis techniques, but researchers may still find themselves struggling to build a corpus, decide between methods, and interpret results. We will survey the hows and whys of a variety of commonly used methods, including word counting, topic modeling, and natural language processing techniques.
Students who take this course will be able to:
- Find and prepare texts for analysis.Store, access, and document their text objects and data.
- Discuss their corpus-building decisions and textual data in ways that are methodologically and disciplinary sound.
- Identify appropriate text analysis methods for a given question.
- Engage in text analysis methods that use word frequency, word location, and natural language processing.
- Articulate statistical, computational, and linguistic principles — and how they intersect with humanistic approaches to texts — for a few text analysis methods.
We will use a mixture of free tools and scripts in R and Python (you don’t need to know R or Python to take the class). We will primarily work together from shared data sets the instructors will provide. This course will be appropriate for people at all levels of technical expertise. Students should have administrative rights to load software on their laptop.
Nichole Nomura is a PhD candidate in the Stanford University Department of English and a graduate of Stanford’s Graduate School of Education (M.A). She studies how science fiction teaches and is taught, using methods from the digital humanities, literary criticism, and education. A member of Stanford’s Literary Lab, she’s worked on projects including Microgenres, Voice, and the Young Readers Database of Literature.
J.D. Porter is a DH Project Specialist at the Price Lab. He received his PhD in English from Stanford University in 2017 and then worked as the Associate/Technical Director at the Stanford Literary Lab. He specializes in text mining, American modernism, and race and ethnicity theory, with occasional forays into jazz studies, network analysis, and ordinary language philosophy. His work has appeared in (or is slated to appear in) Episteme, Cultural Analytics, the pamphlet series of the Stanford Literary Lab, and the edited volumes Ralph Ellison in Context and The Cambridge Companion to the American Short Story.