Many tools and tutorials promise to help you clean up your messy data, which is an essential step before doing any kind of network, text, spatial, or quantitative analysis or visualization. But how do we even figure out what “clean” means when it comes to complex humanities knowledge, especially when we may not yet know what kind of analysis we eventually want to do? Participants will come out of this class understanding how to create a data plan to capture the parts of their sources that are going to be important for their research questions, handle complex relationships and uncertainty, and format that information into tidy data that can then be reshaped as needed to drive databases, websites, analyses, and visualizations. We will also cover the practical side of using software such as Google Sheets, OpenRefine, and Palladio to collect, tidy, and make exploratory visualizations of humanistic data. Additionally, we’ll learn about using Linked Open Data to interconnect our research databases with objects, documents, and authority lists maintained by institutions such as archives, libraries, and museums, focusing on pragmatic steps that real-life researchers can take to get the most out of connecting their newly-created knowledge (and the data that come with it) back into the larger ecosystem on which we all depend. This course assumes no prior knowledge of databases or coding, and will use freely-available open source tools. We will work with some sample data sets over the course of the week, but participants are encouraged to bring their own data, or sources that they are potentially trying to transform into data, for group “data therapy” sessions in order to apply lessons learned each day to their own work and research.
Dr. Matthew Lincoln is a senior software engineer for text and data mining at JSTOR Labs. He earned his PhD in Art History at the University of Maryland, College Park, and has formerly held positions at the National Gallery of Art, the Getty Research Institute, Carnegie Mellon University Libraries, and as technical lead of The Programming Historian. He is a co-project director of the NEH Digital Humanities Advancement Grant Freedom and the Press Before Freedom of the Press: Tools, Data, and Methods for Researching Secret Printing. His recent publications include The Index of Digital Humanities Abstracts and “Tangled Metaphors: Network Thinking and Network Analysis in the History of Art,” in The Routledge Companion to Digital Humanities and Art History.