Simone ReboraGiovanni Pietro Vitali: Distant Reading in R. Analyse the text & visualize the Data

SIMONE REBORA holds a PhD in Foreign Literatures and Literary Studies (University of Verona) and a BSc in Electronic Engineering (Polytechnic University of Torino). He worked as a research fellow at the Universities of Göttingen, Basel, and Bielefeld. Between 2020 and 2022 he was assistant coordinator of the European Network ELIT (Empirical Study of Literature Training Network). Currently, he works as a postdoc at the University of Mainz and he teaches comparative literature and computational thinking at the University of Verona. His main research interests are theory and history of literary historiography, reader response studies, and computational literary studies. His essays have appeared in journals such as “PLOS ONE”, “Digital Scholarship in the Humanities”, and “Modern Language Notes”. In Italian, he published the monographs Claudio Magris (2015) and History/Historie and Digital Humanities (2018).

GIOVANNI PIETRO VITALI is Associate Professor in Digital Humanities at the University of Versailles Saint-Quentin-en-Yvelines – Paris Saclay and Secretary of Humanistica, the French-speaking association of digital humanities. Between 2018 and 2020, he has been a Marie Curie European Research Fellow with the project “Last Letters from the World Wars”. This research was conducted at University College Cork in collaboration with New York University and the University of Reading. He holds a PhD in Italian Literature, obtained in France at the University of Lorraine, and a PhD in Language Sciences, obtained at the University for Foreigners in Perugia. He is also Associate Researcher at the University of Oxford, where he is in charge of the digital analysis of a multilingual corpus of translations of Charlotte Brönte’s novel “Jane Eyre” in the AHRC programme “Prismatic Translations”. 

Giovanni Vitali’s research activities reflect his interdisciplinary background. His three main research areas are History, conflict and politics; Literature and multilingualism; Digital methodologies.

His approach combines cultural studies and the use of digital technology, in particular Natural Language Processing techniques oriented towards discourse analysis. His work is situated after the creation of the corpus, between the digital analysis of the text and the representation of the data. His first area of specialisation in computer science is data visualisation: he is a specialist in digital cartography, GIS (Geographic Information Systems) and network analysis.

Short Description of Workshop

1.0 The Workshop 

Distant reading is one of the most famous methodological approaches that has been constantly taking place in digital humanities, since its formalisation by Franco Moretti in the article Conjectures on World Literature (2000). Distant reading benefits greatly from the use of computational tools. For this reason, we are proposing a course based on the use of R, one of the most popular programming languages used today by the scientific community. 

The course is suitable for beginners who want to start digital humanities training with a complete overview of the most common tools used for distant reading. 

The philosophy of the course is to analyse the text & visualize the data and the course is structured on this dichotomy. 

The objective of the course is to provide the participants with methodological and practical tools that they can utilise for their own research. At the end of the two weeks, they will be able to use R and RStudio in order to apply textual and spatial analysis. R analysis displays results that can be easily presented by graphical representations such as graphs, trees, or maps. As a result, part of the course will be dedicated to open-source programs like Gephi, Gimp and Inkscape, specific to the reworking of vectorial and graphical files.  

2.0 Schedule

The course takes place over two weeks in order to allow the participants to choose to attend one or both parts. However, participation to the entire course is strongly advised.  

The first week is dedicated to the basics of R and natural language processing, three of the most common methods used for distant reading (sentiment analysis, topic modeling, and stylometry) and a brief introduction to machine learning. The objective of this first week is to provide a basic theoretical / methodological understanding of distant reading techniques, together with the practical tools to analyse texts in an R environment. 

The second week is dedicated to data visualization. In this module the participants will focus on mapping, network analysis and graphics. The objective of this week is to give participants the tools to organise the visualisation of data graphically, chronologically, and spatially. If a participant is interested in the second week only, we will assume that s/he has a more than basic knowledge of R programming language. 

At the beginning of the course, the workshop leaders will divide the class in two groups according to their research interests. Each group will carry out some research to be presented on the last day of the workshop, using one of the methodologies introduced during the week.

 Week 1: Analyse the text     Week 2: Visualize the data     
 Day 1 Day 2  Day 3 Day 4 Day 5 Day 6 Day 7 Day 8 Day 9 Day 10 
1st  hour Introduction to the course Natural Language Processing Sentiment analysis Stylometry  Topic modelling Network analysis (Gephi) Network analysis Named-entity Recognition Mapping Mapping 
2nd  hour Introduction to R and RStudio Natural Language Processing Sentiment analysis Stylometry Topic modelling Network analysis (Gephi) Network analysis Named-entity Recognition Mapping Mapping 
3rd  hour Introduction to R and RStudio Natural Language Processing Sentiment analysis Stylometry Machine learning Network analysis (Gephi) Inkscape & Gimp Mapping (Coordinates) Mapping Mapping 
4th  hour Introduction to R and RStudio Natural Language Processing Sentiment analysis Stylometry Machine learning Network analysis (Gephi) Inkscape & Gimp Mapping (Coordinates) Mapping Mapping 

3.0 Technical Requirements

  • Participants should have their own computer with at least 5-10GB of available space. 
  • Operating System: Windows (preferably 7+), Linux or Mac OSX. 
  • Java 8 for the operating system. You may need to create an Oracle account to download Java 8. 
  • Zip/unzip programs (these are programs that you normally have by default in your computer, like 7-Zip or WinZip for Windows, to manage compressed folders).
  • Browser: Mozilla Firefox and Google Chrome. 
  • Simple text reading program (for txt and csv) like Sublime Text Editor 3 for Windows, Linux and Mac. 
  • Google account  
  • R version 3.6.3 (2020-02-29) — Holding the Windsock“ 
  • RStudio and Xquartz (the latter for Mac)
  • Openoffice 
  • Gephi 
  • Inkscape 
  • Gimp