This article was published in our August 2018 newsletter. Sign up here.
Though it might sound rather odd given the endless accusations of plagiarism in the academic domain, copying is in fact a cultural technique. Without manuscripts being copied, the knowledge of the ancient world would never have been passed on. And the Brothers Grimm would not have left us with their collection of fairy tales. They combed through books and manuscripts in search of stories and transcribed them. They also asked their acquaintances to write down folk tales for them. Together with their transcriptions, these notes were the raw material that the Brothers Grimm elaborated for their collection and published for the first time in 1812 in the form of their "Children's and Household Tales".
Using IT tools to "query" digital data
The evolution history of this collection of fairy tales raises many questions. What was the origin of the stories? How much did the Brothers Grimm add to them? Why were some stories rewritten in the numerous subsequent editions, while others were removed and new ones added? In just the same way that IT tools are used nowadays to analyse academic work for suspected instances of plagiarism, such instruments can also help literary scholars and historians research the way literary and historic texts evolved and were passed down from one generation to the next.
A team of young computer scientists and humanities scholars at the University of Göttingen have been busy designing precisely such a tool. "We are developing a system that can scour extensive text data for traces of intertextuality – i.e. elements shared by different texts", says Marco Büchler. The researcher with a PhD in computer science has run the Electronic Text Reuse Acquisition Project (eTRAP) research group since 2014. This is one of six groups of young researchers that the Federal Ministry of Education and Research (BMBF) is funding in the digital humanities until early 2019.
Recognising fairy tale motifs as patterns
One of the things the Göttingen-based group is studying are the variants of Grimms' fairy tales in different languages. To this end, they are looking for similarities, for instance between Grimms' "Snow White and the Seven Dwarfs" and the Russian "Tale of the Dead Princess and the Seven Knights" by Alexander Pushkin. The "yardstick" by which similarities are measured are the narrative motifs. Step by step, the researchers supply the digitised fairy tale texts with annotations relating to the motifs, and train their Text Reuse Detection Machine, or TRACER for short, to recognise them as patterns. This also reveals the differences, of course: in Pushkin's tale it is not dwarfs who protect the princess, as in Grimms' story, but knights.
Automatically tracking texts through space and time
Marco Büchler has developed 700 algorithms for TRACER. This digital research tool is designed to be non-language-specific and has already been applied to texts in nine languages, including English, Arabic, Coptic, Hebrew and Tibetan. "Once the tool has been fully developed, it will be possible to track the history of texts, and the way they have been handed down, in detail", explains Büchler.
TRACER allows data to be worked through more quickly. For example, it takes only three to five hours to compare the seven editions of Grimms' Children's and Household Tales that were published during the course of four decades.
Tutorials for academics in Europe
It will be quite some time before the tool is child’s play to operate, however. As yet it does not feature an intuitive user interface. Academics wishing to use TRACER have to be thoroughly familiarised with its functionality so that they learn how to configure the system to their specific task. "People also have reservations and concerns, as is always the case when it comes to artificial intelligence", explains Büchler. This is why the Göttingen team supports academics interested in their new tool. They tour Europe's universities, giving their colleagues tutorials and trying to convince them of the benefits. The researchers also present their work at trade fairs such as the international Cebit, thereby ensuring that a wider public can keep abreast of developments in the digital humanities.
Digital humanities group of young researchers at the University of Würzburg
"Computational literary genre stylistics", or CLiGS for short, is the name of another young research group funded by the BMBF. This project at the Department for Literary Computing at the University of Würzburg is studying several collections of French and Spanish texts. One objective is to bring together two research camps that have often been somewhat distant in the past: literary scholars trained in the "traditional" hermeneutic way and practitioners of computer-based text analysis.www.cligs.hypotheses.org