In Codice Ratio is a research project that aims at developing novel methods and tools to support content analysis and knowledge discovery from large collections of historical documents.
The goal is to provide humanities scholars with novel tools to conduct data-driven studies over large historical sources. The project concentrates on the collections of the Vatican Secret Archives, one of the largest and most important historical archive in the world. In an extension of 85 kilometres of shelving, it maintains more than 600 archival collections containing historical documents on the Vatican activities, such as, all the acts promulgated by the Vatican, account books, correspondence of the popes, starting from the eighth century.
The project team is developing a full-fledged system to automatically transcribe the contents of the manuscripts based on character segmentation. The idea is to govern imprecise character segmentation by considering that correct segments are those that give rise to a sequence of characters that more likely compose a Latin word. The team have designed a principled solution that relies on convolutional neural networks and statistical language models.
"Our approach requires minimal training efforts, making the transcription process more scalable as the production of training sets requires a few pages and can be easily crowdsourced," the researchers said. "Our system has been able to produce good transcriptions that can be used by paleographers as a solid basis to speedup the transcription process at a large scale."
These documents record the inbound and outbound correspondence of the popes: political letters that testify the broad activities of the popes in the ecclesiastical and temporal spheres; authoritative opinions on legal issues; documents addressed to sovereigns, religious and political institutions scattered throughout the globe; correspondence relating to the harvest of tithes and tributes due to the Church. Never having been transcribed in the past, these documents are of unprecedented historical relevance. Preliminary results are encouraging.
Source: In Codice Ratio
Top image: Vatican Registers