In HisDoc III we target historical document classi cation for large amounts of uncategorized facsimiles with the intent to provide new capabilities for researchers in the Digital Humanities. In particular, we will address the task of categorizing document images with respect to content, language, script, and layout. To do so, we will leverage the expertise gained from our previous projects HisDoc and HisDoc 2.01.
In HisDoc we have shown that historical Document Image Analysis (Dia) can be e ectively applied to extract layout structures and textual transcriptions and in the current HisDoc 2.0 project we successfully retrieved additional paleographic information. The novel contributions of HisDoc III will be complemented by these methods to cope with large document collections.
HisDoc III is funded by the Swiss National Science Foundation (SNF) under the Project Number 169618.