MPIWG

The Sphere Knowledge System Evolution and the Shared Scientific Identity of Europe

Web Services



CorDeep

CorDeep is a machine-learning based web application to extract visual elements from historical sources and to classify pages that contain numerical and alphanumerical tables. It locates and classifies visual elements into the following categories: “Content Illustrations,” “Initials,” “Decorations,” and “Printers’s Marks”. CorDeep is trained on the Sphaera corpus, which is a collection of 359 early modern treatises containing about 78,000 pages, 30,000 visual elements, and 10,000 pages containing tables. The visual elements were manually annotated with bounding boxes and semantic labels whereas the pages with tables were identified semiautomatically by an incrementally improved model supervised by a human expert. CorDeep reaches an average precision of up to 98% concerning the detection of visual elements and an accuracy of 94% concerning the classification of pages containing tables. These values might change depending on the style, content, and quality of inputted images.

Go to the CorDeep Web Application

This web-service is based on the article below:
Büttner J, Martinetz J, El-Hajj H, Valleriani M. “CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents”. Journal of Imaging. 2022; 8(10):285. https://doi.org/10.3390/jimaging8100285