Instead of determining explicit transformations for the equalization. Generative neural networks are used to generate equalized variants of images The dewarping tool is also interesting due to its novel architecture. It predominantly achieves very good results on the entire projectĭata. The cropping tool based on computer vision is particularly noteworthy for its For this purpose, tools forīinarization, deskewing, cropping and dewarping were implemented. Performance of the subsequent OCR modules. The digitized material with the aim of improving the image quality and thus the The first module project image optimization focused on the pre-processing of Integrated into the OCR-D software system. In both modules several processors were developed and Project participants: Andreas Dengel, Martin Jenckel, Khurram HashmiĭFKI was involved in the OCR-D project with two modules: Image optimization and German Research Center for Artificial Intelligence (DFKI) Finally the end result is transferred toįrom the project proposals for the DFG’s module project call in March 2017,Įight projects were approved: Scalable methods of text and structure recognition for full text digitization of historical prints: Image Optimisation To their typographic function before the OCR result is improved in the Or elements of the fully text-recognized document are then classified according Is based on neural networks in all modern approaches. The recognition of the lines or theīaseline is particularly important for the subsequent text recognition, which This is followed by layout recognition, which identifies the textĪreas of a page down to line level. First, a digital image is preprocessed for text recognition byĬropping, deskewing, dewarping, despeckling and binarizing it into a black and Several upstream and downstream steps in addition to the actual text Full text recognition is seen as a complex process that includes In the first project phase, a functional model for the OCR-D workflow wasĭeveloped. ![]() OLA-HD – An OCR-D long-term archive for historical books.Development of a model repository and an automatic font recognition for OCR-D.Automatic post-correction of historical OCR captured prints with integrated optional interactive correction.Optimized use of OCR processes – Tesseract as a component in the OCR-D workflow.NN/FST – Unsupervised OCR-Postcorrection based on Neural Networks and Finite-state Transducers.Further development of a semi-automated open source tool for layout analysis and region extraction and classification (LAREX) of early printing.Scalable text and structure recognition methods for the full text digitization of historical prints: Layout Recognition.Scalable methods of text and structure recognition for full text digitization of historical prints: Image Optimisation.
0 Comments
Leave a Reply. |