Most conventional characters extraction methods include binarization (background determination), region segmentation, and region identification. Incorrect binarization results adversely influence the segmentation and identification results. This can be a problem when color documents are printed with different background color regions as the binarization will not have effective threshold results and subsequent segmentation and identification steps will not work properly. Conventional region segmentation methods are time-consuming for large document images. Conventional region identification methods are applied for the preceding segmentation results, using a bottom-up method. This study presents an intelligent method to solve these problems, which integrates background determination, region segmentation, and region identification to extract characters in color documents with highlight regions. The results demonstrate that the proposed method is more effective and efficient than other methods in terms of binarization results, extraction results, and computational performance.
關聯:
Modern Approaches in Applied Intelligence, Lecture Notes in Computer Science Volume 6704/2011 P.143-152