Binary document image is still one of the most important information carriers in this era of data. In this final exam, we will present two novel technologies to learn and understand low-level features from document images, and we also apply these technologies in the applications including compression, reconstruction, registration, and searching.
The first learning technology is the entropy-based dictionary learning, which is a method to learn a strong prior for document images. The information in this prior is used to encode the image effectively. If there are more than one page to be encoded, we impose hierarchical structure onto the dictionary, and dynamically update the dictionary. Compared with the best existing methods, we achieve much higher compression ratio.
The dictionary prior we proposed is also used to restore noisy document images. Our dictionary-based restoration improves the document image quality, and the encoding effectiveness simultaneously.
The second learning technology is layout structure detection for document images. Our layout detection method is faster and more efficient, compared with conventional methods. Using this technology, we construct sparse feature set for document images, which is then used in our novel, efficient document image searching system.
|Advisor:||Bouman, Charles A. Bouman A.|
|Commitee:||Bell, Mark R., Comer, Mary L.|
|Department:||Electrical and Computer Engineering|
|School Location:||United States -- Indiana|
|Source:||DAI-B 77/01(E), Dissertation Abstracts International|
|Subjects:||Statistics, Electrical engineering, Computer science|
|Keywords:||Binary document image, Compression, Image analysis, Reconstruction, Search, Sparse coding|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be