Deteriorated image classification model for malayalam palm leaf manuscripts
Document Type
Article
Publication Title
Journal of Intelligent and Fuzzy Systems
Abstract
The method for document image classification presented in this paper mainly focuses on six different Malayalam palm leaf manuscripts categories. The proposed approach consists of three phases: dataset analysis, building a bag of words repository followed by recognition and classification using a voting approach. The palm leaf manuscripts are initially subject to pre-processing and subjective analysis techniques to create a bag of words repository during the dataset analysis phase. Next, the textual components from the manuscripts are extracted for recognition using Tesseract 4 OCR with default and self-adapted training sets and a deep-learning algorithm. The Bag of Words approach is used in the third phase to categorize the palm leaf manuscripts based on textual components recognized by OCR using a voting process. Experimental analysis was done to analyze the proposed approach with and without the voting techniques, varying the size of the Bag of Words with default/self-adapted training datasets using Tesseract OCR and a deep learning model. Experimental analysis proves that the proposed approach works equally well with/ without voting with a bag of words technique using Tesseract OCR. It is noticed that, for document classification, an overall accuracy of 83% without voting and 84.5% with voting is achieved with an F-score of 0.90 in both cases using Teserract OCR. Overall, the proposed approach proves to be high generalizable based on trial wise experiments with Bag of Words, offering a reliable way for classifying deteriorated Malayalam handwritten palm manuscripts.
First Page
4031
Last Page
4049
DOI
10.3233/JIFS-223713
Publication Date
8-24-2023
Keywords
ancient document images, deep learning, Document image classification, handwritten document analysis, palm leaf manuscripts, Tesseract OCR
Recommended Citation
B.J. Bipin Nair, N. Shobha Rani, and M. Khan, "Deteriorated image classification model for malayalam palm leaf manuscripts," Journal of Intelligent and Fuzzy Systems, vol. 45, no. 3, pp. 4031 - 4049, Aug 2023. doi:10.3233/JIFS-223713
Comments
IR Deposit conditions:
OA version (pathway a) Accepted version
No embargo
Published sources is acknowledged with citation
Must link to published version with DOI