About This Project
Devanagari Handwritten Word Recognition
This project demonstrates the application of deep learning techniques for recognizing handwritten words in the Devanagari script. It was developed as a Final Year Project under the guidance of the National Informatics Centre (NIC), aiming to create a practical tool for showcasing OCR capabilities for Indic scripts.
Motivation
India possesses a vast repository of historical documents, government records, and literature primarily written in Devanagari and other Brahmi-derived scripts. Manually processing and digitizing these valuable handwritten resources is a significant challenge – often slow, costly, and prone to errors. While Optical Character Recognition (OCR) offers a technological solution, generic OCR systems frequently struggle with the complexities of Indic scripts like Devanagari. Unique features such as mandatory header lines (shirorekha), vowel modifiers (matras), conjunct characters, and variations in handwriting styles necessitate specialized models. This project addresses this gap by focusing specifically on building and demonstrating a model for isolated handwritten Devanagari words.
Objectives
- Develop and implement a deep learning model (Convolutional + Recurrent Neural Network with CTC loss) tailored for handwritten Devanagari word recognition.
- Create an intuitive and accessible web-based interface for demonstrating the model's recognition capabilities using user-uploaded images or provided examples.
- Evaluate the performance and accuracy of the developed OCR model on relevant test data.
- Highlight the potential applicability of such specialized OCR technology in the context of digitizing handwritten records and improving data processing within government and public sector domains, relevant to NIC's mission.
Potential Relevance (NIC Context)
Accurate and efficient Devanagari OCR technology holds significant potential to support various NIC initiatives and e-Governance goals under the Digital India programme. Key applications include:
- **Digitization of Archives:** Accelerating the conversion of legacy paper-based government records, historical manuscripts, and official archives into searchable digital formats.
- **Improved Data Entry:** Streamlining processes that involve capturing information from handwritten forms (e.g., census data, surveys, application forms).
- **Enhanced Searchability:** Enabling full-text search across large volumes of previously inaccessible handwritten documents.
- **Accessibility:** Supporting initiatives to make information more accessible by converting handwritten materials into machine-readable text that can be used with assistive technologies.
Yashaswini Khansama