Volume 19, No. 4, 2022

Multi-Lingual (Urdu And English) Text Detection And Identification In Natural Images Using Attention Based Rnn-Cnn


Syed Ishfaq Manzoor , Dr. Suruchi Talwani , Dr. Sur Jimmy Singla

Abstract

Urdu, an Indo-Aryan language predominantly spoken in South Asia, holds significant importance as the national language and lingua franca of Pakistan. Additionally, it is recognized as an official language alongside English in Pakistan. In India, Urdu is listed as an Eighth Schedule language, acknowledging its cultural heritage and status by the Constitution. Furthermore, several Indian states confer official status to Urdu. In Nepal, Urdu is registered as a regional dialect, and in South Africa, it enjoys protection as a language under the constitution. While being a minority language in Afghanistan and Bangladesh without official recognition. Urdu has witnessed an increase in its usage among internet users in recent times. However, building a robust Recognition system (RS) for cursive nature languages like Urdu presents challenges due to certain complexities. These challenges become more intricate when dealing with variations in text size, fonts, colors, orientation, lighting conditions, and noise within the dataset. To address these issues, deep learning models have shown promising results in data modeling and handling large datasets. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have proven effective in various research areas, including text recognition, voice recognition, and Natural Language Processing (NLP). This paper introduces a CNN-RNN model with an attention mechanism for Urdu image text recognition. The model takes an input image and generates feature sequences using a CNN. These sequences are then processed by a bidirectional RNN to obtain the features in the correct order. However, to improve text segmentation, a bidirectional RNN with an attention mechanism is employed to produce the output. The attention mechanism enables the model to focus on relevant information from the feature sequences. The model undergoes end-to-end training through a standard back propagation algorithm, aided by the attention mechanism. State-of-the-art data set normalizing and balancing techniques, such as SMOOT, have been adopted to achieve enhanced results.


Pages: 742-754

Keywords: image text recognition; deep learning; recurrent neural networks (RNNs); convolutional neural networks (CNNs); bidirectional RNN; attention mechanism; text segmentation; natural scene images

Full Text