Web-based transliteration and translation system between the Urdu and Hindi languages. Punjabi University

Project background and justification

South Asia is one of those unique parts of the world where single languages are written in different scripts. In India, Urdu is written in the Devnagri script (left-to-right), while in Pakistan, it is written in an Arabic-based Urdu script (right-to-left).

The problem of communication between Hindi and Urdu languages has long been a social barrier between the Muslim and Hindu populations of India and a key barrier to people-to-people contact between India and Pakistan.

This project aims to bring the Urdu and Hindi speaking people closer by developing a transliteration/translation tool for the two languages that may help people link across a hostile geographical divide. In so doing we will provide an ICT solution to a social problem that had seemed insurmountable for centuries.

The option of a transliteration component is to enable the well-developed poetic verse of the Urdu language to be available to the Hindi literate public. The translation tool will enable conversion of Urdu websites to Hindi and vice versa, enabling them to read in their own languages. This will in turn facilitate electronic and written communication between people living in India and Pakistan.

Project summary

Even though over 600 million people in India and Pakistan speak Hindi and Urdu, the languages are written in mutually incomprehensible scripts despite sharing the same grammar and more than 70 per cent of commonly used words.

After 12 years of academic research on the development of languages and literacy in Pakistan and India, a tool has been developed to facilitate electronic and written communication between Indians and Pakistanis through the development of a bi-directional web-based Hindi-Urdu Language Transliteration/Translation Tool.

The main modules to be developed are:

• Urdu-Hindi electronic dictionary
• Transliteration/translation rules and mapping tables
• Parallel Hindi/Urdu Corpus
• Tool to convert any Unicode-based Hindi website to Urdu and reverse

The target groups will be media organizations (such as magazines/newspapers), literary and literacy promotional organizations, writers, and NGOs involved in dissemination activity amongst the urban and rural poor, and virtual Hindi-Urdu speaking communities, schools, and colleges.

Organization profile

One of the premier public institutions of higher education in Northern India, Punjabi University was established in 1962, with the highest grading of A by the NAAC. Five departments and one research centre are exclusively devoted to the development of various aspects of the Punjabi language, literature and culture.

The Advanced Centre for the Technical development of the Punjabi Language, Literature and Culture was established in February 2004, with the aim of conducting research and development into the linguistic and computational aspects of the Punjabi Language. The department has also carved a niche in the area of transliteration and translation. It developed, for the first time, an online Shahmukhi (Urdu) to Gurmukhi transliteration tool and a Punjabi to Hindi translation system. Both were demonstrated in the 22nd International conference on Computational Linguistics held at Manchester, UK in August 2008.

More information about the Punjabi University at http://www.punjabiuniversity.ac.in/