Embibe's AI Translation: Personalized Learning in Vernacular Languages

Unlock new horizons of learning outcomes with our Translation tool to access a wide range of learning materials in your preferred language. Our advanced Neural Machine Translation technology is infused with human intelligence and Artificial Intelligence to provide accurate translations in real-time.

Embibe is an AI platform for delivering learning outcomes at scale [5][6]. We are committed to students all across the world, studying in any language. The translation project aims to provide educational content in vernacular languages to millions of students in India. It is important to curate, create or translate content to ensure students are given personalised learning, practising and assessment content during their learning journey [7][8].

Most of the high-quality academic content is available in English. It will immensely help students if we can translate these contents into Indian vernacular languages. In order to do that, we built in-house Neural Machine Translation models for all major vernacular languages of India. Each model will receive academic English sentences as input and will output the translated sentences in the provided target language.

Currently, we support 11 Indian languages:

Hindi
Gujarati
Marathi
Tamil
Telugu
Bengali
Kannada
Assamese
Oriya
Punjabi
Malayalam

Google Translate sometimes makes mistakes as it’s not specifically built for the academic domain. Here are some examples:

English	Google translation	NMT translation
Which of the following law was given by Einstein:	निम्नलिखित में से कौन सा कानून आइंस्टीन द्वारा दिया गया था:	निम्नलिखित में से कौन सा नियम आइंस्टीन द्वारा दिया गया था:
Which one of the following is not alkaline earth metal?	निम्नलिखित में से कौन क्षारीय पृथ्वी धातु नहीं है	निम्नलिखित में से कौन सा क्षारीय मृदा धातु नहीं है?
Endogenous antigens are produced by intra-cellular bacteria within a host cell.	अंतर्जात प्रतिजन एक मेजबान कोशिका के भीतर इंट्रा-सेलुलर बैक्टीरिया द्वारा निर्मित होते हैं।	अंतर्जात प्रतिजन एक परपोषी कोशिका के भीतर अंत: कोशिकीय जीवाणु द्वारा उत्पन्न किए जाते हैं।

Approach

Now, to build Neural Machine Translation models from scratch, we need lots and lots of data — at least a few million sentences. So, we built a feedback loop that keeps improving as time goes on. We have taken the help of academic translators across all the languages here.

We provide machine-translated sentences to academic translators. They do minor corrections if needed and provide corrected translations, i.e., feedback data to us. Then we train our model with this new feedback data. Now, with the updated Neural Machine Translation models, the quality of machine-translated sentences is better than last time.

Here is a diagram that shows the overall structure of the whole project.

So, to solve the problem of Translation, we are leveraging both Human Intelligence and AI.

By using the Neural Machine Translation technology of Embibe, manual work that needs to be done by academic translators has been reduced by ~80%. Their productivity has increased several times. Eventually, the cost of the translation has also gone down significantly.

Image Translation:

We’re also trying to solve the image-translation problem, where an image with English labels will be fed into the system, and the output will be an image with labels in the target language.

For example, this input image…

…will be automatically converted to the below output image:

We can do minor font-styling updates on top of this output image to make it perfect.

For this project, we first detect text labels from the image, then do OCR for each text label, then translate them using Neural Machine Translation APIs, and lastly, put that translated text into the image at the respective place.

References:

[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. “Attention Is All You Need.”

[2] Faldu, Keyur, Amit Sheth, Prashant Kikani, and Hemang Akabari. “KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding.” arXiv preprint arXiv:2104.08145 (2021).

[3] Gaur, Manas, Keyur Faldu, and Amit Sheth. “Semantics of the Black-Box: Can knowledge graphs help make deep learning systems more interpretable and explainable?.” IEEE Internet Computing 25, no. 1 (2021): 51-59

[4] Sheth, Amit, Manas Gaur, Kaushik Roy, and Keyur Faldu. “Knowledge-intensive Language Understanding for Explainable AI.” IEEE Internet Computing 25, no. 5 (2021): 19-24.

← Back to AI home

Translation

Approach

Image Translation:

References:

Sign Up to Take Unlimited Free Mock Tests

CREATE A FREE ACCOUNT

Enter OTP