Text Correction (Sanskrit) work from home job/internship at IIT Bombay
Text Correction (Sanskrit)
Start Date
Starts immediatelyImmediately
6 Months
1000 /month + Incentives
Apply By
6 Dec' 20
About IIT Bombay
The Indian Institute of Technology, Bombay (IITB) is one of the fifteen higher institutes of technology in the country set up with the objective of making facilities available for higher education, research, and training in various fields of science and technology. With the same mission and vision, Prof. Ganesh Ramakrishnan is gearing to take rural India a leap ahead. For his outstanding contributions, he has also been awarded the IBM Faculty Award in 2011. IIT Bombay has also honored Prof Ganesh's work on "Adaptive framework for end-to-end corrections in Indic OCR".
About the work from home job/internship
Selected intern's day-to-day responsibilities include:

1. Correcting pages written in Sanskrit to check and highlight the mistakes in accordance with the original Sanskrit-based scanned textbook
2. Correcting the seldom mistakes in the annotated text
3. Working on the text annotation of the scanned documents

Note: This internship does not involve programming or technical knowledge but primarily deals with manual text verification, secondary aspects are software testing and software installation.
Skill(s) required
Hindi Proficiency (Spoken) Hindi Proficiency (Written)
Who can apply

Only those candidates can apply who:

1. are available for the work from home job/internship

2. can start the work from home job/internship between 21st Nov'20 and 26th Dec'20

3. are available for duration of 6 months

4. have relevant skills and interests

* Women wanting to start/restart their career can also apply.

Other requirements

1. Proficiency in Sanskrit (written) is a must

2. Selected candidates are expected to be familiar with basic computer operations and working with computer software

3. Candidates having advanced knowledge in Sanskrit or prior experience with the language or a degree in Sanskrit would only be considered

Certificate Letter of recommendation Flexible work hours 5 days a week
Additional Information

Stipend structure: This is a performance-based internship. In addition to the minimum-assured stipend, you will also be paid a performance-linked incentive (₹ 8 per page).

Prof. Ganesh Ramakrishnan (Dept of CSE) and Prof. Ramasubramanian (Dept of Humanities and Social Sciences) are attempting to significantly speed up the process of digitization of Sanskrit texts. Enabled by the OCR and post-editing related technologies developed at IIT Bombay (see https://www.cse.iitb.ac.in/~ganesh/videosurvellianceanalytics/), they are now seeking the participation of the community of Sanskrit lovers, those with even basic knowledge of Sanskrit and wanting to volunteer for Sanskrit (with some nominal financial compensation for the time spent).
Volunteers are required to be proficient in Sanskrit with sound knowledge of grammar, preferably having some degree or professional credentials in Sanskrit.
The demo video for our framework is at https://youtu.be/u9bqUDrGugc (MUST WATCH for applying candidates) To install the software, you can go to https://github.com/rohitsaluja22/OpenOCRCorrect and follow the instructions given in https://youtu.be/0hcdlF-zn8E This can be a remote internship.

Optical Character Recognition (OCR) is the process of converting the document images into an editable electronic format. This has many advantages like data compression, enabling search or edit options in the images/text, and creating the database for other applications like Machine Translation, Speech Recognition, and enhancing dictionaries and language models. OCR in Indian Languages is quite challenging due to richness in inflections. Using Open Source and Commercial OCR systems, we have observed the Word Error Rates (WER) of around 20-50% on printed documents in four different Indic languages. Moreover, developing a highly accurate OCR system with accuracy as high as 90% is not useful unless aided by the mechanism to identify errors. So, we started with the problem of developing "OpenOCRCorrect", an end-to-end framework for Error Detection and Corrections in Indic-OCR. Our models outperform state-of-the-art results in “Error Detection in Indic-OCR” for six Indic languages with varied inflections and we have solved the Out of Vocabulary problem for “Error Correction in Indic-OCR” in our ICDAR-2017 conference paper. We further improve the results with the help of sub-word embeddings in our ICDAR-2019 conference paper. The demo video for our framework is in the first video. Currently, we are targeting Sanskrit. Even after a good accuracy in OCR, the detected text needs a lot of improvement. Further, in the digitization process of such texts, the second step would be spelling correction and formatting of the text detected by the OCR models. Hence, the selected candidate’s task would be verifying the corrected OCR text in accordance with the scanned images.

Number of openings

Save yourself from fraud!

If an employer asks you to pay any security deposit, registration fee, laptop fee, etc., do not pay and notify us immediately. Remember, Internshala doesn't charge a fee from the students to apply to a job or an internship & we don't allow other companies to do so either.