Text Verification (Sanskrit) Work From Home Internship

Applications are closed for this internship. Click here to browse more internships.
Text Verification (Sanskrit)
Start Date
Starts immediatelyImmediately
Duration
6 Months
Stipend
₹ 1,000 /month + Incentives
APPLY BY
26 Jan' 22
Posted 3 weeks ago
Internship

About the work from home job/internship

Selected intern's day-to-day responsibilities include:

1. Correcting pages written in Sanskrit to check and highlight the mistakes following the original Sanskrit-based scanned textbook
2. Correcting the seldom mistakes in the annotated text
3. Working on the text annotation of the scanned documents

Skill(s) required

Hindi Proficiency (Spoken) Hindi Proficiency (Written)
Earn certifications in these skills

Who can apply

Only those candidates can apply who:

1. are available for the work from home job/internship

2. can start the work from home job/internship between 12th Jan'22 and 16th Feb'22

3. are available for duration of 6 months

4. have relevant skills and interests

* Women wanting to start/restart their career can also apply.

Perks

Certificate Letter of recommendation Flexible work hours
Additional information

Stipend structure: This is a performance-based internship. In addition to the minimum-assured stipend, you will also be paid a performance-linked incentive (₹ 8 per page).

1. Prof. Ganesh Ramakrishnan (from the department of CSE) and Prof. Ramasubramanian (from the department of humanities and social sciences) are attempting to significantly speed up the process of digitization of Sanskrit texts
2. Enabled by the OCR and post-editing related technologies developed at IIT Bombay (see https://www.cse.iitb.ac.in/~ganesh/videosurvellianceanalytics/), they are now seeking the participation of the community of Sanskrit lovers, those with even basic knowledge of Sanskrit and wanting to volunteer for Sanskrit (with some nominal financial compensation for the time spent)
3. The demo video for our framework is at - https://youtu.be/u9bqUDrGugc all applicants must watch this
4. To install the software, you can visit - https://github.com/rohitsaluja22/OpenOCRCorrect and follow the instructions given in https://youtu.be/0hcdlF-zn8E
5. This can be a remote internship
6. Optical character recognition (OCR) is the process of converting document images into an editable electronic format
7. This has many advantages like data compression, enabling search or edit options in the images/text, and creating the database for other applications like machine translation, speech recognition, and enhancing dictionaries and language models
8. OCR in Indian languages is quite challenging due to richness in inflections
9. Using open-source and commercial OCR systems, we have observed the word error rates (WER) of around 20-50% on printed documents in four different Indic languages
10. Moreover, developing a highly accurate OCR system with an accuracy as high as 90% is not useful unless aided by the mechanism to identify errors
11. So, we started with the problem of developing 'OpenOCRCorrect', an end-to-end framework for error detection and corrections in Indic-OCR
12. Our models outperform state-of-the-art results in “error detection in Indic-OCR” for six Indic languages with varied inflections and we have solved the out-of-vocabulary problem for “error correction in Indic-OCR” in our ICDAR-2017 conference paper
13. We further improve the results with the help of sub-word embeddings in our ICDAR-2019 conference paper
14. The demo video for our framework is in the first video
15. Currently, we are targeting Sanskrit and even after a good accuracy in OCR, the detected text needs a lot of improvement
16. Further, in the digitization process of such texts, the second step would be spelling correction and formatting of the text detected by the OCR models
17. Hence, the selected candidate’s task would be to verify the corrected OCR text following the scanned images
18. This internship does not involve programming or technical knowledge but primarily deals with manual text verification, secondary aspects are software testing and software installation

Number of openings

5

About IIT Bombay

The Indian Institute of Technology, Bombay (IITB) is one of the fifteen higher institutes of technology in the country, set up intending to make facilities available for higher education, research, and training in various fields of science and technology. Professor Ganesh Ramakrishnan (department of CSE) and professor Ramasubramanian (department of humanities and social sciences) are attempting to significantly speed up the process of digitization of Sanskrit texts. Enabled by the OCR and post-editing related technologies developed at IIT Bombay, they are now seeking the participation of the community of Sanskrit lovers, software developers, machine learning enthusiasts, project managers, etc.
Activity on Internshala
Hiring since December 2013
418 opportunities posted
108 candidates hired
Sign up to continue

OR

By signing up, you agree to our Terms and Conditions.