Machine translation from one language to another language is one of the most studied field in Machine learning and Artificial Intelligence. Researchers are working on many projects for better machine translation of languages. Google, Big and Microsoft is already translating billions of words per day using Machine Learning and Artificial Intelligence. But machine translation accuracy does not meet human translations. These technology giants are working towards improving their translation quality.
Researchers at the Dartmouth College have turned on to Bible for training their machine learning model to achieve great accuracy. The Bible verses are one of the untapped data set to be used for machine learning.
The Dartmouth research team is exploring the potential of Bible data; Dartmouth press termed it“a large, previously untapped data set of aligned parallel text (or translation/s)”. University released a press statement on 23 October detailing their research.
The Bible is huge data set of written text in many styles and its translation in other languages. For example each version of Bible contains more than 31,000 verses which can produce 1.5 million unique pairs in the form of source and translated pairs. This quantity is a huge data set for training any machine learning model.
Previously such work of training the model was done on the Wikipedia, Shakespeare and other open source data format. This is the first time when Bible data is used for training the model and producing machine translation.
Bible is already indexed in chapters and verse numbers, which will make data scientist to create training data in much better to get maximum accuracy in the Machine learning model. Such pre-index data will also eliminate the risk of errors; as well structured human verified data of Bible is given to machine learning program.
The Dartmouth researcher’s team used 34 statically distinct Bible versions ranging in linguistic complexity from “King James Version” to the “Bible in Basic English” for training the model. The “Moses” and “Seq2Seq” algorithms were used in the project.
Finally the system was developed to translate one style of writing to other versions suitable for readers of various age groups, non-native English speakers or for any other variety of audience.