Statistical Machine Translation with Rule Based Re-ordering of Source Sentences Amit Sangodkar
Vasudevan N
Om P. Damani
(CSE, IIT Bombay)
Illustration - Re-ordering
sung_have poets praise_in songs land_of this Many Bengali nsubj prep dobj amod nn prep det
Output: Many Bengali poets this land of praise in
Illustration - Re-ordering
sung_have poets praise_in songs land_of this Many Bengali nsubj prep dobj amod nn prep det
Output: Many Bengali poets this land of praise in
Illustration - Re-ordering
sung_have poets praise_in songs land_of this Many Bengali nsubj prep dobj amod nn prep det
Output: Many Bengali poets this land of praise in songs
Illustration - Re-ordering
sung_have poets praise_in songs land_of this Many Bengali nsubj prep dobj amod nn prep det
Output: Many Bengali poets this land of praise in songs sung have
कई बंगाली कवियों ने इस महान भूमि की प्रशंसा के गीत गाए हैं
Experimental Setup
Procedure
Train Moses using Training data with 6-gram language model
Tune the Moses using Development data
Decode Testing data using trained Moses
This experimentation procedure on pure data and reordered data
Results
4.0140 3.7335 4.2426 3.9036 NIST 0.0853 0.0836 0.0842 0.0815 BLEU IIIT
Data Set 4.6923 4.8539 4.7287 4.7600 NIST 0.1601 0.1751 0.1450 0.1488 BLEU EILMT Test Dev Test Dev Re-ordered Baseline Metric Corpus
Translation Example - I
Actual : इसी वर्ष नील व़्यापार और नील उत़्पादन के इतिहास में एक मोड़ आया.
Baseline : इस वर्ष में एक निर्धारित बिंदु रहे के इतिहास में नील व्यापार और नील उत़्पादन.
Re-ordered : इस साल नील व्यापार और नील उत़्पादन के इतिहास में यह एक रहा था.
Translation Example - II
Actual : वे गुलामी की जिंदगी से रिहाई चाहते हैं.
Baseline : वे चाहते हैं कि deliverance का जीवन से गुलामी की है.
Re-ordered : वे गुलामी की जिंदगी से रिहाई चाहते हैं.
Conclusion
Using Linguistic knowledge appears to improve the SMT quality
BLEU score applicability in this context needs to be investigated
Acknowledgements
We acknowledge the Department of IT (DIT), Government of India and the English-to-Indian Languages (EILMT) consortium for making the EILMT tourism dataset available.
IIIT Data Set: Data acquired during DARPA TIDES MT project 2003 and later refined at LTRC,IIIT-H.
References
[Hieu2008] Hieu Hoang, Philipp Koehn, Design of the Moses Decoder for Statistical Machine Translation, ACL Workshop on Software engineering, testing, and quality assurance for NLP 2008.
[Marie2006] Marie-Catherine de Marneffe, Bill MacCartney and Christopher D. Manning, Generating Typed Dependency Parses from Phrase Structure Parses. In Proceedings of LREC-06. 2006.
[Manual2008] Stanford Dependencies Manual, Available at http://nlp.stanford.edu/software/dependencies_manual.pdf..
[Moses] Moses Tutorial, Available at http://www.statmt.org/moses/?n=Moses.Tutorial. .
[Singh2007] Smriti. Singh, Mrugunk. Dalal, Vishal Vachhani, Pushpak Bhattacharyya, Om P. Damani. Hindi Generation from Interlingua (UNL), Machine Translation Summit XI, 2007.
Comments