A Linguistic Evaluation of the Output Quality of 'Google Translate' and 'Bing Translator' in Chinese-English Translation
Abstract
This study investigates and compares the translation output quality of two statistical machine translation (SMT) systems – Google Translate and Bing Translator, by performing a human evaluation method called ‘linguistic evaluation’. The language pair in the translation tasks is Chinese – English (with English as the target language), and the domain is news articles. 50 Chinese sentences extracted from several lengthy Chinese news articles were automatically translated by Google Translate and Bing Translator into 50 sets of translations in English. Errors in the output of both systems were manually analysed and annotated based on the proposed error taxonomy, which allowed me to evaluate two MT systems at each linguistic level, namely the orthographical level, the morphological level, the semantic level, the lexical level, and the syntactic level.
A fine-grained taxonomy of linguistic errors is proposed and implemented in the study. Subcategories of errors at each linguistic level are tailored and defined for Chinese-English language pair (with English as the target language). The output sentences are analysed thoroughly, using a standardised form of ‘markup’ with an input-output mapping.
The results show that in the same quantity of Chinese-to-English translation tasks, Bing Translator, an SMT system which incorporates linguistic information, does outperform Google Translate, which is a pure SMT system that does not use linguistic rules to perform translation tasks. In general, Bing produces fewer linguistic errors, especially at syntactic level. The distribution of error types shows that syntactic and lexical errors are particularly problematic in both SMT systems, which suggests this is where developers should focus when attempting to improve the output quality of Chinese-English translation tasks.