Tables
Based on Alwazna (2024), the present study seeks to assess the translation output quality of both ChatGPT and Gemini using certain articles of the Ḥanbalī Sharīʿa Code through automatic evaluation with its TER, BLEU, and ChrF3 metrics and human evaluation with its adequacy and fluency criteria. It carries out automatic evaluation followed by human evaluation to test the former’s reliability results in evaluating the two AI-generated translations and whether such AI-generated translations are reliable within the Arabic-English legal translation context. The present paper argues that AI software programs, including ChatGPT and Gemini, have made considerable advancements in Arabic-English legal translation, although they still encounter certain challenges, particularly in the rendition of legal terms. The paper also claims that the use of automatic evaluation metrics cannot be dependent on individually assessing AI software translation output quality with regard to Arabic-English legal translation; however, human evaluation should be carried out after undertaking automatic evaluation to test whether or not the scores of automatic evaluation are reliable. This study offers a baseline for assessing AI software translation output quality in Arabic-English legal translation, which may have implications for other similar language-pair contexts.