Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages
LLM-as-a-Judge has become the dominant evaluation paradigm for many natural language generation tasks, due to shortcomings of conventional metrics and high correlations with human judgment, albeit mostly in English. There are now attempts to extend LLM-as-a-Judge to multilingual settings including low-resource languages. However, LLMs have limited proficiency in low-resource languages, and there is often no adequate human validation in these settings. To highlight the scope of the problem and current practices, we explore the use of LLM-as-a-Judge evaluators in ACL Anthology papers focusing on
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorA. Seza Doğruöz →
Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages
- Linked via arxiv authorXixian Liao →
Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages
- Linked via arxiv authorVerena Blaschke →
Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages
- Linked via arxiv authorJakob Prange →
Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages
- Linked via arxiv authorSenyu Li →
Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages
- Linked via arxiv authorDavid Ifeoluwa Adelani →
Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages
