Read original ↗
paperarXivTrust 82 · PrimaryPublished yesterdayLive · 6h ago

Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages

LLM-as-a-Judge has become the dominant evaluation paradigm for many natural language generation tasks, due to shortcomings of conventional metrics and high correlations with human judgment, albeit mostly in English. There are now attempts to extend LLM-as-a-Judge to multilingual settings including low-resource languages. However, LLMs have limited proficiency in low-resource languages, and there is often no adequate human validation in these settings. To highlight the scope of the problem and current practices, we explore the use of LLM-as-a-Judge evaluators in ACL Anthology papers focusing on

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

  • Linked via arxiv authorA. Seza Doğruöz

    Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages

  • Linked via arxiv authorXixian Liao

    Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages

  • Linked via arxiv authorVerena Blaschke

    Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages

  • Linked via arxiv authorJakob Prange

    Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages

  • Linked via arxiv authorSenyu Li

    Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages

  • Linked via arxiv authorDavid Ifeoluwa Adelani

    Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages

Implements

Covers

authored (incoming)

Implements (incoming)

Related across the graph

Topics