Universal Dependencies for the Alemannic Alsatian Dialects

Auteur·e·s : Barbara Hoff, Nathanaël Beiner, Delphine Bernhard

Travail réalisé dans le cadre du projet ANR DIVITAL sur les langues régionales de France

Abstract

We present the first corpus of Alsatian Alemannic dialects following Universal Dependencies (UD) guidelines, a project which already covers many of the world’s languages. Standard languages are represented to a greater extent than non-standard varieties in UD, and our corpus contributes to closing the gap in the lack of resources for Alsatian dialects by presenting the first UD treebank for these dialects, which are spoken in Northeastern France. Our corpus is annotated both with part-of-speech tags and dependency information, as well as French glosses and German lemmas, containing in total 975 sentences and 19,286 tokens, spanning over various text genres. In this article, we present our data, details of the annotation process, as well as some specific syntactic phenomena which differentiate and situate Alsatian with regards to both Standard German and some other German non-standard varieties. The addition of this corpus to the UD project allows for a higher visibility of the Alemannic Alsatian dialects in linguistic research, and provides a valuable resource for research in many fields, including NLP, syntax and comparative Germanic linguistics.

Authors

  • Barbara Hoff, University of Strasbourg, UR 1339/LiLPa & LiRiC
  • Nathanaël Beiner, University of Strasbourg, UR 1339/LiLPa & LiRiC
  • Delphine Bernhard, University of Strasbourg, UR 1339/LiLPa & LiRiC

Cite as

Barbara Hoff, Nathanaël Beiner, Delphine Bernhard. Universal Dependencies for the Alemannic Alsatian Dialects. 23rd Workshop on Treebanks and Linguistic Theories (TLT), Aug 2025, Ljubljana, Slovenia. pp.10-22. ⟨hal-05143697⟩