Publications

Latest preprints

    1. A Principled Framework for Evaluating on Typologically Diverse Languages
      Esther Ploeger, Wessel Poelman, Andreas Holck Høeg-Petersen, and 3 more authors
      Jul 2024
    2. How Good Is Your Wikipedia?
      Kushal Tatariya*, Artur Kulmizev*, Wessel Poelman, and 6 more authors
      Nov 2024

        Conference papers

        1. NoDaLiDa
          The Roles of English in Evaluating Multilingual Language Models
          Wessel Poelman, and Miryam de Lhoneux
          In The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies Mar 2025
        1. NAACL
          BPE-knockout: Pruning Pre-existing BPE Tokenisers with Backwards-compatible Morphological Semi-supervision
          Thomas Bauwens, and Pieter Delobelle
          In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) Jun 2024
        2. EMNLP
          What Is “Typological Diversity” in NLP?
          Esther Ploeger*, Wessel Poelman*, Miryam de Lhoneux, and 1 more author
          In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Nov 2024
        3. EMNLP
          Pixology: Probing the Linguistic and Visual Capabilities of Pixel-based Language Models
          Kushal Tatariya, Vladimir Araujo, Thomas Bauwens, and 1 more author
          In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Nov 2024
        4. COLM
          Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP
          François Remy, Pieter Delobelle, Hayastan Avetisyan, and 3 more authors
          In First Conference on Language Modeling Nov 2024

            Journal papers

              1. TACL
                CreoleVal: Multilingual Multitask Benchmarks for Creoles
                Heather Lent, Kushal Tatariya, Raj Dabre, and 18 more authors
                Transactions of the Association for Computational Linguistics Sep 2024

                  Workshop papers

                    1. SIGTYP
                      Sociolinguistically Informed Interpretability: A Case Study on Hinglish Emotion Classification
                      Kushal Tatariya, Heather Lent, Johannes Bjerva, and 1 more author
                      In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP Mar 2024
                    2. SIGTYP
                      A Call for Consistency in Reporting Typological Diversity
                      Wessel Poelman*, Esther Ploeger*, Miryam de Lhoneux, and 1 more author
                      In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP Mar 2024
                    3. MRL
                      Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios?
                      Zeno Vandenbulcke, Lukas Vermeire, and Miryam de Lhoneux
                      In 4th Multilingual Representation Learning (MRL) workshop Mar 2024
                    4. HumEval
                      Exploratory Study on the Impact of English Bias of Generative Large Language Models in Dutch and French
                      Ayla Rigouts Terryn, and Miryam de Lhoneux
                      In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024 May 2024
                    1. Transfer Learning for Code-Mixed Data: Do Pretraining Languages Matter?
                      Kushal Tatariya, Heather Lent, and Miryam de Lhoneux
                      In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis Jul 2023