Publications

Latest preprints

      1. How Good Is Your Wikipedia? Auditing Data Quality for Low-resource and Multilingual NLP
        Kushal Tatariya*, Artur Kulmizev*, Wessel Poelman, and 6 more authors
        Nov 2024

          Conference papers

          1. EACL
            Form and Meaning in Intrinsic Multilingual Evaluations
            Wessel Poelman, and Miryam Lhoneux
            In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics Mar 2026
          2. EACL (Findings)
            Typologically Informed Parameter Aggregation
            Stef Accou, and Wessel Poelman
            In Findings of the Association for Computational Linguistics: EACL 2026 Mar 2026
          1. NoDaLiDa
            The Roles of English in Evaluating Multilingual Language Models
            Wessel Poelman, and Miryam de Lhoneux
            In The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies Mar 2025
          2. ACL
            GRaMPa: Subword Regularisation by Skewing Uniform Segmentation Distributions with an Efficient Path-counting Markov Model
            Thomas Bauwens, David Kaczér, and Miryam de Lhoneux
            In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Jul 2025
          3. ACL (Findings)
            Supervised and Unsupervised Probing of Shortcut Learning: Case Study on the Emergence and Evolution of Syntactic Heuristics in BERT
            Elke Vandermeerschen, and Miryam de Lhoneux
            In Findings of the Association for Computational Linguistics: ACL 2025 Jul 2025
          4. EMNLP
            Confounding Factors in Relating Model Performance to Morphology
            Wessel Poelman, Thomas Bauwens, and Miryam Lhoneux
            In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing Nov 2025
          5. AACL
            On the Interplay between Positional Encodings, Morphological Complexity, and Word Order Flexibility
            Kushal Tatariya, Wessel Poelman, and Miryam Lhoneux
            In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics Dec 2025
          1. NAACL
            BPE-knockout: Pruning Pre-existing BPE Tokenisers with Backwards-compatible Morphological Semi-supervision
            Thomas Bauwens, and Pieter Delobelle
            In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) Jun 2024
          2. EMNLP
            What Is “Typological Diversity” in NLP?
            Esther Ploeger*, Wessel Poelman*, Miryam de Lhoneux, and 1 more author
            In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Nov 2024
          3. EMNLP
            Pixology: Probing the Linguistic and Visual Capabilities of Pixel-based Language Models
            Kushal Tatariya, Vladimir Araujo, Thomas Bauwens, and 1 more author
            In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Nov 2024
          4. COLM
            Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP
            François Remy, Pieter Delobelle, Hayastan Avetisyan, and 3 more authors
            In First Conference on Language Modeling Nov 2024

              Journal papers

                1. CL
                  A Principled Framework for Evaluating on Typologically Diverse Languages
                  Esther Ploeger, Wessel Poelman, Andreas Holck Høeg-Petersen, and 3 more authors
                  Computational Linguistics Oct 2025
                1. TACL
                  CreoleVal: Multilingual Multitask Benchmarks for Creoles
                  Heather Lent, Kushal Tatariya, Raj Dabre, and 18 more authors
                  Transactions of the Association for Computational Linguistics Sep 2024

                    Workshop papers

                      1. MRL
                        Type and Complexity Signals in Multilingual Question Representations
                        Robin Kokot, and Wessel Poelman
                        In Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025) Nov 2025
                      1. SIGTYP
                        Sociolinguistically Informed Interpretability: A Case Study on Hinglish Emotion Classification
                        Kushal Tatariya, Heather Lent, Johannes Bjerva, and 1 more author
                        In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP Mar 2024
                      2. SIGTYP
                        A Call for Consistency in Reporting Typological Diversity
                        Wessel Poelman*, Esther Ploeger*, Miryam de Lhoneux, and 1 more author
                        In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP Mar 2024
                      3. MRL
                        Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios?
                        Zeno Vandenbulcke, Lukas Vermeire, and Miryam de Lhoneux
                        In 4th Multilingual Representation Learning (MRL) workshop Mar 2024
                      4. HumEval
                        Exploratory Study on the Impact of English Bias of Generative Large Language Models in Dutch and French
                        Ayla Rigouts Terryn, and Miryam de Lhoneux
                        In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024 May 2024
                      1. Transfer Learning for Code-Mixed Data: Do Pretraining Languages Matter?
                        Kushal Tatariya, Heather Lent, and Miryam de Lhoneux
                        In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis Jul 2023