Publications

Latest preprints

  1. A Principled Framework for Evaluating on Typologically Diverse Languages
    Esther Ploeger, Wessel Poelman, Andreas Holck Høeg-Petersen, and 3 more authors
    Jul 2024
  2. How Good Is Your Wikipedia?
    Kushal Tatariya*, Artur Kulmizev*, Wessel Poelman, and 6 more authors
    Nov 2024

      Conference papers

      1. NAACL
        BPE-knockout: Pruning Pre-existing BPE Tokenisers with Backwards-compatible Morphological Semi-supervision
        Thomas Bauwens, and Pieter Delobelle
        In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) Jun 2024
      2. EMNLP
        What Is “Typological Diversity” in NLP?
        Esther Ploeger*, Wessel Poelman*, Miryam de Lhoneux, and 1 more author
        In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Nov 2024
      3. EMNLP
        Pixology: Probing the Linguistic and Visual Capabilities of Pixel-based Language Models
        Kushal Tatariya, Vladimir Araujo, Thomas Bauwens, and 1 more author
        In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Nov 2024
      4. COLM
        Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP
        François Remy, Pieter Delobelle, Hayastan Avetisyan, and 3 more authors
        In First Conference on Language Modeling Nov 2024

          Journal papers

          1. TACL
            CreoleVal: Multilingual Multitask Benchmarks for Creoles
            Heather Lent, Kushal Tatariya, Raj Dabre, and 18 more authors
            Transactions of the Association for Computational Linguistics Sep 2024

              Workshop papers

              1. SIGTYP
                Sociolinguistically Informed Interpretability: A Case Study on Hinglish Emotion Classification
                Kushal Tatariya, Heather Lent, Johannes Bjerva, and 1 more author
                In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP Mar 2024
              2. SIGTYP
                A Call for Consistency in Reporting Typological Diversity
                Wessel Poelman*, Esther Ploeger*, Miryam de Lhoneux, and 1 more author
                In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP Mar 2024
              3. MRL
                Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios?
                Zeno Vandenbulcke, Lukas Vermeire, and Miryam de Lhoneux
                In 4th Multilingual Representation Learning (MRL) workshop Mar 2024
              4. HumEval
                Exploratory Study on the Impact of English Bias of Generative Large Language Models in Dutch and French
                Ayla Rigouts Terryn, and Miryam de Lhoneux
                In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024 May 2024
              1. Transfer Learning for Code-Mixed Data: Do Pretraining Languages Matter?
                Kushal Tatariya, Heather Lent, and Miryam de Lhoneux
                In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis Jul 2023