Workshop

MODIFED: Morphosyntactic Dialect Feature Detection Workshop

Jelena Prokic

Date: Thursday 20 June 2024 - Friday 21 June 2024
Explanation: Times: 20 June 9:00-17:00; 21 June 9:00 – 14:00
Location: P.J. Veth Building, Leiden University
Room: P.J. Veth 1.01 & 1.07

MODIFED: Morphosyntactic Dialect Feature Detection

Workshop Description

In the past two decades computational approaches to dialect variation, known as dialectometry, have allowed researchers to work efficiently with large amounts of data and in a data-driven manner define dialect groups, identify specific dialect features and search for general tendencies in language variation. One of the main advantages of the data-driven dialectology is “avoiding the need to select which features to use as the basis of characterization” (Nerbonne, 2008). However, most published studies in dialectometry are based on data extracted from dialect atlases or surveys containing linguistic features carefully selected by human experts. Automatic extraction and analysis of meaningful features from raw text, like interviews, would enable researchers to work with data that has not be chosen by experts and which can be considered unbiased. Despite the attractiveness of this type of approach, automatic feature extraction at all linguistic levels is still challenging (Kroon, 2022) and understudied.

We are excited to announce a MODIFED workshop organized by the Re-examining Dialect Syntax Network (REEDs) and Leiden University Centre for Digital Humanities (LUCDH). This 2-day workshop will take place at Leiden University on Thursday 20 June and Friday 21 June 2024. This event is designed to foster collaboration among specialists in dialectology, computational linguistics, and corpus linguistics, with a focus on identifying morphosyntactic dialect features from various semi-structured and unstructured sources. This workshop will provide an opportunity for researchers and research groups to reflect on theoretical and/or methodological problems and solutions related to automatic morphosyntactic dialect feature extraction.

Invited Speaker:

Our invited speaker on Thursday is Anne Breitbarth, Ghent University on 'Hunting for structures in treebank forests: Considerations on the use of parsed corpora of spontaneous dialect speech'

Workshop Program

9:30-10:00 Registration and Coffee

10:00-10:15 Workshop opening

10:15-11:15 Hunting for structures in treebank forests: Considerations on the use of parsed corpora of spontaneous dialect speech (Abstract)
Anne Breitbarth, Ghent University (Invited talk)

11:15-12:00 Extracting morphological features from published grammars of African Arabic dialects: methodological considerations (Abstract)
Carolina Zucchi, University of Bayreuth

12:00-14:00 Lunch

14:00-14:45 Automatic discovery of phonological and morphological features in dialect corpora with orthographic normalization (Abstract)
Yves Scherrer, University of Helsinki

14:45 -15:30 Extracting dialect features from German social media data using local spatial autocorrelation (Abstract)
Dana Roemling & Jack Grieve, University of Birmingham

15:30-16:00 Coffee break

16:00-16:45 From Feature Extraction to Measuring Dialect Typicality (Abstract)
Matthew Sung, Leiden University

16:45-17:30 Discussion

18:00 CONFERENCE DINNER

09:30-10:00 Registration and Coffee

10:00-11:00 Automatic Detection of Syntactic Differences through the Minimum Description Length principle and feature mapping (Abstract)
Martin Kroon, Utrecht University

11:00-11:45 Drawing on Research on Explainability of Dialect Classifiers to Extract Greek Dialect Features (Abstract)
Erofili Psaltaki and Dana Roemling, University of Helsinki

11:45-12:30 Automatic Detection of Morphosyntactic Dialect Featuresin African American English Oral Histories (Abstract)
Kevin Tang, University of Düsseldorf

Registration:

To register for the workshop, please use the following link:

MODIFED Workshop