CorpusPhon


CorpusPhon

 

Organizers

Eleanor Chodroff
University of Zurich
(Switzerland)

Christian DiCanio 
University at Buffalo
(United States)
Morgan Sonderegger
McGill University
(Canada)

Márton Sóskuthy 
University of British Columbia
(Canada)

 

Workshop information

Date/Time: 09:00-16:50, Wednesday 26 June 2024
Location612, HIT, Hanyang University

If you would like to join over Zoom, please register via this link: Zoom LINK


Program

Time Event Title Authors
9:00-9:10 Intro    
9:10-10:00 Invited speaker Montreal Forced Aligner 3.0 Michael McAuliffe (Amazon)
10:00-10:20  Break with coffee + snacks    
10:20-10:40 Talk 1 Informativity effects can be probability effects in disguise Vsevolod Kapatsinski (U of Oregon)
10:40-11:00 Talk 2 Applying Big Data and Automation Techniques in Phonetics: A Case Study on Hyperarticulation in Korean Word-Initial Stops Cheonkam Jeong and Andrew Wedel (U of Arizona)
11:00-11:20 Talk 3 Predictability and phonological context interact in conditioning the acoustic reduction of Seoul Korean lenis obstruents Seung Suk Lee (U of Massachusetts, Amherst)
11:20-11:40 Talk 4 Corpus Phonetics in the Signed Modality: One Approach Kathleen Currie Hall, Kaili Vesik, Anushka Asthana, Maggie Reid, Grace Zhang, Yiran Gao, Grace Hobby, Stanley Nam and Oksana Tkachman (U of British Columbia)
11:40-12:00 Talk 5 Language-specific /s/ acoustics for early Cantonese-English bilinguals? Molly Babel, Victor Wong, Sabrina Luk, Kai Fong and Ragul Loganathan (U of British Columbia)
12:00-13:00 Catered lunch (provided you have registered)    
13:00-13:10 Lightning Talk 1 Introducing the Speech Maturity Dataset: Research opportunities for speech scientists and linguistic fieldworkers Margaret Cychosz, Kasia Hitczenko, William Havard, Loann Peurey, Madurya Suresh, Theo Zhang and Alex Cristia (UCLA; George Washington U; U Grenoble Alpes; École normale supérieure; UCLA; UCLA; École normale supérieure
13:10-13:20 Lightning Talk 2 Creating a corpus of web-data with Pyrlato. A demonstration. Giuseppe Magistro and Claudia Crocco (U of Ghent)
13:20-13:30 Lightning Talk 3 The Multi-ethnic Hong Kong Cantonese Corpus for the Study of Child-Directed Speech Alan Yu, Nathan Delisle, Nicholas Martin, Vivienne Zhang, Yao Yao and Carol To (U of Chicago; U of Chicago; U of Chicago; U of Chicago; Polytechnic U of Hong Kong; U of Hong Kong)
13:30-13:40 Lightning Talk 4 The XPF Corpus: Rule-based grapheme to phoneme translation schemes for hundreds of languages Uriel Cohen Priva (Brown U)
13:40-13:50 Lightning Talk 5 Creating Multimodal Corpora for Co-Speech Gesture Research Walter Dych, Karee Garvin and Kathryn Franich (Binghamton U; Harvard: Harvard)
13:50-14:00 Lightning Talk 6 AutoRPT: Automatic Detection of Prosodic Prominence and Boundary Seth Heiney and Jonathan Howell (Montclair State U)
14:00-15:00 Walkabout: Poster session + demos See Poster List  
15:00-15:20 Break with coffee + snacks    
15:20-15:40 Talk 6 Large-scale assessment of speech intelligibility Seung-Eun Kim, Matthew Goldrick and Ann R. Bradlow (Northwestern U)
15:40-16:00 Talk 7 Cross-linguistic differences in the phonetic implementation of /s/ Massimo Lipari, Morgan Sonderegger and Meghan Clayards (McGill U)
16:00-16:20 Talk 8 Harvesting spontaneous speech data from digital reservoirs to study prosody Aviad Albert, Constantijn Kaland, T. Mark Ellison, Francesco Cangemi, Bodo Winter and Martine Grice (U of Cologne; U of Cologne; U of Cologne; Tokyo U of Foreign Studies: U of Birmingham; U of Cologne)
16:20-16:40 Talk 9 A corpus phonetics study of nominal prominence marking in two Australian languages Catalina Torres and Sarah Babinski (U of Zurich)
16:40 - 16:50  Closing    
       
14:00-15:00 Posters Posters  
  Poster 1 Patterns of misaccentuation of unaccented words in English speakers’ Japanese Kakeru Yazawa (U of Tsukuba)
  Poster 2 Variable /s/ weakening in Canary Islands Spanish – a sociophonetic corpus study Karolina Broś (U of Warsaw)
  Poster 3 Vowel Classification in Conversational Speech Corpus Hyun Jin Hwangbo (Pukyong National U)
  Poster 4 Attention-LSTM Autoencoder for Phonotactics Learning from Raw Audio Input Youngah Do, Frank Lihui Tan (U of Hong Kong)
  Poster 5 AnglistikVoices: an L2 English speech dataset for educational and technological advancement in speech technology Akhilesh Kakolu Ramarao and Anna Sophia Stein (Heinrich Heine University)
  Poster 6 Automatic analysis of phonemic context-dependent cue productions in acoustic cue-labeled speech Jeung-Yoon Elizabeth Choi, Sofie Chung and Stefanie Shattuck-Hufnagel (Massachusetts Institute of Technology)
  Poster 7 A phonetic comparison of lexical /i/ and epenthetic /i/ in Korean speech corpus Hyunjin Lee (U of Georgia)
  Poster 8 Spectral energy properties of non-modal phonations Yuan Chai, Padmini Bhagavatula, Serene Wong and Patricia Keating (U of Washington; U of Washington; U of Washington; UCLA)
  Poster 9 Investigating the Predictability of an Upcoming Code-switch in Cantonese-English Bilinguals Nikolai Andrés Schwarz-Acosta (UC Berkeley)
  Poster 10 A sociophonetic study of tones on Jeju Island Moira Saltzman (California State U Northridge)
  Poster 11 On the Advantages and Challenges of Working with Large Corpora of Naturalistic Speech Johanna Cronenberg and Ioana Chitoran (Université Paris Cité, Université Paris Diderot)
  Poster 12 F0 characteristics of sexuality-diverse Australian adolescents with and without symptoms of depression in The Future Proofing Study Corpus Tuende Szalay , Brian Stasak , Kate Maston, Debopriyo Bal, Helen Christensen, Aliza Werner-Seidler, Mark Larsen (U. New South Wales; U. New South Wales; The Black Dog Institute; The Black Dog Institute; The Black Dog Institute; The Black Dog Institute; Centre for Big Data Research)

※ Funding for catering and refreshments comes from the Canada Research Chair in Speech Variability and the Swiss National Science Foundation

 

Call for papers

The production of speech can be simultaneously examined in laboratory and non-laboratory settings. While the former context allows researchers to carefully target specific, controlled aspects of production, the latter allows researchers to examine speech in more ecologically-real settings. Alongside advances in computational power and increased access to automated techniques, this perspective has elevated corpus phonetics as a major approach to research in phonetics and phonology. Corpus phonetic methods are now used in a wide range of contexts, from the analysis of fieldwork data from small numbers of speakers to the automated processing of cross-linguistic speech data sets representing hundreds or thousands of speakers. The primary goal of the CorpusPhon workshop is to create an inclusive forum for this diverse set of practitioners, bringing together researchers who use corpus phonetic tools with a view towards building a cohesive community.

The workshop will be held alongside LabPhon 19 in Seoul, South Korea at Hanyang University on June 26, 2024. It will offer a venue for discussing methodological best practices in corpus phonetics, demonstrating a diversity of approaches, examining the relevance of corpus data to laboratory phonology and phonetics, analyzing problems relating to collecting or analyzing corpus data at different scales, presenting results of corpus studies, and showcasing data and tools. We are pleased to welcome Dr. Michael McAuliffe, developer of the Montreal Forced Aligner, as an invited speaker.


Areas of interest

We are soliciting work on original and unpublished research on topics related to corpus phonetics, as well as tutorials on existing data/tools, or strong work in progress. Appropriate sub-topics include (but are not limited to) the following:

  • Corpus phonetic studies, including studies involving smaller speech corpora, endangered/underdocumented language data, prosody, sociophonetics, cross-linguistic/dialectal variation, longitudinal data, historical data, or large-scale corpora.
  • Processing tools, such as forced alignment, grapheme-to-phoneme conversion, automated annotation, and automated phonetic measurement;
  • Quantitative analysis (statistical methods, visualization) for corpus/observational data;
  • Issues in corpus development, such as validation and quality control; issues related to data storage, management, and metadata; and ethical issues;
  • Presentation of new corpora appropriate for research in laboratory phonology.

Submissions should specify whether the presentation is better suited for a standard conference talk (~20 min + 10 min questions) or a demonstration (10-min lightning talk + participation in a 1-hour walk-about session). For example, a talk could report new research using an existing corpus, summarize a “closed” corpus (e.g. co-developed with a language community), or discuss broader methodological and conceptual considerations for corpus phonetics. A demonstration could present a tool for automatic speech analysis, show a new “open” corpus, or give a quick tutorial.


Submission instructions

1-page abstract with a second page for figures and references. The formatting should adhere to the LabPhon abstract formatting requirements (Times New Roman, 12pt font, single spacing, 1-inch margins). Abstracts should be submitted on EasyChair.

Link for submission: https://easychair.org/conferences/?conf=corpusphon2024 

Please specify whether your abstract should be considered for a demonstration slot or a standard talk slot. Demonstrations should be given in person. We might be able to offer a hybrid presentation option for a limited number of presenters who are giving a standard talk.


Important dates

  • Submissions are due by Wednesday, March 6 March 13, 11:59P, Anywhere on Earth (AoE)
  • Notifications will be sent out by March 15 March 22, 2024.
  • Date/Time (Tentative)09:00-16:50, Wednesday 26 June 2024
  • Location: TBA (but the same place as the conference venue, HIT, Hanyang University)


Workshop structure

  • Regular talks: 15 min talk + 5 min questions
  • Lightning talks: 10 min talk
  • Posters: Please use the LabPhon specifications: “The recommended poster size is A0, with a horizontal width of 84.1cm and a vertical height of 118.9cm. The maximum width allowed is 90cm and the maximum height allowed is 150cm.”

Variance and invariance in Phonological Representation: Insights from Articulation

Phonetic imitation: representation, sound change, and other theoretical implications