CorpusPhon

Organizers


Eleanor Chodroff University of Zurich (Switzerland)	Christian DiCanio University at Buffalo (United States)	Morgan Sonderegger McGill University (Canada)	Márton Sóskuthy University of British Columbia (Canada)

Workshop information

Date/Time: 09:00-16:50, Wednesday 26 June 2024
Location: 612, HIT, Hanyang University

If you would like to join over Zoom, please register via this link: Zoom LINK

Program

Time	Event	Title	Authors
9:00-9:10	Intro
9:10-10:00	Invited speaker	Montreal Forced Aligner 3.0	Michael McAuliffe (Amazon)
10:00-10:20	Break with coffee + snacks
10:20-10:40	Talk 1	Informativity effects can be probability effects in disguise	Vsevolod Kapatsinski (U of Oregon)
10:40-11:00	Talk 2	Applying Big Data and Automation Techniques in Phonetics: A Case Study on Hyperarticulation in Korean Word-Initial Stops	Cheonkam Jeong and Andrew Wedel (U of Arizona)
11:00-11:20	Talk 3	Predictability and phonological context interact in conditioning the acoustic reduction of Seoul Korean lenis obstruents	Seung Suk Lee (U of Massachusetts, Amherst)
11:20-11:40	Talk 4	Corpus Phonetics in the Signed Modality: One Approach	Kathleen Currie Hall, Kaili Vesik, Anushka Asthana, Maggie Reid, Grace Zhang, Yiran Gao, Grace Hobby, Stanley Nam and Oksana Tkachman (U of British Columbia)
11:40-12:00	Talk 5	Language-specific /s/ acoustics for early Cantonese-English bilinguals?	Molly Babel, Victor Wong, Sabrina Luk, Kai Fong and Ragul Loganathan (U of British Columbia)
12:00-13:00	Catered lunch (provided you have registered)
13:00-13:10	Lightning Talk 1	Introducing the Speech Maturity Dataset: Research opportunities for speech scientists and linguistic fieldworkers	Margaret Cychosz, Kasia Hitczenko, William Havard, Loann Peurey, Madurya Suresh, Theo Zhang and Alex Cristia (UCLA; George Washington U; U Grenoble Alpes; École normale supérieure; UCLA; UCLA; École normale supérieure
13:10-13:20	Lightning Talk 2	Creating a corpus of web-data with Pyrlato. A demonstration.	Giuseppe Magistro and Claudia Crocco (U of Ghent)
13:20-13:30	Lightning Talk 3	The Multi-ethnic Hong Kong Cantonese Corpus for the Study of Child-Directed Speech	Alan Yu, Nathan Delisle, Nicholas Martin, Vivienne Zhang, Yao Yao and Carol To (U of Chicago; U of Chicago; U of Chicago; U of Chicago; Polytechnic U of Hong Kong; U of Hong Kong)
13:30-13:40	Lightning Talk 4	The XPF Corpus: Rule-based grapheme to phoneme translation schemes for hundreds of languages	Uriel Cohen Priva (Brown U)
13:40-13:50	Lightning Talk 5	Creating Multimodal Corpora for Co-Speech Gesture Research	Walter Dych, Karee Garvin and Kathryn Franich (Binghamton U; Harvard: Harvard)
13:50-14:00	Lightning Talk 6	AutoRPT: Automatic Detection of Prosodic Prominence and Boundary	Seth Heiney and Jonathan Howell (Montclair State U)
14:00-15:00	Walkabout: Poster session + demos	See Poster List
15:00-15:20	Break with coffee + snacks
15:20-15:40	Talk 6	Large-scale assessment of speech intelligibility	Seung-Eun Kim, Matthew Goldrick and Ann R. Bradlow (Northwestern U)
15:40-16:00	Talk 7	Cross-linguistic differences in the phonetic implementation of /s/	Massimo Lipari, Morgan Sonderegger and Meghan Clayards (McGill U)
16:00-16:20	Talk 8	Harvesting spontaneous speech data from digital reservoirs to study prosody	Aviad Albert, Constantijn Kaland, T. Mark Ellison, Francesco Cangemi, Bodo Winter and Martine Grice (U of Cologne; U of Cologne; U of Cologne; Tokyo U of Foreign Studies: U of Birmingham; U of Cologne)
16:20-16:40	Talk 9	A corpus phonetics study of nominal prominence marking in two Australian languages	Catalina Torres and Sarah Babinski (U of Zurich)
16:40 - 16:50	Closing

14:00-15:00	Posters	Posters
	Poster 1	Patterns of misaccentuation of unaccented words in English speakers’ Japanese	Kakeru Yazawa (U of Tsukuba)
	Poster 2	Variable /s/ weakening in Canary Islands Spanish – a sociophonetic corpus study	Karolina Broś (U of Warsaw)
	Poster 3	Vowel Classification in Conversational Speech Corpus	Hyun Jin Hwangbo (Pukyong National U)
	Poster 4	Attention-LSTM Autoencoder for Phonotactics Learning from Raw Audio Input	Youngah Do, Frank Lihui Tan (U of Hong Kong)
	Poster 5	AnglistikVoices: an L2 English speech dataset for educational and technological advancement in speech technology	Akhilesh Kakolu Ramarao and Anna Sophia Stein (Heinrich Heine University)
	Poster 6	Automatic analysis of phonemic context-dependent cue productions in acoustic cue-labeled speech	Jeung-Yoon Elizabeth Choi, Sofie Chung and Stefanie Shattuck-Hufnagel (Massachusetts Institute of Technology)
	Poster 7	A phonetic comparison of lexical /i/ and epenthetic /i/ in Korean speech corpus	Hyunjin Lee (U of Georgia)
	Poster 8	Spectral energy properties of non-modal phonations	Yuan Chai, Padmini Bhagavatula, Serene Wong and Patricia Keating (U of Washington; U of Washington; U of Washington; UCLA)
	Poster 9	Investigating the Predictability of an Upcoming Code-switch in Cantonese-English Bilinguals	Nikolai Andrés Schwarz-Acosta (UC Berkeley)
	Poster 10	A sociophonetic study of tones on Jeju Island	Moira Saltzman (California State U Northridge)
	Poster 11	On the Advantages and Challenges of Working with Large Corpora of Naturalistic Speech	Johanna Cronenberg and Ioana Chitoran (Université Paris Cité, Université Paris Diderot)
	Poster 12	F0 characteristics of sexuality-diverse Australian adolescents with and without symptoms of depression in The Future Proofing Study Corpus	Tuende Szalay , Brian Stasak , Kate Maston, Debopriyo Bal, Helen Christensen, Aliza Werner-Seidler, Mark Larsen (U. New South Wales; U. New South Wales; The Black Dog Institute; The Black Dog Institute; The Black Dog Institute; The Black Dog Institute; Centre for Big Data Research)

※ Funding for catering and refreshments comes from the Canada Research Chair in Speech Variability and the Swiss National Science Foundation

Call for papers

The production of speech can be simultaneously examined in laboratory and non-laboratory settings. While the former context allows researchers to carefully target specific, controlled aspects of production, the latter allows researchers to examine speech in more ecologically-real settings. Alongside advances in computational power and increased access to automated techniques, this perspective has elevated corpus phonetics as a major approach to research in phonetics and phonology. Corpus phonetic methods are now used in a wide range of contexts, from the analysis of fieldwork data from small numbers of speakers to the automated processing of cross-linguistic speech data sets representing hundreds or thousands of speakers. The primary goal of the CorpusPhon workshop is to create an inclusive forum for this diverse set of practitioners, bringing together researchers who use corpus phonetic tools with a view towards building a cohesive community.

The workshop will be held alongside LabPhon 19 in Seoul, South Korea at Hanyang University on June 26, 2024. It will offer a venue for discussing methodological best practices in corpus phonetics, demonstrating a diversity of approaches, examining the relevance of corpus data to laboratory phonology and phonetics, analyzing problems relating to collecting or analyzing corpus data at different scales, presenting results of corpus studies, and showcasing data and tools. We are pleased to welcome Dr. Michael McAuliffe, developer of the Montreal Forced Aligner, as an invited speaker.

Areas of interest

We are soliciting work on original and unpublished research on topics related to corpus phonetics, as well as tutorials on existing data/tools, or strong work in progress. Appropriate sub-topics include (but are not limited to) the following:

Corpus phonetic studies, including studies involving smaller speech corpora, endangered/underdocumented language data, prosody, sociophonetics, cross-linguistic/dialectal variation, longitudinal data, historical data, or large-scale corpora.
Processing tools, such as forced alignment, grapheme-to-phoneme conversion, automated annotation, and automated phonetic measurement;
Quantitative analysis (statistical methods, visualization) for corpus/observational data;
Issues in corpus development, such as validation and quality control; issues related to data storage, management, and metadata; and ethical issues;
Presentation of new corpora appropriate for research in laboratory phonology.

Submissions should specify whether the presentation is better suited for a standard conference talk (~20 min + 10 min questions) or a demonstration (10-min lightning talk + participation in a 1-hour walk-about session). For example, a talk could report new research using an existing corpus, summarize a “closed” corpus (e.g. co-developed with a language community), or discuss broader methodological and conceptual considerations for corpus phonetics. A demonstration could present a tool for automatic speech analysis, show a new “open” corpus, or give a quick tutorial.

Submission instructions

1-page abstract with a second page for figures and references. The formatting should adhere to the LabPhon abstract formatting requirements (Times New Roman, 12pt font, single spacing, 1-inch margins). Abstracts should be submitted on EasyChair.

Link for submission: https://easychair.org/conferences/?conf=corpusphon2024

Please specify whether your abstract should be considered for a demonstration slot or a standard talk slot. Demonstrations should be given in person. We might be able to offer a hybrid presentation option for a limited number of presenters who are giving a standard talk.

Important dates

Submissions are due by Wednesday, ~~March 6~~ March 13, 11:59P, Anywhere on Earth (AoE)
Notifications will be sent out by ~~March 15~~ March 22, 2024.
Date/Time (Tentative): 09:00-16:50, Wednesday 26 June 2024
Location: TBA (but the same place as the conference venue, HIT, Hanyang University)

Workshop structure

Regular talks: 15 min talk + 5 min questions
Lightning talks: 10 min talk
Posters: Please use the LabPhon specifications: “The recommended poster size is A0, with a horizontal width of 84.1cm and a vertical height of 118.9cm. The maximum width allowed is 90cm and the maximum height allowed is 150cm.”

▶ Variance and invariance in Phonological Representation: Insights from Articulation

▶ Phonetic imitation: representation, sound change, and other theoretical implications

Conference

Labphon19