A large corpus of recorded oral tradition can be created using two recording machines, one playing back the spoken texts and the other used to capture an oral annotation. Recording speakers who are commenting on earlier recordings is a method for providing annotations that bypasses literacy.
Technology
Wagiman electronic dictionary
Aidan Wilson went up to Pine Creek and Kybrook Farm in the Northern Territory last week to deliver the various versions of the Wagiman electronic dictionary to the Wagiman community. You can read about it at the Project for Free Electronic Dictionaries blog.
Wunderkammer in Canberra
Dearest Canberrans, I’ll be giving a presentation of the Wunderkammer mobile phone dictionary software at the ANU in Canberra at 11 am on 18 September. If you’re interested and in the area, come by. Full details, including the exact location, can be found here.
New ELAR publications
The Endangered Languages Archive (ELAR), based at SOAS, has recently published two new articles on the Endangered Languages Project website that may be of interest to readers of this blog: Bernard Howard’s detailed review of the new Zoom H4n audio recorder. Bernard puts the machine through its paces and concludes his review with the words: … Read more
Look What They’ve Done to My Song (and other time-aligned data and analysis), Ma
At the Linguistic Society of America Summer Institute in Berkeley last week (17-19th July) the National Science Foundation sponsored Cyberling 2009, a workshop exploring how computational infrastructure (called “cyberinfrastructure” in the US, and e-Science or e-Humanities in the UK) can support linguistic research in a variety of fields. There was a panel discussion about data sharing that looked at the proposal:
“A cyberinfrastructure for linguistic data would allow unprecedented access [to] the empirical base of our field, but only if we collectively build that empirical base by contributing data. This panel addresses the benefits of data sharing and the obstacles to the widespread adoption of sharing practices, from the perspective of a variety of subfields”
But the bulk of the workshop was given over to closed discussion sessions by seven working groups looking at annotation standards, other standards, new multi-purpose software (so-called “killer apps”), data reliability and provenance, models from other fields, funding sources, and collaboration structure. The group discussions and resulting final day presentations are available on the Cyberling Wiki.
I was co-chair of Working Group 4 that was charged with discussing “protecting data reliability and provenance”, i.e. how to keep track of the creation of data and analysis and its passage through the electronic infrastructure as researchers access and use each other’s materials. As the Cyberling Wiki says, this is crucial
“for data creators (who need credit for the work they have done and the academic contribution of collecting, curating and annotating data) and the data users (who need to know where the data has come from so they can form an opinion of how much credence to give it and how to give proper credit to the originator of the data)”.
We also looked at how to establish a culture of data sharing and what mechanisms might be put in place to encourage people to share data. Clearly, for endangered language research where data are unique and fragile, these are very important issues.
After two and a half days of intense discussions our group came up with a set of proposals relating to data reliability and provenance that can be summarised as follows:
Endangered languages and technology in the New York Times
The New York Times has just published an article about the role technology plays in helping to save endangered languages. A few specific projects are mentioned, including some work supported by SOAS and MPI Nijmegen and our own mobile phone dictionary project.
How to import a basic transcript into ELAN
The problem: you have text files and audio files, but the text files are not aligned to the audio files.
I imagine there are a few readers out there who have transcriptions of audio files that never made it past an utterance per line text file. This is a post for you, if you’d like to know how to import and time-align those files in ELAN.
Cold dead media
PARADISEC’s director Linda Barwick has been raising the alarm for years about the way media are becoming obsolete because the machines to read them are dying. So it was very sad to hear the death-rattle on the CHILDES list in this message from Brian MacWhinney Dear Colleagues, It appears that we are now just about … Read more
More Good News
Following on from Jane’s announcement during the week of all the great news regarding successful grant applications, I have another bit of good news to share: James McElvenny and I recently applied for, and even more recently received, a grant from a philanthropic foundation to support our current work in compiling dictionaries.
Road Testing the Nagra Ares BB+ – Ana Kondic
[from Ana Kondic at the University of Sydney]
I have just spent eight months doing field work in Mexico where I used a Nagra Aress BB+ (with a Sony ECM-MS 957 Microphone) for audio recording that I borrowed from PARADISEC at Sydney University.
I worked with a highly endangered Mayan language, South Eastern Huastec. It is spoken in the region of La Huasteca, in the municipality of Chontla, in the North of Veracruz, Mexico, where the majority of the population speaks this as their first language, alongside Spanish.
The area of la Huasteca is tropical, with high temperatures and a very high humidity. I chose the “cold” period from October to May, with pleasant months of December and January (about 20 C during the day, and gets to low 5 C or so during the night), but very warm April and May (up to 35 C). The humidity is very high all year, mostly 85-95%.