Look What They’ve Done to My Song (and other time-aligned data and analysis), Ma

At the Linguistic Society of America Summer Institute in Berkeley last week (17-19th July) the National Science Foundation sponsored Cyberling 2009, a workshop exploring how computational infrastructure (called “cyberinfrastructure” in the US, and e-Science or e-Humanities in the UK) can support linguistic research in a variety of fields. There was a panel discussion about data sharing that looked at the proposal:

“A cyberinfrastructure for linguistic data would allow unprecedented access [to] the empirical base of our field, but only if we collectively build that empirical base by contributing data. This panel addresses the benefits of data sharing and the obstacles to the widespread adoption of sharing practices, from the perspective of a variety of subfields”

But the bulk of the workshop was given over to closed discussion sessions by seven working groups looking at annotation standards, other standards, new multi-purpose software (so-called “killer apps”), data reliability and provenance, models from other fields, funding sources, and collaboration structure. The group discussions and resulting final day presentations are available on the Cyberling Wiki.
I was co-chair of Working Group 4 that was charged with discussing “protecting data reliability and provenance”, i.e. how to keep track of the creation of data and analysis and its passage through the electronic infrastructure as researchers access and use each other’s materials. As the Cyberling Wiki says, this is crucial

“for data creators (who need credit for the work they have done and the academic contribution of collecting, curating and annotating data) and the data users (who need to know where the data has come from so they can form an opinion of how much credence to give it and how to give proper credit to the originator of the data)”.

We also looked at how to establish a culture of data sharing and what mechanisms might be put in place to encourage people to share data. Clearly, for endangered language research where data are unique and fragile, these are very important issues.
After two and a half days of intense discussions our group came up with a set of proposals relating to data reliability and provenance that can be summarised as follows:

Read more

World Oral Literature Project

The website of a new project called World Oral Literature Project: Voices of Vanishing Worlds has just gone live at the University of Cambridge. The project kicked off early this year under the leadership of Mark Turin, an anthropological linguist whose major research area is Nepal (his PhD thesis was a grammar of Thangmi, a … Read more

3L Summer School final report

The two-week 3L Summer School continued last week with plenary lectures on documentation and linguistic theory, language policy, language archiving, and documentation and language typology. Courses in the second week included Amazonian languages, Caucasian languages, Grammar writing, and documenting special vocabulary, together with the continuation of documenting sign languages, and sociolinguistics of language endangerment. The … Read more

3L summer school mid-term report

Well, we have just passed the half-way point of the 3L Summer School and things seem to be going pretty much according to plan. Despite some last minute scrambles (presenters dropping out and needing to be replaced, equipment needing to be bought, rooms being taken out of service) all the classes got organised on time and have run well so far. Even Blackboard, the e-learning support environment, is functioning faultlessly, enabling us to do away with photocopying handouts and having useless piles of paper at the end of each class.
There are 97 students attending the 3L summer school, representing 42 nationalities (Argentinian, Australian, Belgian, Benin, Brazilian, British, Cameroonian, Canadian, Croatian, Czech, Danish, Dutch, Ethiopian, Finnish, French, German, Ghanaian, Greek, Indian, Indonesian, Irish, Israeli, Italian, Japanese, Korean, Malaysian, Malian, Mexican, Nigerian, Norwegian, Pakistani, Polish, Portuguese, Russian, Saami, South African, Spainish, Swedish, Swiss, Taiwanese, Ugandan, USA). There are 18 instructors, who come from the three consortium universities (SOAS, Lyon and Leiden), along with colleagues from University College London. Three tutors from SOAS and a group of student volunteers, plus our Administrator Alison Kelly, make up the rest of the 3L team.

Read more

New publications from SOAS and FEL

Two new groups of publications are now available from SOAS. 1. LDD 6 Volume 6 of Language Documentation and Description is now available. This volume is a fully-refereed collection of papers dealing with: language documentation methodology sociolinguistics and pedagogy for endangered languages software applications The papers were written specially written for the volume, and include … Read more

Technologically-enhanced fieldwork

Last year I wrote about how mobile phones are being used to do “fieldwork at a distance”, checking data with consultants, or collecting text messages of writing in endangered languages.
A recent blog post by ESL educator Tom Leverett alerted me to yet another possible technological aid for linguistic data collection and checking, Skype. Many of us know Skype as a way to make cheap (or even free) voice and video phone calls, but Tom points out another use for the software (in association with audio and video software) — conducting and recording conversations. He reports on an experiment that he carried out with a colleague:

“Thom T., our lab director, who makes it his business to know these things, agreed to place a call, and sure enough, from my office to his, we not only had a call, but also recorded it; furthermore, he bundled up that tiny recording (he had recorded only a few minutes of it – still, he said, it was quite a large bundle) and sent that bundle to me over the text chat function that is right there on Skype … one can send songs, movies, documents, anything, as one would on an IM or another chat function. But, you can do it, and look the other person in the eye as you do it. Look ’em in the videocam eye, anyway”

So, I thought, what about interviewing consultants on Skype and using it to collect material to be added to a documentary corpus, check grammaticality judgements, socialise with the community, get feedback on materials, or indeed, just about anything that involves two-way communication? There are, however, limitations, as Tom points out. Two of these are bandwidth and interference:

Read more

Australia beats US, again

That’s my tabloid journalist headline for what is a serious, some would say momentous, development in the history of the Linguistic Society of America (LSA), namely the adoption last month by the Executive Committee of the LSA of an Ethics Statement [.pdf]. Its Ethics Committee has been working on a draft statement for the past two and a half years, and engaged in consultation within the Society.
There is an article dealing with the issue in this week’s Inside Higher Ed, but it focuses on what I believe are two less important aspects of thinking about ethical issues in linguistic research, namely what could be paraphrased as “how to stop linguists from screwing things up” and “how to get round the Institutional Review Board (IRB) process”.

Read more

Endangered Languages in Chronicle of Higher Education

This week’s Chronicle of Higher Education has two articles by Peter Monaghan on endangered languages issues. The first is entitled Languages on Life Support: Linguists debate their role in saving the world’s endangered tongues (viewable free on line, and includes material from interviews with Nick Evans, Michael Krauss, Richard Rhodes, Noam Chomsky, and myself. Some of the topics covered will be familiar to readers of this blog, like what Monaghan calls “a ‘commando style’ of recording trip” (something Jane wrote about as Fifo (fly in fly out) fieldwork).

Read more