Hugo Schuchardt Archiv

I’ve been meaning to express my love and gratitude for the excellent Hugo Schuchardt Archiv at the Uni Graz for a while now. I was thinking of maybe saying a little something about Schuchardt for his birthday or Todestag, but the dates passed and in any case I come to exhume Schurchardt, not to praise … Read more

How much room is there in the arc(hive)?

Forty-five years ago the annual fieldwork reports of some of the researchers funded by the then Australian Institute of Aboriginal Studies (now AIATSIS) included specifications of how much research had been completed in terms of the number of feet of tapes that had been recorded during the project year (“this year was especially productive with 45 feet 3 inches of tape being recorded”). The modern measure of this kind of quantitative nonsense is the number of gigabytes of digital files (soon to be terabytes) created by the researcher. Don’t mind the quality, it’s the length/bytes that count.
My colleague David Nathan, Director of the Endangered Languages Archive (ELAR) at SOAS, has been approached on several occasions by researchers (both those funded by ELDP and those not (yet)) asking how much data they would be allowed to deposit in the archive. “Would it be OK if I deposit 500 gigabytes of data?” they ask. When you think about it for a moment or two, this is a truly odd request, but one driven by part of what David (in Nathan 2004, see also Dobrin, Austin and Nathan 2007, 2009) has termed “archivism”. This is the tendency for researchers to think that an archive should determine their project outcomes. Parameters stated in terms of audio resolution and sampling rate, file format, and encoding standards take the place of discussions of documentation hypotheses, goals, or methods that are aligned with a project’s actual needs and intentions. David’s response to such a question is usually: if the material to be deposited is “good quality” (stated in terms of some parameters (not volume!) established by the project in discussion with ELAR) then the archive will be interested in taking it.
Another quantity that comes up in this context (and in the context of grant applications as well) is the statement that “10% of the deposited archival data will be analysed”. The remainder of the archive deposit will be, in the worst case, a bunch of media files, or in the best case, media files plus transcription (and/or translation). Where does this magical 10% come from? It seems to have originated around 10 years ago with the DOBES project which established a set of guidelines for language documentation during its pilot phase in 2000. As Wittenburg and Mosel (2004:1) state:

“During a pilot year intensive discussions … took place amongst the participants. The participants agreed upon a number of basic guidelines for language documentation projects. … For some material a deep linguistic analysis should be provided such that later researchers will be able to reconstruct the (grammar of the) language”

Similarly, the guidelines for ELDP grant applications (downloadable here) include the following:

“Note that audio and video are not usable, accessible or archivable without accompanying textual materials such as transcription, annotation, or notes about content and participants. While you are encouraged to transcribe and annotate as much of the material as possible, we recognise that this is very time-consuming and you may not be able to do this for all recorded materials. However, you must provide some text indication of the content of all recordings. This does not have to be the linguistic content and could include, for example, description of the topics or events (e.g. names of songs), or names of participants, preferably with time alignment (indication of where they occur in the recording).”

No actual figure is given of how much “some material” (for DOBES) or “as much of the material as possible” (for ELDP) amounts to. In earlier published versions of advice to applicants both DOBES and ELDP did mention 10%.
Interestingly, Wittenburg (2009, slide 34) has done an analysis of the language documentation data collected by DOBES projects between 2000 and 2009, and he notes that the average project team has recorded 131 hours of media (59 hours of audio, 72 hours of video), transcribed 50 hours of this, and translated 29 hours. Linguistic analysis on average exists for 14 hours of recordings — strikingly this is exactly 10.68% of the average corpus!!
How much of the corpus needs to be linguistically annotated so that “later researchers will be able to reconstruct the (grammar of the) language” or indeed so that the rest of the corpus can be parsed? Well, it depends on a range of factors, including the nature of the language(s) being documented. Some Austronesian languages, like Sasak or Toratan, have relatively little morphology with pretty straightforward morpho-phonemics of such morphology that does exist, and so a relatively small amount of morpheme-by-morpheme glossed materials in conjunction with a lexicon would enable users to bootstrap the morphological analysis of other parts of a transcribed corpus in those languages. Other languages, like Athapaskan tongues with their fiendishly complex verb morphology, might need more annotated data to help the user deal with the whole corpus.
This is however an empirical question, and one that to my knowledge has not been addressed so far. There are now a number of documentary corpora available, with more coming on stream, and it should be possible to establish whether the “magical 10%” is a real goal to be aimed for, or just a figure that researchers have created and continue to repeat to one another.

Read more

Wunderkammer Import Package 2 final release

The final release of Wunderkammer Import Package 2 is now available for download. Check out the Wunderkammer website for more info. Thanks to everyone who pointed out bugs and made suggestions for improvement. In this release several bugs have been squished and a bit of input validation and some friendlier error messages have been added. … Read more

New website: Aboriginal Languages Network, Port Augusta

The Aboriginal Languages Network is a team of teachers and Aboriginal language and culture experts in Port Augusta, South Australia, and is working on language revitalisation and materials development for threatened languages spoken in northern South Australia. Mohamed Azkour at Augusta Park Primary School in Port Augusta, has developed a website of Aboriginal language materials … Read more

Wunderkammer Import Package 2

The latest version of the Wunderkammer mobile phone dictionary software, Wunderkammer Import Package 2 Beta, is now available for download. The major advance in this distribution is a new easy to use graphical user interface. There’s also a new set of documentation to go with the new user interface. This is a beta release. We … Read more

Contemporary Aboriginal naming practices

As I reported in this recent blog post, at least one family in South Australia is still speaking the Dieri (Diyari) Aboriginal language. During our discussions last week I took down a genealogy for seven generations of the family, and noticed something interesting about the names given to children of each generation.
The first generation for which I have information were children born around 1880 (such as Frieda Merrick, born in 1885). Many Dieri at that time were associated with Killalpaninna mission run by German Lutheran missionaries. The English language names given to children of this generation have Biblical and Germanic sources, eg. Frieda, Gottlieb, Timotheus, Katerina, Selma, Alfred and Walter.
Children of the next generation, born around 1900, typically have ‘Anglo’ names that were also common among the non-Aboriginal population at the time, eg. Ben, Ernest, Shirley, Myra, George, Martha, Albert, Suzie. This practice continued for the next three generations, born in the 1920s to 1960s, who had names like Arthur, Rosa, Eileen, Nora, Robert, Joan, Jeffrey, Reg and Ian. By the 1970s other names (also used among the wider population) make an appearance, such as Donica, Trevan, Kyle, Liam, Kristen, Brenton and Michele.
A change seems to have happened in the last 10 years for children born around 2000 and later. The names given to them are all ‘unusual’ in not being ‘typically Anglo’ but rather based on African-American names, especially those of popular black singers and rap artists (with a number of girl’s names ending in -esha). Additionally, names of the current youngest generation are spelled in many unusual ways, with lots of unexpected consonant clusters, and even the use of punctuation in the case of De’Ron. The following are the names I collected:

BillyLee Damelia De’Ron
Iesha Jaima Jaran
Jenola Kaiha Kanolan
Katasha Kyrahn Lailani
Lamiah Latesha Mikayla
Nikkiesha Quandelia Quanesha
Quintella Ronice Shareena
Shekogan Shonesha Sianne
Talesha Trayton Trevan
Tyrelle Vaniah Virion
Zander Zysdonehia

Colleagues living in New South Wales have noticed a similar phenomenon and reported the presence of highly distinctive and unusually spelled names among young Aboriginal children there too. There is clearly a distinctive naming system evolving for some Aboriginal groups, a system with its own dynamics though influenced by exposure to popular US black music culture.

Read more

And still they speak it

From 1974 to 1978 I worked intensively on Dieri (Diyari), an Aboriginal language spoken in the far north of South Australia, mainly in Port Augusta and Marree. I completed my PhD, which was a descriptive grammar of Diyari, in 1978, and published a revised version with Cambridge University Press in 1981. I later published some texts in Diyari, and in 1988 together with Luise Hercus and Philip Jones published a life-history of Ben Murray, one of our main consultants, in the journal Aboriginal History.
Since 1978, jobs in the US, Australia, Hong Kong, Japan, Germany and the UK have kept me busy working on other languages and other topics. My last fieldtrip to South Australia was in 1977. At that time there were about 12 fluent speakers of Diyari, all aged over 50, and in the intervening years all of them have died (Ben Murray passed away in 1994 aged 101). According to the latest edition of Ethnologue Dieri (DIF) is now “extinct”.
This year I am taking my first sabbatical leave since starting work at SOAS over 7 years ago, and have had the opportunity to return to Australia for an extended visit and to start to think about Diyari again. In 2009 I was contacted by Greg Wilson, South Australian Department of Education and Children’s Services (DECS), who told me about a pilot project to introduce the language into schools in South Australia with sponsorship from the Dieri Aboriginal Corporation (DAC) (which just last year purchased Marree Station for the Dieri people – see photos) and financial support from the Australian Federal Department of Environment, Water, Heritage and the Arts (DEWHA). For the past year Greg has worked on creating a CD-ROM of basic language materials in Dieri (as the community members prefer to call it) recording words and simple sentences from a number of people in Port Augusta, Whyalla and Adelaide. At the beginning of this year DAC, with DEWHA funding, asked Greg to start a main phase project to develop Dieri language lessons for R-12 students. He had already produced a massive program for the neighbouring Arabana language, using materials from Luise Hercus’ grammar and dictionary, and working with a number of Arabana speakers, however it looked like the same would not be possible for Dieri as the level of language knowledge seemed much more fragmentary.
Last week Greg organised for me to visit South Australia and travel with him to Port Augusta to meet members of the Dieri community, especially Winnie Naylon and Renie Warren, and their children and grand-children. They are sisters, and the grand-daughters of one of my main consultants from the 1970s, the late Frieda Merrick. Frieda was born in 1885 (she passed away in 1978) and had spent her early years at Killalpaninna Mission that was run by Lutheran missionaries and where Dieri was the main language in use. Her husband Gottlieb Merrick had also been involved with the mission. Frieda spoke only Dieri to her daughters, one of whom was Suzie Kennedy, the mother of Renie and Winnie. I once had the opportunity to interview Suzie Kennedy in 1974 but she was very busy with her family and the opportunity to work with her didn’t arise again.
Renie Warren and her son Reg remembered me from my visits to study Dieri with their grand-mother (and great-grand-mother), and once initial shyness had passed, helped along by a few jokes (my saying nhawu parlali nganayi yingkangu and yidni piti thungka nganayi had the whole room in stitches), it turned out that Renie was very fluent in Dieri, easily able to converse and tell stories. She even told me yidni manyu marla yathayi Diyari yawarra ‘You speak Dieri really well’, quite a complement for someone who hadn’t spoken the language for 33 years!
Greg and I got to work on Lesson 1 of the Dieri language program, recording Winnie and Renie, as well as Reg, who is pretty fluent, despite having spent the past 20 years away from Dieri country working on various mining projects (he is currently working as a driving instructor for the massive dump trucks used to cart ore in the Pilbara). Renie’s grandson Robert also joined in with recording bird names.
So, Dieri (Diyari) is not extinct, indeed far from it. The language has been kept alive continuously within this family, and now I have had the pleasure of studying Dieri with five successive generations. In the future I hope to assist Greg and the community with development of further language learning materials.

Read more

Fieldwork training workshop in Manchester

The Institute for Linguistics and Language Studies (ILLS) at The University of Manchester and the Subject Centre for Languages, Linguistics and Area Studies are co-organising a fieldwork training workshop to be held in Manchester on 20th May. This event is aimed at both postgraduate students and lecturers with an interest in teaching field methods for … Read more

Bienvnus a Dgernesi

As part of the MA in Language Documentation in the Endangered Languages Academic Programme (ELAP) at SOAS, students are able to participate in a two-week fieldtrip to Guernsey, Channel Islands, to undertake first-hand fieldwork and document the local highly endangered indigenous language Dgernesiais (or Guernesiais). The fieldtrip is organised by Julia Sallabank, Lecturer in Language … Read more

How long is a piece of string?

Last month I received the following email query from a colleague:

“I am currently submitting a grant application for a small grant at the HRELP to document …. One concern I have is how many hours it will realistically take to transcribe one hour of text. I have done fieldwork in the past, but this would be the first time that I will have trained a transcriber who would work (mostly) independently. (The linguists on the project would consult with them.) I would like to give some sort of concrete number of total hours transcribed and translated (in contrast to fully annotated).”

Since this is an issue I have been asked about several times, I present here an elaborated version of what I wrote back to my correspondent (here I am using ‘source language’ to refer to the language of the recording, and ‘target language’ to refer to the language of a translation of the recording. I restrict my remarks to transcription of spoken languages).
I wrote back:
The answer to your questions is kind of like the answer to the question: ‘How long is a piece of string?’
There are so many variables:

Read more