Suzzy Data Workshop – Guest blogger Bruce Birch – Endangered Languages and Cultures

Dear ELAN Workshop attendees, and anyone who might find this of interest,
There were a few loose ends left at the end of the ELAN workshop last week. I’d particularly like to address one, the question as to whether we should aim for a standard set of ELAN templates which everyone uses.

To me this is OK if we are talking about a community of people working in the same way on the same sort of data who wish to agree on certain standards in order to make various kinds of data-sharing and analysis easier. Obviously a good idea to be using the same categories and the same set of values within those categories (although not necessarily easy to get people doing that sometimes I’ve noticed!)
However, it needs to be stressed that ELAN is a program which can be used to mark up many kinds of data in an infinite number of ways, only one of which is the Toolbox-friendly set of interlinearization tiers I displayed during the workshop.
To give an example, I am marking up my data for different prosodic features. I have, among others, an Intonation tier on which I mark up different intonational tunes, and combinations of tunes. Using the powerful search function of ELAN, I am able to search instantly across hundreds of my files marked up in this way, for a particular tune. I am then able to view and hear (in its original context) any of the selections listed in my search window simply by clicking on it in the displayed list. I am then able to instantly open that selection in Praat by right-clicking on the selected area of waveform, and so am able to do acoustic analysis, or export pitch traces for illustrations in papers. Trust me, it’s incredibly useful. In effect it gives me an acoustic analysis database.
This tier, and the entries in it, are not standard, as on the one hand they are language-specific, and on the other, I’m working it out as I go.
I also have, as another example, a Topic Index tier, which means I can instantly locate all the data we have collected on Green Sea Turtle anatomy, for example, which is useful in preparing new interviews on the subject, and for introducing the relevant data to specialists working with our project. I can have all of this data at my fingertips instantly, and export it as text in various formats, or export a translation tier as subtitles for a quicktime movie.
Furthermore, as it is possible to merge any number of ELAN annotations which refer to the same media, I can view an interlinear gloss imported from Toolbox, at the same time, and in the same window, as any of the non-standard annotation tiers I have created for that media. If there are tiers I’m not particularly interested in at a given time, they can be hidden so that the tier display is not cluttered.
‘Intonation’ and ‘Topic Index’ are just two examples of the many tiers I have created to facilitate my particular research needs. Clearly someone interested in aspects of syntax would have their data marked up quite differently. And someone working on Song would have a different set of tiers again. Etc, etc.
The notion, therefore, that ELAN is useful for aligning existing Toolbox or Shoebox annotations with media files, and that’s about it, which seems to be out there, is far from the truth. I do initial transcription work in ELAN, then export selected texts to Toolbox for automatic interlinearization and lexicon building, then bring them back into ELAN and merge them with other tiers I may have been using during the same period, or tiers which I create subsequently. Of course, once imported, the morphemic gloss from Toolbox becomes available to the ELAN search function, so I can instantly bring up a list of all instantiations of 3sgA>1pl.incl prefixes, or whatever I want.
All of that said, it would be useful to have a few ELAN templates available for download, as Rachel Nordlinger suggested on David Nash’s behalf, if I remember correctly. In particular, it would be good to make readily available the ELAN template which exports to Toolbox, and the marker file required by ELAN to import text from Toolbox. I’ll check if these are currently available anywhere on the web, and if not, I’ll attempt to get them uploaded somewhere and let people know.
I hope this may have clarified a few things for a few people.
Best wishes,
Bruce Birch.
Bruce Birch
Please note I have new email address:
birchb (at) unimelb.edu.au
Iwaidja Documentation Project
Department of Linguistics and Applied Linguistics
University of Melbourne
VIC 3010
Australia
phone: +61 (0)3 8344 4588
mobile (aust): +61 (0) 410 103 965
mobile (europe) +49 (0) 162 380 4213

5 thoughts on “Suzzy Data Workshop – Guest blogger Bruce Birch”

All sounds reasonable to me Bruce. Some useful templates can be shared around, if only as examples for newbies. Perhaps the RNLD could host some. One of the FAQs at RNLD is relevant, with the links to Andrew Margett’s helpful advice (of May 2005 and thus about Elan version 2.4).

I agree that it would be useful to have some simple shared templates – but that the overall potential scope of projects is hard to regularise and cater for. We need a sort of ELAN template cookbook. Soup and entree, and the whole meal for anyone mad enough. To me the possible relationships between the tiers is not immediately intuitive, and so some simple examples of those structures would be useful.

Yes indeed. I think the intention of the original suggestion was to do exactly as Jenny and David suggest, namely provide a template that beginners could start with (if they want to), in order to get their heads around the software, and then modify to suit their particular needs.
And don’t think the umlaut went unnoticed, Bruce!

I’ll put up my templates on my web site (I’ll link here when I’ve done it). I’m not suggesting that they are particularly wonderful, but there are good for beginners, and they work.

Only slightly more than a month after I said I’d put these up, here they are:
Elan Templates

Comments are closed.