First Footprints, Farsd Fatbrontz, Verst Pitprands: Spelling as if the language matters

I have watched the excellent series First Footprints a couple of times. It is a great overview of the origins of human occupation of Australia, with fantastic visual effects and photography. It starts with the declaration that “First Footprints seeks to treat Indigenous cultures and beliefs with respect”. Respecting Indigenous Australian languages should involve at least treating them the way you would any other language and checking that words in Australian Indigenous languages were written accurately. Think of the times you have watched a film that had misspelled English subtitles in it and what it makes you think of the care the subtitler took. It only took me a little effort to check on the following mistakes by web-browsing and by talking to people with experience in the particular languages.

Read more

ARC Centre of Excellence for the Dynamics of Language

We have great pleasure in announcing that the ARC has funded a Centre of Excellence for the Dynamics of Language over seven years. This project will be led by Nick Evans at ANU with a collaborative team from there, the University of Western Sydney, the University of Queensland and the University of Melbourne, and with many partners from other universities and institutions including AIATSIS and  Appen.

We want this to be a centre for collaboration, for generating  ideas and inspiration for linguistics in Australia and the world.  In the New Year we’ll be putting up a web-page to give more information, In the meantime, here’s an overview of what we are planning.

Read more

Research, records and responsibility conference: Ten years of PARADISEC

RRRReception

The conference celebrating ten years of PARADISEC in early December had a suitably interdisciplinary mix of presentations. Joining in the reflection on building records of the world’s languages and cultures were musicologists, linguists, and archivists from India, Hong Kong, Poland, Canada, Alaska, Hawai’i, Australia, the UK and Russia. The range of topics covered can be seen in the program: http://paradisec.org.au/RRRProgram.html

The conference ended with a discussion of what was missing in our current tools and methods. While it is clear that linguists have done pretty well at using appropriate tools for transcribing and annotating text, and building repositories to provide long-term citation and access to the material, there is still a long way to go.

Read more

The long road to language resources—CLARIN

CLARIN, the ‘Common Language Resources and Technology Infrastructure’ is a European initiative to support the creation, curation and exploration of language material for research purposes and for as broad an audience as possible. The stated aim is that you should not need to be a technical expert to use the corpora, lexica and annotations that are targeted in CLARIN.

It is part of the European Research Infrastructure Consortium (ERIC). This is a huge project, with a budget of some €104 million. CLARIN-D is the German section of CLARIN and it recently had its 2-year showcase, which I was able to attend (see current activities at http://clarin-d.net/de/aktuelles/). Given that this is the first two years of a longterm project it has clearly achieved a great deal already, and certainly more than can be glimpsed in a short blog post.

This is part of a ‘roadmap’ process that actually leads somewhere, unlike the Australian version I reported on earlier that appears to have cost hundreds of thousands of dollars only to have been abandoned even before it was published.

Read more

Print on demand, again

In an earlier post I talked about getting texts from Toolbox into books for use in the language community. The print-on-demand service I was so enthusiastic about and which I pointed to for copies of my books, has now closed, fallen victim to a change of bookshop ownership at Melbourne Uni. After talking with Manfred … Read more

Exploring data from language documentation

The workshop ‘Exploring data from language documentation’, organised by Kilu von Prince and Felix Rau, (May 10/11 2013) included a number of interesting presentations which can be downloaded here: http://www.zas.gwz-berlin.de/1701.html

I talked about some gaps in the current language documentation workflow and tools that could help fill them, in particular ExSite9 for improving metadata collection, and EOPAS for presenting text and media online for citation and verification.

Christian Chanard and Amina Mettouchi showed a hybrid version of Elan they have developed that allows parsing and morphological labeling, as well as another tool that allows websearching of Elan files. http://corpafroas.tge-adonis.fr/tools.html

Read more

PARADISEC’s decade celebration conference

Announcing the conference “Research, records and responsibility (RRR): Ten years of the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC)” Dates: 2nd-3rd December 2013 Venue: University of Melbourne, Australia Keynote speaker: Shubha Chaudhuri Associate Director General (Academic) Archives and Research Centre for Ethnomusicology American Institute of Indian Studies Gurgaon, India For details … Read more

Fieldwork helper – ExSite9

ExSite9 is an open-source cross-platform tool for creating descriptions of files created during fieldwork. We have been working on the development of ExSite9 over the past year and it is now ready for download and use: http://www.intersect.org.au/exsite9 https://github.com/IntersectAustralia/exsite9/wiki/Install-packages

ExSite9 collects information about files from a directory on your laptop you have selected, and presents it to you onscreen for your annotation, as can be seen in the following screenshot. The top left window shows the filenames, and the righthand window shows metadata characteristics that can be clicked once a file or set of files is selected.The manual is here: http://bit.ly/ExSite9Manual

Researchers who undertake fieldwork, or capture research data away from their desks, can use ExSite9 to support the quick application of descriptive metadata to the digital data they capture. This also enables researchers to prepare a package of metadata and data for backup to a data repository or archive for safekeeping and further manipulation.

Scholars in the Humanities, Arts and Social Sciences (HASS) typically need to organise heterogeneous file-based information from a multitude of sources, including digital cameras, video and sound recording equipment, scanned documents, files from transcription and annotation software, spreadsheets and field notes.

The aim of this tool is to facilitate better management and documentation of research data close to the time it is created. An easy to use interface enables researchers to capture metadata that meets their research needs and matches the requirements for repository ingestion.

Read more

Digitising Mali’s cultural heritage — Simon Tanner

Simon Tanner has a blog post on his experience of working with various manuscript collections and the tragic destruction of potentially thousands of manuscripts from the New Ahmed Baba Institute building in Mali: “I have worked with manuscripts for over 20 years now; as a librarian, academic and as a consultant helping others to digitise … Read more

Counting Collections

As will be clear to regular readers of this blog, we are concerned here to encourage the creation of the best possible records of small languages. Since much of this work is done by researchers (linguists, musicologists, anthropologists etc.) within academia, there needs to be a system for recognising collections of such records in themselves as academic output. This question is being discussed more widely in academia and in high-level policy documents as can be seen by the list of references given below.

The increasing importance of language documentation as a paradigm in linguistic research means that many linguists now spend substantial amounts of time preparing corpora of language data for archiving. Scholars would of course like to see appropriate recognition of such effort in various institutional contexts. Preliminary discussions between the Australian Linguistic Society (ALS) and the Australian Research Council (ARC) in 2011 made it clear that, although the ARC accepted that curated corpora could legitimately be seen as research output, it would be the responsibility of the ALS (or the scholarly community more generally) to establish conventions to accord scholarly credibility to such products. Here, we report on some of the activities of the authors in exploring this issue on behalf of the ALS and discuss issues in two areas: (a) what sort of process is appropriate in according some form of validation to corpora as research products, and (b) what are the appropriate criteria against which such validation should be judged?

“Scholars who use these collections are generally appreciative of the effort required to create these online resources and reluctant to criticize, but one senses that these resources will not achieve wider acceptance until they are more rigorously and systematically reviewed.” (Willett, 2004)

Read more