This blogpost comes to you from Salt Lake City at the University of Utah, thanks to the Center for American Indian Languages which is co-hosting a Workshop on Endangered Languages Information and Infrastructure (ELIIP) project with Linguist List(organised by Lyle Campbell, Helen Aristar Dry, Anthony Aristar). It’s intended mostly for the specialist, but there’s an interesting push to reach out to the general public- if they don’t understand what we do, they won’t support it. Cute and less cute facts help in conveying this – more on this later.
A thousand flowers on endangered languages are blooming on the web, from Wikipedia to blogs on particular languages to the language resources catalogued by libraries. Helen Aristar Dry suggested that users want to view the whole flowerbed from a convenient vantage point. That’s the II of ELIIP: do we need a comprehensive catalogue/database/website/portal of endangered languages?
So suppose Jane LUser does a google search on the web for ‘Ossetian language’.
Top hits
Not until page 2 do I get:
- OLAC, the major US-based data harvester on languages. This links to Ethnologue, and to Linguist List, and then to OLAC’s own report, which contains information on language resources including the WALS Online Resources for Ossetic, and links back to LINGUIST List Resources for Osetin
I do NOT get in the first two pages to
- the major European language documentation resource CLARIN’s link to the TITUS Ossetian corpus
- Ethnologue run by SIL which has information on speakers, dialects, geography and is the closest thing we have to a worldwide resource.
- The World Atlas of Language Structures Online , which has lots of comments on typological features, and references to documentation on the language (but not to the TITUS corpus).
- Linguist List site by subject language, for Ossetian which gives 1 Linguist, Erschler, David, Independent University of Moscow found, and that There are materials linked to Avestan, Yagnobi, which are closely related to the language you selected. You can view this information by clicking on their name above.
- the Linguist List site by language or family, which gets various names (Osetin, Ossete, Ossetic, Ossetian), ISO code , family information and links to
- Multitree family trees
- a map (LL-MAP) which then links to
- the description in Ethnologue,
- any interlinear texts in the Odin database ( We’re sorry, no records for language code “OSS” can be found in ODIN)
- The World Atlas of Language Structures Online
- UNESCO Interactive Atlas of the World’s Languages in Danger (which has it under Ossetic only)
So the lack of a simple portal with a high page rank is why linguistics departments get rung up by people looking for basic information. We need a portal because:
- speakers of the languages want to get stuff on them
- so do researchers
- We don’t know which languages are endangered, what interesting typological traits they may have, what projects are underway, who works on them
- Existing data structures on languages aren’t integratable into other data structures on languages
We do need a window on the flowerbed. A similar metaphor is the ‘virtual language observatory’ – the CLARIN project’s label for their resource discovery portal. BUT both metaphors obscure other important factors – speakers live their languages rather than observe them (the languages are their flowers), and speakers and researchers want to contribute to the assembly and verification of resources (?cultivating the flowers?).
The workshop’s task then is working out the structure of the flower bed.
- What’s in it? (we have huge wishlists, we’re torn between accuracy and presenting competing hypotheses)
- How do people learn about it? (E.g. high page rank, simple interface and RSS feeds updating you when a new resource is added to languages you follow)
- Is it run as a data manager or a data harvester and aggregator?
- And whatever way, how do people contribute material? WALS has a nice feature whereby users can comment on any data-point – and they receive LOTS of comments, many of them useful.
- How is it moderated and verified?
In between working groups looking at this ideas, looking out the window at the high snow caped mountains, we heard about all sorts of interesting ideas, projects, and tools, both in the program and in breaks. Here are some that struck me.
Not all apparently healthy languages are safe as Arienne Dwyer (University of Kansas, Committee on Endangered Languages and Their Preservation (CELP)) showed. There are around 10 million speakers of Uyghur. But recent changes in government education policy aimed at increasing access to the dominant language, Chinese (Putonghua) have changed Uyghur from language of instruction to a subject language. This devalues Uyghur and is likely to lead to reduction in use. She and colleagues have been preparing instructional materials in Uyghur.
How the context of the speakers shapes the work (contra a ‘ Noah’s Ark approach to language documentation’) was discussed by Tony Woodbury with respect to Chatino speaker-linguists work on the importance of land-related kowledge and of verbal art.
Alice Harris gave a lovely paper on Exuberant exponence in Batsbi describing psycholinguistic field testing of 40 native Batsbi speakers (their average age was 67 and they constitute 20% of all speakers – it’s endangered!) to see if having lots of exponents of the same class marker helped processing and word recognition. Short answer: No.
A nice example of a web dictionary is the Dictionary of the Archi (Daghestanian) Language (sounds and pictures) organised by
Greville G. Corbett and Marina Chumakina.
Finally, access to information on languages isn’t enough – Carol Genetti and Margaret Florey are proposing a> consortium on training in language documentation and conservation. Brian Joseph gave a neat description of a capstone general education unit on Language endangerment and language death– so no knowledge of linguistics assumed.
More tomorrow.
As someone watching this from the sidelines, it seems to me there is an obvious solution to the so-called “problem” of a “single portal” for language materials that this meeting is thinking about, and it’s in your commentary. The top hit is Wikipedia so take advantage of that and edit the Wikipedia entries to improve them in the ways that your group wants, and add all the links to other stuff, like those you listed. Give this as a task to a bunch of linguists and see what the result is — combine Wikipedia drawing power with improved content and whamo there’s your solution!
It’s ironical to me that you should choose Ossetic as an example language but not to list ANY sources in Russian (like http://ironau.ru/index.html) or Turkish, the languages that Ossetic speakers might be accessing the internet in. Is there any discussion of multilingual interfaces in the good old US or A where you are meeting? Interestingly, Linguamon in Barcelona, which aims to provide information on language diversity to the general public operates their website in 21 languages (that’s 21, not 2 or even 1, and includes Basque and Tamazight).
Oh, and Wikipedia exists in languages other than English too.
Just a thought.
Snap! WIKI was intensively discussed here yesterday. And localisation on the one hand and aggregation of data from languages other than English is being talked about right now.
Peter, the irony is google’s fault, not Jane’s. Search for the English language name, get English hits. Search for осетинский язык and you get the Russian wiki site, http://www.triadna.ru/dictionary/lang/ossetin.htm, and a bunch of other things. No highly ranked page lists the language name in multiple languages.
We spent some time talking about localizations for content and the interface (Russian, Arabic, “Chinese”, Spanish, French and Indonesian came up).
Claire’s point holds in general – it’d be great if Google provided alternatives to Ossetian in different languages and scripts. [Using the ‘any language’ feature doesn’t do it for you).
BUT to my shame I did skip over a .ru site in English on the first page of hits, ‘Minority languages in Russia’ http://www.peoples.org.ru/eng_oset.html, the corresponding Russian version of which has quite a extensive bibliography (but different population figures from Ethnologue).
And none of us have mentioned Iron aevzag, the Ossetian wikipedia entry in Ossetian.