In a few weeks’ time reports and powerpoints on the ELIIP workshop will be up on the ELIIP website for discussion.
I took away memories of the beauty of the mountains and saltlakes, the strange comfortableness of bison, and a slight increase in knowledge about the Latter Day Saints – how can one not feel sympathetic to the nineteenth century Welsh Mormon who set sail for Zion equipped with an English and Welsh dictionary.
There’s a lively group of people at the University of Utah working on native American languages (from Brazil north to Ojibway). One project that especially struck me was a Shoshone outreach program. Several Shoshone were at the ELIIP workshop. Last year 10 Shoshone high school students came to the Center for a six week summer camp funded by a donation from a local mining company. In the program they learned some Shoshone language, as well as crafts from Shoshone elders. The students worked as paid interns to do some work on language documentation and prepare language learning material in Shoshone. It was a great introduction, not only to language documentation but to university life generally. What a good idea!]
Back to the workshop. Yes we need something like ELIIP – a list of endangered languages with information about them and pointers to other sources about them. But it won’t work unless it is aimed at more than just linguists. And it must point to rich information. And it must be inclusive. And it must be simple to use. And, since there is very little money around, it must be designed to have as low maintenance costs as possible.
Summing up, I’d say the workshop allowed various ideas to gel about what the one-stop shop for languages would look like. I thought the most important were:
- Avoid duplication. A lot of work has already gone into collecting material. Don’t waste it.
- Data-freshness. People will be drawn to the site if they believe that the data is fresh, rich and reliable.
- …comes at a cost Whatever’s built has to be updatable and maintainable at minimal cost. So maintaining links – even with a web crawler – is beyond many sites
- Buy-in If it’s to work, lots of communities, archives and linguists need to be able to add in material easily and to feel that it belongs to all of us
- Simple interface for searching AND for uploading. This means paying for good design and testing with a range of users. Maybe there’ll be several interfaces for different types of user.
- Wish-things
- There was a strong swell of opinion in favour of digital archives where people could deposit digital data files and update information easily
- Snapshots in time People will want to know what a language was like 10 years ago, 20 years ago – how many speakers, did children speak it and so on.
- Localisation How to translate the material into other languages for countries where outreach on the importance of helping speakers keep their languages is really needed? Spanish, Chinese, Russian, Pidgins and French may be the main lingua francas for some of these areas.
.
A divide was proposed by Gary Simons between curated web services (where people create data and people manage that data) – like Wikipedia – and aggregating web services (where automatic harvesters harvest data from archives, libraries etc) – like Google. I think the consensus was that we needed both – linking to information that is out there, and filling in the gaps.http://www.language-archives.org/OLAC/metadata.html
Aggregation means work for existing archives as well as for ELIIP. If an archive’s data is to be harvested, it has to be accessible to data harvesters. And access has many levels – first, knowing that it is there (e.g. via a URL which builds in the ISO code for the language). Getting language cataloguing information (metadata) in a shape that is harvestable is hard and time-consuming, as was noted by researchers and archivists wrestling with OLAC and IMDI metadata. And then if you want to go beyond a link, there’s extracting the information from the page itself. (I liked the way WALS ( The World Atlas of Language Structures) allows going to actual references on GoogleBooks or equivalent).
What kind of interface? It has to be simple – for searching, for uploading data, and for commenting on existing data. We suggested a basic interface (possibly offered by another organisation – e.g. Sorosoro. Doug Whalen suggested a hinged model – one underlying database which could be expressed as a UNESCO list for policymakers, one for ELIIP researchers, one for the general public, and lots of community portals for communities.
On community portals, I was impressed by the way the DoBeS people can generate semi-automatically community portals from the material in their archive – an advantage of having highly structured data in the first place. E.g. Dane-zaa Community Portal to facilitate the use of the archive collected by the DoBeS team together with the elders. An interactive community portal the community members could customise and manage would be great.
Simple interactivity is important – free form comments are easier than web forms but the information has to go to the right people and this can be tricky when there are thousands of different right people for different questions. Hans-Joerg Bibika brought up the WALS database where they have thousands of comments which can be made on any data point or set of datapoints. He thinks that roughly 60% of commenters are linguists, 30% noise and 10% native speakers. It is a blog system. The 65 authors of WALS are linked via RSS feeds and because it’s their chapters they have some incentive to correct mistakes. Having public tracking keeps the administrator honest. And it turned out to be simple to implement.
Wikipedia cropped up many times. It has superb page rank and data freshness. BUT … a number of drawbacks were noted, many by Doug Whalen. Regular Wikipedia doesn’t support heaps of links and for data richness we need that. It doesn’t go for original research (it wants citations) and people are loathe to put work into something which some non-specialist can then change. Only 700 languages or so have pages, and who would create the others? Cold hard truth crept in here, researchers live from recognition. Getting them to maintain a language site requires some incentive other than a sense of virtue. So there needs to be some minimal recogition of people who do contribute information – whether by authoring chapters as in WALS, or by having a public list of regional editors which people can then cite as indicative of community/research service
Wikipedia has only one level of access, with widely varying types of information, and so it can be rather daunting for Joe User. Various suggestions were batted around. One was having basic and advanced interfaces. Another was creating a template for language entries in Wikipedia, with automatic links to ELIIP and Ethnologue, and suggestions of archives. A promising line of enquiry for more reliability is a more controlled type of Wiki, such as Wiki-Species (also easier to translate as you can see from the list which includes Eald Englisc).
Anyway, many ideas – watch ELIIP’s space!
maybe Scholarpedia is a better model than Wikipedia?
Several people have also brought up Wiki Species as a potential model, both for the taxonomic organisational aspect and for content.
Thanks for this summary Jane! For those of us looking on from afar, it’s much appreciated.