Today I have a story to share that involves intellectual property violations, taking materials without attribution from a copyrighted dictionary of an Australian indigenous language, and publication of a book that contains such bad scholarship, ridiculous claims, nonsense, and stupid howlers that it is actually funny.
Over the past couple of years I have presented sessions at various workshops and training courses (most recently at a grantee training workshop held at SOAS 11-17th June) on the topics of "ethics, intellectual property rights and copyright". I have learnt a bit about copyright and moral rights in the process – my Powerpoint slides for the most recent presentation can be found here.
One of the issues that is often raised by fieldworkers and researchers during these presentations can be summarised as: "I don’t want to make my data publicly available because someone will steal it and publish it under their own name". I usually reply in terms of the low likelihood of such an event happening (as Andrew Garrett said at an archiving workshop at the January 2008 Linguistic Society of America annual meeting (and I paraphrase): "Sorry to tell you this, but actually no-one wants to steal your data") and the protection afforded by copyright and moral rights (mentioning the World Intellectual Property Organisation and various other lobby groups).
Well, unfortunately, I have to change my tune, folks, because it has happened to me. A subset of materials which I have published in book form (and deposited as Word .doc files with the ASEDA archive) and co-published with David Nathan on the web as the Kamilaroi/Gamilaraay Web Dictionary that are all clearly marked as copyright have been reproduced without attribution or recognition of our authorship both on a website and in a recent book publication. Fortunately, they have been done in such as way as to reveal the ignorance of the violator that is truly laughable. Sadly, this individual is attempting to profit financially from both our intellectual property and that of an Australian Aboriginal group, along with potentially damaging the trust we have built up by years of work with the community.
The story goes like this.
Professor Phillip M Parker PhD, Professor of Marketing at INSEAD "The Business School for the World" (based in Fontainebleau, France) has established a website called Webster’s Online Dictionary, using the term "Webster’s" which is now out of copyright. Note that this has nothing to do with Merriam-Webster, a highly reputable dictionary making firm.
The blurb about Parker’s website goes as follows:
"The goal of Websters Online Dictionary is to give all people of the world free access to a complete mapping of all known words to and from all written languages. In fulfilment of this goal, Websters Online Dictionary also offers as much information as possible for each word, including definitions, translations, images, trade name usage, quotations, and encyclopedic knowledge."
On the website it is possible to access wordlists in a wide variety of languages (see Note 2).
In addition to the web materials, Parker also publishes books which are marketed via Amazon and Target (in the US) and which purport to be "thesaurus dictionaries" of a range of languages. David Nash, who brought Parker’s website to my attention, pointed out to me that one of these languages is Kamilaroi (more commonly and correctly known as Gamilaraay), a language traditionally spoken in north-west New South Wales that I have been working on since 1972 (an article I wrote on the history of research on Gamilaraay appears in the book by Bill McGregor that I blogged about recently).
So, I thought, how interesting to discover that there is a new Kamilaroi English Thesaurus Dictionary that has just been published, and curious to see its contents, I duly placed an order with Amazon.co.uk, paid UK£17.38 (that’s roughly A$35) and waited for two weeks for the book to arrive.
And what an unusual and funny little piece of work it is. For my A$35 I got a little A5 book of 65 pages that contains a Preface, Kamilaroi to English Thesaurus, Index of English Subjects to Kamilaroi Subjects, Vocabulary Study Lists, and an Index. Nowhere is the identity or location of the Kamilaroi language given, nor the fact that it belongs to the Gamilaraay people of northern New South Wales. There is nothing about the spelling of Kamilaroi words, though the spelling system follows exactly that established in my publications.
The Preface begins with a truly silly statement (probably inserted from some boilerplate):
This is an English thesaurus designed for Kamilaroi speakers who wish to better understand the ambiguities and richness of the English language"
Sorry, mate, but every Gamilaraay person is perfectly at home in English and doesn’t need your little book to help them, certainly not at A$35 a toss, and not based on "444 Kamilaroi subject words".
The "Kamilaroi to English Thesaurus" is a list of 444 Gamilaraay headwords followed by a single word English gloss, or in some instances numbered multiple glosses. This information is taken straight from the web dictionary (or from my Gamilaraay Reference Dictionary published in 1993). This looks like stealing, you might think, but under a strict interpretation of copyright (which Prof Parker has no doubt checked with a lawyer) this is not "creative work". It is not possible to copyright common knowledge such as words and meanings. Unfortunately for Parker, some of the quoted forms, like muRumuRu on page 11 are creative works since they are reconstitutions which I have posited on the basis of 19th century published and unpublished amateur recordings (as explained in the preface of my dictionaries — note that the orthographic R is not a Gamilaraay sound but a cover term for where I could not determine whether the source represented a flap rr or a continuant r). Now that is copying of creative work without attribution, in my view.
Following the gloss there is a string of "synonyms" and, for some entries, "antonyms" which have apparently been computer generated by a program that is seeded by the gloss. This is where the author’s bad scholarship truly comes into play. Here is the entry for bindaya a word borrowed into Australian English as "bindieye" but it, and the word ‘burr’ in the sense of a prickly weed, is clearly missing from Professor Parker’s computer program lexicon:
bindaya burr; synonyms (n) flash, beard, brogue, burring, bramble, drawl, enunciation, inflection, intonation, pronunciation, twang, clinker (v) bur, clank, clink, jangle, pipe, creak, grate, jar, snub (adj) awn, catchweed, cleavers, clivers, goose, grass, hackle, hairif, hatchel"
Look out for those slurred bindieyes when you open your mouth next!
The remaining 443 entries are more-or-less of this type – a useless list of English words that bear little or no connection to the meaning of the Gamilaraay headword and that do not respect its morpho-syntactic category (headwords that contain a hyphen plus conjugation maker, like baaya-li ‘to chop’ are verbs in Gamilaraay but they still get noun glosses in this book). I was interested and surprised to learn that there are aardvarks and pangolins in Australia (listed as synonyms in the entry for bigibila ‘echidna’). And so it goes for 19 pages.
The following "Index of English Subjects and Kamilaroi Subjects" is simply a listing of the English glosses with Gamilaraay headword repeated from the first 19 pages. Again this material is taken from my publications. The "Vocabulary Study Lists" are listings of Gamilaraay headwords with English gloss classified into groups as "Verbs", "Nouns" and "Adjectives" (again done with poor scholarship — ‘black swan’ baRamal (a reconstitution, again) is listed under "Adjectives"). The book ends with a 34 page index giving page references for the English words that appear in all the preceding sections.
When I started reading this sad little book I felt angry, but when I got to the end of it I realised it is so badly done that I had to laugh.
So where does this leave us in relation to copyright and publication without attribution? I believe that what Parker has done is an aberration and that true scholars do not disrespect other scholars’ work in this way (nor disrespect the communities whose languages we seek to document). Also, by making the Gamilaraay dictionary available on-line and depositing the materials with an archive, we are able to demonstrate clearly that our work has been made use of, and intellectual property rights violated. So my advice is do go ahead and publish your work — that’s how you establish copyright and get to assert moral rights. If or when they are abused then point this out publicly, as I have done here.
Oh, and don’t waste your money buying copies of Parker’s terrible book. It just encourages him.
Note 1: Thanks to David Nash for telling me about Parker’s website and books and for corrections and comments on an earlier version of this posting. Thanks also to David Nathan both for digging around inside Parker’s website and for correcting and commenting on an earlier draft of this blog. Neither David can be held responsible for the opinions expressed here.
Note 2: The terms of use state (rather ironically as it turns out):
"Students: If you are a student and want to use some of the sites content for a classroom assignment please feel free to cut and paste sections off of the web site, and paste them directly into your document. All of the pages are Microsoft Word compatible. Remember to include the "Source" at the bottom of the tables so your teachers cannot accuse you of plagiarism."
On this page we find:
"Copyright Notice: This site and its contents are Copyright 2004, Philip M. Parker and Webster’s Online Dictionary (websters-online-dictionary.org). All rights reserved. All contents cited with permission, with license, or quoted under fair use doctrine remain the intellectual property of their respective originators. Other contents are in the public domain, and are used with attribution."
Pity that what is good for the goose isn’t so good for the gander!
Parker has also published Webster’s Kamilaroi to English Crossword Puzzles: Level 1 and Webster’s English to Kamilaroi Crossword Puzzles: Level 1, each costing A$15. I have not wasted my money buying these two books, but I guess the content is also taken from the Gamilaraay Web Dictionary.
I stand corrected, Peter!
Actually I still believe my main point – sadly few linguists are interested in using, let alone misusing, unpublished language data deposited in archives – but this amazing case does highlight a danger of publication, probably especially web publication.
It’s hard to believe that the guy can’t be taken to task somehow for something. According to Amazon he has 85,764 books (that’s right, eighty-five thousand and change). The many language books are bad enough, including not just dictionaries but also items like the “Kamilaroi” crossword puzzle book, but the long series of medical books could surely actually be dangerous to someone’s health.
All but 3 of his 85,764 books are the product of data harvesting and automatic book publishing. I’ve done a bit of background research on this guy and plan to blog about it when I get the time to finalise the post (ALS is taking its toll).
I have bought each of the Wageman [sic] / English crossword books, and despite there being absolutely no information about the language, the speakers, or previous research or anything, they’re not such a bad resource for teaching kids. Although, garbage in, garbage out; if the wordlist that he harvests isn’t ready for publication, the result could be a sub-standard orthography, incorrectly glossed words, whatever.
While it’s a risk to my own legal position, I’ll post a scanned pdf of one of the pages.
Peter is there any lobbying action you could suggest for us concerned ppl to take? Even just an email to him to tell him off?
Wamut – some people have tried this already. Stephen Wilson, whose Wagiman Online Dictionary was the source for materials that Parker published as a “Wageman” print dictionary and crossword puzzles wrote to him and got the following response:
As I noted in my posting, Parker is adopting a strictly legal interpretation of copyright that enables him to view his work as not violating the law regarding copyright materials. So you could write to him to complain but I personally doubt that he will take much notice. It seems that he’s got a little cottage industry going and won’t be diverted by linguists pointing out the error of his ways, but I could be wrong.
I don’t think that this counterexemplifies Andrew’s point at all: this guy is not a scholar, he’s a professor of marketing. That’s like being an instructor in pimping, but with better liquor.
With regard to the question of whether the inclusion of reconstructions makes a difference legally, I am not so sure. The basic legal principle involved is that information cannot be copyrighted, only the expression of that information. The reason that you can publish your own version of the telephone directory, for example,is not that no creativity goes into constructing the numbers and pairing them with names and addresses. Rather, it is that the numbers, names, and addresses are uncopyrightable information, so a publication of them may only be copyrighted insofar as it presents them in a sufficiently creative way. If, for example, you were to publish a telephone directory in which each entry consisted of a limerick, it would be copyrightable due to the creativity of the limericks.
So my inclination is to say that due to the content/expression dichotomy in copyright law, the fact that some of the entries in your dictionary are your reconstructions rather than directly recorded words makes no legal difference. However, there are some decisions about “fictional facts” that might support the opposite conclusion. If you really want to know, short of a decision by a court you need a true copyright guru to look into it.
Bill – the only way really to test whether reconstructions (or reconstitutions based on early wordlists) are considered to be “creative works” is in court, I suspect, and I have neither the funds nor the time to do that. Probably all we can do is yell about how ‘unacademic’ and ‘non-collegial’ Prof Parker is.
Would you be willing to publicise Prof Parker’s exploits on Language Log which would reach a wider audience than this blog or Matjjin-nehen? We know that materials from various Australian languages have been plagiarised but it may be that other linguists could find their e-published data has been vacuumed up by Prof Parker as well. It may be an issue of interest to the wider linguist audience.
Another could tack would be to tell Amazon they are selling bad products. If someone has time to do up a standard email or letter, I’d be happy to send them a copy from me and I’m sure plenty of others will do the same.
Parker did the same thing to our online data for the Cheyenne language. He even included words from another language, Blackfoot, which are not in our online database. See my amazon.com reviews:
http://www.amazon.com/Websters-Cheyenne-English-Thesaurus-Dictionary/dp/0497834677
http://www.amazon.com/Websters-Cheyenne-English-Crossword-Puzzles/dp/0497826089/ref=sr_1_1?ie=UTF8&s=books&qid=1217128806&sr=1-1
I have informed a tribal linguist and suggested that the tribal attorney might want to get involved with the copyright issues. The tribal college owns the copyright to the website as well as a new dictionary which we are readying for publication.
Phillip Parker has recently responded to Wayne Leman saying that he hasn’t received e-mails of complaint, and that perhaps they have been filtered by his junk mail filter. You could try sending more emails in the kind of language that won’t get filtered to
Dear Peter, Jane,
I received an email from Wayne Layman this week pointing me to your blog. If you have sent me an email, I am sorry for not responding; his emails were sent to an account that is monitored for volunteers, and I did not see them until now. I inferred that he felt that you may have sent me an email (or two). I do not have records of these (I have checked all folders, including spam folders, but there may be errors in this way of looking). If you sent me an email, can I ask you a favor to resend to phil.parker AT insead.edu. I will be happy to respond.
Cheers,
Phil
Perhaps “Phil Parker”, having discovered this site, might like to actually reply here to the questions raised, rather than simply pointing out his email problems?
Cheers
Rod
Rod,
Absolutely, I am preparing an open letter. I will finish shortly, but am in the middle of travels (I do not scan blogs, so email is best for me).
Best,
Phil
I, for one, eagerly wait said open letter.
Before I finalize the letter, just an update. Having emailed all the persons I think are concerned (if you were missed, please let me know), I have told all that I am more than happy to add citations to any book in any language where someone asks, or delist any book that has errors or does not serve a pedagogical purpose (due to their being few or no fluent speakers seeking to learn basic English). This is being done now (it can take a week or two). I have offered to correct any errors, for example, on the crossword puzzles or elsewhere, and allow the linguist to republish, freely distibute or use the titles as they wish – or they can delist the title as well (I have noted that educators are always free to make copies). I am still waiting for a response from one, and want to give a summary and more complete explanation of my activities after I hear back from this person. There has been a number of issues raised (copyrights, versus citations, versus errors, versus formats – thesauri versus crosswords), etc. In a nutshell, language and dictionaries are a hobby of mine and in no way an academic endeavor (the books are not at all scholarly, nor intended to be, and I am not a professional linguist, this we can all agree). I am interested in education (irrespective of language), hence the focus on elementary education. I am happy to read that Aidan Wilson found that, – “The books actually appear to be a pretty good educational resource, assuming that the school in Pine Creek is up to the point of recommencing its Wagiman language programs, of which I’ve only ever seen fleeting bits of evidence of ever having taken place.” My only intention is to create educationally useful material for the k to 12+2 student. I will explain why via amazon, whether there are free versions, etc. in the letter. I ageee with Wamut that there is a dearth of bi-lingual materials, and that these may be useful to some people (I am an agnostic, when it comes to format and educational approaches). I am grateful to amazon to agree that I can list low or no volume titles.
I have never earned an income from these activities, which are subsidized by my econometric studies. I had no intention to steal academic credit and in no way wanted to upset. I also did not think that one could be seen to plagiarize using a translation (any definitions or similar explanations are never used). My online dictionary asks for volunteers to help, of which I have had many:
Donate an Electronic Bi-lingual Language File: We are thrilled to accept contributions in bits and bytes. We have been especially grateful to readers who have sent in bi-lingual language files. These must have no copyrights attached, and be in a clean format (i.e. English words or expressions in column#1 and the translation in column #2; Ms Access and/or Excel are preferred file formats but we can handle just about anything). For the moment, we are most interested in languages of indigenous peoples in all regions of the world (including transliterations). We would like to give priority to languages spoken by over 1 million persons, but are happy to accept files from less populous linguistic cultures.
Based on the following reviews (and being listed by various agencies), I have received files from all over (including from academics from major university language departments):
The dyslexicographer
Margaret Marks at Translation Blawg rightly wonders what on earth the Webster’s Online Dictionary (WOD) is all about. Although there is quite a lot of background information available on the site, I decided to find out from its creator Phil Parker. Here’s the score.
A Professor at INSEAD, the European business school, Philip Parker was born dyslexic. This meant he found reading dictionaries – lists of words and their definitions – much easier than sustained prose, which demanded too much time to decipher. So over the past 30 years he has been collecting dictionaries of all kinds. Around the year 2000, large dictionaries on the web started charging for ‘premium’ words of the sort he needed in his research and that really “pissed him off”. So he decided to leverage the definitions he had collected from his own store, borrowed the out-of-copyright ‘Webster’ badge, and started building WOD, which he intends to make the biggest multilingual dictionary site on the web.
He was lucky since he had loads of help from academic and other assistants, benefited from donations of out of print dictionaries and word lists, and was able to finance the whole thing himself. He even uses a firm in Togo to keyboard in content. This summer he hopes to upgrade the site to feature dictionaries covering 600 languages (10% of the world’s current language population), and in the case of existing site languages such as Spanish, he hopes to increase the entry count from around 100,000 to 600,000 entries.
To give global coverage he is working in a sequences of passes. The first pass was to work by time zones, taking a location such as Europe and collecting dictionary materials for all ‘major’ languages. The second pass, now under way, is to include ‘secondary’ languages (say Maltese in Europe). Next year, he plans to start the third pass by incorporating locally endangered languages, using volunteer help where necessary. One technique is this: he donates a computer and a small stipend to missionary children (e.g. for Tarahumara in Mexico) who then create a local language/English dictionary.
What’s next, once he’s got all these bilingual word lists? Create a total lexical linker, whereby you can click from any word to its equivalent in any other language, using English as the underlying pivot language. An “N-dimensional cube of words in every language to every language,” as he puts it, that will by this summer be the world’s largest compilation of language items ever produced. His content currently weighs in at around one terabyte.
How useful is Phil’s site proving? He reckons it is among the top ten sites used to search Arabic words in Arabic script, since the whole hoard has been programmed for Unicode. And because the Webster word is a synonym for ‘dictionary’ for Americans (as Kodak once was for cameras or Google is for search engines) WOD ranks between 5 and 7 on, well, Google for “Webster” out of about 150 “Webster” sites on the web these days. Probably the best way to appreciate the ambition of Phil Parker’s site is to search the term Webster itself, and see the degree of encyclopedic potential – words, images, statistical findings from corpora, sign language versions, et alia multa – that he is trying to pack into what he calls a hobby. But the definitions don’t include a more recent decomposition – web + ster (as in napster) – a linguistic peer to peer resource.
Sounce: http://www.multilingualblog.com/index.php/weblog/the_dyslexicographer
Péter’s Digital Reference Shelf – April 2005
Title: Webster’s Online Dictionary, Rosetta Edition
Publisher: Philip M. Parker, INSEAD
Cost: Free
Tested: March 18-25, 2005
Webster’s Online Dictionary, Rosetta Edition defies my efforts to write a traditional review. I always try to evaluate and review digital ready-reference sources in this column in a systematic way. For example, I test general dictionaries using a benchmark of about 150 terms that represent a mix of contemporary, formal, slang, archaic, recently coined, foreign, borrowed, technical, medical, scientific, and everyday words. I give a score for each ranging from 0 (no entry) to 5 (perfect entry) depending on the quality of definitions, sample sentences, attributions, usage notes, etymology, print and audio pronunciation help and visual illustrations. I can’t do that with this dictionary.
I put the dictionaries in context, comparing them with alternative sources that I reviewed or at least used extensively. I determine the hit rate, add up the scores, and calculate their average, then compare the numbers with those garnered by other dictionaries in the same league. These scores give me a quantifiable result, such as 85% hit rate in the American Heritage Dictionary (4th edition, 2003 digital update) with a total score of 541 points for 154 words (3.51 average) versus Merriam-Webster’s 10th College edition (2002 digital update), with a hit rate of 75%, total score of 380 points for the same 154 words (2.47 average). Then I look at the typography, the layout, and various software aspects and write my review. It still may not be completely objective, but it is at least systematic and is based on extensive samples.
I can’t follow this process with the Rosetta edition of the Webster’s Online Dictionary. It is as if I were to try to describe a jam session featuring many of the best musicians, vocalists, other artists and performers. You must see it, hear it and feel it.
It is the brainchild of professor Philip M. Parker. His very short biography gives a hint of his lexicographic interest and competence. His affiliation with INSEAD may not impress you as much as it should because the institute is not well-known in the U.S. Suffice it to say that last year it was ranked no. 11 among the executive education programs in the worldwide yearly survey of the Financial Times. Its faculty have published many unconventional, eye-opening and award-winning scholarly articles and books. They may not be booked on TV morning shows and afternoon talk fests (a dubious sign of celebrity in the contemporary culture), but this faculty is certainly a very good company for the unorthodox and scholarly thinkers, doers and projects.
The project is based on the 1913 edition of Webster’s Unabridged Dictionary, but that is like saying that New York is based on New Amsterdam. It has been enhanced by millions of copyright-cleared entries (including images, drawings, book covers, posters and photographs) from both historical and contemporary sources. That’s why you would find definitions and examples for such neologisms as bling, blog and wiktionary.
Parker is not only the instigator, but also the editor-in-chief of the dictionary (although he does not use this term). He has been assisted by contributors. But this is not one of the dime-a-dozen free-for-all wiki projects with contributions without attributions – though list is not yet complete. (Yes, I know, the dictionary does include entries from Wikipedia.) Definitions and illustrations for the words are included from a variety of sources, and each entry is meticulously acknowledged and, if possible, linked to – just as in the splendid Answers.com service, formerly known as GuruNet, Sling and Atomica, which I have reviewed more than once in this column and will probably do so again.
The editor seems to bear the brunt of the intellectually demanding selections and compilations. No matter how sophisticated computer technology is applied in this project, I doubt that “*t*he dictionary will soon consist of over 400 modern languages, and 10 ancestral languages, with some 30 million individual entries across languages.”
As for the Rosetta qualifier, it’s an obvious homage to the Rosetta Stone, the important cultural heritage from Egypt that included the same decree in three languages and whose deciphering was crucial for translating hieroglyphic text and for learning about ancient cultures. For further details and background about the project check out the About Us page.
I only illustrate here the lay of the land and pinpoint a few of the landmarks. Don’t start by looking up words such as “love” or “money” as the results will be overwhelming. Instead go for the more esoteric words, such as, well Rosetta.
Each word has its own Web page. Most of the pages are very long, but an excellent index, which is always at hand, can help you skip quickly to the sections that interest you the most. The entries start with a traditional definition, etymology notes when appropriate, and dating of first usage. These are followed by definitions from special/subject dictionaries and crossword puzzles, usage examples from contemporary book and video titles, and even software titles. For the word Rosetta there is a series of images in slide show format (it did not work when I tested it), as well as thumbnails about the object or the person with links to the larger (and sharper) images (photos, engravings, clip arts, etc.).
The next section is the word usage statistics that reveal how frequently the word appears in the 100 million word subset of the huge British National Corpus, and the word’s frequency rank among the 700,000 words used in English. If the word is also a personal name, similar statistics (based on U.S. census data) are shown for its use and popularity rank as first and/or last name. This may be followed by lists of derivative names, company names and compound terms in which the word is used.
The statistical data about the daily use of the word in queries submitted to the most popular English language search engines is very interesting, and a goldmine for Web site optimizers.
Translations of the word in a variety of languages are then listed (only three for this word, but dozens for others) along with a list of words for which rosetta is recommended by spell checking programs as the correct term. This section is followed by direct anagram(s) for the word (such as toaster and rotates) and by various Scrabble riddles with some of the letters in the word. A series of professional photos may follow this section, which includes images of books, CDs, software and household items whose name, or author’s or performer’s first or last name includes the search term.
This section concludes with a few bibliographic citations of primary newspaper and magazine articles from HighBeam with the search word automatically passed forward. HighBeam is not a free service, so you can see only a small snippet of the full HighBeam record if you follow the link. As a nod toward Google, there is a Google search box with your term already in the search cell, ready for launching.
You will probably not use this last option too often because you may already be full from the mountain of well-clustered information about a single word.
And this is only the tip of the iceberg. For other more common words, there are definitions from many more dictionaries, encyclopedias and thesauri in zillions of contexts with example sentences to illustrate the use of the word in a great variety of sources, such as:
* the Bible (in numerous English versions of different eras and in many foreign language translations)
* classical literary texts (dramas, novels, poems)
* famous historical and contemporary speeches and talks
* contemporary fiction and non-fiction
The word is often shown in different notation systems and orthographies ranging from hexadecimal notations to Braille and Morse code, from sign language to Leonardo’s mirror-writing. For some words there are also animations and sound bites.
This is a fascinating carnival of words. It is a very smart and honest project aimed at appreciating and learning about English, as well as foreign languages and cultures. After all, Noah Webster was a polyglot and solving the enigma of the Rosetta Stone depended on understanding foreign languages and cultures.
Source: http://www.galegroup.com/free_resources/reference/peter/current.htm
I have received over 27 million translations across 2000+ file/databse submissions (many databases cover hundreds of langauges) from 200+ volunteers across 1000 langauges. I have been swapping or purchasing files for years with many academics, NGOs, PVOs, governments, translators, etc., and none have asked for citation, so I did not know this was an issue. I was told all languages and simple translations are public domain (more on this later). Furthermore, files submitted do not indicate the primary sources used for each translation. I am now doing an extensive audit to see if I can backtrack any original sources that might want credit or that could have been used (I do not have a bot running across the internet; this is a more manual process; for Cheyenne, for example, we had 10 different submissions, some of which used online sources – the online sources that might have been used show over 22,000 entries, but I received 780 translations, many of which overlapped with each other, some of which came from the local community – we receive files from high school and college students).
One person, maybe a year ago, asked for a citation on my online dictionary (an Australian), but at the time, we were unable to do so quickly, as this would result in us re-generating all the pages of the dictionary that can take several years (we also did not have someone available to make these changes on the dictionary). We created a credit page and listed his work (but not in the right format). In a new version of the site (based on XML), we will be listing all possible translation sources, which should encourage more submissions. A second person, Stephen Wilson, authored a book on Wagiman (listed on amazon) and claimed copyrights over the language and its translations; he asked that I delist a title. I did so last summer, not over copyrights, but because my intention was not to upset anyone, least of all a linguist studying the language (the language has less than 10 speakers, so the utility of such a book is reduced). Stephen did not ask for citation, but claimed a copyright (I will write about what I have been told about this later). For another person, the story is slightly different, etc., but the sum has been me delisting titles for about 5 extinct, nearly extinct, or endangered languages, and some titles will be relisted with citation and/or corrections following the wishes of the linguist involved (by the way, the name of the language we use in the title may be different that what the linguist uses on their site/publications, because we use the name used with the file submitted, and we try to cross check the use of that name in Ethnologue – this is not to avoid copyright issues, but rather different people refer to languages using different names).
As I do not scan the internet for the use of my own name (there are several million pages indexed in google and well over 1000 sites that link to the dictionary), I am sorry I was not aware of the conerns raised on this blog (neither Jane nor Peter emailed me but used a blog instead to raise their concerns) – this is regrettable. I was unaware that blogs are used to convey such concerns, and this is the first public criticism of my activities (as far as I know, with the exceptions of a few weird emails about the politics of definitions, that come with editing dictionaries). The blog came to my attention months later, thanks to someone in INSEAD’s external relations department who thought I had received an email from Wayne Leman, who I promptly contacted and he then sent a link to this page. If you have not as yet contacted me, please feel free to do so using my academic address (my online dictionary has no permanent staff, and we use its email address for volunteers to submit files, so the emails sent to that address are not regularly monitored).
More on my projects (which date back to the 1990s), its processes, its motivation, what I have been told about intellectual property laws that you may find interesting, my plans for bilingual educational materials for kids and older students, etc. will come later. When I heard there was a concern, I consulted with INSEAD’s Dean, the Dean of R&D, and Dean of Faculty who agreed with my suggetion that an open letter is appropriate in this situation (as it seems that I may not have received emails from other persons who might be concerned). I see this as a learning opportunity for myself and am sorry for making anyone angry, and I hope I can share this with others.
On a different note, I am creating a cross-cultural research platform via my site, which takes a non-Anglo-saxon perspective (e.g. an Arabic speaker wishing to learn more about Tarahumarra, and vice versa, without passing through the English language). The modifications will be able to gather quantitative sociometric and econometric data across linguistic cultures. If you find this interesting or would like to collaborate or use the data in any way, please let me know. If you have not seen my latest “Word of the Day” effort, please check it out at http://www.websters-online-dictionary.org. Click on the word of the day, hour, minute, etc. I will be doing similar efforts for all languages (even the smallest) – bringing the power of Youtube and automated video creation to bilingual education (a popular format for the younger ones). Let me know if you like the quality of the videos (you need to wait till the end of them to see the definition).
Phil
p.s. I will prepare the letter on IPR, etc., after I hear back from the last linguist I have emailed.
Hi everyone. Since my last post, above, I have not really heard from anyone.
I am also waiting from Claire and Peter on a final clarifications to emails. I am generally encouraged by the comments of Aidan Wilson (who bought the Wagiman crosswords/thesaurus) who found “they’re not such a bad resource for teaching kids,” and that they “actually appear to be a pretty good educational resource, assuming that the school in Pine Creek is up to the point of recommencing its Wagiman language programs, of which I’ve only ever seen fleeting bits of evidence of ever having taken place.”? I am encouraged since the titles were explicitly designed for younger, local language fluent audiences (as is my online dictionary – the average users are teens, with about 70 percent coming from outside the Anglophone world).
As mentioned earlier, I have delisted all titles when someone has made this request (up to yesterday, that covered 3 languages). Aidan Wilson recently showed this blog to a Sydney journalist; I spoke to this person about what I have learned – he learned of my previous decisions to delist the titles.
While I have not heard from anyone in response to my blog postng above, after speaking to the journalist I decided to delist all the English learning books created for all native Australian language speakers (even titles that no-one has mentioned, or purchased). I have not heard from a local speaker about any titles. I have delisted the equivalent titles for native American languages as well, thinking ‘latent’ reactions might be similar (though a local native speaker/colleague of mine thinks I should do otherwise). I have been contacted (or I contacted) persons working on 3 Australian languages (Wagiman, Gamilaraay, and Nhirrpi) who created dictionaries or wordlists on those languages (who made copyright claims directly to me), and from Wayne for Cheyenne. I am not sure why I did not hear from others. Again, my goal was to never offend or upset, and I am sorry this happened. I am still working on my K to 12 +2 project covering bi-lingual and mono-lingual math, science, biology, chemistry, reading, spelling, etc. materials for all languages that could not have these materials based on the current economics of the publishing industry (I am worried that my approach may be the only feasible way to do so in the near term). These are created for educators for use by younger students (you will be seeing some cool java games, cartoon formatted, etc. that tests have shown are pretty fun and educational). For paper versions, I have always given educators the rights to make full copies as they wish. Online versions (non-paper) will continue to be free. Amazon, being the only firm that can deliver paper versions to the remotest areas of the world, will continue to be available for teachers that need paper versions to work from (many village schools do not have computer access, or do not use computers in teaching; teachers copy from books to blackboards – my personal experience is limited to West Africa, and paper versions are about the only way to go). I am excluding Australian and native American languages (not due to copyright issues, but due to the non-copyright concerns) from this part of my project. In addition, while it looks like the titles are Anglo-centric, I will be doing crosses to/from languages without having to pass via English.
On the technical front, when a title is delisted, Amazon labels it “temporarily out of stock”. See:
http://www.amazon.com/s/ref=nb_ss_b?url=search-alias%3Dstripbooks&field-keywords=0497837595&x=0&y=0
The titles, however, are permanently out of stock (their computer program that posts status does not distinguish between the two – at this point at least). Delisting can take some time. All told, it appears that less than 10 copies of the books being delisted were sold.
Andrew Garrett laments that sadly few linguists are interested in using language data deposited in archives. I am more than happy to share research data (coming in 2009 from my online dictionary – see the geographic distribution graph I just added at http://www.websters-online-dictionary.org), and help members of this blog create educational materials (with any citations one might want), offer these for local publication by the linguists or local speakers (completely under their control and IP), and will be happy to include any local languages in my activities if you request it. So, if Aiden or anyone else interested in educational materials for Australian language speakers thinks that children would like to use the puzzles, or create new ones, I will be happy to create and email them to you (or relist the title he thought might be useful to kids). My educational activities including my online dictionary, is a hobby of mine, so I am happy to help. I am working with some eminent international educators who might also be interested in topics you raise.
My open letter (in draft form) is basically finished, while I wait for word from Peter Austin and Claire Bowern.
Cheers,
Phil
Hello,
I have heard back from Claire (the question was her use of the expression “moral rights” which Peter also used) and from Peter who gave me the name of someone who can look into the crosswords being appropriate for Gamilaraay revitalization. I am overseas this week, and will be back home next week, and will probably polish the letter then.
Thanks to all.
Phil
Jane, Peter, Wayne, Claire, Aidan and Stephen,
I have drafted two documents, one is a generic “Open Letter to Field Linguists” (which summarizes the previous posts here), and the other is a guide to “Copyrights and Moral Rights for Languages and their Translations” which relates to what I have been told by legal experts, linguists (including yourselves), and few Native American language professors, which may be of interest to your community. I will not post these here (as they are lengthy and potentially of relevance to persons beyond those familiar with this blog). Rather, I will post these on the ‘volunteer page’ of my dictionary (linking to/from all languages). I am not sure when this will appear, as we are in the midst of updating the site (a major overhaul that has been in the works for the last 3 years). I will modify the volunteer page (and elsewhere, when appropriate) to note that some communities and/or non-native linguists claim ownership rights over bilingual translations to/from the languages they speak, or study, etc. (I am trying to compile a list of these, though this is difficult – there does not seem to be a public list of these languages or linguists). In the case of the translations between Nhirrpi and English, Claire has emailed me that she and an estate (of perhaps another linguist who may have passed away) claim ownership rights over the translations; the last Australian speaker passed away in the 1960s. If there are other such cases, please feel free to let me know. For now, I prefer to err on the side of prudence, and assume that the above might hold for all extinct and existing Australian languages.
In this regard, I would be happy to receive any emails giving any constructive comments or advice on the next phases of my project, as there will definitely be references to Australian languages. Minimally, for each language, there will be citations to as much of the extant literature as possible (being linked to/from each translation, within each language) – including as many library holdings and other references as I could find. Here is the link to the ‘demo hub’ that is used by programmers to set this process up (virtually none of the secondary links within work yet) – think of this as a construction site at this point.
Demo hub: http://195.101.240.36/functions/sections/demo.asp
Please type in the words like “Kamilaroi” or “Warumungu” or “Warlpiri” (please note that these pages are themselves incomplete, but give you an idea). For all words, any usage (including proper noun usage), is noted for various domains (e.g. Wagiman is also the last name of an inventor). You will note the bibliographic references, as well as the Wikipedia pages for some languages, which, for example, Aidan has edited. I am also now collecting all known online sources (links) that people may have or will perhaps use for bi-lingual translations, and these will be posted as well (for both citation, and usage purposes). At the end of each bibliography is a link to titles at amazon.com where publications by Jane, Peter, Wayne and Stephen are sold or listed, just in case they are not listed in the bibliographies (this will ensure that new materials you create or offer for sale are linked as a reference).
In addition, you will note the ‘look’ of the dictionary can appear juvenile, as it is meant to appeal to younger audiences. We will be including various mono-lingual and bi-lingual games as well. In the following link, please type in the word ‘zeal’ to get a flavor for these. While created for kids, you will see that some adults may find these fun/challenging as well. The bi-lingual versions will give a word in one language, and the solution will be in another. The monolingual games (e.g. Maltese to Maltese) will be based on games like hangman, word search, etc. Here is the link (again, type in ‘zeal’ to start the process – these are the monolingual English versions).
Game demo: http://195.101.240.36/gametest/mainmenu/GameDemo.asp
If you think that no Australian language community might benefit from this or that specific ones should be excluded, please email me.
Finally, people have asked for French, German and especially English ‘pronunciation’ features (e.g. people in Africa or Latin America wanting to know how to pronounce words in the local commercial languages). I have created “EVE” who will speak about 10 languages at first, and then perhaps use phonetic morphs for others. She is really in rough stages, and will be posted sometime later this year when full functionality is in place. One will be able to ask her, for example,
Q: Say japbany in English. (typed in my a user)
A: Slow. (spoken by EVE)
or
Q: What is japbany? (or similar question)
A: It is a Wagiman word for “slow”.
or
Q: Say japbany in Chinese. (typed in English by a Wagiman speaker, or by a Chinese speaker who types in Chinese)
A: – (she responds in Chinese “The English translation for Japany is ‘slow’.”)
Or, I can modify her responses to be variable:
Q: What is japbany?
A: According to Stephen Wilson’s online dictionary, it is a Wagiman word for ‘slow’.
The system is XML based and is quite flexible:
Eve: http://195.101.240.36/functions/Sections/eve/evenew.asp
Working with a number of educators (e.g. ELS teachers, etc.), I am having many of her responses modeled using graph theory and a meta-analytic framework (a hybrid MDS). Users will be able to choose which language they would like to learn, and, when relevant, bi-lingual subtitles will appear below the images displayed. She will not be able to speak all languages, e.g. Wagiman; however, the bi-lingual subtitles will be available for all languages, of the user’s choice. I am working on a phonetic system blending phonemes across 12 languages to create language-specific pronunciations (i.e. blending French with English to create an Urdu word – this is very experimental at this point). If successful, we may be able to theoretically have her pronounce any word in any language for which there is a common or bridged phonetic translation (this is a long way off). While I personally do not like chat bots, some young people do enjoy this and my dictionary users from all over have requested this. Again, if you think that no Australian language community might benefit from this or that particular ones should be excluded, please let me know.
It may seem odd given Peter’s comments, but I am a stickler for giving citation (referees for my peer-reviewed work often slap me for this). In a review of my online dictionary, which for about 8 years has posted many bi-lingual translations (with some 1 million+ views a month from about 190 countries), a noted dictionary reviewer, Peter Jasco, wrote “Definitions and illustrations for the words are included from a variety of sources, and each entry is meticulously acknowledged and, if possible, linked to”. I have been careful to give attribution in fields where I thought this is tradition. Having started with the largest languages (citing the first authors of word pairs between French and English is simply not feasible, nor a tradition for things like crossword puzzles, language learning publications, or online sites, like babel fish, etc.), it did not dawn on me that the citation tradition would change once one fell into endangered languages. My naïve “faux pas” of not citing linguists was inadvertent, and the result of me starting with the larger languages where such questions have never arisen, but also by me receiving millions of translations from various language Professors and native speakers around the world (I have exchanged files with over the years), who have never suggested citation requirements or claimed ownership rights. For my oversight, I apologize. I have done this process so that all languages can be linked to all others (e.g. Arabic to Maltese, not just English to Maltese — as all languages can appear on the same page for a given subject word). If you think that any particular Australian language community might not benefit from this or should be excluded, please email me. Feel free to share this request to other linguists or language teachers you might know that would also have on opinion (I have only heard from you at this point). Please keep in mind that the focus is not teaching Anglophones non-English languages, but the reverse (teaching Japanese to someone who speaks Ewe; or teaching Ewe to Chinese speakers, working in Togo; or teaching Korean to someone in the Wagiman community, etc.), esp. for younger audiences.
Finally, many thanks to Wayne for already giving me more Cheyenne bibliographic references. Please note that the demo hubs are not at all ready for public consumption (many sections and letters are not loaded, etc.). There are many rough edges, so please see it for what it is – a sand box of a hobbyist. I will not ignore your inputs.
Best Regards,
Phil Parker
I recalled this discussion as I came across something entitled Webster’s Ilongot-English dictionary on GoogleBooks. I was quite surprised as this is a language of the North Philippines which I’ve been interested in for some time and I am sure that there is no dictionary for it nor has anyone else been working on it. The error it seems here was even more grave than the ones recounted above. This dictionary was not of the Ilongot language at all but rather of the Ilonggo language (a completely different language from another region in the Philippines). How can we take seriously this mock concern for the speakers on the part of the copyist Phil Parker if he cannot even be bothered to correctly copy the name of the language?!? I find this disinformation highly insulting to all communities involved as well as to the scholars who produced the original.