This semester, I have been helping out Jane with her wonderful Field Methods class in technical matters such as recording, uploading files onto the server and allowing students to securely and quickly download both .wav and .mp3 files. I took this course myself some years ago, and it was a great experience for me and the whole class, and many members of that class have continued on in their studies to do field research of their own, and I’m sure the Field Methods class was as much a help to their research as it was to mine.
But this post is not about when I took the class. Instead, it’s about how I almost buggered up this semester’s class in what can best be described as a lesson in keeping backups of your recordings.
(Warning: Some computer nerd stuff follows after the fold.)
The course is being run in conjunction with Paradisec, which is where my helping hand comes in. We provided the equipment for the class to record their informants (two Karo Batak speakers), and provided space on our server for the recordings to sit. Eventually they will be archived in the larger Paradisec collection.
I always like to find ways of doing things quickly and simply using fairly simple programming. I’m not much of a programmer as such, but I know my way around bash, and have been using it to do most things automatically, such as moving things around and producing mp3 files of each recording.
This week, the field methods students began their individual sessions with their informant, meaning that there are suddenly many more than just one recording per week; in fact there are closer to five or six per day, for two days per week. With this in mind, Jane suggested we organise the recordings into directories based on what day they were recorded. A very sound suggestion which I was happy to implement.
Of course, it would have been too easy to do it manually; so I tried to do it in a couple of lines of code. The first step was to take the names of the recordings (which are named in line with out specifications at Paradisec), and create directories based on those filenames such that they will contain all recordings on a given day. To take an example, we might have a list of recordings such as the following.
- FM2-20100310-01.wav
- FM2-20100317-01.wav
- FM2-20100317-02.wav
The command I wrote would create two directories, called FM2-20100310 and FM2-20100317 (the command would also try to produce a directory for the last file, but it fails, since it already exists after being created for the second). Here was the code:
$ for x in *; do mkdir ${x%-*} ; done
Translation: for all files, make a directory of the same name, but strip off everything from the last dash (-).
This worked fine, and despite some redundancy I had a bunch of directories, one for each day. The next step then was to move each file, like those above, into the directories that correspond to that day (which is always predictable from the filename). The code for this should have been:
$ for x in *; do mv $x ${x%-*}/ ; done
Translation: For all files, move them to the directory which has the same name, but with everything from the last dash (-) stripped off.
However I missed the crucial forward-slash in the section ${x%-*}/, meaning I had sent the files not to the directories of the same name, but to files of that name.
Now, when you have many files of the same day, the output filename for this command is the same. So the way the command is run, it takes the first file, say FM2-20100420-01.wav, and moves it (which is synonymous with renaming it) to the file FM2-20100420. If there is a second file, let’s say FM2-20100420-02.wav, then it similarly moves it to FM2-20100420, thus overwriting what was there before.
As I pointed out earlier, only this week did the class begin their individual sessions, so only this week was there more than a single recording in a given day. And therefore only recordings from this week were adversely affected (by which I mean deleted). The others were merely renamed.
Luckily, I realised what was going on by the fact that it was taking far too long to perform a mere move, and managed to stop it after only a couple of files got deleted. Even more luckily, especially as this saved my own skin, we have kept backups and the data is safe.
The problem can be boiled down – computationally speaking – to a mere missing slash. But the real culprit here was my trying to be too clever by half.
So let this be a lesson: Always, always keep backups. Especially if you are going to do any work on your recordings, even if you think it’s as mundane as simply moving them from one location to another.
I always tell my students there are three basic principles of documentary linguistics: backup, backup, backup. And not just any kind of backup – one that’s not useless (for which have a look at this advice).
Interesting to see the word “informant” used a couple of times in this post.
And Backup Offsite – a student has just had her laptop AND backups on memory sticks stolen. All that labour coding data…. Terrible.
Jane – that’s one of the ways to create useless backups, as explained in the web page I linked to.
I recommend using Dropbox or similar facilities (like Files Anywhere or Jungle Disk) which provide a couple of gigabytes of storage for free, or more storage at fairly low cost. Alternatively, open a Gmail account and email files to yourself, or get on Google Docs which provides 1 Gigabyte of free storage (you can purchase more for US 25 cents per Gigabyte, and you can set different access priviledges to files stored there). Reportedly it will soon be possible to store any type of file, including audio and video, on Google Docs in its original format (though there is an upload limit of 250 Mbytes per file)
Aidan,
You might like to add “Always carefully test your regular expressions” to your lesson.
As on old Unix hand I usually test things like this by replacing the “mv” or “mkdir” with an “echo” and have a close look at the output before committing to something irrevocable.
Oh, and Peter, I recommend Dropbox to everyone. I swear by it for both backup and convenience – no more leaving files at home or work.
Tony
Yes, I was introduced to Dropbox by Claire Bowern and have used it extensively, both to backup files and to share files with colleagues on the other side of the world when doing a joint publication project. I have found it simple and easy to use, and, as you say, it means files are accessible from anywhere. We have also used Googledocs to write documents together (so much easier than emailing “track changes” documents back and forth) though there are limitations on what you can do with Googledocs, especially in terms of formatting.
This site has some great information about backup — different types and methods are discussed in detail. It is intended for digital photographers and was an initiative funded by Library of Congress, but has advice with wide applicability. See also here.
Hi Aidan, I enjoyed reading this little anecdote, having tried myself to be too clever with little shell scripts. Apart from the obvious importance of backups which you thankfully had, I always like to try it out on a test file/directory first.
Nice to hear you are helping out with the Field Methods class – I have such fond memories of our semester with Muna. That course is gold.
Shelley.