Monday, September 7, 2009

Manuscript Complications

An earlier post designated UCLA's Catalogue of Digitized Medieval Manuscripts as our primary affiliate project for transcription. We thought that since the collection is incipient, there might be more alacrity for joining with our project to develop a program to promote the distribution of classical writings and enable the viewers of their collection to actually comprehend the contents of their manuscripts. However, this convenience also presented a sort of catch-22. Though the cryptic nature of most of the manuscripts in their collection validate to some degree our endeavors, they also complicate the process of transcription: if MTurk users are unable to read the manuscripts, then they are unable to transcribe them for anyone else. It is also then commensurately difficult to "proofread" the submissions we would receive from those workers.


After discovering that at least for immediate purposes UCLA's collection proved undesirable, I decided to browse the internet for other digital collections of handwritten manuscripts. The best candidate is an online collection presented by the Library of Congress called "The Thomas Jefferson Papers." It is purportedly the largest collection of Thomas Jefferson's handwritten manuscripts in the world (over 27,000), including letters and speeches. These have several advantages. Being as though they were written to be read, they are predominantly legible. Also, though the speech is dated in some respects, the letters and speeches are written in English, providing the workers with context to aid in transcription. Furthermore, the collection has a cultural relevance that may further prove to substantiate our project. Our pilot-run will likely be using one of these manuscripts.

Below is a link to an image of one of Jefferson's letters of correspondence:

Logic, Logistics, and Formats

The last entry discussed in brief the my attempts to develop an adequate template for crowdsourcing the manuscripts to be transcribed. Since, I have encountered a few new challenges--and occasionally "frustrations"--and learned a bit more of what is required of the "Requesters" using MTurk. The next few entries will address some of the logistics of "piloting" our concept through Mechanical Turk.



It should first be noted that, while we were originally intending to have MTurk HITs e-mail the image transcriptions to WrittenRummage@gmail.com, it has now shown itself prudent to operate primarily through MTurk. In designing a template for the general form of all transciption tasks we will be requesting, I found that MTurk supplies the requester with a variety of tools to serve whatever function might prove necessary, including multiple choice bubbles for surveys or text boxes for purposes similar to ours. Given that as a Requester we have to have one of these supplied tools in our template, it is inferred that Mechanical Turk wants most--if not all--information transfer to be done through their program. This is not entirely inconvenient or impractical; we can use the provided text box as the area in which HITs will transcribe, thus saving several delaying steps of log-ins and file formatting. I therefore changed the instructions in the template.

Images will be provided once the first MTurk request is actually made.