On the Written, the Digital, and the Transcription: Data Collection Logistics

In an attempt to expedite the transcription process, I have found it necessary to raise the compensation per transcribed page. Each page is at present valued at a dime, which, while still considerably cheaper than the $1.oo per page value suggested by one chagrined Turker, is not as desirable as the $.03 and $.05 we were paying last month. Unfortunately, while there doesn't seem to be a proportional relationship yet between compensation and quality, there is a relationship between compensation and the rate of tasks accomplished. While tasks can be accepted for low rates, they, understandably, are not the priority for Turkers. Being at the "bottom of the barrel" is never preferred when working with deadlines--senior paper deadlines now, but also library deadlines later if this project develops. My concern right now is finding a tenable value that enables our product to be expedient and affordable. It may take some calculus.

Requesting high numbers of tasks is preferable if manageable. One present hurdle is how to identify which document the worker has attempted to transcribe. As of now, I am having to manually check by surveying the manuscript images. This would have to be replaced by some method of automation if larger quantities of manuscripts had to be transcribed at a time--for instance, if multiple libraries needed to have work done. Also, sometimes certain tasks within a given cluster of manuscripts get picked up, leaving others still untouched. This means that new CVS files have to be made containing only those manuscripts that were left unfinished from prior requests. This also has to be done for dud or inadequate submissions from workers. Sometimes workers simply cannot read the manuscript; sometimes I receive an advertisement as submission (and I'm usually not interested enough in the product to bear the inconvenience of having the task request wasted). Again, automation to alleviate some of these issues will be difficult but probably necessary.

On the Written, the Digital, and the Transcription

Followers

Blog Archive

About Me

Friday, February 25, 2011

Data Collection Logistics

1 comment: