Sunday, January 25, 2009

A Meeting of Interests

Despite our expectations, the Holy Spirit Research Center has few handwritten manuscripts in its possession. Much of its material is comprised of old articles, magazines, and typeset research/publications. Though these do not need to be transcribed from being handwritten to searchable, digital text—which was our original intent in developing this technology—the nonetheless expressed an interest in the idea of converting the texts they do possess to being both digital and searchable. The Head of the research center expressed a desire to begin digitizing the texts they possess en masse via a device released by Snapster. This device uses two digital cameras to photograph the pages of an open-resting book—such that the binding will not be broken—and then uses the Snapster technology we intend to use to make organized PDF files of the book. This, however, does not convert the digitized texts to being searchable, which has become an information-age necessity to facilitate and hasten research. Here Dr. Lang and I intercepted interests with the Holy Spirit Research Center: we can use crowdsourcing technology to take digitized texts—and still if necessary handwritten texts—and convert them to searchable texts. This will begin with some texts that are largely unreadable by standard OCR technology: a compilation of some of the original Azusa Street movement articles. This will be the first step in our progress to producing a nonprofit transcription method to local and eventually national libraries.

Castingwords.com and Crowdsourcing (in Brief)

Castingwords.com is an online audio transcription service that receives audio of various qualities via either mail or internet. It takes the audio that was sent, breaks it into various segments of time, and uses the Amazon-based program “Mechanical Turk” to transcribe the audio using crowdsourcing technology. Crowdsourcing is the process by which a company outsources a function to an undefined network of people rather than hiring one or several professionals to accomplish the same function*. The company chooses the “winning” method of solving the function, compensates the successful user the predetermined (generally cheap) reward, and keeps the rights to the work and method. This process can occur by users cooperating, operating individually, or many individuals completing individual tasks that amount to a cohesive whole. It provides the company a solution to its function without having to pay higher wages, while the users benefit with a quick job and compensation, which is especially profitable for users in foreign countries where they have a greater value in the American dollar than their monetary system. Furthermore, the company has a broader pool of amateur and potentially expert talent to select from and pays strictly when it is satisfied with the product.

Negatives, Drawbacks, Counterarguments
Corwdsourcing by nature entails a lack of accountability. In not hiring a professional or working with a specific enclave of people, the company is less able to hold the employed accountable to smooth progress or finishing by a certain deadline. There are weaker forms of accountability, such as predetermined dates for completion, just as there are predetermined expectations for quality, but there is not the level of commitment that comes standard with professional employment with contracts. The worker is viable to either begin a task and not complete it or not attain to the standard of the company. Thus there is no assurance that anyone who undertakes the crowdsourced task will produce the quality of work the company prefers. Because of this risk, the time—and money—taken to inspect the quality and accuracy of the work, especially on large-scale projects, could potentially result in crowdsourcing being less profitable than hiring another more controlled business model. We will attempt to circumvent this discrepancy by not only crowdsourcing the original manuscript but also the proofreading process. If the submitted files match in size, the transcription will be accepted; if not, then we will crowdsource again until the transcription and proofread text match.

Snapter in Brief

Snapter is a recently developed internet software that converts digital images of paper into PDF format. It uses “complicated algorithms” to crop, straighten, and flatten the pictured paper—the flattening is especially useful when one is uploading images of books with the notorious “roll” in the page. One has then successfully scanned their book without the wear-and-tear on the book, which is generally ideal when working with old books that have accrued both value and dust. Snapter would ideally be able to facilitate the process of scanning the books libraries want to transcribe without the concern of destroying the original manuscript in the process. With the images converted into PDF format by Snapter, we can then easily distribute the PDF files to the crowdsourcing users (probably through Mturk) in a format they can access, read, and transcribe.