manual to using · web viewthe tasks posted on mturk are called human intelligence tasks...

44
Manual to Using Amazon Mechanical Turk

Upload: ngobao

Post on 13-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Manual to Using Amazon Mechanical Turk

Shaney Flores, May 2016

Page 2: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Table of Contents

I. The Basics

II. Requester Site

III. Production Site

IV. Requester/Production Sandbox Site

V. Building a Study for MTurk

VI. Command Line Tools

a. mturk.properties

b. .input file

c. .question file

d. .properties file

e. run.sh

f. getResults.sh

g. approveWork.sh

h. reviewResults.sh

i. extendHITs.sh

VII. Communicating with Workers

VIII. Custom Made MTurk Scripts

a. assignQuals.py

b. grantBonus.py

Page 3: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

I. The Basics

Amazon Mechanical Turk (MTurk) is a service offered by Amazon where Amazon users can complete small tasks for monetary rewards. Originally developed to assist Amazon in recruiting humans to identify duplicate product pages on its retail site, it has quickly become a new method for behavioral researchers to run quick, large sample, cost-effective online studies with roughly similar validity to running a study in the lab.

The tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers. The community of users who post HITs are called requesters. When posting HITs, requesters can create multiple copies of the same HIT (i.e., make it so that several workers can completed the same HIT). These multiple copies are called assignments. For example, if I have a task where people watch and segment a single movie, I can post that task as a single HIT and then add 29 more assignments to it so that 30 total workers can complete the same task for me.

Because of the distinction between workers and requesters, Amazon has built two sites to accommodate each group. The site where workers can accept, complete and submit HITs is at https://www.mturk.com/mturk/welcome. The site where requesters can build and post HITs and review work submitted is at https://www.requester.mturk.com. Both of these sites require an Amazon account. The account for the DCL is [email protected]. The lab manager can provide the password for the account.

HITs on MTurk typically pay pretty small amounts ($1 - $5). As more and more researchers have started to use the service, the amount workers expect to get paid as gone up as well. It is recommended that we pay an amount competitive with other researchers using MTurk. The more you pay for a HIT, the faster the HITs will be completed. However, MTurk does charge you for posting HITs. Currently, there is a 20% administration fee for posting a HIT and an additional 20% service fee if you post more than 8 assignments of a HIT. You will need to be wary of how much you are paying workers and how much you will pay Amazon.

Experiments conducted on MTurk can be considered low-risk. Therefore it is advised that whenever you go through the Human Research Protection Office to get approval to conduct an MTurk study, you request a waiver of documented consent. Additionally, unless you are collecting PHI or other identifiable information from workers, you may also wish to classify the project as an exempt project to avoid submitting a continuing review every year.

Further reading providing an overview of the MTurk system can be found in:Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavioral Research Methods, 44(1), 1-23.

You may also want to follow Amazon’s blog for recent developments to the MTurk platform at this site.

Page 4: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

II. Requester Site

The lab is considered a requester on MTurk because we post online studies as HITs. Therefore, the site you will be using most is the requester site. To access this site, simply go to the requester site URL shown above (in “The Basics” section) and log in.

Once you have logged in, there are multiple actions you could take. If you need to add money to the account, then you will need to click on the “My Account” link in the upper right-hand corner (as see in Figure 1). You will be directed to a page where you can add money to the account, view the current balance, change the username and password, and view a transaction history for money from our account to workers or MTurk.

Figure 1. “My Account” page of MTurk requester site. Here you can manage money paid for HITs, view current balance and transaction history, and change user settings.

Whenever a HIT is posted, the administrative/service fees and amount paid to workers for completing that HIT will be held for liability and will be deducted from your available balance. This is to prevent you from posting more HITs than you can afford. This money will not be paid out until the requester approves the HITs.

Another action you can do is to create HITs through MTurk’s web interface. This is mainly reserved for creating “compensation HITs,” HITs that are posted for workers who experienced a technical or other issue that prevented them from actually completing one of our studies. To post this kind of HIT, you will need to click on the “Create” tab on the requester site page (as seen in Figure 2). Then you will need to either click on “Copy” to create a new HIT or click on “Publish Batch” and then enter the number of HITs you

Page 5: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

would like to create. You will then need to provide the worker with the URL to access the compensation HIT. Workers will then simply accept the HIT and submit it.

Once it is submitted, the requester can then approve the HIT and pay a bonus equal to the amount the worker would’ve received for the HIT. When setting up this kind of HIT, always set the amount to be paid out to the worker as $0. This will help hide the HIT from other workers or “bots” that may wish to take advantage of you not thoroughly monitoring this HIT. Additionally, it allows us the flexibility to compensate a worker whatever amount we feel is appropriate (typically how much they would’ve received had they been able to finish the HIT).

Compensating workers in this manner offers two benefits: (1) it improves the worker’s HIT acceptance rate, making them more desirable for other requesters to recruit, and (2) it pays them for the work they would have submitted had their not been an issue. Both of these benefits break in favor of the worker, and in the MTurk community, happy workers are more likely to recommend to other workers to do your HITs.

Figure 2. If you want to create a new compensation HIT, you can click “Copy” for a previously created HIT and then change all the necessary fields. If you wish to reuse an old HIT, you can simply click on “Publish Batch” and post a new batch of HITs. However, once a HIT is created, you cannot change its properties or fields, so be sure you check what those are before posting a new batch for a previously created HIT.

To manage currently running HITs (i.e., approval or reject submissions, expire HITs, add additional assignments to a HIT, etc.), you can navigate to the “Manage” section (shown in Figure 3). Here, each HIT created through the MTurk web interface will be shown in an ascending list based on creation date.

Page 6: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Figure 3. The Manage HITs page. Clicking on the name of the HIT will take you to a web-management page for the HIT where you view properties of the HIT and download results from the HIT. It is recommended that you do not use this page to manage your currently running HITs.

By clicking on the name of the HIT, you will enter a simplified web-management properties page for that specific HIT (as seen in Figure 4) where you can view properties associated with the HIT, costs allocated to the HIT, and download worker submissions. The number of actions you can actually take on this page is somewhat limited therefore it is more useful to manage HITs individually where you have more options on what actions to take (more on this further below).

In the “Manage” tab, you will also be able to access a webpage that lists the worker id number for every worker that has completed a HIT for our group (see in Figure 4). Workers on this page are sorted in alphanumeric order by their worker id number, starting from the second character (every worker id is prefixed with the letter A as the first character). Clicking on someone’s worker id will take you to another page where you can assign/revoke qualifications for the worker, pay the worker a bonus, or block the worker from accepting any future HITs (it is highly encouraged that you do not block a worker unless it is absolutely necessary as blocking a worker can have negative repercussions!).

Page 7: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Figure 4. A listing of worker ids for workers who have previously completed HITs for us.

For HITs that were not created through the MTurk web interface (i.e., HITs created through the command line tools), you will need to click on the “Manage HITs individually” link in the right corner of the Manage>>Results page. This page will also list any HITs created through the MTurk web interface. It is on this page where you will have the great flexibility and array of actions you can take to manage your currently running HITs. You should use this page.

Figure 5. This page allows you to individually manage each HIT posted onto MTurk from the CLTs or the web interface. You will want to be careful when performing actions on this page as most actions performed on MTurk cannot be cancelled after they’ve been performed.

Once, you enter the “Manage HITs individually” page (as seen in Figure 5), you will be able to do any of the functions that necessary to manage your currently running HITs from both web interface and command line created HITs. Such actions include adding

Page 8: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

more assignments to a certain HIT, extending the deadline for HITs to be completing or expiring that deadline altogether, downloading submissions for each HIT, or approving/rejecting individual submissions for a HIT. It is typically recommended that you use this page to manage all HITs that are posted on MTurk. Clicking on any of the actions that could be taken (e.g., download results, add time) will redirect you to a web-management page for the HIT (shown in Figure 6).

Figure 6. Web-management page for a HIT managed individually. This page should be used to manage your currently running HITs as it provides the greatest ease and flexibility with changing the number of assignments associated with the HIT, approving/rejecting specific submissions, or extending or expiring the time limit for the HIT to be posted on MTurk.

Further information on how to use the requester site can be found here in the API manual. Amazon also provides a basic tour of the site here.

Page 9: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

III. Production Site

The production site is the official location where workers can go and accept HITs to work on (shown in Figure 7). Whenever anything is posted onto the production site, it is considered a paid service by Amazon and will be subject to all fees. Additionally, the system requires that every submission from a worker be approved or rejected so that payment can be doled out or withheld. The system is designed so that if a requester does not approve a submission within a specified period of time (usually determined by the requester prior to posting the HIT), Amazon will automatically approve the submission and pay the worker.

Figure 7. The MTurk Production Site. Only HITs that are ready to go live and public should ever be posted to site. If you plan to do any tests, you should use the developer sandbox and worker sandbox sites instead.

The only thing that should go on the production site is the finished product we actually want to use; anything else should go to the developer sandbox (described in the next section).

Page 10: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

IV. Requester/Production Sandbox

Before you attempt to post any HITs to the production site, it is best practice to test the HIT first to ensure the HIT is working properly with Amazon and to ensure that all data are received. Amazon created a few sandbox sites for you to help with this. A sandbox is simply any environment that is self-contained and typically used for development. A sandbox site exists for both the requester and production sites (the hyperlinks to these sites are provided below).

Think of the sandbox as an exact copy of the production and requester sites with one major difference: nothing you do in the sandbox impacts your reputation, finances, or posting history in the MTurk community. No money is paid out when HITs are posted or completed/approved to either the worker or Amazon in the sandbox. The sandbox is only for development and testing of HITs and is free of charge.

You can post HITs through the sandbox requester site (or command line tools described below) to the sandbox production site. Every means of communication between the two sandbox sites is exactly the same as the actual requester and production site.

However, you should NEVER use the sandbox production site when you are ready to start collecting data. As this site does not pay out any actual money and is an entirely independent site from the production site, if you post a HIT to the sandbox no actual worker will complete it.

The requester sandbox can be accessed at this site. The production sandbox can be accessed at this site.

Page 11: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

V. Building a Study for MTurk

Although MTurk has the ability to build HITs through the requester site, it does not provide enough space or capabilities to run studies that we are used to in the lab. Because of this, it is more preferable to host our studies locally on a college web server. MTurk allows you to do this through a feature called external submit, where workers can be redirected from the MTurk production site to an external site to complete the study. The external site can then send the information back to MTurk to notify Amazon that the worker completed the HIT and that the requester should review the submission.

To access the college web server here, you will need to first download an FTP client program (e.g., Cyberduck or Filezilla). An FTP client is simply a program that allows file transfers between two remote machines. You will need this to transfer the web code for your study from your local machine to the server. Additionally, you will want some kind of text editor to help you build your code. The recommended program would be TextWrangler (a text editor Mac app that can interpret many different programming languages and organize your code logically according to the syntax of the language). Apple’s TextEdit or Emacs in the Unix environment would also be fine.

To access the web server, you will need to set up an FTP connection to the following address on Port 21: web.artsci.wustl.edu. The username is dcl. The lab manager can provide the password for the account. Once you have connected to the web server, you will be placed in the dcl’s home directory. There should be a folder called “public_html”. This folder contains allow the web code publicly accessible over the Internet. You should create a new directory for your experiment on the web server and upload any web code you have into that directory.

For our experiments, we use several different programming languages (depending on the task). These languages typically include HTML, JavaScript, PHP, AJAX, and CSS. Tutorials for each of these languages can be found at w3schools.com or codeacademy.com.

The final submission page for your HIT should contain the following code:

You should have this code execute when the workers clicks on the final submit button for the HIT. The purpose of this code is to send all collected data through an external page controlled by Amazon to our requester account for review and approval. If this code is not included, then the data from the worker will not be sent and it will be like the worker never completed the HIT. It is very important to have this code.

Page 12: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Another way to collect data online without having to learn or use web code described above is to use Qualtrics. Qualtrics is useful for running Likert scale surveys or simple tasks in a web browser. The university has a license accessible to all university members. Inquire the department chair’s office to learn how to gain access to this service.

The methods available through MTurk to prevent repeat workers (i.e., workers who complete more than one HIT for us) are extremely limited. Therefore, it is best to handle this process on our end. The best method we have discovered so far is two-fold. First, at the very end of your experiment (on the final submit page), you should write a function that prints the worker’s id number to a text file stored on the web server. Accessing the worker id is not very hard as Amazon provides this information when workers access our external site. Once the worker id number is recorded, you can prevent them from accessing your study again by including another page at the beginning of the study that opens the text file, searches for the worker’s id number, and, if located, prevents them from going any further. If they are not in the text file, then the worker can proceed to the study. Creating such a process does not take much beyond some simple JavaScript, PHP and AJAX code.

A simple example of using HTML and PHP for web coding is provided over the next few pages.

Page 13: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Figure 8a. HTML code for the “FirstPage.html” page containing a form, some text, and a submit button. Once the submit button is pressed, the form will automatically send the data to the page “question.html”.

Figure 8b. Visualization of the “FirstPage.html” page..

Page 14: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Figure 9a. HTML code for the “question.html” page containing a multiple-choice question with radio buttons. Clicking on the submit button will send the data to the “response.php” page.

Figure 9b. Visualization of the “question.html” page..

Page 15: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Figure 10a. PHP and HTML code used for the “response.php” page. This page utilizes PHP to retrieve the response selected from the previous page and prints it onto the current page.

Figure 10b. Visualization of the “response.php” page.

Page 16: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

A more complex (and closer to the types of studies we run in the lab) example can be found here. (To run the code, you will need to change “ABC123” in the URL to a new value.) To access the code that runs that example, log on to the college web server using a FTP client and navigate to public_html>>ExampleSite. There, you will find several PHP, Javascript, and CSS documents that code that code to run that site, along with comments that explain what the major components of each section of the code do.

This process works only for workers who are attempting to access HITs from the same study. To prevent people from accessing our HITs more than once across studies, we typically assign qualifications. Further information on qualification is provided in section IX.A.

If you have any questions or requests for the college web server, you can contact Dale Abernathie (or his equivalent) at the A&S Computing Center.

Page 17: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

VI. Command Line Tools (CLTs)

In our experience, working with the MTurk web interface can be a headache. There is no search feature to sort through HITs that have been posted or to find a specific worker you may wish to block or grant a bonus. Additionally, downloading files from the web interface can take some time, depending on the size of the file. To circumvent these issues, a series of Unix scripts were written to execute many of the basic and complex tasks needed without using the web interface. These scripts are referred to as command line tools (CLTs). These tools are run through a terminal and can be downloaded here.

These scripts are especially useful as they can be customized for each experiment and provide the requester with a greater degree of control over the entire MTurk data collection process than using the web interface.

To install these tools, follow this procedure: 1. Download a copy of the CLTs2. Uncompress the downloaded file in the desired system directory3. A new directory called "aws-mturk-clt-1.3.1" should have been created in the location

of the uncompressed downloaded file.4. Navigate to "[system directory path]\aws-mturk-clt-1.3.1\bin"5. Open the mturk.properties file using a text editor6. Enter the following in the specified location (for the eventsegmentationstudy acct):

access_key=AKIAJNZCEZPCP5TWBKIQsecret_key=oQ20TgETZLhtOO3MgiqBpYDV7e5TRJL0EDNfsrYU

7. Replace all instances of "http://" in the "service_url=http://mechanicalturk.amazon.com/..." with "https://".

8. Save the file with changes 9. In terminal, enter "export MTURK_CMD_HOME=[system directory path]\aws-mturk-

clt-1.3.1"10. In terminal, enter "export

JAVA_HOME=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home". If your Java VM directory is not in this location, then provide the absolute path for your Java VM directory.

11. To check if setup and installation worked correctly, run the getBalance script by typing "./getBalance.sh" into the terminal

12. If it prints out the right number, then your setup has been successful!!

A brief description of each of the mostly used command line tools and their functions is provided below. Also provided are descriptions of the three parameter files (input, properties, questions) that are necessary for loading and managing HITs. Examples of these files are stored on the college web server in the public_html>>ExampleSite directory; these files are named “exp.properties”, ”exp.question”, and “exp.input”.

a. mturk.properties

A file stored in the bin directory of MTurk’s CLTs once it installs. This file toggles between running HITs on the worker sandbox or the production site. You can denote which is which as the sandbox will have a service url of “https://sandbox.mechanicalturk.com/…” Be sure that you check this file prior to running any HITs on MTurk through the CTLs to ensure it is toggled to the correct Amazon site.

Page 18: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

This file should NEVER be changed with the exception of toggling between the worker sandbox and the production site. You can do this by simply entering a # symbol before the service_url line of the site you DO NOT wish the HIT to loaded onto. A # symbol is considered a programmer comment symbol in most programming language, therefore it will not execute lines that start with a # symbol.

b. .input file

The input file is a tab-delimited text file that typically acts as a counterbalance for our studies. Its first row contains the names of each column and thus the names of each condition we want to run. Subsequent rows list the possible values each column can be assigned. For example, in Figure 8 below, we plan to show a participant four movies. The first row contains a variable for each movie (movie1, movie2, etc.). Subsequent rows contain the order in which the four movies (BlindDate, Classroom, etc.) can be shown. This specific input file will contain a total of 25 rows (24 from counterbalancing movie and 1 header row containing column names).

Figure 11. A tab-delmited input file that acts as a counterbalance for our experiments. The first row must always contain the title names of each column.

Meaningful column names and values for the different columns should be used in the input file as these names and values will be used in the question file, which directs workers to the external site the experiment is hosted.

It is typically useful to go to a previous MTurk study and borrow a copy of their input file to help with creating your own.

Further information on the input file can be found here.

c. .question file

The question file serves as the hyperlink address to the experiment. This tells Amazon what should display in the iframe that is posted for each HIT on MTurk for workers to preview. For our experiments, we like to show the consent form in the iframe to give participants a chance to read it before accepting the HIT (which is when we say is the moment they

Page 19: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

provide their consent). The consent form is typically hosted on a php web page called “FirstPage.php” (see in Figure 12).

Figure 12. The question file. The https protocol should always be used for our external webpages. The name of the server we use is web.artsci.wustl.edu.

This file also provides that external experiment site with the necessary information to select the appropriate movies and/or tasks. You will notice that each of the column names from the input file are in the URL following the “?” symbol. The code reading “{helper.urlencode($movie1)}” tells the web browser to interpret the value in the first column of the input file as the value for movie1 in the URL. Feeding these values into the URL will give your external site the information it requires to set up the experiment session for the worker.

Currently, MTurk can only accept URLs that use a secure http connection (i.e., your hyperlink address will need to have https: instead of http: at the beginning). If you attempt to use a non-secure http connection (i.e., http:), the consent form will not render on MTurk and will instead display a blank page. Workers will then not be able to access your HIT, therefore you must ensure you are using https in your URL.

The only part of this file that should ever be changed is the field between the <ExternalURL> tags. Changing anything else can result in fatal errors when the HIT is posted.

It is typically useful to go to a previous MTurk study and borrow a copy of their question file to help with creating your own. You can then edit the URL you are using while leaving everything else intact.

Further information on the question file can be found here.

d. .properties file

The properties file provides MTurk with all the necessary information to post and manage the HIT. It is highly recommended that you borrow a copy of the properties file from a previous MTurk study.

Page 20: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Figure 13. The properties file. This file provides the basic descriptive, timing, and qualification properties necessary to post the HIT on the MTurk production site. Several of these properties (including the HIT description and qualifications) are difficult to change once a HIT is posted, so it is important to ensure this file is correct before running it.

Properties files can change from study to study, depending on the need. However, they will always have the same format. An example file is provided in Figure 13.

The first section of the properties file deals with the general properties of an external HIT (i.e., what the worker will see when they access the HIT and what they can access). This includes the HIT’s title, a description, search keywords and how much the HIT pays. An additional field lists the number of assignments to post for each HIT. It is important to note that

Page 21: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

posting multiple assignments for a HIT is an effective way to prevent people from accessing the HIT more than once. If you, for example, needed to collect data from 130 people and posted a HIT for each person, it is possible for one worker to access 30 of those HITs and submit them as their work. This is a system feature of MTurk (and a rather annoying one). To prevent this from happening, it is recommended that you submit only 1 HIT per counterbalance condition and then added assignments to those HITs based on the number of people you need in each condition.

The second section deals with time limits for the HIT. All of these time limits are on the time scale of seconds. Namely, the amount of time a worker has to complete the HIT once it is accepted (assignment duration) and how long the HIT will be available on MTurk (HIT lifetime). Additionally, there is a field for auto-approval time limit; this is how long can elapse between a worker’s submission and the decision of the requester to approve/reject the submission. Should a submission still be under review once this time has elapsed then the submission will automatically be approved by Amazon and the worker paid. Set this to a reasonable time limit to give you the opportunity to review the work you receive. You should never go longer to 1.5 weeks to review work.

The final section deals with qualifications workers must have (or not have) to accept and complete the HIT. For our purpose, we want to use high performance workers (i.e., approval ratings over 95%) and workers from the U.S. Additionally, the qualification that blocks previous workers from doing the HIT should also be included. It is recommended that the ‘private’ field for all these qualifications be set to TRUE. Qualifications only prevent a worker’s access to HITs, not their ability to preview them. Setting this field to TRUE will prevent previous workers from both accessing and previewing the HIT. This will save you considerable time responding to emails.

Further information on the fields in the properties file can be found here.

e. run.sh

The run shell script is used to post the HITs to the production site. This script requires several arguments in order to work. These arguments correspond to the locations and names of the three parameter files mentioned above (input, questions, and properties). Without these, the run script will not execute and your HITs will not be posted to the production site.

Page 22: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Figure 14. The run shell script. The LABEL argument is helpful if all the parameter files and the success file will have the same name; the script though can run without this argument. To help organize success files, a Success/ directory is created in the experiment folder.

A typical run shell script is shown in Figure 14. The run script operates by executing another script inside of the mturk bin directory that loads HITs to MTurk’s servers. After successfully executing, the script outputs a file called a success file. This file will list the HIT IDs and HIT Type IDs for each HIT posted. It is very important that this file be saved as this file will be used to retrieve data and approve work. Each time the run script is executed, it will output a success file with the name specified in the script.

If you plan to execute the run script more than once, it is imperative that you rename the success file. Otherwise, the original(s) may be overwritten.

When running the run shell script in a terminal, it will also print out the URL hyperlink to the MTurk webpage hosting the HIT. It is highly recommended you copy this link to your NOTES file for your experiment so that you may refer to it later, if needed. Some workers may report issues when accessing the first page of your HIT on MTurk, and it will be easier to enter this URL into your browser rather than searching MTurk to find your HIT.

Further information on the loadHITs shell script can be found here.

Page 23: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Further information on the success file can be found here.

f. getResults.sh

Once a HIT has been posted, you can automatically download whatever work has been submitted for that HIT. To do this, you can use the getResults shell script. This script requires as one of its arguments the location and name of the success file produced from running the run shell script. Without this file, getResults will not be able to locate the data you wish to retrieve and will print an error.

Figure 15. The getResults shell script. The LABEL argument is helpful if the success and results files will have the same name; the script though can run without this argument. To organize results, a separate Results/ directory is typically created in the experiment folder.

An example getResults shell script is shown in Figure 15. The getResults script will output a tab-delimited text file, called a results file, containing all the information Amazon received from the external site. You can name this file whatever you wish but be aware that Amazon removes results files that have been sitting on their servers for several months, so you should always have a copy on DCL_ARCHIVE_2 of this file. You can open this file in most versions of Excel and in R by the following methods:

Page 24: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Excel: Click and drag the file over the MS Excel icon. Excel will open it automatically.

R: Use the following command to load the file into the R workspace: read.table(filename, header=T, sep=”/t”, fill=T)

The result file can also be accessed via the “Manage” tab on the requester site. However, this file can take some time to download as Amazon will convert it to a csv file as it is downloading. Furthermore, this file will also not contain the names of the form fields you created on the external experiment site. This causes the column names in the results file being generically named (e.g., Answer1, Answer2, Answer3, etc.). Unless you can immediately identify what column is which data field, it is highly recommend you download the results file through the command line as this file will contain all the names you specified for each form field on the external site.

Further information on the getResults shell script can be found here.

g. approveWork.sh

After reviewing worker submissions and finding them acceptable, you can approve every submission using the approveWork shell script. You will need to provide the name and location of the success file as an argument to use this script.

Figure 16. The approveWork shell script. Providing this script with the location and name of the success file produced from the run shell script. This script will approve ALL submissions associated with the success file. If you do not plan to approve all submissions, use the reviewResults shell script instead.

Page 25: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

An example approveWork shell script is shown in Figure 16. Once executed, the script will begin approving each submission and paying workers. This will consequently also result in all applicable service and administrative fees being charged to our MTurk account for each submission. This script should only be used if you are going to approve every submission.

In the event that you do not plan to approve all HITs (e.g., suppose you want to reject one worker’s submission because it appears they intentionally non-complied with the instructions), you will want to use the reviewResults shell script instead. This script provides the ability to reject certain submissions if they are not up to standards.

Further information on the approveWork shell script can be found here.

h. reviewResults.sh

The process for using this script is more complex than any of the other scripts previously mentioned. You will first need to download a copy of the results file from the Amazon server using the getResults script. Then, you will need to open the results file in either Excel or R. In the results file, there is a column called ‘approval’. In this column, you will need to make some kind of indication (a “x” for example) on the submissions you wish to reject. Any submission without an indication will be assumed as a submission to approve. Once you have done this, save the file and then execute reviewResults.sh with the name and location of the result file as a parameter. Any submission rejected will result in the worker not being paid and the HIT will not go back into the pool of available HITs for workers to complete; regardless if you accept or reject a submission, once a submission is received from a worker, Amazon considers the HIT or assignment done. We will be charged for posting the HIT and any other fees, if applicable.

Further information on the reviewResults shell script can be found here.

i. extendHITs.sh

When initially creating a HIT on MTurk using the command line tools, you are asked to provide an expiration date for the HIT and the number of assignments for each HIT in the input file. The expiration date only refers to the date at which the HIT can no longer be publically accessible on the production site (in seconds). In some cases, you may need to extend this deadline (particularly in cases where you still have a substantial number of HITs/assignments that need to be completed and are close to the HIT expiration date) or may wish to add more assignments to a particular HIT. To do this, you can use the extendHITs.sh script, which requires the location and name of the success file of the HITs you wish to extend.

Page 26: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

An example is provided in Figure 17 below. Inside the script is an argument for amount of time you wish to extend the expiration of the HIT (in hours) and another argument for the number of assignments to wish to add to a HIT.

Figure 17. The extendHITs shell script. This script can be toggled between extending both the expiration date and assignments for a HIT or just the assignments. You will need to provide the location and name of the success file for the HITs you want to extend. This script will extend all the HIT associated with the success file. To extend only certain HITs, use the web-management page for the HIT on the requester site.

Be sure you use the correct argument when extending HITs. We will only be charged for using this script if we decide to add more assignments to a HIT, not if we decide to extend the expiration date. If you are adding assignments, the cost associated with adding those assignments will be held for liability once the script successfully executes. If you do not have enough funds to extend the assignments, the script will post only those assignments for which you have enough money. It will then print an error message informing you of insufficient funds.

Further information on the extendHITs shell script can be found here.

Page 27: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers
Page 28: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

VII. Communicating with Workers

Whenever you post HITs, workers may experience some kind of issue that prevents them from completing the HIT. Many of these issues should be caught and corrected in development phase, but given that we are conducted research in a web environment, other issues that we may not have expected could pop up. When workers are not able to complete a HIT, they will usually contact the requester via MTurk’s email system. This system sends their message directly to the email associated with the requester’s MTurk account (in our case, [email protected]) along with the worker’s id number.

You should respond to these emails through our gmail account within 24 hours. Any longer and workers can become impatient or consider you “non-responsive”. Although these are web-based workers, you will want to communicate with them in the same manner and tone as dealing with any participant brought into the lab. You should also sign your name in the emails sent to these workers so they know exactly with whom they are communicating.

Some of the common issues workers will email you with are presented below, along with the recommended solutions to these issues:

Problem: Experienced an issue while doing the HIT that caused it to crash or not submit successfullySolution: Apologize to the worker and set up a “compensation HIT” to pay them for attempting the study. Analyze your code to determine if the issue came from our end or a problem with the worker’s browser/internet connection. Do not attempt to let the worker complete the HIT again as we do not know how far they got the first time.

Problem: Unable to access the HIT past the consent formSolution: Apologize to the worker and attempt to determine the cause of the problem. Correct the problem and offer the worker the chance to take the HIT once it is clear they can access it. If they previously participated in one of our studies, tell them they can’t participate again.

Problem: Unsure if data was submitted successfully.Solution: Determine if we received a submission from the worker. If so, inform the worker their work was received and is under review. If not, set up compensation HIT and pay the worker for attempting the study. Determine why data was not received.

As MTurk is community-based, several review sites have been created for workers to share reviews of requesters and post comments or complaints about requesters. One of these sites is operated by UC San Diego called turkopticon.com. Anyone with an MTurk worker id number can create an account on this site, including requesters. Our username for turkopticon is [email protected]. Ask the lab manager for the password.

You will want to access these forums while running a HIT to determine if there are issues occurring workers may not be reporting. Again, you will want to be courteous when responding to any posts on turkopticon. If a worker complains

Page 29: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

they suffered a technical issue while attempting to complete the HIT, then ask the worker to email our lab account and request compensation for their attempt. It is not advisable to post the URLs to compensation HITs in these forums. Other workers who did not attempt the HIT may try to take advantage and claim they did (nearly all workers are fair players but there are always a few bad apples). This defeats the purpose of trying to keep a good reputation in the MTurk community. We want workers to contact us directly so we control the whole technical support process, from directly speaking with the worker to paying out only what is needed.

Given that MTurk is a web community, it is possible that you may post a compensation HIT and then have another worker who was supposed to work on that HIT accept and submit an assignment. In most cases, this is a worker who hoping to take advantage of the system or a “bot” that goes, accepts HITs and submits them. However, there may be cases where someone did experience a technical issue but didn’t report it, found the HIT and decided to accept it to be compensated. Because of this, it is very important that you collect the worker id number for those individuals whom we ask to complete compensation HITs.

When you access the web-management page for the compensation HIT, it will provide you the worker id for each submission it receives. You can then verify that only the individuals who were told to perform the HIT actually performed it. If someone completes the HIT who was not supposed to, you can email the worker through the web-management page.

Customarily, it is appropriate to give a worker at least three days to respond to your email and to provide them with the [email protected] address they should respond to in the body of the email. If the individual has not responded to the email, you are free to reject their submission with reason “Was not supposed to complete HIT.”

Page 30: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

IX. Custom Made MTurk Scripts

There are a few scripts in the CLTs that are cumbersome to work with. For example, if we wanted to give every worker who completed a HIT a bonus of $1, we would need to either call the grantBonus shell script for each worker and provide the bonus amount and reason as arguments for each call or access the worker’s page in the MTurk web interface and enter the bonus manually. This is simple enough if you have three workers to pay bonuses to but far more complex when dealing with 300 workers.

To assist with this, the lab has written a few MTurk CLT scripts in python that make this process easier. These scripts are described below:

a. assignQuals.py

For a lot of our experiments, we need to ensure we get samples that have had no prior experience with our stimuli or tasks. To do this, we assign workers a qualification that prevents them from accepting another HIT from us again. This qualification can always be revoked for a worker but should remain in place unless otherwise decided.

Other qualifications can be assigned as well. For example, if you were running a longitudinal study and wanted to denote people who were completing all the required sessions by their deadlines, you could assign a qualification for each session to ensure workers do not skip a session.

To assign qualifications, an in-lab custom-made python script was written. This script is fed a tab-delimited text file of worker id numbers (one id number per line) and runs the assignQualification shell script for each worker. This script can be copied from another project that used the script. The only features that will need to be changed would be the qualification type id number (a number that tells Amazon what exact qualification to assign) and the score (the value assigned to that qualification, ranges from 0-100). Note that a score can only be used once for a qualification, meaning that I cannot have two qualifications and assign a score of 10 to both; I can only assign that score to one of them. For our qualification that prevents previously workers from completing future HITs, we assign a score of 100.

To make the python script more user-friendly, a short help page was written into it. This help page provides information on the purpose of the script, what arguments it requires, and what are the suggested names for the files the script accepts and outputs. To access this help page for the assignQuals python script, simply enter assignQuals.py –help

The typical usage for the assignQual.py script is as follows:./assignQuals.py <inputfile> <outputfile>

Page 31: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

Once the script is running, it will print its output to the terminal. However, the python script was written so that it could also copy that output to an external file (which is the outputfile argument). This acts as a log of what you have done and any errors you encountered while running the script. It is recommend that each execution of the python script be given a new output file name so you can keep track of everything you did.

A copy of the assignQual.py script can be found on DCL_ARCHIVE_2 in Documents>>MechanicalTurk.

Further information on assigning qualifications can be found here. Additional information on revoking qualifications can be found here.

b. grantBonus.py

It is very rare but, in some cases, you may want to pay workers who have completed a HIT a bonus. There are several ways to pay a bonus to workers: (1) you can access the Manage>>Workers page on MTurk, find the worker in the list, go to their worker page and pay them a bonus there; (2) access the individual HIT page from “Manage HITs individually”, find the worker’s submission and pay them a bonus there; or (3) use the CLTs to pay a bonus for a worker using the grantBonus command.

All these methods can be cumbersome if you have to pay a bonus to every worker that completed a HIT or even a large number of workers. To streamline this process, the lab wrote a custom python script that simply reads in a list of worker ids and then assigns a bonus to each of those workers. You will need to open up the script in a text editor and change the fields that state how much should be paid to the worker as a bonus and the reason for the bonus.

The script operates by running the grantBonus shell script for each worker from the CLTs. This shell script requires three arguments: the worker id, the bonus amount, and the reason for the bonus. Without these arguments, the script will not execute.

To make this script more user-friendly, a short help page was written into it. This help page provides information on the purpose of the script, what arguments it requires, and what are the suggest names for the files the script accepts and outputs. To access this help page for the grantBonus python script, simply enter grantBonus.py –help

The typical usage for the grantBonus.py script is as follows:./grantBonus.py <inputfile> <outputfile>

Once the script is running, it will print its output to the terminal. However, the python script was written so that it could also copy that output to an external file (which is the outputfile argument). This acts as a log of what you have done and any errors you encountered while running the script. It

Page 32: Manual to Using · Web viewThe tasks posted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers

is recommend that each execution of the python script be given a new output file name so you can keep track of everything you did.

A copy of the grantBonus.py script can be found on DCL_ARCHIVE_2 in Documents>>MechanicalTurk.

Further information on granting bonuses can be found here.