How to Upload Pictures for Amazon Mechcnail Turk
Abstract
In this written report, we investigate the attentiveness exhibited past participants sourced through Amazon Mechanical Turk (MTurk), thereby discovering a significant level of inattentiveness amongst the platform'south tiptop crowd workers (those classified as 'Main', with an 'Blessing Charge per unit' of 98% or more, and a 'Number of HITS approved' value of ane,000 or more than). A total of 564 individuals from the United states of america participated in our experiment. They were asked to read a vignette outlining ane of iv hypothetical engineering products and then consummate a related survey. Three forms of attention check (logic, honesty, and fourth dimension) were used to appraise attentiveness. Through this experiment we determined that a total of 126 (22.iii%) participants failed at to the lowest degree 1 of the iii forms of attention cheque, with most (94) declining the honesty check – followed by the logic check (31), and the time bank check (27). Thus, we established that significant levels of inattentiveness be even among the most aristocracy MTurk workers. The study concludes by reaffirming the need for multiple forms of carefully crafted attending checks, irrespective of whether participant quality is presumed to exist high according to MTurk criteria such as 'Master', 'Approving Rate', and 'Number of HITS approved'. Furthermore, we propose that researchers adjust their proposals to business relationship for the try and costs required to address participant inattentiveness.
Keywords
- Amazon Mechanical Turk
- MTurk
- Attention checks
- Inattentive respondents
- Worker reputation
- Worker quality
- Data quality
Introduction
Over fourth dimension, online services for participant recruitment past researchers have increased in popularity [xxx]. Amazon Mechanical Turk (MTurk; also known every bit Mechanical Turk [fifteen]) is ane of the oldest and most frequently selected tools from a spectrum of spider web-based resources, enabling researchers to recruit participants online and lowering the required time, endeavour, and cost [24, 39]. A Google Scholar search for the term 'Mechanical Turk' reveals continuing growth in its use, with 1,080, 2,750, and v,520 items found when filtering the results for 2010, 2012, and 2014 respectively [24]. The technology facilitates "an online labor market place" where "individuals and organizations (requestors)" can "hire humans (workers) to complete various computer-based tasks", which they describe as "Human being Intelligence Tasks or HITs" [24]. The MTurk platform'southward suitability for utilise in research has been extensively evaluated [12], with most published studies describing it every bit a suitable means for recruitment [8], although a few also state reservations [nine]. This paper builds on that piece of work by investigating the reliability of the pinnacle crowd workers that can potentially be sourced from the MTurk platform, while concurrently motivating them through offers of high compensation. Specifically, nosotros focus on the attentiveness exhibited by United states based workers classified as 'Main', who had completed at least 98% of the tasks they committed to completing (i.e., an 'Approval Rate' of 98% or more). Additionally, the number of these activities was required to exceed 999 (i.due east., the 'Number of HITS approved' had a value of i,000 or more than). To the best of our knowledge, this segment of the platform has notwithstanding to be studied with a focus on participant attending.
Background
Amazon does non disclose real-time information on the total number of workers available for hire via their MTurk service, or those online at whatsoever particular moment. Several researchers offer insight into what those values might be [9, 13, 36]. Ross et al. [36] written report that in 2010 the platform had more than 400,000 workers registered. Likewise, there were anywhere betwixt l,000 to 100,000 HITs at whatever given fourth dimension. In 2015, Chandler, Mueller, and Paolacci [nine] wrote: "From a requester's perspective, the pool of available workers can seem limitless, and Amazon declares that the MTurk workforce exceeds 500,000 users". Stewart et al. [xl] written report that the turnover rate is not dissimilar to what i would experience in a university environment, with approximately 26% of the potential participants on MTurk retiring and being replenished by new people. More recently (2018), Difallah, Filatova, and Ipeirotis [xiii] constitute that at least 100,000 workers were registered on the platform, with 2,000 active at whatever given time. The authors likewise land that a significant worker turnover exists, with the half-life for workers estimated to be betwixt 12 and 18 months [13]. Such numbers as those reported by Stewart et al. [forty] and Difallah, Filatova, and Ipeirotis [13] signal that recruiting the same worker more than than once for a given experiment is highly unlikely.
MTurk is probably the most thoroughly studied of the available platforms for online participant recruitment through crowdsourcing. The literature on the suitability of MTurk for research presents a somewhat 'rosy' moving picture, labeling it as adequate for use with experiments. This includes the work of Casler, Bickel, and Hackett [viii], who compared the data obtained through the recruitment of participants on MTurk with data nerveless from participants recruited through social media, and those recruited on an bookish campus [eight]. The authors found that the data was similar across all iii pools and highlight that the MTurk sample was the most diverse [8]. Moreover, the authors [eight] reveal that the results were like irrespective of whether the experiments had been completed in-lab or online. Both the replicability and reliability of data nerveless through the MTurk platform have been established. Rand [34] institute that participant responses across experiments were consequent, allowing for replication of results and Paolacci, Chandler, and Ipeirotis [31] found increased reliability of data. Hauser and Schwarz [17] found that participants recruited from MTurk exhibited superior attending to the assigned task compared to participants recruited using traditional approaches.
Despite all these reassuring findings, a growing torso of literature raises warnings that must be addressed [2, 5, xx, 25, fifty], and "significant concerns" remain [15]. Such concerns are not new. For example, the use of attention checks to identify inattentive participants was standard practice for a large grouping of enquiry communities, and prior piece of work from past decades shows that many participants (from 5% to 60%) respond survey questions carelessly [6, 21, 28]. All the same, to some extent, this would not be expected with MTurk, as it would be assumed that individuals are substantially workers, and as such, they would exist devoted to the task and paying attention. The main concern is the assertion that the low remuneration attracts workers with express abilities who cannot observe amend employment [15]. Stone et al. [42] note that participants recruited through MTurk "tend to be less satisfied with their lives than more nationally representative samples", although they comment that "the reasons for this discrepancy and its implications are far from obvious". The reliability of oversupply workers has been widely discussed and studied by investigating the bear on of attentiveness on the reliability of the oversupply worker responses (east.g., [37]).
Researchers are increasingly concerned that participants sourced through MTurk "do not pay sufficient attention to study materials" [15]. A prominent example is the work of Chandler et al. [9], who reveal that participants were not always entirely devoted to the assigned task and were instead multitasking. Litman et al. [25] identified that the exercise of multitasking while participating in a research study is problematic, as it can lead to inattentiveness and reduce participants' focus on details [25]. Consequently, studies relying on participants devoting their full attending to the current work are at run a risk [25]. The authors also country that "these findings are peculiarly troubling, because that the participants in the Chandler et al. study were some of the almost reliable of MTurk respondents, with cumulative approving ratings over 95%". The current research seek to empathize this inattentiveness, trusting that our research will exist useful to others who source participants for research using this tool.
Methodology
This section outlines the methodology used for the study.
Experimental Design
To investigate participant attention an experimental approach was adopted. Participants solicited through MTurk were forwarded to the Qualtrics web-based software where they were randomly presented with a vignette describing one of four hypothetical engineering products. Subsequently, they were asked questions on their intention to adopt that technology. MTurk has been used on numerous occasions to empathise user intention to adopt technology [29, 38, 41, 48, 49, 51]. Participants were then given one of two engineering acceptance questionnaires to share their perceptions of the technology presented in their respective vignettes. The questionnaires were adaptations of the most popular models used to report user adoption of applied science [35]. The commencement questionnaire (brusque) reflected the instrument for the second version of the unified theory of acceptance and apply of engineering science (UTAUT2) [45] model and comprised 52 questions in total. The 2d questionnaire (long) reflected the musical instrument for the third version of the technology acceptance model (TAM3) [46] and comprised 74 questions. Both questionnaires besides included 10 demographic and feel questions. Bated from the demographic questions, each question was rated through a 7-indicate Likert calibration ranging from 'strongly disagree' to 'strongly agree'.
Assessing Attention
Three forms of attention check, derived from the work of Abbey and Meloy [1], were used to approximate participant attending. The first was a logical cheque based on logical statements. It required participants to demonstrate "comprehension of logical relationships" [1]. An example of such a question might be 'at some point in my life, I accept had to consume h2o in some form'. This cheque comprised two such logical statements to respond, as shown in Table 1. The 2nd was an honesty check to "ask a respondent straight to reveal their perceptions of their effort and information validity for the report" [1]. An case was, 'I expended effort and attending sufficient to warrant using my responses for this research written report'. As part of this bank check, participants were asked ii questions regarding their perception of the attention invested in the experiment. Table 1 shows the questions used. These questions were also rated using a vii-point Likert calibration ranging from 'strongly disagree' to 'strongly concord'. Participants who did not respond to both questions by selecting the 'strongly concur' option were accounted to have failed their respective attention checks.
The tertiary form of attention cheque was a fourth dimension cheque, which used "response time" to ascertain attention, employing the concept that response times might be "overly fast or boring based on distributional or expected timing outcomes" [i]. Participants who were unable to consummate the experiment within a reasonable time were deemed to accept failed that check. To guess the response fourth dimension, we totaled up the number of words that participants would read as office of the informed consent certificate, instructions, and vignette. We so used the conservative reading rate (200 words per infinitesimal) described past Holland [18] to estimate the time participants would require to read that material. To determine the time participants would need to complete each Likert question, nosotros used the estimate provided by Versta Inquiry [47], which is 7.v s on boilerplate. Table 2 summarizes our calculations. Participants who spent less than 70% of the estimated time on the survey were deemed to have failed the time check.
Compensation
A gene that was considered of import and that needed to be controlled for was bounty. The business concern was that the level of bounty might influence participant attention. However, numerous studies have investigated how bounty influences the quality of data produced by MTurk workers [four, 7, 25]. Most found that the quality of results is not linked to the rate of compensation, with Litman, Robinson, and Rosenzweig [25] stating that "payment rates have virtually no detectable influence on data quality". Ane example of such a study was conducted by Buhrmester et al. [7], who offered participants 2 cents (i.east., $0.25/60 minutes), 10 cents, or fifty cents (i.e., $6 per hour) to complete a v-minute chore. The authors found that while recruiting participants took longer when lower compensation was offered, the information quality was similar irrespective of the offered compensation. A similar report was conducted by Andersen and Lau [4], who provided participants with either $2, $iv, $vi, or $8 to complete a job. They constitute that the remuneration did not influence participants' performance, writing that at that place was "no consequent or clear evidence that pay rates influenced our subject beliefs".
A smaller number of studies show that the quality of work produced by those on MTurk is influenced by the bounty size. An example is seen in Aker et al. [3] who compensated participants for a task at rates of $4, $8, and $x per hr. Their "results indicate that in general higher payment is improve when the aim is to obtain high quality results" [3]. Overall, most tasks on MTurk offering a minimal level of bounty [seven, 14, 33]. In 2010, the hateful and median wages were $3.63/hour and $1.38/hour, respectively [19]. In 2019, the median compensation in the United States was $three.01/hour [16]. Paolacci, Chandler, and Ipeirotis [31] comment that "given that Mechanical Turk workers are paid so little, ane may wonder if they take experiments seriously".
The charge per unit offered to participants in this study surpassed $22/hour in lodge to ensure that participants were adequately motivated and thereby command for compensation (Table two). The size of this compensation could be considered excessive when because what is traditionally offered to participants on MTurk, the federal minimum wage in the Usa of $7.25/hour, and what is presented in studies examining the effect of compensation on operation. For example, Aker et al. [3] describe $10/hour as high. Offering an extremely generous wage was expected to negate undesirable effects and induce participants to devote their full attention to our study.
Participants
Participants were selected to correspond the top workers the MTurk platform offers. This was achieved by using a filtering mechanism, allowing only workers satisfying certain criteria to participate in the study [10]. The filters used were: 1) located in the United States; ii) classified by Amazon as 'Master' level; 3) had completed at least 98% of the tasks they committed to completing (i.e., an 'Approval Charge per unit' of 98% or more than); and 4) had completed at least 1,000 tasks (i.eastward., their 'Number of HITS approved' rating was ane,000 or more). The data was collected through two batches over 16 days (between December 22nd, 2019, and December thirtyth, 2019, and again between February 1st, 2020, and February viith, 2020). Participation in this study was voluntary and all our participants were first asked to confirm that they were willing to participate before beingness immune to brainstorm the experiment. The privacy of participants was protected using confidential coding.
The sample was comprised of 564 participants, 293 (51.95%) identified as male person, and 271 (48.05%) identified every bit female. Most participants were in the 31–55 age range (73.94%), had some form of undergraduate (74.47%) or postgraduate (9.93%) training, and earned beneath $60,000 (lx.29%) per year. Most of the participants (480, equating to 85.11% of the sample) identified as white. Finally, near participants were either never married (284, or 50.35%) or married (215, or 38.12%). Table three shows the participants' demographics for the sample in greater detail. Figure 1 shows the locations of participants within the Us. All states were represented except for Wyoming, with the 5 virtually prevalent in the sample being California, Florida, Pennsylvania, Texas, and Michigan.
Assay and Results
To analyze our data, we relied on three techniques. The first examined the frequency with which attention checks were passed or failed by participants; this revealed that 126 of the 564 participants (22.34%) failed at least ane class of attention check. The attention check that nearly participants failed was the honesty check (94/564), followed by the logic cheque (31/564), and the time bank check (27/564). Some participants failed more than than one check, with 14/564 (2.48%) failing both logic and honesty checks and six/564 (1.06%) failing both time and honesty checks. Finally, vi/564 (1.06%) participants failed all iii attention checks (logic, honesty, and time). Effigy 2 illustrates the numbers of participants who failed and passed each course of attending check. Every bit expected, participants who passed the time check were more likely to laissez passer the other attention checks (logic and honesty).
The 2d technique used Spearman rank-order (rho) correlations to appraise the correlation between the characteristics of historic period, gender, income, race, and prior experience with the applied science and each of the iii forms of attention checks (i.east., logic, honesty, and time checks). No significant correlation was found except in 2 instances. First, prior experience of using the technology beingness studied was positively correlated with the logic bank check (rs = 0.192, p = 0.000). Second, prior feel of using the engineering science being studied was positively correlated with the honesty check (rs = 0.213, p = 0.000). Therefore, the more familiar participants were with the technology, the more probable they were to pass the logic and honesty checks.
Spearman rho correlations were likewise used to assess the relationship between the three different forms of attention checks and the passing of all three checks. A positive correlation was found betwixt participants passing the logic check and passing the honesty bank check (rsouthward = 0.310, p = 0.000), failing the time check and failing the honesty check (rsouth = 0.132, p = 0.002), and failing the time check and failing the logic check (rs = 0.139, p = 0.001). That is, participants who pass one of the three attending checks are more likely to pass the other ii attention checks. Table 4 shows the results of the correlations.
Finally, a logistic regression analysis was used to investigate whether age, income, gender, prior experience, and time on task influenced participant attending. All four assumptions required for logistic regression were satisfied [23]. Just prior experience with the technology in the logistic regression assay contributed to the model (B = 0.484, SE = 0.136, Waid = 12.706, p = 0.000). The estimated odds ratio favored an increase of 62.three% [Exp(B) = one.623, 95% CI (1.244, ii.118)] for participant attention for every unit increase in experience. None of the other variables were found to be statistically significant (Table five).
Discussion
Litman et al. [25] draw MTurk as "a constantly evolving marketplace where multiple factors can contribute to data quality". In this work, the attentiveness exhibited by an elite segment of the MTurk worker customs was investigated. Specifically, workers holding the coveted 'Master' qualification with an 'Approving Rate' of 98% or more than (i.e., completed at to the lowest degree 98% of the tasks they had committed to completing) and had a 'Number of HITS canonical' value of 1,000 or more (i.e., the number of these activities exceeded 999). It was conjectured that these characteristics would ensure that this group of workers would be free of behavior reflecting inattentiveness and that this higher level of considerateness would justify the additional cost attached to using workers holding the 'Primary' qualification.
To confirm this hypothesis, an experimental approach was adopted in which participants were asked to complete a simple chore involving reading about a hypothetical product and then answering questions on their perceptions of the product. Participant considerateness was ascertained by using a series of questions originally proposed by Abbey and Meloy [1] and evaluating the corporeality of fourth dimension spent on the survey. Surprisingly, the results revealed that over a fifth (22.34%) of the participants were not paying attention, having failed one of the three categories of attention checks. This result could be explained by the work of Chandler et al. [9], who examined the considerateness of workers with an 'Approving Rate' exceeding 95% and discovered that participants were not always entirely devoted to the current chore and were multitasking. In particular, 27% of participants in their sample disclosed that they were with other people while completing the report, xviii% were watching television, 14% were listening to music, and 6% were chatting online [nine]. This would explain the lack of attending being paid.
Nosotros contrast our findings with the work of Peer, Vosgerau, and Acquisti [32], who investigated how attention differs through 2 experiments. The offset experiment compared workers with 'low reputation' (i.east., an 'Blessing Rate' below xc%) and 'high reputation' (i.due east., an 'Approval Rate' exceeding 95%). The second experiment compared what the authors describe every bit workers with 'low productivity' (i.e., their 'Number of HITS approved' was less than 100) and 'loftier productivity' (i.e., their 'Number of HITS approved' was more than 500). At least i attention check question was failed by 33.9% of the 'low reputation' workers, 2.6% of the 'loftier reputation' workers, 29% of the 'low productivity' workers, and xvi.vii% of the 'high productivity' workers. Given that we took an even more selective arroyo than Peer, Vosgerau, and Acquisti [32], our findings are concerning. Our failure rate of 22.34% is closer to what they allocate as 'depression reputation' (33.9%), and between the 'low productivity' (29%) and 'high productivity' (16.7%) workers. Iii possibilities could explain this divergence. Start, the attention checks we used did non work equally expected; 2d, the high level of compensation was miscommunicated; tertiary, a seasonal influence on attentiveness exists. Nosotros explore each of these possibilities in greater detail in the following subsections.
Suitability of the Attending Checks
Abbey and Meloy [1] describe the procedure of forming attention cheque questions as a delicate job [1], the concern residing with the nature of these constructs. They requite as an example the logic check which comprises questions that "crave comprehension of logical relationships", such equally "preferring to eat fruit over paper" [ane]. The danger with such questions is that "the more subtle the statement, the less objective these checks can go" [1]. Thus, both the participants' and the researcher'south responses to a question can be tainted by their interpretations. In the honor cheque, participants are asked to "reveal their own perceptions of their endeavor and data validity for the study" [1]. Effectively, the honesty check asks "a respondent to [self] identify the level of endeavor and attention to detail they [perceive that they] gave to the study and if that endeavor warrants use of their data" [1]. The weakness of this form of attention cheque is that respondents may have been paying adequate attention simply were overly critical of themselves when submitting their responses. Consequently, they did not respond past selecting the 'strongly concur' option. An culling arroyo might be to use objective tasks to gauge participant attention, as demonstrated past Aker et al. [3]. However, even these questions have issues. For example, if presented with a mathematical problem, the participant must take the skill to solve the question.
Although our study is non the first to use attention checks with survey inquiry to place devil-may-care (i.e., inattentive) participants [22], the use of attention checks has been questioned past some researchers, every bit it is believed that this tin can negatively interact with the survey response quality [22, 44]. Some researchers contend that attention checks should not be used at all [22, 44]. Abbey and Meloy [1] warn that the process to exclude respondents who are not paying attention "can become subjective" in cases where the study is not "largely a replication of known results with expected times, scales, or constructs". Our attending checks may have been also sensitive. If the criteria for rejection were to neglect more than ane attending check, the inattentiveness charge per unit would drop to four.61% (26/564). This charge per unit is closer to what has been establish in other studies. More specifically, information technology is similar to the charge per unit of 4.17% reported by Paolacci, Chandler, and Ipeirotis [31] and closer to the finding of ii.6% past Peer, Vosgerau, and Acquisti [32] (for what they classified as 'high reputation' workers with an 'Approval Rate' exceeding 95%).
The Effect of Compensation
Another aspect to consider is compensation and its consequence on participant attending. In our study, despite offering an extremely high hourly wage to our participants (above $22/60 minutes), we plant substantial show of inattentiveness, equally the high wage did not eliminate the problem of lack of attending exhibited by participants. The magnitude of the compensation we offered can be better understood when information technology is compared to the median wage for MTurk workers in the United States, which in 2019 was said to exist $3.01/hour [xvi], and the electric current federal minimum wage of $7.25/hr [43]. Correspondingly, our participants should accept been well enticed, and no evidence of inattentiveness should have been discovered. Thus, a high wage does not eliminate the possibility of having inattentive participants whose piece of work must exist discarded. An explanation for this finding might be that participants practice not consider the hourly wage only rather the full compensation offered. For example, ane may prefer a advantage of $0.fifty/hour if the total compensation of a task were $10 rather than a reward of $twenty/hour when the total compensation from the task was $1. A tradeoff appears to exist where, as per Aker et al. [3], increasing bounty leads to improved data quality. Nevertheless, our research suggests that the ability of money to improve attending is express after a certain point. Additional enquiry is needed to create a ameliorate understanding of the marginal effects of wages on participants' attention and identify an optimal point that maximizes attending vis-à-vis compensation.
The Explanation of Seasonality
An alternative explanation of the varying range of inattentiveness exhibited by participants in the studies mentioned in a higher place may exist found in the work of Chmielewski and Kucker [11], who replicated an experiment four times: the first betwixt December 2015 and January 2016; the second betwixt March 2017 and May 2017; the tertiary between July 2018 and September 2018; and the fourth in April 2019. In their piece of work, the percentage of participants who failed at least one attention bank check (which they called a "validity indicator") slowly increased from ten.4% to xiii.eight%, and so jumped to 62%, and finally dropped to 38.two%. Given that we collected our data eight months afterward the determination of their final data collection, our inattentiveness rate of 22.34% is non simply similar but might indicate a down trend and possibly a cyclical pattern.
The Irrelevance of User Characteristics
We also attempted to define whether a design exists that could help to predict which participants would fail our attention checks. The characteristics of age, gender, income, marital status, race, and schooling were examined, but no relationship was institute apropos participant attending. It appears that the lack of attention does non reside with any specific demographic grouping. Instead, everyone has an equal chance of being inattentive. This outcome is slightly puzzling, as specific demographics take already been linked with participant attending, such as age [26] and culture [27]. The assay did identify that participants' prior experience of using the technology applied in the study influenced their attention, with those having the most prior experience with the engineering exhibiting the greatest attention.
Implications
This piece of work has several noteworthy implications. The first concerns the discovery that participant inattentiveness persists inside the population we investigated. This grouping consisted of MTurk workers with the 'Master' qualification, an 'Blessing Rate' of 98% or more, and a 'Number of HITS approved' value of 1,000 or more. Coupled with the loftier compensation to ensure participants were highly motivated, information technology is evident that no 'silver bullet' exists that can reliably eliminate the manifestation of participant inattentiveness. Thus, there appears to be no justification in undertaking the additional expense associated with recruiting just participants with the 'Master' qualification. If inattentiveness tin can be observed under these 'optimal' weather condition, this concern cannot be discounted. The fact that there is no one characteristic (i.e., age, didactics, gender, income, or marital status) that can be used to explain the miracle offers minimal hope of an informed intervention. Instead, researchers must vigilantly review participants for inattentiveness and not presume that sure criteria will ensure participants pay attending. Ultimately, the finding highlights the importance of using attending checks to identify inattentive participants and implementing a process to address these occurrences. Specifically, with an inattentiveness rate equally high as 22.34%, such a practice would demand "researcher time, funds, and other resources" [11].
A tactic to mitigate the boosted price might be to refuse to recoup any participant who fails to satisfy i or a combination of attention checks. Yet, this involves challenges. Participants who are refused compensation may object and thus crave additional (potentially costly) resources to be invested by the researcher to accost those concerns. Participants who accept earnestly participated as all-time they can but failed to produce results that pass the attention check(s) would be unfairly denied bounty. An alternative strategy to withholding payment might be to offer a low rate for participation in studies simply offer a bonus for submissions matching a particular design. The problem with this approach is that participants may not focus on the enquiry but on producing the illusion that they paid attention. Moreover, this may introduce biases in the responses, as participants may not respond honestly and authentically just rather as they believe the researchers want them to respond.
No simple solution exists. Consequently, to address participant inattentiveness, researchers should consider adjusting their proposals to account for the effort and costs required to identify participants who practise not pay attention, address problems arising when addressing their poor functioning, and recruit additional participants to replace submissions that must be disregarded.
References
-
Abbey, J., Meloy, M.: Attention by design: using attending checks to detect inattentive respondents and improve data quality. J. Oper. Manag. 53–56, 63–70 (2017). https://doi.org/ten.1016/j.jom.2017.06.001
-
Aguinis, H., et al.: MTurk research: review and recommendations. J. Manag. 47(four), 823–837 (2021). https://doi.org/10.1177/0149206320969787
-
Aker, A., et al.: Assessing crowdsourcing quality through objective tasks. In: Proceedings of the 8th International Briefing on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 1456–1461. European Linguistic communication Resources Association (ELRA) (2012)
-
Andersen, D., Lau, R.: Pay rates and subject area performance in social science experiments using crowdsourced online samples. J. Exp. Polit. Sci. 5(3), 217–229 (2018). https://doi.org/x.1017/XPS.2018.seven
-
Barends, A.J., de Vries, R.E.: Noncompliant responding: comparing exclusion criteria in MTurk personality research to improve information quality. Pers. Individ. Differ. 143, 84–89 (2019). https://doi.org/ten.1016/j.paid.2019.02.015
-
Berry, D.T.R., et al.: MMPI-ii random responding indices: validation using a cocky-study methodology. Psychol. Assess. iv(3), 340–345 (1992). https://doi.org/ten.1037/1040-3590.4.3.340
-
Buhrmester, Grand., et al.: Amazon's Mechanical Turk: a new source of inexpensive, even so high-quality, data? Perspect. Psychol. Sci. half dozen(1), iii–5 (2011). https://doi.org/x.1177/1745691610393980
-
Casler, Yard., et al.: Separate but equal? A comparison of participants and information gathered via Amazon'south MTurk, social media, and face-to-face behavioral testing. Comput. Hum. Behav. 29(vi), 2156–2160 (2013). https://doi.org/10.1016/j.chb.2013.05.009
-
Chandler, J., Mueller, P., Paolacci, G.: Nonnaïveté among Amazon Mechanical Turk workers: consequences and solutions for behavioral researchers. Behav. Res. Methods 46(ane), 112–130 (2013). https://doi.org/10.3758/s13428-013-0365-7
-
Chen, J.J., et al.: Opportunities for crowdsourcing inquiry on Amazon Mechanical Turk. Presented at the CHI 2011 Workshop on Crowdsourcing and Homo Ciphering. https://www.humancomputation.com/crowdcamp/chi2011/papers/chen-jenny.pdf. Accessed nine June 2021
-
Chmielewski, Thou., Kucker, S.C.: An MTurk crisis? Shifts in information quality and the impact on study results. Soc. Psychol. Pers. Sci. eleven(four), 464–473 (2019). https://doi.org/10.1177/1948550619875149
-
Crump, M.J.C., et al.: Evaluating Amazon's Mechanical Turk every bit a tool for experimental behavioral enquiry. PLoS ONE 8(three), 1–18 (2013). https://doi.org/10.1371/periodical.pone.0057410
-
Difallah, D., et al.: Demographics and dynamics of Mechanical Turk workers. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, New York, NY, U.s., pp. 135–143. Association for Calculating Machinery (2018). https://doi.org/ten.1145/3159652.3159661
-
Fort, Yard., et al.: Amazon Mechanical Turk: gilt mine or coal mine? Comput. Linguist. 37(two), 413–420 (2011). https://doi.org/ten.1162/COLI_a_00057
-
Goodman, J.K., et al.: Data collection in a flat earth: the strengths and weaknesses of Mechanical Turk samples. J. Behav. Decis. Mak. 26(3), 213–224 (2013). https://doi.org/10.1002/bdm.1753
-
Hara, One thousand., et al.: Worker demographics and earnings on Amazon Mechanical Turk: an exploratory analysis. In: Extended Abstracts of the 2019 CHI Briefing on Man Factors in Calculating Systems, New York, NY, USA, pp. i–vi. ACM Inc. (2019). https://doi.org/x.1145/3290607.3312970
-
Hauser, D.J., Schwarz, N.: Attentive Turkers: MTurk participants perform better on online attention checks than do subject area pool participants. Behav. Res. Methods 48(1), 400–407 (2015). https://doi.org/10.3758/s13428-015-0578-z
-
Kingdom of the netherlands, A.: How estimated reading times increase engagement with content. https://marketingland.com/estimated-reading-times-increase-engagement-79830. Accessed 9 June 2021
-
Horton, J.J., Chilton, L.B.: The labor economic science of paid crowdsourcing. In: Proceedings of the 11th ACM Conference on Electronic Commerce, Cambridge, Massachusetts, The states, pp. 209–218. ACM Inc. (2010). https://doi.org/10.1145/1807342.1807376
-
Hydock, C.: Assessing and overcoming participant dishonesty in online data collection. Behav. Res. Methods 50(4), 1563–1567 (2017). https://doi.org/10.3758/s13428-017-0984-5
-
Johnson, J.A.: Ascertaining the validity of individual protocols from Web-based personality inventories. J. Res. Pers. 39(1), 103–129 (2005). https://doi.org/10.1016/j.jrp.2004.09.009
-
Kung, F.Y.H., et al.: Are attention check questions a threat to scale validity? Applied Psychology: An International Review. 67(2), 264–283 (2018). https://doi.org/10.1111/apps.12108
-
Laerd Statistics: Binomial Logistic Regression using SPSS Statistics, https://statistics.laerd.com/spss-tutorials/binomial-logistic-regression-using-spss-statistics.php#procedure, last accessed 2020/11/29.
-
Levay, K.E., et al.: The demographic and political limerick of Mechanical Turk samples. SAGE Open half dozen, ane (2016). https://doi.org/x.1177/2158244016636433
-
Litman, L., Robinson, J., Rosenzweig, C.: The relationship between motivation, monetary compensation, and data quality among US- and Bharat-based workers on Mechanical Turk. Behav. Res. Methods 47(2), 519–528 (2014). https://doi.org/10.3758/s13428-014-0483-10
-
Lufi, D., Haimov, I.: Effects of age on attention level: Changes in performance between the ages of 12 and 90. Aging Neuropsychol. Cogn. 26(vi), 904–919 (2019). https://doi.org/10.1080/13825585.2018.1546820
-
Masuda, T.: Culture and attention: recent empirical findings and new directions in cultural psychology. Soc. Pers. Psychol. Compass 11(12), e12363 (2017). https://doi.org/10.1111/spc3.12363
-
Meade, A.W., Craig, South.B.: Identifying careless responses in survey information. Psychol. Methods 17(three), 437–455 (2012). https://doi.org/10.1037/a0028085
-
Okumus, B., et al.: Psychological factors influencing customers' credence of smartphone nutrition apps when ordering nutrient at restaurants. Int. J. Hosp. Manag. 72, 67–77 (2018). https://doi.org/ten.1016/j.ijhm.2018.01.001
-
Palan, Due south., Schitter, C.: Prolific.ac—a discipline pool for online experiments. J. Behav. Exp. Financ. 17, 22–27 (2018). https://doi.org/ten.1016/j.jbef.2017.12.004
-
Paolacci, Chiliad., et al.: Running experiments on Amazon Mechanical Turk. Judgm. Decis. Mak. 5(5), 411–419 (2010)
-
Peer, Eastward., Vosgerau, J., Acquisti, A.: Reputation equally a sufficient condition for data quality on Amazon Mechanical Turk. Behav. Res. Methods 46(iv), 1023–1031 (2013). https://doi.org/ten.3758/s13428-013-0434-y
-
Pittman, M., Sheehan, G.: Amazon's Mechanical Turk a digital sweatshop? Transparency and accountability in crowdsourced online research. J. Media Ethics 31(4), 260–262 (2016). https://doi.org/10.1080/23736992.2016.1228811
-
Rand, D.G.: The promise of Mechanical Turk: how online labor markets can aid theorists run behavioral experiments. J. Theor. Biol. 299, 172–179 (2012). https://doi.org/10.1016/j.jtbi.2011.03.004
-
Rondan-Cataluña, F.J., et al.: A comparing of the different versions of popular applied science acceptance models: a non-linear perspective. Kybernetes 44(v), 788–805 (2015). https://doi.org/10.1108/Thousand-09-2014-0184
-
Ross, J., et al.: Who are the crowdworkers? Shifting demographics in Mechanical Turk. In: CHI 2010 Extended Abstracts on Human Factors in Computing Systems, New York, NY, Us, pp. 2863–2872. Association for Computing Machinery (2010). https://doi.org/x.1145/1753846.1753873
-
Rouse, S.V.: A reliability analysis of Mechanical Turk data. Comput. Hum. Behav. 43, 304–307 (2015). https://doi.org/10.1016/j.chb.2014.11.004
-
Salinas-Segura, A., Thiesse, F.: Extending UTAUT2 to explore pervasive information systems. In: Proceedings of the 23rd European Conference on Information Systems, Münster, DE, pp. ane–17. Association for Information Systems (2015). https://doi.org/10.18151/7217456
-
Schmidt, G.B., Jettinghoff, W.Thou.: Using Amazon Mechanical Turk and other compensated crowdsourcing sites. Bus. Horiz. 59(four), 391–400 (2016). https://doi.org/10.1016/j.bushor.2016.02.004
-
Stewart, N., et al.: The average laboratory samples a population of 7,300 Amazon Mechanical Turk workers. Judgm. Decis. Mak. 10(5), 479–491 (2015)
-
Stieninger, M., et al.: Factors influencing the organizational adoption of deject computing: a survey among cloud workers. Int. J. Inf. Syst. Proj. Manag. 6(i), v–23 (2018)
-
Stone, A.A., et al.: MTurk participants have substantially lower evaluative subjective well-being than other survey participants. Comput. Hum. Behav. 94, ane–8 (2019). https://doi.org/10.1016/j.chb.2018.12.042
-
U.S. Department of Labor: Minimum Wage. https://www.dol.gov/general/topic/wages/minimumwage. Accessed 25 Nov 2020
-
Vannette, D.: Using attention checks in your surveys may harm data quality. https://www.qualtrics.com/blog/using-attention-checks-in-your-surveys-may-damage-data-quality/. Accessed 07 Jan 2021
-
Venkatesh, V., et al.: Consumer acceptance and apply of information technology: extending the unified theory of credence and utilize of technology. MIS Q. 36(i), 157–178 (2012). https://doi.org/10.2307/41410412
-
Venkatesh, Five., Bala, H.: Technology acceptance model iii and a research agenda on interventions. Decis. Sci. 39(two), 273–315 (2008). https://doi.org/ten.1111/j.1540-5915.2008.00192.ten
-
Versta Research: How to Approximate the Length of a Survey. https://verstaresearch.com/newsletters/how-to-approximate-the-length-of-a-survey/. Accessed 10 Apr 2020
-
Yang, H.C., Wang, Y.: Social sharing of online videos: examining American consumers' video sharing attitudes, intent, and behavior. Psychol. Marking. 32(9), 907–919 (2015). https://doi.org/10.1002/mar.20826
-
Yoo, W., et al.: Drone delivery: factors affecting the public's attitude and intention to adopt. Telematics Inform. 35(6), 1687–1700 (2018). https://doi.org/10.1016/j.tele.2018.04.014
-
Zack, E.S., et al.: Can nonprobability samples be used for social science research? A cautionary tale. Surv. Res. Methods xiii, 215–227 (2019)
-
Zimmerman, J., et al.: Field trial of tiramisu: crowd-sourcing bus arrival times to spur co-design. In: Proceedings of the SIGCHI Conference on Human Factors in Calculating Systems, New York, NY, Usa, pp. 1677–1686. Clan for Computing Machinery (2011). https://doi.org/10.1145/1978942.1979187
Author information
Affiliations
Corresponding author
Rights and permissions
Open Access This chapter is licensed nether the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits utilize, sharing, accommodation, distribution and reproduction in whatsoever medium or format, as long every bit you requite appropriate credit to the original author(s) and the source, provide a link to the Artistic Eatables license and bespeak if changes were made.
The images or other 3rd party material in this chapter are included in the chapter'due south Creative Eatables license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended utilize is not permitted by statutory regulation or exceeds the permitted use, you volition need to obtain permission directly from the copyright holder.
Reprints and Permissions
Copyright information
© 2021 The Writer(s)
Nigh this newspaper
Cite this paper
Saravanos, A., Zervoudakis, S., Zheng, D., Stott, Northward., Hawryluk, B., Delfino, D. (2021). The Hidden Cost of Using Amazon Mechanical Turk for Inquiry. In: , et al. HCI International 2021 - Tardily Breaking Papers: Blueprint and User Feel. HCII 2021. Lecture Notes in Computer Scientific discipline(), vol 13094. Springer, Cham. https://doi.org/10.1007/978-3-030-90238-4_12
Download citation
- .RIS
- .ENW
- .BIB
-
DOI : https://doi.org/x.1007/978-3-030-90238-4_12
-
Published:
-
Publisher Name: Springer, Cham
-
Print ISBN: 978-three-030-90237-7
-
Online ISBN: 978-three-030-90238-iv
-
eBook Packages: Information science Computer Scientific discipline (R0)
Source: https://link.springer.com/chapter/10.1007/978-3-030-90238-4_12
0 Response to "How to Upload Pictures for Amazon Mechcnail Turk"
Post a Comment