Informatics Colloquium: Human Factors and Bias in Crowdsourced Information Retrieval EvaluationColloque / Congrès / Forum Public-cible: Ouvert au grand public
In Information Retrieval evaluation, the classic approach of adopting binary relevance judgments has been replaced by multi-level relevance judgments and by gain-based metrics leveraging such multi-level judgment scales.
In this talk I will present recent research where we explore different relevance scales to make judgements more natural for crowd assessors. Our results show that a 100-level relevance scale maintains the flexibility of unbounded scales in providing assessors with ample choice when judging document relevance. It also allows assessors to judge on a more familiar scale and to perform efficiently since the very first judging task.
I will then discuss research on how users perceive bias in search results, and the degree to which their perceptions differ and/or might be predicted based on user attributes.
Finally, I will discuss how crowd workers could undermine data quality. One of the most popular quality assurance mechanisms in crowdsourcing is based on gold questions: the use of a small set of tasks of which the requester knows the correct answer and, thus, is able to directly assess crowd work quality. We show that such mechanism is prone to an attack carried out by a group of colluding crowd workers that is easy to implement and deploy).
Site PER 21
/ Salle A230
Bd de Pérolles 90, 1700 Fribourg
Dr. Gianluca Demartini