I’ve been very quite here lately, which was the result of a relocating back to Europe and directing efforts into more personal projects. Recently, however, I returned to science in a most enjoyable way by assisting in a workshop on the reproducibility crisis in science.
You might have hear about this problem before. In short, it reserachers found that a large proportion of scientific findings in the fields of medicine, psychology, economics and other fields were not able to be replicated with the same analysis as from the original paper (read more here).
In the light of this survey, the US Defense Advanced Research Projects Agency initiated a program for Systematizing Confidence in Open Research and Evidence (SCORE). This project aims to develop automated tools to assign “confidence scores” to research outcomes and claims from different social and behavioural sciences. This confidence score then should identify how likely it is that a particular claim can be reproduced.
There are several different steps in the SCORE program. First, a group of scientists at the Centre for Open Science created a database of 30,000 claims from peer-reviewed and published papers and extracted statements from these papers which support the claim.
In a second step, experts will predict the likelihood that a claim will replicate for 3,000 published claims. This is the repliCATS part of SCORE. It is led by Fiona Fidler and consists of a group of researchers from the University of Melbourne in collaboration with the Centre for Environmental Policy at Imperial College London.
Third, algorithms will be developed to see if the reproducibility likelihood can be predicted artificially. Finally, about 300 claims will be attempted to be replicated, allowing both the humans’ and the computers’ efforts to be measured and scored. The whole project is a massive attempt to help decision makers (and others) to get an idea about how general a scientific claim is, without them having to understand the whole article.
My part in this is tiny, but I certainly became passionate about the project quickly. In the last two days, I was helping the crew around the repliCATS project with a huge workshop in which 575 claims were assessed by 30 groups with 5 – 7 researchers from psychology fields. As a facilitator, I was directing my group through the assessment process which is based on the IDEA protocol (Investigate, Discuss, Estimate and Aggregate) – a protocol originally developed at the University of Melbourne to elicit more accurate estimates from experts under uncertainty.
The process of elicitation was separated into two rounds. In the first round the participants independently investigated a claim and provided a personal judgement on their perspective of the replicability of the claim and they commented on their reasoning for their judgement. Afterwards the experts saw the judgement of their peers, the associated comments and the aggregated judgement. After a discussion on their views on the claim, they were allowed to revise their previous estimate and describe why their thinking had changed.
Personnally, I was quite nerveous about the facilitation process at first. But once I realised that my group was active, interested and genuinly nice, it was a lot of fun. Five PhD students from different parts of the world assessed the likelihood of claims from research in customer behaviour to education to criminology and many other fields.
Generally most claims seemed quite plausible to us, but we frequently were unsure about the reasoning behind some methods. For example, while it seems reasonable that you’re more satisfied with your relationship if you like your partner more, does a preference for your partners initials really show that you implicitly like your partner more? Questionable yes. But possible? Not sure. The authors of the latter claim definitely deserve recognition as they starte their paper quoting Shakespear’s Juliette, causing a lot of entertainment for us after assessing the dry statistics of 17 articles.
Another claim worth mentioning was that extroverts have more extroverted friends, while an introvert’s friends group is a more accurate representation of the study population (an new MBA class). Generally, we were amazed at the range of topics that only our 22 claims covered. We’ll be interested to see if one of the claims that we assessed will be chosen for replication.
For more information and the progress of the study follow @replicats on twitter.