Different conclusions can arise from identical data, a recent Biomedical Center (BMC) biology study reveals. Jessica Abbott, professor of biodiversity and evolution at Lund University, contributed in the study. Her analysis revealed a clear pattern, while others found different results. The study raises important questions about the reliability of scientific results. Can we really trust them?
The ability to repeat experiments and get the same results has always been an important part of science. But there is another issue – even the same data can lead to different conclusions.
– The question is how big of a difference it makes just how you analyse the data. With complex datasets it is not always clear what the best method is and there are many decisions that researchers have to make, says Jessica Abbott.
In biological research, fieldwork often generates vast datasets with many possible methods and models of analysis. Which parameters scientists include, how they handle outliers, and the statistical model they choose can all significantly influence the final conclusions. To investigate this variability in data interpretation, the BMC biology study was designed. This was a large-scale effort, spanning several years, and involving 174 analyst teams from different countries and universities. Both teams were given identical research questions and data sets – one on blue tit nestling growth and the other on eucalyptus tree recruitment – but were allowed to use different methodological approaches, and their resulting findings were compared.
Common research biases
+Model selection bias: The outcome changes depending on which statistical model is chosen.
Confirmation bias: Prioritising data that supports pre-existing beliefs or hypotheses while ignoring contradictory evidence.
Overfitting: Creating models so complex they capture random noise instead of the underlying trend.
Reporting bias: Only certain results are published or highlighted, often those that are positive or statistically significant.
P-hacking: Tweaking analyses or trying multiple methods until statistically significant results emerge.
Source: National Institutes of Health (NIH)
Jessica Abbott analysed the blue tit dataset and peer-reviewed results from the eucalyptus dataset. Her approach was to first pick out a few models that seemed reasonable and then use a model comparison tool to find which fit the data best. She observed a strong relationship between sibling competition in bird nests and the weight gain of nestlings. However, not all research groups found significant results or even reported opposite correlation.
– It is surprising that they could build a model that gave opposite results. My guess is that they included parameters that masked the effects, Jessica Abbott says.
Another surprising result from the study regarded random effects. These are variables that may affect the outcome of an experiment but are not necessarily relevant for the research question. For example, for the blue tit data, a random factor could be year-to-year variation between bird growth. Many biological researchers have argued that accounting for random effects lead to better results. However, there was nothing that indicated this in the study.
– It is interesting, because the inclusion of random factors has been heavily debated, and now it was shown that it actually does not matter. But we also do not know what the best or true result is in this case. The mean value of all analysis was considered the truth, and that is a simplification, Jessica Abbott adds.
So how could research be trusted?
– If there are multiple studies with the same findings, there is no reason to be worried. The important thing is that the results can be reproduced, she says.
Jessica Abbott also emphasises that in her experience, if an effect is strong, it will show up no matter what analytical method is used. The question of different outcomes mostly concerns datasets with less clear patterns. But one should be cautious and avoid overinterpreting findings.
– There is a known bias in scientific literature where new, striking results have a bigger chance to get published. For articles in leading scientific journals, it is not unusual that the effect is the strongest in the first study, Jessica Abbott says.
– It is quite common for the initial results to be overestimated, and that later studies show that the effect was not as striking as it first seemed. The top ranked journals also have the most retractions, meaning that articles are withdrawn because of this.
The authors of the study argue that in the future, many different analytical methods must be used to increase the reliability of results. Jessica Abbott supports this but points out that it also comes with a danger, namely p-hacking. P-hacking is when researchers select and only report the most striking result.
At the same time, Jessica Abbott sees a growing awareness and transparency in the research community. Raw data and the analysis code are now usually required and available. By sharing the actual computer script used to process the data, researchers make their methods more accessible and reproducible. Additionally, peer reviewers often request that other analytical methods are tested before accepting an article.
Pre-registered studies could also be a way to increase transparency. This BMC biology study, Jessica Abbott explains, is an example of that, where the authors publish their hypothesis and methods before conducting the study. Any deviations from the original plan then must be reported and explained.
– It is also about trust. That is how it is in the research world – we have to trust each other that we are trying to do the right thing, she says.