A recent study in Nature Machine Intelligence, published by the European Commission’ Joint Research Centre jointly with the University of Cambridge, the Universitat Politècnica de València and the Universidad de Oviedo analyses how benchmarking is transforming the Artificial Intelligence (AI) scientific research and its concrete applications in different fields.
A benchmark is ‘a point of reference for measuring the performance of any new AI system, algorithm, and method’.
With a strong boom since 2016, benchmarking has been gradually replacing the traditional methods of evaluating scientific outputs (peer-review, etc.), becoming one of the major indicators of the quality of many new research papers.
What consequences of the use of benchmarking
One consequence of this change is that outperforming (breakthroughs) existing benchmarks has become the major indicator of innovation in the field of AI research.
Another consequence is that this leads to the creation of communities of long-term collaboration in AI research, beyond the standard co-authorship relations.
Among these, the hybrid ones that include both universities and companies, seem to be the most likely to create breakthroughs in AI benchmarks, and therefore, they tend to lead the innovation.
Another important finding is that the presence of tech giants (Google, Microsoft and Facebook, among others) in a certain research community makes it more likely to be successful to outperform current benchmarks.
These dynamics shape the innovation in the AI field, including the ways they will develop into scientific applications in many fields.
Research-generation of benchmark dynamic
The study also notices that the more influential particular benchmarks become, the more careers and project funding may depend on good performance on them.
Conversely, very quick progress on a benchmark may be a sign that the field is ‘overfitting’ on its performance, which may not translate ideally to the real world or to research challenges they are intended to be a benchmark for.
This leads to a two-way dynamic: not only do benchmarks influence research, but research influences the new generation of benchmarks, in a ‘challenge-solve-and-replace’ evaluation dynamic.
Variety of dimensions for measuring breakthroughs
The main conclusion of the study is that the measurement of a breakthrough should include a variety of dimensions and not only the outperforming of an existing benchmark.
This approach would help preventing the development of an incremental research, and identifying innovative research able to push forward the concrete applicability of AI in a meaningful way for the society.
This study has been published in the context of AI WATCH, the European Commission knowledge service to monitor the development, uptake, and impact of AI for Europe, which also aims at providing a better understanding of the evolution and progress of the discipline.
Share this page