This workstream aims to give European regulators and market authorities their own means to cross-evaluate AI systems, in particular, general-purpose AI systems (GPAIS) independently from industry. It also seeks to provide developers with the facilities to measure various metrics of AI systems (accuracy, robustness, generality, interpretability, etc.) in rigorous, standardized experiments. To do so, it aims to bring various EU member states’ capabilities in metrology together to start establishing benchmarks relevant to the EU AI Act. Thanks to our efforts, the AI Act includes clauses to ensure the creation of a pool of metrology labs, coordinated at the EU level, to fulfil that purpose.
In February 2020, the European Commission published a White Paper on Artificial Intelligence. In response, TFS published a memo titled “Experimentation, testing & audit as a cornerstone for trust and excellence,” advocating for numerous changes, including the development of means to benchmark AI technologies in a coherent way, through experiments, tests and evaluations.
When the European Commission’s proposed AI Act was unveiled in April 2021, benchmarking was not included, though relevant functions were implied via requirements pertaining to AI systems’ levels of “accuracy” and “robustness.” To aid in clarifying this aspect, in August 2021, TFS published a memo titled “Trust in Excellence & Excellence in Trust,” which put forward recommendations for testing and experimentation facilities, such as the development of test beds and benchmarking protocols.
In late 2021, under the banner of The Athens Roundtable, TFS launched the Working Group on Interoperable Benchmarking of AI Systems, bringing together members from major standards organizations and technical communities—including U.S. NIST, CEN-CENELEC, IEEE, LNE, VDE, and Greece’s National Centre of Scientific Research—with the goal of producing interoperable benchmarks that assess the extent to which AI-enabled systems do, in fact, meet their intended real-world purpose.
Throughout 2021, 2022 and early 2023, TFS has continued to build capacity and understanding among policymakers about the critical role of independent benchmarking in AI governance, notably through our work on standard-setting and on the NIST Risk Management Framework. We welcome the European Parliament’s decision of June 2023 to carry out our recommendations regarding the establishment of an EU-level pool of benchmarking authorities, in particular for addressing general-purpose AI systems. We remain available to contribute independent expertise and opinions to the development of benchmarking capabilities in the EU, independent from industry.
Related resources
Policy achievements in the EU AI Act
Policy achievements in the EU AI Act
The draft AI Act approved by the European Parliament contains a number of provisions for which TFS has been advocating, including a special governance regime tailored to general-purpose AI systems. Collectively, these operationalize safety, fairness, accountability, and transparency in the development and deployment of AI systems.
2021 Edition of the Athens Roundtable on Artificial Intelligence and the Rule of Law
2021 Edition of the Athens Roundtable on Artificial Intelligence and the Rule of Law
An international, cross-organizational dialogue on how to uphold the rule of law in the age of AI.
Input on Europe’s future AI policies
Input on Europe’s future AI policies
The Future Society recommends "Experimentation, testing & audit as a cornerstone for trust and excellence" in response to the European Commission's White Paper on AI.