Assessing Risks and Impacts of AI

A compelling set of scenarios will aim to explore risks and related impacts across three levels of testing: model testing, red-teaming, and field testing.

Evaluation Plan

Image of two people discussing measurement science and trustworthy AI

AI Challenge Problem Overview

The latest in a portfolio of evaluations managed by the NIST Information Technology Laboratory – ARIA will assess models and systems submitted by technology developers from around the world. ARIA is an evaluation environment which is sector and task agnostic.

ARIA will support three evaluation levels: model testing, red-teaming, and field testing. ARIA is unique in that it will move beyond an emphasis on system performance and accuracy and produce measurements on technical and contextual robustness.

The program will result in guidelines, tools, methodologies, and metrics that organizations can use for evaluating their systems and informing decision making regarding positive or negative impacts of AI deployment.

Learn more

ARIA

The initial evaluation (ARIA) will be conducted as a pilot effort to fully exercise the NIST ARIA test environment. ARIA will focus on risks and impacts associated with large language models (LLMs). Future iterations of ARIA may consider other types of generative AI technologies such as text-to-image models, or other forms of AI such as recommender systems or decision support tools. A compelling and exploratory set of tasks will aim to elicit pre-specified (and non-specified) risks and impacts across three levels of testing: model testing, red-teaming, and field testing.

Schedule

May 2024 ARIA Kickoff
May 2024 Evaluation Plan Release
August 2024 Evaluation Plan Update
November 12, 2024 Workshop 1: ARIA Kickoff
December 2024 Start of Pilot Testing
January 2025 End of Pilot Testing
February-May 2025 Pilot Analysis
Summer 2025 Pilot Summary Report Out

ARIA Email Distribution List

Join to receive important ARIA program news and announcements

Join

ARIA Resources	FAQ
Coming soon Participant Login	AI Resource Center

Assessing Risks and Impacts of AI

AI Challenge Problem Overview

ARIA

Schedule

Grid of related links

ARIA Email Distribution List