Assessing Risks and Impacts of AI
A compelling set of scenarios will aim to explore risks and related impacts across three levels of testing: model testing, red-teaming, and field testing.
AI Challenge Problem Overview
The latest in a portfolio of evaluations managed by the NIST Information Technology Laboratory – ARIA will assess models and systems submitted by technology developers from around the world. ARIA is an evaluation environment which is sector and task agnostic.
ARIA will support three evaluation levels: model testing, red-teaming, and field testing. ARIA is unique in that it will move beyond an emphasis on system performance and accuracy and produce measurements on technical and contextual robustness.
The program will result in guidelines, tools, methodologies, and metrics that organizations can use for evaluating their systems and informing decision making regarding positive or negative impacts of AI deployment. ARIA will inform the work of the U.S. AI Safety Institute at NIST.
ARIA 0.1
The initial evaluation (ARIA 0.1) will be conducted as a pilot effort to fully exercise the NIST ARIA test environment. ARIA 0.1 will focus on risks and impacts associated with large language models (LLMs). Future iterations of ARIA may consider other types of generative AI technologies such as text-to-image models, or other forms of AI such as recommender systems or decision support tools. A compelling and exploratory set of tasks will aim to elicit pre-specified (and non-specified) risks and impacts across three levels of testing: model testing, red-teaming, and field testing.
ARIA Email Distribution List
Join to receive important ARIA program news and announcements
Join