Skip to main content

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Evaluating Generative AI Technologies

A NIST evaluation program to support research in Generative AI technologies.

NIST GenAI Overview

NIST GenAI is a new evaluation program administered by the NIST Information Technology Laboratory to assess generative AI technologies developed by the research community from around the world. NIST GenAI is an umbrella program that supports various evaluations for research and measurement science in Generative AI by providing a platform for Test and Evaluation.

The objectives of the NIST GenAI evaluation include but are not limited to:

  • Providing a T&E platform to measure and understand the capabilities and limitations of AI models for multiple modalities,
  • Conducting adversarial evaluation between Generative AI and Discriminative AI with relevant metrics and analyses,
  • Evolving benchmark dataset creation,
  • Prompting effect on generating both credible and misleading content,
  • Facilitating the development of AI technologies for identifying content authenticity, as well as the source of fake or misleading information, and
  • Conducting human studies to compare human performance with AI system performance.

GenAI Ongoing Evaluations

NIST GenAI program provides rigorous, science-based testing and evaluation (T&E) of Generators (generative AI), Detectors (discriminative AI), and Prompters (prompt engineering) across multiple modalities (text, image, code, audio, and video).

GenAI is an adversarial testing framework: Generators create high-quality synthetic data (AI content) using frontier AI models, while the detectors develop AI tools to identify whether the content is AI-generated and believable. Prompters provide a different quality of content by employing various prompting strategies.

Our study aims to measure and understand AI system behavior, particularly focusing on the performance gap between generation and detection.

Our first pilot challenge, on text summarization (NIST AI 700-1), demonstrated that while strong detectors exist, three generators produced summaries that fooled every detector, highlighting the urgent need for trustworthy AI safeguards.

Our ongoing evaluations are: Image (Indistinguishability): Can you tell if the content is generated by AI or human? Text (Believability): How convincing and credible is the content? Code (Reliability): Can AI generate code for testing software reliably?

The evaluations provide valuable lessons and insights for future research on advanced AI technologies, help inform AI standards, and guide responsible and safe use of digital content.

Schedule
Nov 14
2025
Schedule Updates
Dec 5
2025
Code Round-2 Available
Dec 11
2025
Image D-Round-2 Sub Deadline
Jan 12
2026
Text Registration Open