Evaluating Generative AI Technologies
A NIST evaluation program to support research in Generative AI technologies.
GenAI Challenge Problem Overview
NIST GenAI is a new evaluation program administered by the NIST Information Technology Laboratory to assess generative AI technologies developed by the research community from around the world. NIST GenAI is an umbrella program that supports various evaluations for research in Generative AI by providing a platform for Test and Evaluation. These evaluations will inform the work of the U.S. AI Safety Institute at NIST.
The objectives of the NIST GenAI evaluation include but are not limited to:
- Evolving benchmark dataset creation,
- Facilitating the development of content authenticity detection technologies for different modalities (text, audio, image, video, code),
- Conducting a comparative analysis using relevant metrics, and
- Promoting the development of technologies for identifying the source of fake or misleading information.
NIST GenAI Pilot
The pilot study aims to measure and understand system behavior for discriminating between synthetic and human-generated content in the text-to-text (T2T) and text-to-image (T2I) modalities. This pilot addresses the research question of how human content differs from synthetic content, and how the evaluation findings can guide users in differentiating between the two. The generator task creates high-quality outputs while the discriminator task detects if a target output was generated by AI models or humans.
Generator teams will be tested on their system's ability to generate synthetic content that is indistinguishable from human-produced content.
Discriminator teams will be tested on their system's ability to detect synthetic content created by generative AI models including large language models (LLMs) and deepfake tools.
Pilot evaluations provide valuable lessons for future research on cutting-edge technologies and guidance for responsible and safe use of digital content.