GenAI

GenAI: Image Challenge

Evaluating generators for creating AI images and discriminators for detecting AI-generated images.

Overview

NIST GenAI Image Challenge is an evaluation series that supports research in Generative AI Image modality. Which generative AI models are capable of producing synthetic content that can deceive the best discriminators as well as humans? The performance of generative AI models can be measured by (a) humans and (b) discriminative AI models. To evaluate the "best" generative AI models, we need the most competent humans and discriminators. The most proficient discriminators are those that possess the highest accuracy in detecting the "best" generative AI models. Therefore, it is crucial to evaluate both generative AI models (generators) and discriminative AI models (discriminators).

What

The Image Generators (Image-G) task is to automatically generate high-quality images given a statement of information needed ("topic"). For more details, please refer to the generator data specification.

The Image Discriminators (Image-D) task is to detect if a target output image has been generated using a Generative AI system or a Human. For more details, please see the discriminator evaluation plan.

Who

We welcome and encourage teams from academia, industry, and other research labs to contribute to Generative AI research through the GenAI platform. The platform is designed to support various modalities and technologies, including both "Generators" and "Discriminators".
Generators will supplement the evaluation test material with their own AI-generated content based on the given task (e.g., automatic generation of images). These participants will use cutting-edge tools and techniques to create synthetic content. By incorporating this data into our test material, our test sets will evolve in pace with technology advancements. In the GenAI pilot, generators do “well” when their synthetic content is not detected by humans or AI detectors.
Discriminators are automatic algorithms identifying whether a piece of media (text, audio, image, video, code) originated from generative AI or a human. In the GenAI pilot, discriminators do “well” when correctly categorizing the test material produced by AI or Humans.

How

To take part in the GenAI evaluations you need to register on this website and complete the data license to download/upload the data. NIST will make all necessary data resources available to the generator and discriminator participants. Each team will receive access to data resources upon completion of all needed data agreement forms and based on the published schedule of each task data release date. Please refer to the published schedule for data release dates. Once your system is functional, you will be able to upload your data (generators) or system outputs (discriminators) to the challenge website and see your results displayed on the leaderboard.

Task Coordinator

If you have any questions, please email to the NIST GenAI team

Schedule

Date	Generators (G)	Discriminators (D)
March 13, 2025	Data Specification available	Evaluation Plan available
March 19, 2025	Registration period open NIST topics available (G-Testset-1) Building a generator system (1)	Registration period open
June 13, 2025	Round-1 data submission is due	Building a detection system (1)
June 27, 2025	Building a generator system (2)	Receive D-Testset-1
July 28, 2025	Building a generator system (2)	System output submission deadline on the D-Testset-1
August 4, 2025	Leaderboard round-1 results NIST topics available (G-Testset-2)	Leaderboard round-1 results
September 8, 2025	Round-2 data submission deadline	Building a detection system (2)
September 22, 2025	Building a generator system (3)	Receive D-Testset-2
October 27, 2025	Building a generator system (3)	System output submission deadline on the D-Testset-2
November 10, 2025	Leaderboard round-2 results NIST topics available (G-Testset-3)	Leaderboard round-2 results
December 8, 2025	Round-3 data submission is due	Building a detection system (3)
December 19, 2025		Receive D-Testset-3
January 19, 2026		System output submission deadline on the D-Testset-3
January 30, 2026	Leaderboard round-3 results	Leaderboard round-3 results
March, 2026	GenAI pilot evaluation workshop

GenAI Image Challenge Evaluation Rules

Participation in the GenAI evaluation program is voluntary and open to all who find the task of interest and are willing and able to abide by the rules of the evaluation. To fully participate, a registered site must:

become familiar with and abide by all evaluation rules;
develop/enhance an algorithm that can process the required evaluation datasets;
submit the necessary files to NIST for scoring; and
attend the evaluation workshop (if one occurs) and openly discuss the algorithm and related research with other evaluation participants and the evaluation coordinators.

The goal of the GenAI-Image pilot is help participants to learn the strengths and weaknesses of their systems that could result in improvements to their systems. To encourage participants to focus on improving their systems and not think about ‘gaming’ the leaderboard, we will only display the various metrics in terms of ‘bins’ that they fall in. Full results will be revealed at the end of each round. In the 3rd round, we will only allow 5 submissions (at most) and the leaderboard will be closed until all submissions have been made
All teams must have signed and submitted their license agreements back to NIST in order to have their submissions accepted, scored and published to the leaderboard. Thus, please plan to submit any license ASAP.
Participants are free to publish results for their own system but must NOT publicly compare their results with other participants (ranking, score differences, etc.) without explicit written consent from the other participants and NIST.
While participants may report their own results, participants may NOT make advertising claims about their standing in the evaluation, regardless of rank, winning the evaluation, or claim NIST endorsement of their system(s). The following language in the U.S. Code of Federal Regulations (15 C.F.R. § 200.113(d)) shall be respected: NIST does not approve, recommend, or endorse any proprietary product or proprietary material. No reference shall be made to NIST or to reports or results furnished by NIST in any advertising or sales promotion which would indicate or imply that NIST approves, recommends, or endorses any proprietary product or proprietary material or which has as its purpose an intent to cause directly or indirectly the advertised product to be used or purchased because of NIST test reports or results.
At the conclusion of the evaluation, NIST may generate a report summarizing the system results for conditions of interest. Participants may publish or otherwise disseminate these charts unaltered and with appropriate reference to their source.
NIST disclaims any liability or responsibility for the presence of potentially objectionable, offensive, or otherwise inappropriate content within the dataset generated by various AI/LLM models participating in the evaluation. Given the volume of submissions and data, it is not feasible for the GenAI team to conduct an exhaustive review of every submission for such content. Nevertheless, the GenAI team is committed to maintaining a dataset that is respectful, safe, and free from objectionable material. To support this goal, a procedure has been established for addressing reports of offensive or inappropriate content. If a participant or other individual identifies potentially objectionable material within the dataset and reports it to the GenAI team, the team will promptly investigate. Upon determining, in its sole discretion, that the reported content is objectionable or otherwise inappropriate, the GenAI team reserves the right to remove such content to uphold the integrity, safety, and respectfulness of the dataset.
All NIST distributed data (eg. Images) to discriminator teams are licensed under Creative Commons ( Specifically CC BY, CC BY-NC, CC BY-NC-ND, CC BY-NC-SA, CC BY-ND, CC BY-SA, and CC0). All teams agree to comply with these licenses, and all usage of the distributed data should only be for research purposes.
The challenge participant agrees NOT to use publicly available NIST-released data to train their systems or tune parameters, however, they may use other publicly available data that complies with applicable laws and regulations to train their models.
The challenge participant agrees NOT to examine the test data manually or through human means, including analyzing the media and/or training their model on the test data, to draw conclusions from prior to the evaluation period to the end of the leaderboard evaluation.
All machine learning or statistical analysis algorithms must complete training, model selection, and tuning prior to running on the test data. This rule does NOT preclude online learning/adaptation during test data processing so long as the adaptation information is NOT reused for subsequent runs of the evaluation collection.
The participants agree to make at least one valid submission for participating tasks. Evaluation participants must do so to be included in downloading the next round of datasets.
The participants agree to have one or more representatives at the post-evaluation workshop, to present a meaningful description of their system(s). Evaluation participants must do so to be included in future evaluation participation.
New participants can join any round. Once they register for the evaluation, they cannot skip the next rounds. If they don’t submit in the current round, they cannot move on to the next round."

Resources

Trustworthy & Responsible AI Resource Center.

Image Discriminators Overview

The primary goal of the GenAI pilot is to understand system behavior detecting AI-generated vs. human-generated content.

The Image-D task is a detection task focused on determining if a target output image was generated using Generative AI models or humans.

For each Image-D trial consisting of a single image, the Image-D detection system must render a confidence score with a higher number indicating a higher likelihood that the target image was generated using AI models. The primary metric for measuring detection performance will be the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC), as well as the Equal Error Rate (EER), True Positive Rate (TPR) at a given False Positive Rate (FPR), and Brier score values for targets (AI-generated images) and non-targets (human images).

The GenAI pilot challenge provides data (test sets) created by both G-participants and the NIST GenAI team. This allows D-participants to develop and run a system on their own hardware platform. Discriminator participants can then submit their system outputs to a web-based leaderboard, where scores and results are displayed.

The data from G-participants will only be accessible to D-participants once the G-participants submit their data packages to NIST and the NIST GenAI team approves the data. NIST will report performance measures for D-participant system outputs, displayed through a leaderboard, using the G-participants data.

Please refer to the discriminator evaluation plan for the details.

Data resources will be available for download once the registration is open and the data release has been announced. NIST will also release GenAI Scorer and Format Validator scripts.

Image Discriminator Instructions

The participants are not allowed to use the generators’ topics files by any means to extract information that can assist their systems in predicting the testing image source. Teams who participate in both tasks (Generators and Discriminators) are required to take all needed precautions to isolate the generators' topic files from discriminator's system development.

During result analysis and for presentation of results, all discriminator only teams will be in a separate category for reporting results purposes, while teams who participate as a generator and discriminator will be in a separate category.

All images must be processed independently of each other within a given round of evaluation and across all evaluation rounds, meaning content extracted from the data must not affect the processing of data from another rounds’ data.

System Input File

For a given task, a system’s input is the task index file, called <modality_id>_<dataset_id>_<task_id>_index.csv. Given an index file, each row specifies a test trial. Taking the corresponding media (eg. texts or images) as input(s), systems perform detection tasks.

The following format constitutes the index file for the D-participant system input:

genai25_Image-D_detection_index.csv
DatasetID	(string) The ID of the dataset release (e.g., GenAI25-G1-Image-set1)
TaskID	(string) The globally unique ID of tasks. Tasks could be summarization, generation, translation, question-answering (e.g., Detection)
FileID	(string) The globally unique ID of the image trials (e.g., file_0001.webP)

Example of the CSV file with delimiter “|”.

DatasetID             | TaskID      | FileID         GenAI25-G1-Image-set1 | Detection   | file_0001.webp

System Output File

The system output file must be a CSV file with the separator “|”. Please include the optimal cutoff (threshold) for the confidence score binary classification in your file name. For example, "cutoff-50" means the threshold is 0.5 The filename for the output file must be a user-defined string that identifies the submission with no spaces or special characters besides ‘_-.’ (e.g., `genai25_image_d_sys_model-01_cutoff-50.csv`).

The system output CSV file for the Image-D detection task must follow the format below:

genai25_image_d_sys_model-01_cutoff-XX.csv
DatasetID	(string) The ID of the dataset release, e.g., GenAI25-G1-Image-set1
TaskID	(string) The unique ID of the task, e.g., Detection
DiscriminatorID	(string) The globally unique ID of Discriminator (D) participants, e.g. D-participant_001
ModelVersion	(string) The system model version on D-participant submission (e.g., MySystem_DallE)
FileID	(string) The globally unique ID of the image trials (e.g., file_0001.webp)
ConfidenceScore	(float) in the range [0,1], the larger, the more confidence that the output is AI generated

Example of the CSV file with delimiter “|”.

DatasetID             | TaskID    | DiscriminatorID  | ModelVersion |FileID       | ConfidenceScore GenAI25-G1-Image-set1 | Detection | D-participant_01 | MySys_DallE | file_0001.webp |  0.7

Validation

The FileID column in the system output [submission-file-name].csv must be consistent with the FileID in the <modality_id>_<dataset_id>_<task_id>_index.csv file. The row order may change, but the number of the files and file names from the system output must match the index file.

To validate your system output locally, D-Participants may use the validation script provided by NIST with the dataset release.

Submission

System output submission to NIST for subsequent scoring must be made through the web platform using the submission instructions described above. To prepare your submission, you will first make .tar.gz (or .tgz) file of your system output CSV file via the UNIX command ‘tar zcvf [submission_name].tgz [submission_file_name].csv’ and then upload the system output tar file under a new or existing ‘System’ label. This system label is a longitudinal tracking mechanism that allows you to track improvements to your specific technology over time.

Please ensure timely submission of your files to allow us sufficient time to address any transmission errors before the due date. Note that submissions received after the stated due dates for any reason will be marked late and may not be scored. Please refer to the published schedule for the details.

Please take into consideration that submitting your system outputs indicates and assumes your agreement to the Rules of Behavior.

For the best user experience with the scoreboard, please use Google Chrome.

coming soon..

Image Generators Overview

The primary goal of the pilot GenAI evaluations is to understand system behavior for creating AI-generated content. The Image-G participants will be given a list of topics in the form of textual descriptions for a set of images to be generated. Each text description from NIST GenAI team is expected to be approximately 3 sentences per image. Your task is to use Generative AI to automatically generate up to 10 realistic images per topic that could be viewed to satisfy the textual description of the topic.

Sex
Age
Lighting Conditions
Environment
Background
Facial Expression
Accessories/Appearance

There will be about 150 topics in the test data for generator teams. This set of generated images from all generator teams, in addition to supplemental data by GenAI NIST team will serve as the testing data for discriminator teams, who will work on detecting whether the content is human-generated or AI-generated.

The images generated will be evaluated by determining how easy or difficult it is to discriminate AI-generated images from human-generated images, i.e., the goal of generators is to generate images that are indistinguishable from human-generated images.

For more information and details about the task specifics for generator teams, please refer to the generator data specification

Data Generation Instructions

NIST GenAI team created a set of topics using a set of images covering a diverse set of attributes and categories (see sample images in specification plan). The goal of the descriptions and topics is to represent a real-world dataset of persons captured images as uploaded by general internet users. Topics will be distributed by NIST. Only GenAI generator participants who have completed and submitted all required data agreement forms will be allowed access. In total, there will be about 150 topics. As the example below shows, each topic includes an id (num), title, and the required topic prompt ( textual description ). All topics will be aggregated in one master xml file and released to Generator teams. Please check the published schedule for testing data release dates

Example of topic:

<topic>
                            <num> topic_5445 </num>
                            <title> Young Man  </title>
                            <prompt> A young Asian man with glasses looking upward to the sky  </prompt>
                            </topic>

Submission Guidelines

Each team may submit up to 10 runs per round. Each run should include a maximum of 10 images per topic. A submission run should be in the format of a single compressed zip file to include the run results xml file (see below for specifications), any sample images used as part of the prompt, and all images generated for all topics. No multiple zip files are allowed per run.
Please use the following folder naming conventions for your run submission: images_prompts : folder name (as applicable) to contain any images used as part of any topic. Please use the following descriptive file names format to easily identify topics that images have been part of (topic.1.prompt.1.webp, topic.1.prompt.2.webp, etc). images_generated : folder name to contain all images for all topics (e.g. topic.2.image.1.webp, topic.2.image.2.webp, … etc)
Each run should contain images for all topics; a run can not skip a topic or submit images for a subset of the topics.
All images must be submitted as webP lossless files, and follow the guidelines for aspect ratios and/or resolution allowed. Please see specification document for allowed values. EXIF metadata must be preserved in the images.
Please remember to dedicate 1 image out of the 10 to be generated using the NIST-provided textual description. The other 9 images should all be generated using a fixed prompt built automatically (not manually).
Modified system prompts may include sample images (i.e. final prompt is composed of text + image(s)). If a team decides to add sample images to the prompt, then a maximum number of 2 images should be used per prompt. These images are optional and a team may decide to use text prompts only. To be specific, these image samples should be fixed for all prompts within a topic (a team must use the same 1 or 2 image samples as part of the prompt for all generated images within a topic). For example, given 150 topics, the maximum number of image prompts that a team may use will be 300 images.
Image content should be free from offensive visuals or inappropriate remarks. Prohibited images include, but are not limited to, any image with explicit nudity, sexual content, hate symbols, child exploitation, or non-consensual images. NIST has the right to exclude any image or whole runs if it determines the content to be inappropriate for the general public. The GenAI team will have a process to automatically detect image content and any flagged images will be reviewed by a team of NIST personnel for final recommendation.
Each run should include high-level metadata to characterize the generator system as requested by the below run format and DTD file. As explained in the DTD file, teams need to provide some required information/parameters, such as:

teamName: The name of the team as registered on the NIST GenAI website
trainingData: Name of training dataset or collection of different datasets or source data
priority: The priority of the submitted run (the lower number, the higher the priority). For any required manual review of submissions, NIST may need to limit effort to only the highest priority runs. Priority should be an integer with a range between 1 to 10
trained: A boolean (T or F) to indicate if the run was the output of a trained system by the team specifically for this task (T) or the output of an already existing system that the team used to generate the outputs (F)
desc: A high-level description of the system that generated this run
link: A link to the model used to generate the run (e.g. GitHub, etc)
topic: The topic id (the “num” field in the topic xml file)
elapsedTime:The processing time of the model per topic to generate the image after the topic was presented to it.
usedImagePrompts: A boolean (T or F) to indicate if a topic has used (T) sample images (up to 2 images) as part of the prompt in addition to text, or did not use any images (F).
Please use the image file naming convention as demonstrated in the example below (eg. topic.1.image.1.webp, topic.1.image.2.webp, etc) to save and transmit your image files.
prompt: The text description (prompt) used by the system to generate the corresponding image.
NIST-prompt:A boolean to indicate if the generated image was based on the provided based on the official prompt (T), or based on a modified prompt by the team (F)

Example of a sample run:

                                &
                                    lt;!DOCTYPE GeneratorResults SYSTEM "GeneratorResults.dtd">
                                    <GeneratorResults teamName="participant_1" >
                                    <GeneratorRunResult trainingData="OpenAI" priority="1" trained="T" desc="This run uses the top secret x-component" link="TBD">

                                    <GeneratorTopicResult topic="1" elapsedTime="5" usedImagePrompts="F">
                                    <Image filename="topic.1.image.1.webp" prompt="prompt_NIST_text_description" NIST-prompt="T">
                                    <Image filename="topic.1.image.2.webp" prompt="fixed_prompt_1" NIST-prompt="T">
                                    <Image filename="topic.1.image.3.webp" prompt="fixed_prompt_1" NIST-prompt="F">
                                    <!-- ... -->
                                    <!-- ... -->
                                    </GeneratorTopicResult>
                                
                                    <GeneratorTopicResult topic="2" elapsedTime="5" usedImagePrompts="T">
                                    <Image filename="topic.2.image.1.webp" prompt="prompt_NIST_text_description" NIST-prompt="T">
                                    <Image filename="topic.2.image.2.webp" prompt="fixed_prompt_2" NIST-prompt="F">
                                    <Image filename="topic.2.image.3.webp" prompt="fixed_prompt_2" NIST-prompt="F">
                                    <!-- ... -->
                                    <!-- ... -->
                                    </GeneratorTopicResult>
                                    <!-- ... -->

                                    <GeneratorTopicResult topic="150" elapsedTime="150" usedImagePrompts="T">
                                    <Image filename="topic.150.image.1.webp" prompt="prompt_NIST_text_description" NIST-prompt="T">
                                    <Image filename="topic.150.image.2.webp" prompt="fixed_prompt_150" NIST-prompt="F">
                                    <Image filename="topic.150.image.3.webp" prompt="fixed_prompt_150" NIST-prompt="F">
                                    <!-- ... -->
                                    <!-- ... -->
                                    </GeneratorTopicResult>
                                    </GeneratorRunResult>
                                    </GeneratorResults>

Generator Data submission validation

NIST will provide, prior to submission dates, a validator script to participants to validate their output XML files format as well as content specific to the task guidelines (e.g. topic ids, empty required attributes, etc). All generator teams should validate their runs before submitting them to NIST. Example of available DTD validators (via a shell script): xmllint --valid simple_sample.xml

class=">notes: according to the published schedule , the submission page (form) will be open and available (via the GenAI website) for teams to submit their data outputs. Please make sure to follow the schedule and submit on time as extending the submission dates may not be possible.
Upon submission, NIST will validate the data outputs uploaded and report any errors to the submitter.
Please take into consideration that submitting your data outputs indicates and assumes your agreement to the code of conduct & Rules of Behavior .

GenAI: Image Challenge

Overview

What

Who

How

Task Coordinator

Schedule

GenAI Image Challenge Participation Rules

GenAI Image Challenge Evaluation Rules

Resources

Image Discriminators Overview

Image Discriminator Instructions

System Input File

System Output File

Validation

Submission

For the best user experience with the scoreboard, please use Google Chrome.

coming soon..

Image Generators Overview

Data Generation Instructions

Submission Guidelines

Example of a sample run:

Generator Data submission validation

For the best user experience with the scoreboard, please use Google Chrome.