Skip to main content

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

GenAI: Text-to-Text (T2T)

Evaluating generators and discriminators for AI-generated text vs human-written text.

Overview

NIST GenAI T2T is an evaluation series that supports research in Generative AI Text-to-Text modality. Which generative AI models are capable of producing synthetic content that can deceive the best discriminators as well as humans? The performance of generative AI models can be measured by (a) humans and (b) discriminative AI models. To evaluate the "best" generative AI models, we need the most competent humans and discriminators. The most proficient discriminators are those that possess the highest accuracy in detecting the "best" generative AI models. Therefore, it is crucial to evaluate both generative AI models (generators) and discriminative AI models (discriminators).


What

The Text-to-Text Generators (T2T-G) task is to automatically generate high-quality summaries given a statement of information needed ("topic") and a set of source documents to summarize. For more details, please see the generator data specification.

The Text-to-Text Discriminators (T2T-D) task is to detect if a target output summary has been generated using a Generative AI system or a Human. For more details, please see the discriminator evaluation plan.


Who

We welcome and encourage teams from academia, industry, and other research labs to contribute to Generative AI research through the GenAI platform. The platform is designed to support various modalities and technologies, including both "Generators" and "Discriminators".
Generators will supplement the evaluation test material with their own AI-generated content based on the given task (e.g., automatic summarization of documents). These participants will use cutting-edge tools and techniques to create synthetic content. By incorporating this data into our test material, our test sets will evolve in pace with technology advancements. In the GenAI pilot, generators do “well” when their synthetic content is not detected by humans or AI discriminators.
Discriminators are automatic algorithms identifying whether a piece of media (text, audio, image, video, code) originated from generative AI or a human. In the GenAI pilot, discriminators do “well” when correctly categorizing the test material produced by AI or Humans.


How

To take part in the GenAI evaluations you need to register on this website and complete the data usage agreement and the data transfer agreement to download/upload the data. NIST will make all necessary data resources available to the generator and discriminator participants. Each team will receive access to data resources upon completion of all needed data agreement forms and based on the published schedule of each task data release date. Please refer to the published schedule for data release dates. Once your system is functional, you will be able to upload your data (generators) or system outputs (discriminators) to the challenge website and see your results displayed on the leaderboard.


Task Coordinator

If you have any questions, please email to the NIST GenAI team

Schedule

Date

Generators (G)

Discriminators (D)

April 15, 2024

Data Specification available

Evaluation Plan available

May 1, 2024

Registration period opens

Registration period opens

June 3, 2024

NIST source article data available

Test set-1: NIST pilot set-1 available

July 12, 2024

Registration closes

Registration closes

August 16, 2024

Round-1 data submission deadline

System output submission deadline on the test set-1 (Leaderboard)

September 2, 2024
September 16, 2024

G-Scorer results for the Round-1 data available (Leaderboard)

Test set-2: NIST pilot set-2 + G-participant round-1 data available

November 1, 2024
November 6, 2024

Round-2 data submission deadline

System output submission deadline on the testset-2 (Leaderboard)

November 13, 2024

G-Scorer results for the Round-2 data available (Leaderboard)

Test set-3: NIST pilot set-3 + G-participant round-2 data available

January 10, 2025
January 27, 2025

System output submission deadline on the test set-3 (Leaderboard)

January 2025

Close

February 2025

Results release for both G and D

March 2025

GenAI pilot evaluation workshop

GenAI T2T Evaluation Rules (Updated: 5/15/2024)

  • Participation in the GenAI evaluation program is voluntary and open to all who find the task of interest and are willing and able to abide by the rules of the evaluation. To fully participate, a registered site must:
    • become familiar with and abide by all evaluation rules;
    • develop/enhance an algorithm that can process the required evaluation datasets;
    • submit the necessary files to NIST for scoring; and
    • attend the evaluation workshop (if one occurs) and openly discuss the algorithm and related research with other evaluation participants and the evaluation coordinators.
  • Participants are free to publish results for their own system but must NOT publicly compare their results with other participants (ranking, score differences, etc.) without explicit written consent from the other participants and NIST.
  • While participants may report their own results, participants may NOT make advertising claims about their standing in the evaluation, regardless of rank, winning the evaluation, or claim NIST endorsement of their system(s). The following language in the U.S. Code of Federal Regulations (15 C.F.R. § 200.113(d)) shall be respected: NIST does not approve, recommend, or endorse any proprietary product or proprietary material. No reference shall be made to NIST or to reports or results furnished by NIST in any advertising or sales promotion which would indicate or imply that NIST approves, recommends, or endorses any proprietary product or proprietary material or which has as its purpose an intent to cause directly or indirectly the advertised product to be used or purchased because of NIST test reports or results.
  • At the conclusion of the evaluation, NIST may generate a report summarizing the system results for conditions of interest. Participants may publish or otherwise disseminate these charts unaltered and with appropriate reference to their source.
  • The challenge participant agrees NOT to use publicly available NIST-released data to train their systems or tune parameters, however, they may use other publicly available data that complies with applicable laws and regulations to train their models.
  • The challenge participant agrees NOT to examine the test data manually or through human means, including analyzing the media and/or training their model on the test data, to draw conclusions from prior to the evaluation period to the end of the leaderboard evaluation.
  • All machine learning or statistical analysis algorithms must complete training, model selection, and tuning prior to running on the test data. This rule does NOT preclude online learning/adaptation during test data processing so long as the adaptation information is NOT reused for subsequent runs of the evaluation collection.
  • The participants agree to make at least one valid submission for participating tasks. Evaluation participants must do so to be included in downloading the next round of datasets.
  • The participants agree to have one or more representatives at the post-evaluation workshop, to present a meaningful description of their system(s). Evaluation participants must do so to be included in future evaluation participation.

T2T Discriminators Overview

The primary goal of the GenAI pilot is to understand system behavior detecting AI-generated vs. human-generated content.

The T2T-D task is a detection task focused on determining if a target output was generated using Generative AI or humans. The T2T-D detection task consists of detecting if a target text summary was generated based on large language models (LLMs)  such as ChatGPT.

For each T2T-D trial consisting of a single summary, the T2T-D detection system must render a confidence score with a higher number indicating a higher likelihood that the target text summary was generated using LLM-based models. The primary metric for measuring detection performance will be the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC), as well as the Equal Error Rate (EER), True Positive Rate (TPR) at a given False Positive Rate (FPR), and Bayes risk for varying tradeoff values for cost of errors.

The GenAI pilot challenge provides data, including dry-run sets and test sets, created by both G-participants and the NIST GenAI team. This allows D-participants to develop and run a system on their own hardware platform. Discriminator participants can then submit their system outputs to a web-based leaderboard, where scores and results are displayed.

The data from G-participants will only be accessible to D-participants once the G-participants submit their data packages to NIST and the NIST GenAI team approves the data. However, NIST will provide pilot data generated by the NIST GenAI team for D-participants to start the development of their systems. NIST reports performance measures for D-participant system outputs, displayed through a leaderboard, using either NIST pilot data or the evolved G-participants data.

Please refer to the discriminator evaluation plan for the details.

Data resources will be available for download once the registration is open and the data release has been announced. NIST will also release GenAI Scorer and Format Validator scripts.

T2T Discriminator Instructions

System Input File

For a given task, a system’s input is the task index file, called <modality_id>_<dataset_id>_<task_id>_index.csv. Given an index file, each row specifies a test trial. Taking the corresponding media (texts or images) as input(s), systems perform detection tasks.

The following format constitutes the index file for the D-participant system input:

genai24_T2T-D_detection_index.csv
DatasetID (string) The ID of the dataset release (e.g., GenAI24-NIST-pilot-T2T-D-set-1)
TaskID (string) The globally unique ID of tasks. Tasks could be summarization,
 generation, translation, question-answering (e.g., detection)
FileID (string) The globally unique ID of the text summary trials (e.g., file_0001.txt)

Example of the CSV file with delimiter “|”.

DatasetID                      | TaskID      | FileID         
GenAI24-NIST-pilot-T2T-D-set-1 | detection   | xxx_000011.txt

System Output File

The system output file must be a CSV file with the separator “|”. Please include, in your filename, the recommended cutoff (threshold) for the confidence score for binary classification. For example, "cutoff-50" means the threshold is 0.5, and confidence scores greater than or equal to 0.5 are classified as "AI" and confidence scores less than 0.5 are classified as "Human" generated content. The filename for the output file must be a user-defined string that identifies the submission with no spaces or special characters besides ‘_-.’ (e.g., `genai24_t2t_d_sys_model-01_cutoff-50.csv`).  

The system output CSV file for the T2T-D detection task must follow the format below:

genai24_t2t_d_sys_model-01.csv
DatasetID (string) The ID of the dataset release, e.g., GenAI24-NIST-pilot-T2T-D-set-1
TaskID (string) The ID of the summary files, e.g., detection
DiscriminatorID (string) The site name of Discriminator (D) participants, e.g. D-NIST_Site
ModelVersion (string) The system model version on D-participant submission (e.g., mySys_GPT4.0)
FileID (string) The globally unique ID of the text summary trials (e.g., file_0001.txt)
ConfidenceScore (float) in the range [0,1], the larger, the more confidence that the output is AI generated

Example of the CSV file with delimiter “|”.

DatasetID                      | TaskID    | DiscriminatorID  | ModelVersion | FileID         |  ConfidenceScore
GenAI24-NIST-pilot-T2T-D-set-1 | detection | D-NIST_Site | MySys_GPT4.0 | file_0001.txt |  0.7

Validation

The FileID column in the system output [submission-file-name].csv must be consistent with the FileID in the <modality_id>_<dataset_id>_<task_id>_index.csv file. The row order may change, but the number of the files and file names from the system output must match to the index file.

To validate system output locally, D-Participants may use the validation script provided by NIST with the dataset release.

Submission

System output submission to NIST for subsequent scoring must be made through the web platform using the submission instructions described above. To prepare your submission, you will first make .tar.gz (or .tgz) file of your system output CSV file via the UNIX command ‘tar zcvf [submission_name].tgz [submission_file_name].csv’ and then upload the system output tar file under a new or existing ‘System’ label. This system label is a longitudinal tracking mechanism that allows you to track improvements to your specific technology over time.

Please ensure timely submission of your files to allow us sufficient time to address any transmission errors before the due date. Note that submissions received after the stated due dates for any reason will be marked late and may not be scored. Please refer to the published schedule for the details.

Please take into consideration that submitting your system outputs indicates and assumes your agreement to the Rules of Behavior.


We emphasize that this is a pilot study (Round-1 Submissions on Testset-1). The primary purpose of the GenAI pilot is to develop an evaluation pipeline between the NIST team and participants. Therefore, we encourage all participants to submit their system output, regardless of their system performance..



T2T Discriminators Pilot Round 1

Updated: 2025-04-10 16:03:11 +0000
SUBMISSIONID TEAM AUC AUC@FPR=0.1 BRIER EER TNR@FNR=0.1 TPR@FPR=0.1
1 804fe 0.441 0.002 0.419 0.528 0.075 0.109
10 804fe 0.953 0.082 0.11 0.089 0.95 0.922
100 6655b 0.554 0.008 0.266 0.411 0.148 0.125
101 6655b 0.544 0.006 0.282 0.48 0.055 0.078
102 6655b 0.537 0.002 0.318 0.5 0.06 0.125
103 6655b 0.533 0.012 0.269 0.508 0.12 0.172
104 87a8c 0.836 0.05 0.21 0.222 0.38 0.578
11 804fe 0.982 0.087 0.073 0.061 0.942 0.953
117 804fe 0.5 0.0 0.615 0.5 0.1 0.1
118 9de37 0.922 0.052 0.187 0.148 0.691 0.812
12 6655b 0.998 0.048 0.023 0.041 0.97 1.0
13 18126 0.458 0.009 0.45 0.569 0.023 0.188
14 18126 0.477 0.014 0.495 0.544 0.04 0.219
15 6655b 0.511 0.002 0.615 0.452 0.08 0.047
16 804fe 0.993 0.069 0.056 0.036 0.975 0.984
17 804fe 0.993 0.069 0.056 0.036 0.975 0.984
18 804fe 0.993 0.069 0.056 0.036 0.975 0.984
19 18126 0.777 0.03 0.256 0.283 0.29 0.5
2 804fe 0.802 0.048 0.31 0.27 0.35 0.562
20 18126 0.884 0.0 0.094 0.116 0.814 0.484
21 18126 0.884 0.0 0.094 0.116 0.814 0.484
25 9de37 0.917 0.052 0.194 0.141 0.64 0.828
26 6fc49 0.93 0.056 0.359 0.153 0.655 0.844
34 b3cd9 0.931 0.035 0.066 0.069 0.928 0.939
36 9de37 0.902 0.068 0.196 0.133 0.605 0.75
37 9de37 0.922 0.052 0.187 0.148 0.691 0.812
38 9de37 0.922 0.052 0.187 0.148 0.691 0.812
4 29d48 1.0 0.1 0.202 0.0 1.0 1.0
40 d718e 0.825 0.036 0.321 0.25 0.565 0.547
43 18126 0.884 0.0 0.094 0.116 0.814 0.484
44 18126 0.478 0.0 0.421 0.522 0.08 0.095
46 993ad 0.0 0.0 1.0
47 18126 0.81 0.002 0.192 0.242 0.357 0.016
49 18126 0.834 0.0 0.131 0.166 0.721 0.323
5 804fe 0.656 0.0 0.423 0.344 0.146 0.381
50 993ad 0.471 0.006 0.3 0.528 0.018 0.078
52 993ad 0.559 0.008 0.43 0.431 0.139 0.125
54 6655b 0.998 0.048 0.023 0.041 0.97 1.0
59 6655b 0.522 0.009 0.268 0.536 0.12 0.109
6 804fe 0.917 0.062 0.228 0.125 0.755 0.734
60 6655b 0.998 0.048 0.023 0.041 0.97 1.0
68 18126 0.62 0.0 0.348 0.38 0.167 0.16
7 0dea0 0.522 0.012 0.334 0.52 0.06 0.203
72 6655b 0.522 0.009 0.268 0.536 0.12 0.109
73 6655b 0.485 0.0 0.336 0.519 0.106 0.074
74 6655b 0.998 0.048 0.023 0.041 0.97 1.0
76 87a8c 0.575 0.014 0.403 0.48 0.202 0.188
77 87a8c 0.58 0.012 0.507 0.48 0.18 0.219
78 87a8c 0.367 0.0 0.446 0.612 0.105 0.016
8 0dea0 0.97 0.081 0.037 0.048 0.975 0.984
81 804fe 0.994 0.046 0.056 0.048 0.984 0.984
82 804fe 0.89 0.035 0.126 0.162 0.768 0.391
83 804fe 0.988 0.066 0.06 0.056 0.957 0.969
84 804fe 0.994 0.046 0.056 0.048 0.984 0.984
85 804fe 0.988 0.065 0.06 0.084 0.945 0.984
86 804fe 0.993 0.069 0.057 0.048 0.974 0.984
87 804fe 0.993 0.093 0.057 0.048 0.974 0.969
9 804fe 0.933 0.062 0.104 0.141 0.855 0.703
92 87a8c 0.867 0.04 0.354 0.242 0.54 0.672
93 87a8c 0.83 0.047 0.307 0.27 0.44 0.594
96 87a8c 0.843 0.038 0.247 0.27 0.47 0.609
98 6fc49 0.774 0.0 0.372 0.219 0.448 0.25
99 6655b 0.554 0.008 0.266 0.411 0.148 0.125

T2T Discriminators Pilot Round 2

Updated: 2025-04-10 16:03:12 +0000
SUBMISSIONID TEAM AUC AUC@FPR=0.1 BRIER EER TNR@FNR=0.1 TPR@FPR=0.1
119 18126 0.617 0.0 0.317 0.388 0.162 0.163
121 9de37 0.801 0.051 0.377 0.254 0.28 0.607
124 6655b 0.925 0.086 0.1 0.126 0.77 0.874
125 9de37 0.846 0.052 0.369 0.232 0.46 0.654
126 9de37 0.857 0.05 0.386 0.23 0.483 0.628
127 18126 0.684 0.006 0.544 0.318 0.159 0.434
128 18126 0.807 0.026 0.275 0.188 0.3 0.711
129 18126 0.804 0.02 0.275 0.192 0.296 0.696
135 18126 0.683 0.044 0.227 0.392 0.06 0.486
136 29d48 0.935 0.079 0.43 0.122 0.834 0.856
137 29d48 0.935 0.079 0.43 0.122 0.834 0.856
138 29d48 0.915 0.072 0.16 0.149 0.67 0.844
139 29d48 0.903 0.063 0.298 0.16 0.64 0.76
140 29d48 0.935 0.079 0.43 0.122 0.834 0.856
141 29d48 0.926 0.077 0.191 0.12 0.778 0.856
142 29d48 0.936 0.081 0.237 0.105 0.83 0.886
143 29d48 0.941 0.084 0.222 0.11 0.8 0.886
144 29d48 0.922 0.064 0.342 0.128 0.68 0.812
163 804fe 0.912 0.073 0.121 0.159 0.67 0.8
164 804fe 0.271 0.003 0.304 0.67 0.0 0.067
165 804fe 0.616 0.023 0.151 0.429 0.109 0.291
166 804fe 0.915 0.074 0.119 0.16 0.68 0.798
167 804fe 0.672 0.023 0.184 0.371 0.162 0.354
168 804fe 0.834 0.033 0.168 0.211 0.36 0.697
169 804fe 0.914 0.074 0.12 0.159 0.706 0.798
170 804fe 0.866 0.053 0.142 0.189 0.495 0.709
171 804fe 0.886 0.045 0.102 0.172 0.63 0.762
172 804fe 0.895 0.062 0.112 0.167 0.69 0.77
173 804fe 0.899 0.062 0.111 0.176 0.64 0.765
174 804fe 0.899 0.062 0.102 0.176 0.64 0.765
175 804fe 0.899 0.062 0.102 0.176 0.64 0.765
176 804fe 0.912 0.073 0.141 0.159 0.67 0.8
177 804fe 0.915 0.065 0.136 0.158 0.677 0.814
178 804fe 0.915 0.065 0.133 0.158 0.677 0.814
179 804fe 0.874 0.042 0.127 0.178 0.61 0.749
180 804fe 0.874 0.042 0.122 0.178 0.61 0.749
181 804fe 0.909 0.06 0.101 0.149 0.725 0.774
182 804fe 0.909 0.06 0.101 0.149 0.725 0.774
183 804fe 0.898 0.066 0.126 0.178 0.59 0.791
184 804fe 0.878 0.055 0.162 0.178 0.55 0.721
185 804fe 0.855 0.059 0.148 0.201 0.41 0.691
205 6fc49 0.62 0.0 0.186 0.38 0.217 0.142
207 18126 0.798 0.022 0.248 0.259 0.48 0.44
208 18126 0.727 0.031 0.373 0.312 0.186 0.456
209 29d48 0.935 0.079 0.476 0.122 0.834 0.856
210 29d48 0.935 0.079 0.464 0.122 0.834 0.856
211 29d48 0.935 0.079 0.452 0.122 0.834 0.856
212 29d48 0.935 0.079 0.441 0.122 0.834 0.856
213 29d48 0.93 0.073 0.134 0.117 0.82 0.87
214 29d48 0.936 0.074 0.12 0.11 0.837 0.884
215 29d48 0.94 0.075 0.134 0.119 0.81 0.881
216 29d48 0.914 0.068 0.263 0.141 0.7 0.802
217 29d48 0.906 0.062 0.324 0.159 0.725 0.819
218 29d48 0.919 0.077 0.478 0.119 0.757 0.849
219 29d48 0.919 0.078 0.456 0.119 0.757 0.851
220 29d48 0.934 0.078 0.465 0.122 0.832 0.856
221 29d48 0.916 0.069 0.27 0.139 0.7 0.835
222 29d48 0.923 0.065 0.246 0.128 0.78 0.854
223 29d48 0.924 0.076 0.224 0.129 0.78 0.854
224 29d48 0.925 0.076 0.206 0.114 0.773 0.856
225 29d48 0.926 0.077 0.191 0.12 0.778 0.856
226 29d48 0.926 0.078 0.178 0.122 0.75 0.86
227 29d48 0.925 0.07 0.169 0.128 0.713 0.856
228 29d48 0.924 0.079 0.163 0.13 0.68 0.854
229 29d48 0.922 0.072 0.16 0.147 0.68 0.844
230 29d48 0.944 0.076 0.377 0.111 0.85 0.888
231 29d48 0.943 0.085 0.33 0.102 0.86 0.893
232 29d48 0.942 0.085 0.288 0.101 0.83 0.893
233 29d48 0.941 0.084 0.252 0.102 0.84 0.886
234 29d48 0.941 0.084 0.222 0.11 0.8 0.886
235 29d48 0.939 0.075 0.198 0.112 0.79 0.884
236 29d48 0.941 0.084 0.222 0.11 0.8 0.886
237 29d48 0.939 0.075 0.198 0.112 0.79 0.884
238 29d48 0.937 0.083 0.18 0.124 0.76 0.877
239 29d48 0.934 0.082 0.167 0.122 0.73 0.87
240 29d48 0.93 0.081 0.161 0.124 0.73 0.865
241 29d48 0.935 0.079 0.42 0.122 0.834 0.856
242 29d48 0.935 0.079 0.409 0.122 0.834 0.856
243 29d48 0.935 0.079 0.425 0.122 0.834 0.856
244 29d48 0.935 0.079 0.414 0.122 0.834 0.856
245 29d48 0.935 0.079 0.405 0.122 0.834 0.856
247 6fc49 0.532 0.0 0.189 0.469 0.158 0.107
248 804fe 0.931 0.076 0.124 0.126 0.725 0.849
249 804fe 0.926 0.075 0.127 0.136 0.74 0.826
252 18126 0.759 0.049 0.154 0.319 0.15 0.56
254 18126 0.795 0.061 0.24 0.265 0.11 0.702
257 804fe 0.944 0.078 0.095 0.141 0.815 0.854
258 804fe 0.937 0.078 0.098 0.139 0.817 0.854
259 804fe 0.944 0.079 0.095 0.14 0.82 0.854
260 804fe 0.944 0.079 0.095 0.141 0.815 0.851
261 b3cd9 0.902 0.012 0.163 0.102 0.533 0.831
262 18126 0.791 0.054 0.257 0.27 0.08 0.693
263 804fe 0.944 0.079 0.095 0.14 0.814 0.851
264 18126 0.742 0.045 0.279 0.282 0.05 0.56
265 18126 0.75 0.054 0.269 0.29 0.026 0.651
266 18126 0.824 0.036 0.201 0.19 0.336 0.558
267 18126 0.922 0.07 0.242 0.132 0.79 0.812
268 18126 0.903 0.069 0.256 0.142 0.62 0.795
269 18126 0.923 0.072 0.279 0.132 0.79 0.812
270 18126 0.921 0.073 0.212 0.137 0.785 0.854
271 18126 0.922 0.072 0.204 0.112 0.815 0.872
272 18126 0.904 0.056 0.284 0.148 0.773 0.781
273 18126 0.923 0.072 0.215 0.118 0.813 0.844
274 18126 0.932 0.065 0.218 0.114 0.836 0.879
275 18126 0.933 0.066 0.143 0.11 0.86 0.891
276 18126 0.91 0.062 0.175 0.151 0.648 0.835
278 18126 0.931 0.073 0.146 0.11 0.877 0.87
279 18126 0.935 0.077 0.145 0.098 0.88 0.886
280 18126 0.934 0.066 0.145 0.107 0.89 0.895
281 18126 0.907 0.066 0.161 0.151 0.68 0.788
282 18126 0.94 0.062 0.134 0.114 0.87 0.893
283 18126 0.937 0.079 0.135 0.108 0.86 0.881
284 18126 0.941 0.081 0.127 0.102 0.902 0.895
285 18126 0.937 0.08 0.132 0.104 0.867 0.874
286 18126 0.95 0.081 0.143 0.092 0.941 0.916
287 18126 0.948 0.081 0.134 0.089 0.93 0.912
288 18126 0.951 0.082 0.138 0.083 0.946 0.914
289 18126 0.94 0.081 0.142 0.098 0.914 0.905
290 18126 0.954 0.082 0.152 0.092 0.916 0.926
291 18126 0.951 0.082 0.143 0.098 0.908 0.905
292 18126 0.949 0.072 0.144 0.087 0.91 0.916
293 18126 0.95 0.082 0.147 0.096 0.91 0.893
294 18126 0.944 0.065 0.133 0.08 0.947 0.921
295 18126 0.951 0.082 0.162 0.116 0.882 0.884
296 29d48 0.852 0.033 0.537 0.19 0.64 0.551
297 29d48 0.885 0.033 0.536 0.152 0.822 0.633
302 87a8c 0.648 0.032 0.199 0.41 0.1 0.421
303 87a8c 0.67 0.033 0.327 0.37 0.064 0.465
304 87a8c 0.647 0.034 0.306 0.4 0.068 0.43
305 87a8c 0.647 0.034 0.424 0.401 0.07 0.426
307 18126 0.916 0.046 0.14 0.088 0.911 0.914
308 87a8c 0.647 0.034 0.424 0.401 0.07 0.426
309 18126 0.946 0.082 0.143 0.096 0.94 0.907
310 87a8c 0.647 0.034 0.424 0.401 0.07 0.426
311 18126 0.951 0.082 0.138 0.083 0.946 0.914
313 18126 0.942 0.08 0.152 0.102 0.884 0.891
314 18126 0.937 0.08 0.15 0.102 0.898 0.895
315 18126 0.926 0.078 0.162 0.119 0.84 0.874
316 18126 0.946 0.078 0.162 0.118 0.885 0.867
317 804fe 0.836 0.046 0.249 0.241 0.5 0.565
318 804fe 0.942 0.078 0.096 0.132 0.8 0.854
319 804fe 0.942 0.078 0.096 0.139 0.81 0.854
320 804fe 0.941 0.076 0.095 0.132 0.81 0.844
321 804fe 0.897 0.077 0.194 0.166 0.522 0.828
322 804fe 0.942 0.069 0.094 0.139 0.813 0.856
323 804fe 0.942 0.078 0.096 0.13 0.81 0.846
324 804fe 0.858 0.052 0.215 0.21 0.53 0.598
325 804fe 0.944 0.07 0.095 0.13 0.82 0.856

T2T Discriminators Pilot Round 3

Updated: 2025-04-10 16:03:12 +0000
SUBMISSIONID TEAM AUC AUC@FPR=0.1 BRIER EER TNR@FNR=0.1 TPR@FPR=0.1
327 9de37 0.974 0.09 0.313 0.078 0.961 0.938
328 18126 0.972 0.082 0.116 0.053 0.958 0.959
329 804fe 0.98 0.092 0.041 0.067 0.978 0.944
330 29d48 0.974 0.09 0.313 0.078 0.961 0.938
335 87a8c 0.801 0.04 0.15 0.256 0.32 0.608
336 6655b 0.978 0.09 0.039 0.044 1.0 0.963
337 87a8c 0.811 0.045 0.12 0.254 0.348 0.584
338 18126 0.968 0.083 0.096 0.054 0.956 0.958
339 18126 0.95 0.074 0.097 0.096 0.906 0.904
340 18126 0.97 0.082 0.111 0.067 0.958 0.951
341 18126 0.971 0.077 0.115 0.053 0.957 0.961
342 18126 0.97 0.084 0.098 0.052 0.961 0.963
343 18126 0.972 0.082 0.112 0.051 0.96 0.961
344 18126 0.97 0.081 0.116 0.055 0.952 0.957
345 18126 0.972 0.082 0.111 0.055 0.955 0.964
346 29d48 0.708 0.028 0.62 0.345 0.215 0.422
347 29d48 0.974 0.09 0.313 0.078 0.961 0.938
348 29d48 0.919 0.071 0.257 0.161 0.691 0.775
349 29d48 0.943 0.063 0.254 0.121 0.849 0.865
350 29d48 0.942 0.073 0.246 0.118 0.857 0.85
351 29d48 0.975 0.088 0.057 0.056 0.998 0.95
352 29d48 0.975 0.088 0.057 0.056 0.998 0.95
353 29d48 0.978 0.093 0.102 0.051 0.988 0.965
354 29d48 0.797 0.036 0.264 0.294 0.405 0.477
355 29d48 0.955 0.075 0.161 0.099 0.906 0.902
356 29d48 0.964 0.077 0.265 0.092 0.912 0.919
357 29d48 0.973 0.088 0.192 0.055 0.967 0.96
358 29d48 0.983 0.084 0.121 0.045 0.996 0.965
359 29d48 0.981 0.095 0.139 0.044 0.994 0.967
360 29d48 0.981 0.095 0.139 0.044 0.994 0.967
361 29d48 0.919 0.071 0.257 0.161 0.691 0.775
362 29d48 0.974 0.09 0.107 0.056 0.98 0.953
363 29d48 0.968 0.086 0.209 0.084 0.94 0.931
364 29d48 0.974 0.09 0.107 0.056 0.98 0.953
365 29d48 0.974 0.09 0.107 0.056 0.98 0.953
366 29d48 0.976 0.086 0.13 0.05 0.981 0.963
367 29d48 0.899 0.056 0.242 0.184 0.702 0.736
368 29d48 0.951 0.072 0.197 0.1 0.896 0.896
369 29d48 0.971 0.063 0.102 0.057 0.975 0.957
370 29d48 0.968 0.086 0.127 0.061 0.962 0.958
371 29d48 0.979 0.071 0.15 0.042 0.984 0.963
372 29d48 0.978 0.086 0.144 0.049 0.983 0.964
373 29d48 0.939 0.073 0.247 0.123 0.826 0.859
374 29d48 0.962 0.081 0.217 0.088 0.927 0.926
375 29d48 0.977 0.088 0.142 0.045 0.982 0.966
376 29d48 0.976 0.086 0.158 0.05 0.973 0.965
377 29d48 0.969 0.074 0.136 0.055 0.964 0.961
378 29d48 0.969 0.074 0.136 0.055 0.964 0.961
379 29d48 0.971 0.087 0.14 0.055 0.969 0.964
380 29d48 0.974 0.09 0.162 0.055 0.968 0.967
381 29d48 0.974 0.09 0.162 0.055 0.968 0.967
382 18126 0.971 0.083 0.102 0.062 0.954 0.962
383 18126 0.971 0.083 0.104 0.064 0.953 0.961
384 18126 0.97 0.078 0.098 0.054 0.956 0.963
385 18126 0.97 0.084 0.096 0.051 0.961 0.965
386 18126 0.971 0.082 0.114 0.056 0.956 0.962
387 18126 0.971 0.083 0.104 0.062 0.95 0.961
388 18126 0.975 0.085 0.098 0.053 0.961 0.959
389 18126 0.975 0.085 0.097 0.053 0.961 0.96
390 18126 0.97 0.08 0.096 0.062 0.956 0.96
391 18126 0.975 0.085 0.097 0.053 0.961 0.96
392 18126 0.97 0.085 0.1 0.055 0.961 0.957
393 18126 0.972 0.085 0.099 0.054 0.961 0.959
394 18126 0.976 0.086 0.089 0.051 0.96 0.961
395 18126 0.972 0.082 0.116 0.053 0.958 0.959
396 18126 0.976 0.086 0.089 0.051 0.96 0.961
397 18126 0.842 0.002 0.278 0.158 0.321 0.719
398 18126 0.969 0.078 0.098 0.056 0.954 0.959
399 18126 0.969 0.084 0.113 0.07 0.957 0.93
400 18126 0.972 0.082 0.111 0.061 0.954 0.961
401 18126 0.977 0.087 0.088 0.06 0.958 0.964
402 18126 0.965 0.081 0.122 0.057 0.956 0.953
403 18126 0.971 0.082 0.117 0.054 0.958 0.957
405 18126 0.971 0.082 0.117 0.054 0.958 0.957
407 18126 0.972 0.081 0.113 0.055 0.958 0.959
409 b3cd9 0.966 0.018 0.064 0.042 0.994 0.943
410 18126 0.968 0.081 0.155 0.09 0.911 0.932
411 18126 0.897 0.079 0.152 0.112 0.771 0.88
412 18126 0.784 0.049 0.564 0.299 0.301 0.541
413 18126 0.931 0.08 0.131 0.088 0.956 0.916
414 18126 0.964 0.082 0.12 0.06 0.957 0.951
415 18126 0.975 0.087 0.09 0.06 0.958 0.963
416 18126 0.97 0.082 0.131 0.072 0.934 0.951
417 804fe 0.95 0.081 0.202 0.114 0.883 0.886
418 804fe 0.7 0.027 0.243 0.345 0.117 0.458
419 804fe 0.701 0.034 0.24 0.335 0.111 0.444
420 804fe 0.98 0.092 0.046 0.072 0.989 0.936
421 804fe 0.98 0.092 0.046 0.067 0.983 0.94
422 18126 0.971 0.081 0.128 0.061 0.954 0.96
423 18126 0.976 0.086 0.09 0.061 0.958 0.962
424 18126 0.971 0.082 0.121 0.057 0.95 0.965
425 18126 0.943 0.084 0.116 0.076 0.946 0.931
426 18126 0.5 0.0 0.112 0.483 0.072 0.127
427 18126 0.978 0.087 0.084 0.055 0.96 0.967
428 18126 0.948 0.056 0.074 0.074 0.92 0.96
429 18126 0.977 0.087 0.088 0.06 0.958 0.964
430 18126 0.948 0.056 0.074 0.074 0.92 0.96
431 6655b 0.978 0.09 0.039 0.044 1.0 0.963
435 6655b 0.638 0.026 0.507 0.412 0.139 0.349
436 6655b 0.632 0.022 0.512 0.412 0.14 0.306
437 6655b 0.639 0.026 0.522 0.412 0.139 0.373
438 6655b 0.632 0.022 0.542 0.412 0.139 0.345
439 6655b 0.785 0.072 0.892 0.226 0.0 0.769
440 18126 0.967 0.082 0.117 0.054 0.958 0.957
441 6655b 0.528 0.005 0.412 0.478 0.12 0.089
442 18126 0.966 0.082 0.119 0.056 0.957 0.955
443 18126 0.973 0.082 0.113 0.058 0.959 0.962
444 18126 0.96 0.069 0.079 0.111 0.886 0.791
445 18126 0.979 0.088 0.082 0.054 0.96 0.969
446 18126 0.979 0.088 0.098 0.054 0.96 0.969
447 804fe 0.98 0.092 0.046 0.067 0.978 0.936
448 18126 0.971 0.086 0.081 0.057 0.964 0.96
449 804fe 0.921 0.078 0.093 0.127 0.731 0.852
450 18126 0.971 0.085 0.077 0.06 0.971 0.951
451 18126 0.973 0.086 0.093 0.055 0.963 0.969
452 18126 0.962 0.08 0.104 0.088 0.92 0.915
453 804fe 0.921 0.0 0.058 0.079 0.9 0.898
454 18126 0.973 0.085 0.103 0.055 0.967 0.965
455 804fe 0.98 0.092 0.046 0.072 0.989 0.936
456 804fe 0.981 0.092 0.047 0.066 0.98 0.947
457 804fe 0.928 0.042 0.059 0.072 0.915 0.945
458 804fe 0.981 0.087 0.049 0.066 0.983 0.952
459 804fe 0.982 0.092 0.047 0.066 0.978 0.952
460 804fe 0.982 0.092 0.047 0.066 0.978 0.952
461 18126 0.971 0.086 0.083 0.062 0.97 0.947
462 18126 0.98 0.091 0.055 0.046 0.972 0.972
463 18126 0.91 0.075 0.141 0.157 0.691 0.806
464 18126 0.982 0.091 0.059 0.039 0.973 0.976
465 18126 0.975 0.06 0.1 0.054 0.97 0.967
466 18126 0.981 0.064 0.068 0.049 0.98 0.973
467 18126 0.974 0.087 0.085 0.057 0.961 0.967
468 18126 0.982 0.091 0.058 0.051 0.974 0.975
469 18126 0.973 0.081 0.09 0.05 0.964 0.968
470 18126 0.974 0.077 0.087 0.044 0.967 0.971
471 18126 0.973 0.08 0.099 0.051 0.967 0.967
473 804fe 0.981 0.092 0.047 0.061 0.972 0.952
474 18126 0.903 0.075 0.11 0.164 0.634 0.797
475 804fe 0.982 0.092 0.046 0.061 0.978 0.948
476 804fe 0.981 0.092 0.046 0.067 0.978 0.947
477 804fe 0.981 0.092 0.046 0.07 0.978 0.949
478 804fe 0.982 0.092 0.046 0.067 0.978 0.949
479 804fe 0.982 0.092 0.046 0.062 0.979 0.945
480 804fe 0.98 0.092 0.048 0.067 0.983 0.948
481 804fe 0.98 0.092 0.048 0.067 0.983 0.948
482 804fe 0.981 0.092 0.046 0.067 0.976 0.949
483 804fe 0.981 0.092 0.046 0.067 0.972 0.949
484 804fe 0.982 0.082 0.046 0.066 0.978 0.949
485 804fe 0.982 0.092 0.046 0.066 0.983 0.953
486 804fe 0.981 0.092 0.046 0.067 0.978 0.949
487 804fe 0.983 0.087 0.047 0.062 0.983 0.953
488 804fe 0.979 0.09 0.06 0.066 0.956 0.95
489 804fe 0.982 0.087 0.049 0.062 0.983 0.951
490 804fe 0.982 0.092 0.048 0.06 0.983 0.953
491 804fe 0.983 0.087 0.047 0.062 0.983 0.953
492 18126 0.31 0.002 0.23 0.644 0.007 0.05
493 18126 0.976 0.091 0.064 0.048 0.972 0.97
494 18126 0.969 0.09 0.07 0.051 0.97 0.963
495 18126 0.973 0.083 0.046 0.052 0.966 0.957
496 18126 0.961 0.075 0.068 0.072 0.954 0.942
497 18126 0.974 0.086 0.102 0.055 0.968 0.969
498 18126 0.982 0.091 0.067 0.045 0.974 0.976
499 18126 0.977 0.091 0.062 0.04 0.972 0.973
500 18126 0.978 0.091 0.062 0.04 0.972 0.972
501 18126 0.982 0.091 0.072 0.045 0.974 0.979
502 18126 0.982 0.091 0.059 0.039 0.973 0.976
503 18126 0.98 0.091 0.061 0.04 0.972 0.974
504 18126 0.982 0.091 0.059 0.039 0.973 0.977
505 18126 0.982 0.091 0.059 0.039 0.973 0.977
506 18126 0.967 0.082 0.063 0.068 0.944 0.955
507 18126 0.982 0.091 0.059 0.039 0.973 0.976
508 18126 0.983 0.091 0.058 0.038 0.973 0.977
509 18126 0.983 0.091 0.058 0.038 0.973 0.977
510 18126 0.982 0.091 0.059 0.039 0.973 0.977
511 18126 0.983 0.091 0.058 0.038 0.973 0.977
512 18126 0.983 0.092 0.057 0.038 0.973 0.977
513 18126 0.981 0.091 0.06 0.039 0.972 0.975
514 18126 0.983 0.092 0.057 0.037 0.974 0.978
515 18126 0.984 0.092 0.056 0.037 0.974 0.979
516 18126 0.983 0.092 0.057 0.038 0.974 0.977
517 18126 0.984 0.092 0.055 0.037 0.974 0.979
coming soon..

T2T Generators Overview

The primary goal of the GenAI pilot is to understand system behavior detecting AI-generated vs human-generated content.

The T2T-G task for the generative AI models is: given a topic and a set of about 25 relevant documents, create from the documents a brief, well-organized, fluent summary that answers the need for information expressed in the topic statement. Participants should assume that the target audience of the summary is a supervisory information analyst who needs the summary to inform decision-making.

  • All processing of documents and generation of summaries must be automatic.
  • The summary can be no longer than 250 words (whitespace-delimited tokens).
  • Summaries over the size limit will be truncated.
  • No bonus will be given for creating a shorter summary.
  • No specific formatting other than linear is allowed.

There will be about 45 topics in the test data for generator teams. This set of summaries from all generator teams will serve as the testing data for discriminator teams, who will work on detecting whether the written content is human-generated or AI-generated.

The summary output will be evaluated by determining how easy or difficult it is to discriminate AI-generated summaries from human-generated summaries, i.e., the goal of generators is to output a summary that is indistinguishable from human-generated summaries.  

For more information and details about the task specifics for generator teams, please refer to the generator data specification.

Data Generation Instructions

NIST human assessors developed topics of interest. Each assessor created a topic and chose a set of 25 relevant documents. The testing dataset documents will come from a corpus comprising a set of newswire articles. NIST will distribute a subset of topics and relevant documents.

Only T2T generator participants who have completed and submitted all required data agreement forms will be allowed access. As the example below shows, each topic includes an id (num), title, and the required topic statement (narr). The “docs” tag indicates the source relevant documents to be used when generating the required summaries. Please check the published schedule for testing data release dates.

Example of topic:
<topic>
  <num> topic_0001 </num>
  <title> North Medical Center  </title>
  
  <narr>
  Describe the activities of John Smith and the North Medical Center. 
  </narr>
  
  <docs>
  article_0000
  article_0001
  article_0002
  </docs>
</topic>

Submission Guidelines

  • Each team may submit up to 5 runs for a data generation package. Each run should include one summary per topic.
  • Each run should contain summaries for all topics (a run can not skip a topic or submit summaries for a subset of the topics). Please refer to the generator data specification for summary generation instructions.
  • Summary content should be free from offensive text or inappropriate remarks. NIST has the right to exclude any summary or whole runs if the content proves to be inappropriate for the general public.
  • Each run should include high-level metadata to characterize the generator system as requested by the below run format and DTD file. As explained in the DTD file, teams need to provide some required information/parameters, such as:
    • trainingData: Name of training dataset or collection of different datasets or source data
    • teamName: The name of the team as registered on the NIST GenAI website
    • priority: The priority of the submitted run (the lower number, the higher the priority). For any required manual review of submissions, NIST may need to limit effort to only the highest priority runs.
    • trained: A boolean (T or F) to indicate if the run was the output of a trained system by the team specifically for this task (T) or the output of an already existing system that the team used to generate the outputs (F)
    • desc: A high-level description of the system that generated this run
    • link: A link to the model used to generate the run (e.g. GitHub, etc)
    • topic: The topic id (the “num” field in the topic XML file
    • elapsedTime: The processing time of the model (with hardware specs) to generate the summary after the topic and documents were given to it.
Example of a sample run:
<!DOCTYPE GeneratorResults SYSTEM "GeneratorResult.dtd"> 
<GeneratorResults teamName="ExampleTeam">
  <GeneratorRunResult trainingData="OpenAI" version="1.0"
      priority="1" trained="T" 
      desc="Short description about generation approach." 
      link="https://hyperlink_to_document_source (if available)">

    <GeneratorTopicResult topic="topic_0001" elapsedTime="5">
    this is a 250-word summary of topic_0001
    </GeneratorTopicResult>

    <GeneratorTopicResult topic="topic_0002" elapsedTime="5">
    this is a 250-word summary of “topic_0002"
    </GeneratorTopicResult>

    <!-- ... -->
    <GeneratorTopicResult topic="topic_0003" elapsedTime="5">
    this is a 250-word summary of topic_0003
    </GeneratorTopicResult>

  </GeneratorRunResult>
</GeneratorResults>

Generator Data submission validation

  • NIST will provide, prior to submission dates, a validator script to participants to validate their output XML files format as well as content specific to the task guidelines (e.g. topic ids, empty required attributes, etc). All generator teams should validate their runs before submitting them to NIST.
  • Submission notes: according to the published schedule, the submission page (form) will be open and available (via the GenAI website) for teams to submit their data outputs. Please make sure to follow the schedule and submit on time as extending the submission dates may not be possible.
  • Upon submission, NIST will validate the data outputs uploaded and report any errors to the submitter.
  • Please take into consideration that submitting your data outputs indicates and assumes your agreement to the Rules of Behavior .

Note:

  • AUC_DETECTOR_1 & AUC_DETECTOR_2 scores close to 0.5 represent better performance.
  • Higher BRIER_DETECTOR_1 & BRIER_DETECTOR_2 scores represent better performance.
Detector_1 and Detector_2 are baseline detectors, not D- participants. Baseline detectors are not included in Round 2 D-submissions.




T2T Generators Pilot Round 1

Updated: 2025-04-10 18:32:24 +0000
SUBMISSIONID TEAM AUC_DETECTOR_1 AUC_DETECTOR_2 BRIER_DETECTOR_1 BRIER_DETECTOR_2
108 87a8c 0.768 0.682 0.271 0.183
109 87a8c 0.722 0.663 0.268 0.187
110 87a8c 0.808 0.772 0.249 0.168
111 87a8c 0.495 0.953 0.324 0.095
112 87a8c 0.84 0.965 0.248 0.083
113 87a8c 0.612 0.945 0.288 0.074
114 aa872 0.682 1.0 0.284 0.071
115 aa872 0.813 1.0 0.244 0.131
116 aa872 0.948 1.0 0.21 0.085
148 9de37 0.856 0.993 0.226 0.039
22 6fc49 0.75 1.0 0.257 0.109
23 804fe 0.325 1.0 0.376 0.132
24 804fe 0.325 1.0 0.376 0.132
29 0782f 0.872 0.99 0.229 0.149
30 0782f 0.79 0.988 0.264 0.15
31 0782f 0.862 0.995 0.246 0.162
32 0782f 0.88 0.99 0.227 0.156
33 0782f 0.92 0.992 0.226 0.151
53 804fe 0.69 0.915 0.292 0.19
55 804fe 0.69 0.915 0.292 0.19
58 0dea0 0.368 0.962 0.343 0.164
65 87a8c 0.905 0.918 0.226 0.107
66 87a8c 0.812 0.903 0.245 0.105
67 87a8c 0.035 1.0 0.409 0.07
69 87a8c 0.835 0.895 0.247 0.135
70 87a8c 0.735 0.928 0.265 0.107
75 87a8c 0.482 0.86 0.336 0.127
80 804fe 0.618 0.355 0.29 0.166
90 6fc49 0.942 1.0 0.21 0.046
95 804fe 0.335 0.965 0.378 0.06

T2T Generators Pilot Round 2

Updated: 2025-04-10 18:32:24 +0000
SUBMISSIONID TEAM AUC_DETECTOR_1 AUC_DETECTOR_2 BRIER_DETECTOR_1 BRIER_DETECTOR_2
146 9de37 0.75 0.989 0.257 0.134
147 9de37 0.75 0.989 0.257 0.134
154 804fe 0.562 0.961 0.336 0.228
162 804fe 0.562 0.961 0.336 0.228
186 6fc49 0.858 1.0 0.2 0.0
187 87a8c 0.123 1.0 0.468 0.111
188 87a8c 0.128 1.0 0.469 0.119
189 87a8c 0.12 1.0 0.471 0.11
190 87a8c 0.124 1.0 0.463 0.112
191 87a8c 0.99 0.996 0.166 0.089
192 87a8c 0.949 1.0 0.181 0.011
193 87a8c 0.976 1.0 0.171 0.032
194 87a8c 0.982 1.0 0.166 0.031
195 87a8c 0.993 1.0 0.166 0.04
196 87a8c 0.995 1.0 0.166 0.025
198 87a8c 0.999 0.998 0.165 0.046
199 87a8c 0.991 0.993 0.166 0.088
200 87a8c 0.995 0.998 0.166 0.066
201 87a8c 1.0 0.997 0.165 0.058
202 87a8c 0.942 0.995 0.179 0.094
206 6fc49 0.744 0.952 0.26 0.213
306 87a8c 0.938 0.999 0.179 0.044
312 87a8c 0.869 0.99 0.206 0.061