Dataset Card for "trivia_qa"
 
 
  Dataset Summary
 
 
  TriviaqQA is a reading comprehension dataset containing over 650K
question-answer-evidence triples. TriviaqQA includes 95K question-answer
pairs authored by trivia enthusiasts and independently gathered evidence
documents, six per question on average, that provide high quality distant
supervision for answering the questions.
 
 
  Supported Tasks and Leaderboards
 
 
  
   More Information Needed
  
 
 
  Languages
 
 
  English.
 
 
  Dataset Structure
 
 
  Data Instances
 
 rc
 
  - 
   Size of downloaded dataset files:
   2.67 GB
  
- 
   Size of the generated dataset:
   16.02 GB
  
- 
   Total amount of disk used:
   18.68 GB
  
  An example of 'train' looks as follows.
 
 rc.nocontext
 
  - 
   Size of downloaded dataset files:
   2.67 GB
  
- 
   Size of the generated dataset:
   126.27 MB
  
- 
   Total amount of disk used:
   2.79 GB
  
  An example of 'train' looks as follows.
 
 unfiltered
 
  - 
   Size of downloaded dataset files:
   3.30 GB
  
- 
   Size of the generated dataset:
   29.24 GB
  
- 
   Total amount of disk used:
   32.54 GB
  
  An example of 'validation' looks as follows.
 
 unfiltered.nocontext
 
  - 
   Size of downloaded dataset files:
   632.55 MB
  
- 
   Size of the generated dataset:
   74.56 MB
  
- 
   Total amount of disk used:
   707.11 MB
  
  An example of 'train' looks as follows.
 
 
  Data Fields
 
 
  The data fields are the same among all splits.
 
 rc
 
  - 
   question
   : a
   string
   feature.
  
- 
   question_id
   : a
   string
   feature.
  
- 
   question_source
   : a
   string
   feature.
  
- 
   entity_pages
   : a dictionary feature containing:
   
    - 
     doc_source
     : a
     string
     feature.
    
- 
     filename
     : a
     string
     feature.
    
- 
     title
     : a
     string
     feature.
    
- 
     wiki_context
     : a
     string
     feature.
    
 
- 
   search_results
   : a dictionary feature containing:
   
    - 
     description
     : a
     string
     feature.
    
- 
     filename
     : a
     string
     feature.
    
- 
     rank
     : a
     int32
     feature.
    
- 
     title
     : a
     string
     feature.
    
- 
     url
     : a
     string
     feature.
    
- 
     search_context
     : a
     string
     feature.
    
 
- 
   aliases
   : a
   list
   of
   string
   features.
  
- 
   normalized_aliases
   : a
   list
   of
   string
   features.
  
- 
   matched_wiki_entity_name
   : a
   string
   feature.
  
- 
   normalized_matched_wiki_entity_name
   : a
   string
   feature.
  
- 
   normalized_value
   : a
   string
   feature.
  
- 
   type
   : a
   string
   feature.
  
- 
   value
   : a
   string
   feature.
  
 rc.nocontext
 
  - 
   question
   : a
   string
   feature.
  
- 
   question_id
   : a
   string
   feature.
  
- 
   question_source
   : a
   string
   feature.
  
- 
   entity_pages
   : a dictionary feature containing:
   
    - 
     doc_source
     : a
     string
     feature.
    
- 
     filename
     : a
     string
     feature.
    
- 
     title
     : a
     string
     feature.
    
- 
     wiki_context
     : a
     string
     feature.
    
 
- 
   search_results
   : a dictionary feature containing:
   
    - 
     description
     : a
     string
     feature.
    
- 
     filename
     : a
     string
     feature.
    
- 
     rank
     : a
     int32
     feature.
    
- 
     title
     : a
     string
     feature.
    
- 
     url
     : a
     string
     feature.
    
- 
     search_context
     : a
     string
     feature.
    
 
- 
   aliases
   : a
   list
   of
   string
   features.
  
- 
   normalized_aliases
   : a
   list
   of
   string
   features.
  
- 
   matched_wiki_entity_name
   : a
   string
   feature.
  
- 
   normalized_matched_wiki_entity_name
   : a
   string
   feature.
  
- 
   normalized_value
   : a
   string
   feature.
  
- 
   type
   : a
   string
   feature.
  
- 
   value
   : a
   string
   feature.
  
 unfiltered
 
  - 
   question
   : a
   string
   feature.
  
- 
   question_id
   : a
   string
   feature.
  
- 
   question_source
   : a
   string
   feature.
  
- 
   entity_pages
   : a dictionary feature containing:
   
    - 
     doc_source
     : a
     string
     feature.
    
- 
     filename
     : a
     string
     feature.
    
- 
     title
     : a
     string
     feature.
    
- 
     wiki_context
     : a
     string
     feature.
    
 
- 
   search_results
   : a dictionary feature containing:
   
    - 
     description
     : a
     string
     feature.
    
- 
     filename
     : a
     string
     feature.
    
- 
     rank
     : a
     int32
     feature.
    
- 
     title
     : a
     string
     feature.
    
- 
     url
     : a
     string
     feature.
    
- 
     search_context
     : a
     string
     feature.
    
 
- 
   aliases
   : a
   list
   of
   string
   features.
  
- 
   normalized_aliases
   : a
   list
   of
   string
   features.
  
- 
   matched_wiki_entity_name
   : a
   string
   feature.
  
- 
   normalized_matched_wiki_entity_name
   : a
   string
   feature.
  
- 
   normalized_value
   : a
   string
   feature.
  
- 
   type
   : a
   string
   feature.
  
- 
   value
   : a
   string
   feature.
  
 unfiltered.nocontext
 
  - 
   question
   : a
   string
   feature.
  
- 
   question_id
   : a
   string
   feature.
  
- 
   question_source
   : a
   string
   feature.
  
- 
   entity_pages
   : a dictionary feature containing:
   
    - 
     doc_source
     : a
     string
     feature.
    
- 
     filename
     : a
     string
     feature.
    
- 
     title
     : a
     string
     feature.
    
- 
     wiki_context
     : a
     string
     feature.
    
 
- 
   search_results
   : a dictionary feature containing:
   
    - 
     description
     : a
     string
     feature.
    
- 
     filename
     : a
     string
     feature.
    
- 
     rank
     : a
     int32
     feature.
    
- 
     title
     : a
     string
     feature.
    
- 
     url
     : a
     string
     feature.
    
- 
     search_context
     : a
     string
     feature.
    
 
- 
   aliases
   : a
   list
   of
   string
   features.
  
- 
   normalized_aliases
   : a
   list
   of
   string
   features.
  
- 
   matched_wiki_entity_name
   : a
   string
   feature.
  
- 
   normalized_matched_wiki_entity_name
   : a
   string
   feature.
  
- 
   normalized_value
   : a
   string
   feature.
  
- 
   type
   : a
   string
   feature.
  
- 
   value
   : a
   string
   feature.
  
  Data Splits
 
 
  
   
    | name | train | validation | test | 
  
  
   
    | rc | 138384 | 18669 | 17210 | 
   
    | rc.nocontext | 138384 | 18669 | 17210 | 
   
    | unfiltered | 87622 | 11313 | 10832 | 
   
    | unfiltered.nocontext | 87622 | 11313 | 10832 | 
  
 
 
  Dataset Creation
 
 
  Curation Rationale
 
 
  
   More Information Needed
  
 
 
  Source Data
 
 Initial Data Collection and Normalization
 
  
   More Information Needed
  
 
 Who are the source language producers?
 
  
   More Information Needed
  
 
 
  Annotations
 
 Annotation process
 
  
   More Information Needed
  
 
 Who are the annotators?
 
  
   More Information Needed
  
 
 
  Personal and Sensitive Information
 
 
  
   More Information Needed
  
 
 
  Considerations for Using the Data
 
 
  Social Impact of Dataset
 
 
  
   More Information Needed
  
 
 
  Discussion of Biases
 
 
  
   More Information Needed
  
 
 
  Other Known Limitations
 
 
  
   More Information Needed
  
 
 
  Additional Information
 
 
  Dataset Curators
 
 
  
   More Information Needed
  
 
 
  Licensing Information
 
 
  The University of Washington does not own the copyright of the questions and documents included in TriviaQA.
 
 
  Citation Information
 
 
@article{2017arXivtriviaqa,
       author = {{Joshi}, Mandar and {Choi}, Eunsol and {Weld},
                 Daniel and {Zettlemoyer}, Luke},
        title = "{triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension}",
      journal = {arXiv e-prints},
         year = 2017,
          eid = {arXiv:1705.03551},
        pages = {arXiv:1705.03551},
archivePrefix = {arXiv},
       eprint = {1705.03551},
}
 
  Contributions
 
 
  Thanks to
  
   @thomwolf
  
  ,
  
   @patrickvonplaten
  
  ,
  
   @lewtun
  
  for adding this dataset.