Dataset Card for [GPT4All-J Prompt Generations]
 
 
  Dataset Description
 
 
  Dataset used to train
  
   GPT4All-J
  
  and
  
   GPT4All-J-LoRA
  
 
 
  We release several versions of datasets
 
 
  - 
   v1.0:
   The original dataset we used to finetune GPT-J on
  
 
  - 
   v1.1-breezy
   : A filtered dataset where we removed all instances of
   AI language model
  
 
  - 
   v1.2-jazzy
   : A filtered dataset where we also removed instances like
   I'm sorry, I can't answer...
   and
   AI language model
  
 
  - 
   v1.3-groovy
   : The v1.2 dataset with ShareGPT and Dolly added with ~8% of semantic duplicates removed from the dataset using
   
    Atlas
   
  
 
 
 
  The dataset defaults to
  main
  which is
  v1.0
  . To download a specific version, you can pass an argument to the keyword
  revision
  in
  load_dataset
  :
 
 from datasets import load_dataset
jazzy = load_dataset("nomic-ai/gpt4all-j-prompt-generations", revision='v1.2-jazzy')