模型:
theblackcat102/pythia-1b-deduped-sft
此模型卡旨在成为新模型的基本模板。它是使用 this raw template 生成的。
请参考右侧的示例
用户(直接使用者和下游应用)应了解模型的风险、偏见和局限性。有关进一步建议,需要更多信息。
使用下面的代码来开始使用模型。
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "theblackcat102/pythia-1b-deduped-sft"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).half().eval().cuda()
input_text = "<human>What's the earth population?<bot>"
inputs = tokenizer(input_text, return_tensors="pt", padding=True).to(0)
outputs = model.generate(
**inputs,
early_stopping=True,
max_new_tokens=args.max_new_tokens,
do_sample=True,
top_k=args.top_k,
temperature=args.temperature,
pad_token_id=tokenizer.eos_token_id,
# dialogue_collator.py line 36
)
output = tokenizer.decode(outputs[0], truncate_before_pattern=[r"\n\n^#", "^'''", "\n\n\n"])
print(output)
deepspeed trainer_sft.py --configs defaults pythia-1b --deepspeed
此模型进行了1000次迭代训练。
defaults:
learning_rate: 1e-5
gradient_checkpointing: false
gradient_accumulation_steps: 32
per_device_train_batch_size: 2
per_device_eval_batch_size: 2
weight_decay: 0.00
warmup_steps: 600
eval_steps: 250
save_steps: 250
max_length: 512
num_train_epochs: 2
logging_steps: 10
max_grad_norm: 2.0
save_total_limit: 4
fp16: true
eval_accumulation_steps:
freeze_layer:
datasets:
- gsm8k_hard
- webgpt
- squad_v2
- adversarial_qa
- private_tuning
- oa_translated
- prosocial_dialogue
- math_qa
- wikihow
- joke
- gsm8k
- ted_trans_en-hi
- ted_trans_de-ja
- ted_trans_nl-en
- ted_trans_en-ja
- ted_trans_en-es
- ted_trans_en-ms
- xsum:
fraction: 0.5
- cnn_dailymail:
fraction: 0.5
- multi_news:
fraction: 0.5
- tldr_news:
fraction: 0.5
- scitldr:
fraction: 0.5
- samsum:
fraction: 0.5
- debate_sum:
fraction: 0.5
- billsum:
fraction: 0.5
- wmt2019_zh-en:
fraction: 0.9
- wmt2019_ru-en:
fraction: 0.9
- wmt2019_de-en:
fraction: 0.9
- wmt2019_fr-de:
fraction: 0.9
- essay_instruction
- reddit_eli5
- reddit_askh
- reddit_asks
cache_dir: /fsx/home-theblackcat02/.cache
loss_fn: CrossEntropyLoss
eval_size:
log_dir: "base"
quantization: false
seq2seqmodel: false
poly_eps: 1.0
fuse_gelu: true
log_wandb: true
samples_mixing: true # uses collator that mixes samples in the batch to create a single sample with possible multiple tasks within
verbose: false
pythia-1b:
learning_rate: 5e-6
model_name: EleutherAI/pythia-1b-deduped
weight_decay: 0.01
max_length: 540
fp16: true
warmup_steps: 1000
gradient_accumulation_steps: 20
per_device_train_batch_size: 20
per_device_eval_batch_size: 2
eval_steps: 500
save_steps: 500
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
可以使用 Machine Learning Impact calculator 和 Lacoste et al. (2019) 中提供的方法估算碳排放量。
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
BibTeX:
[需要更多信息]
APA:
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]