模型:
philschmid/t5-11b-sharded
这是对 t5-11b 的改进版本,使用一个自定义的handler.py作为在单个NVIDIA T4上使用 inference-endpoints 与t5-11b的示例。
Hugging Face推断端点可以与任何语言中的HTTP客户端一起使用。我们将使用Python和requests库来发送请求(确保已经安装了它 pip install requests)。
import json
import requests as r
ENDPOINT_URL=""# url of your endpoint
HF_TOKEN=""
# payload samples
regular_payload = { "inputs": "translate English to German: The weather is nice today." }
parameter_payload = {
"inputs": "translate English to German: Hello my name is Philipp and I am a Technical Leader at Hugging Face",
"parameters" : {
"max_length": 40,
}
}
# HTTP headers for authorization
headers= {
"Authorization": f"Bearer {HF_TOKEN}",
"Content-Type": "application/json"
}
# send request
response = r.post(ENDPOINT_URL, headers=headers, json=paramter_payload)
generated_text = response.json()
print(generated_text)