多语言BERT 孟加拉语实体识别 
  mBERT-Bengali-NER 是一个基于Transformer的孟加拉语实体识别模型,使用了 
   bert-base-multilingual-uncased
   个数据模型和 
   Wikiann
   个数据集进行构建。 
  如何使用 
 from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("sagorsarker/mbert-bengali-ner")
model = AutoModelForTokenClassification.from_pretrained("sagorsarker/mbert-bengali-ner")
nlp = pipeline("ner", model=model, tokenizer=tokenizer, grouped_entities=True)
example = "আমি জাহিদ এবং আমি ঢাকায় বাস করি।"
ner_results = nlp(example)
print(ner_results)
  标签和ID映射 
 
  
   
    | 
     Label ID
     | 
    
     Label
     | 
   
  
  
   
    | 
     0
     | 
    
     O
     | 
   
   
    | 
     1
     | 
    
     B-PER
     | 
   
   
    | 
     2
     | 
    
     I-PER
     | 
   
   
    | 
     3
     | 
    
     B-ORG
     | 
   
   
    | 
     4
     | 
    
     I-ORG
     | 
   
   
    | 
     5
     | 
    
     B-LOC
     | 
   
   
    | 
     6
     | 
    
     I-LOC
     | 
   
  
 
  训练细节 
   评估结果 
 
  
   
    | 
     Model
     | 
    
     F1
     | 
    
     Precision
     | 
    
     Recall
     | 
    
     Accuracy
     | 
    
     Loss
     | 
   
  
  
   
    | 
     mBert-Bengali-NER
     | 
    
     0.97105
     | 
    
     0.96769
     | 
    
     0.97443
     | 
    
     0.97682
     | 
    
     0.12511
     |