模型:

avichr/heBERT_NER

中文

HeBERT: Pre-trained BERT for Polarity Analysis and Emotion Recognition

HeBERT is a Hebrew pretrained language model. It is based on Google's BERT architecture and it is BERT-Base config.

HeBert was trained on three dataset:

  • A Hebrew version of OSCAR : ~9.8 GB of data, including 1 billion words and over 20.8 millions sentences.
  • A Hebrew dump of Wikipedia : ~650 MB of data, including over 63 millions words and 3.8 millions sentences
  • Emotion User Generated Content (UGC) data that was collected for the purpose of this study (described below).
  • Named-entity recognition (NER)

    The ability of the model to classify named entities in text, such as persons' names, organizations, and locations; tested on a labeled dataset from Ben Mordecai and M Elhadad (2005) , and evaluated with F1-score.

    How to use

        from transformers import pipeline
        
        # how to use?
        NER = pipeline(
            "token-classification",
            model="avichr/heBERT_NER",
            tokenizer="avichr/heBERT_NER",
        )
        NER('דויד לומד באוניברסיטה העברית שבירושלים')
    

    Other tasks

    Emotion Recognition Model . An online model can be found at huggingface spaces or as colab notebook Sentiment Analysis . masked-LM model (can be fine-tunned to any down-stream task).

    Contact us

    Avichay Chriqui Inbal yahav The Coller Semitic Languages AI Lab Thank you, תודה, شكرا

    If you used this model please cite us as :

    Chriqui, A., & Yahav, I. (2021). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. arXiv preprint arXiv:2102.01909.

    @article{chriqui2021hebert,
      title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
      author={Chriqui, Avihay and Yahav, Inbal},
      journal={arXiv preprint arXiv:2102.01909},
      year={2021}
    }
    

    git