数据集:
blinoff/kinopoisk
Kinopoisk movie reviews dataset (TOP250 & BOTTOM100 rank lists).
In total it contains 36,591 reviews from July 2004 to November 2012.
With following distribution along the 3-point sentiment scale:
Each sample contains the following fields:
import pandas as pd
df = pd.read_json('kinopoisk.jsonl', lines=True)
df.sample(5)
@article{blinov2013research,
title={Research of lexical approach and machine learning methods for sentiment analysis},
author={Blinov, PD and Klekovkina, Maria and Kotelnikov, Eugeny and Pestov, Oleg},
journal={Computational Linguistics and Intellectual Technologies},
volume={2},
number={12},
pages={48--58},
year={2013}
}