数据集:
zeroshot/twitter-financial-news-topic
Read this BLOG to see how I fine-tuned a sparse transformer on this dataset.
The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their topic.
topics = {
"LABEL_0": "Analyst Update",
"LABEL_1": "Fed | Central Banks",
"LABEL_2": "Company | Product News",
"LABEL_3": "Treasuries | Corporate Debt",
"LABEL_4": "Dividend",
"LABEL_5": "Earnings",
"LABEL_6": "Energy | Oil",
"LABEL_7": "Financials",
"LABEL_8": "Currencies",
"LABEL_9": "General News | Opinion",
"LABEL_10": "Gold | Metals | Materials",
"LABEL_11": "IPO",
"LABEL_12": "Legal | Regulation",
"LABEL_13": "M&A | Investments",
"LABEL_14": "Macro",
"LABEL_15": "Markets",
"LABEL_16": "Politics",
"LABEL_17": "Personnel Change",
"LABEL_18": "Stock Commentary",
"LABEL_19": "Stock Movement",
}
The data was collected using the Twitter API. The current dataset supports the multi-class classification task.
There are 2 splits: train and validation. Below are the statistics:
| Dataset Split | Number of Instances in Split |
|---|---|
| Train | 16,990 |
| Validation | 4,118 |
The Twitter Financial Dataset (topic) version 1.0.0 is released under the MIT License.