数据集:
mstz/adult
来自人口普查数据集的 Adult dataset 和 UCI ML repository 。该数据集包括个人特征和其收入阈值。
| Configuration | Task | Description |
|---|---|---|
| encoding | Encoding dictionary showing original values of encoded features. | |
| income | Binary classification | Classify the person's income as over or under the threshold. |
| income-no race | Binary classification | As income , but the race feature is removed. |
| race | Multiclass classification | Predict the race of the individual. |
from datasets import load_dataset
dataset = load_dataset("mstz/adult", "income")["train"]
目标特征根据所选配置而变化,并始终位于数据集的最后位置。
| Feature | Type | Description |
|---|---|---|
| age | [int64] | Age of the person. |
| capital_gain | [float64] | Capital gained by the person. |
| capital_loss | [float64] | Capital lost by the person. |
| education | [int8] | Education level: the higher, the more educated the person. |
| final_weight | [int64] | |
| hours_worked_per_week | [int64] | Hours worked per week. |
| marital_status | [string] | Marital status of the person. |
| native_country | [string] | Native country of the person. |
| occupation | [string] | Job of the person. |
| race | [string] | Race of the person. |
| relationship | [string] | |
| is_male | [bool] | Man/Woman. |
| workclass | [string] | Type of job of the person. |
| over_threshold | int8 | 1 for income >= 50k$ , 0 otherwise. |