数据集:
DISCOX/DISCO-10K-random
许可:
You can download the dataset using HuggingFace:
from datasets import load_dataset
ds = load_dataset("DISCOX/DISCO-10K-random")
The dataset contains 10,000 random samples from the DISCO-10M dataset found here .
The dataset contains the following features:
{
'video_url_youtube',
'video_title_youtube',
'track_name_spotify',
'video_duration_youtube_sec',
'preview_url_spotify',
'video_view_count_youtube',
'video_thumbnail_url_youtube',
'search_query_youtube',
'video_description_youtube',
'track_id_spotify',
'album_id_spotify',
'artist_id_spotify',
'track_duration_spotify_ms',
'primary_artist_name_spotify',
'track_release_date_spotify',
'explicit_content_spotify',
'similarity_duration',
'similarity_query_video_title',
'similarity_query_description',
'similarity_audio',
'audio_embedding_spotify',
'audio_embedding_youtube',
}
More details about the dataset can be found here .