数据集:

musabg/wikipedia-oscar-tr

中文

Wikipedia and OSCAR Turkish Dataset

👋 Welcome to the "Wikipedia and OSCAR Turkish" Huggingface Repo!

📚 This repo contains a Turkish language dataset generated by merging Wikipedia and OSCAR cleaned Common Crawl. The dataset contains over 13 million examples with a single feature - text.

🔍 This dataset can be useful for natural language processing tasks in Turkish language.

📥 To download the dataset, you can use the Hugging Face Datasets library. Here's some sample code to get started:

from datasets import load_dataset

dataset = load_dataset("musabg/wikipedia-oscar-tr")

🤖 Have fun exploring this dataset and training language models on it!