数据集:
OATML-Markslab/ProteinGym
预印本库:
arxiv:2205.13760ProteinGym is an extensive set of Deep Mutational Scanning (DMS) assays curated to enable thorough comparisons of various mutation effect predictors indifferent regimes. It is comprised of two benchmarks: 1) a substitution benchmark which consists of the experimental characterisation of ∼1.5M missense variants across 87 DMS assays 2) an indel benchmark that includes ∼300k mutants across 7 DMS assays.
Each processed file in each benchmark corresponds to a single DMS assay, and contains the following three variables:
Additionally, we provide two reference files (ProteinGym_reference_file_substitutions.csv and ProteinGym_reference_file_indels.csv) that give further details on each assay and contain in particular:
If you use ProteinGym in your work, please cite the following paper:
Notin, P., Dias, M., Frazer, J., Marchena-Hurtado, J., Gomez, A., Marks, D.S., Gal, Y. (2022). Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval. ICML.