INTRODUCTION:

dbNSFP is a database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome. Its current version is based on the Gencode release 29 / Ensembl version 94 and includes a total of 84,013,490 nsSNVs and ssSNVs (splicing-site SNVs). It compiles prediction scores from 37 prediction algorithms (SIFT, SIFT4G, Polyphen2-HDIV, Polyphen2-HVAR, LRT, MutationTaster2, MutationAssessor, FATHMM, MetaSVM, MetaLR, CADD, CADD_hg19, VEST4, PROVEAN, FATHMM-MKL coding, FATHMM-XF coding, fitCons x 4, LINSIGHT, DANN, GenoCanyon, Eigen, Eigen-PC, M-CAP, REVEL, MutPred, MVP, MPC, PrimateAI, GEOGEN2, BayesDel_addAF, BayesDel_noAF, ClinPred, LIST-S2, ALoFT), 9 conservation scores (PhyloP x 3, phastCons x 3, GERP++, SiPhy and bStatistic) and other related information including allele frequencies observed in the 1000 Genomes Project phase 3 data, UK10K cohorts data, ExAC consortium data, gnomAD data and the NHLBI Exome Sequencing Project ESP6500 data, various gene IDs from different databases, functional descriptions of genes, gene expression and gene interaction information, etc.

Some dbNSFP contents (may not be up-to-date though) can also be accessed through variant tools, ANNOVAR, KGGSeq, VarSome, UCSC Genome Browser's Variant Annotation Integrator, Ensembl Variant Effect Predictor, SnpSift and HGMD. Please cite our papers (see below) if you used dbNSFP contents through those tools. Please note some component score/content of dbNSFP has specific requirements or licence for non-academic usage. dbNSFP does not grant the non-academic usage of those scores/contents, so please contact the original score/content providers for that purpose.

Please join our Email group for news and updates from dbNSFP.

For whole genome annotation, we recommend our whole genome annotation pipeline WGSA, in which dbNSFP is a component resource.

We thank Dr. CS (Jonathan) Liu from Softgenetics for providing hosting space.

We welcome developers of functional prediction methods to provide their predictions and scores to the database. Please contact Dr. Xiaoming Liu (xmliu.uth{at}gmail.com).

DATABSE:

You will be directed to query page by selecting database version below:




CITATION:

1. Liu X, Jian X, and Boerwinkle E. 2011. dbNSFP: a lightweight database of human non-synonymous SNPs and their functional predictions. Human Mutation. 32:894-899.

2. Liu X, Jian X, and Boerwinkle E. 2013. dbNSFP v2.0: A Database of Human Non-synonymous SNVs and Their Functional Predictions and Annotations. Human Mutation. 34:E2393-E2402.

3. Liu X, Wu C, Li C and Boerwinkle E. 2016. dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Non-synonymous and Splice Site SNVs. Human Mutation. 37:235-241.

4. Liu X, Li C, Mou C, Dong Y, and Tu Y. 2020. dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome Medicine. 12:103.

If you used dbNSFP v1.x, please cite our paper 1. If you used dbNSFP v2.x, please cite our papers 1 & 2. If you used dbNSFP v3.x, please cite our papers 1 & 3. If you used dbNSFP v4.x, please cite our papers 1 & 4.

If you used our ensemble scores (MetaSVM and MetaLR), which are based on 10 component scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Please cite:

1. Dong C, Wei P, Jian X, Gibbs R, Boerwinkle E, Wang K* and Liu X*. (2015) Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Human Molecular Genetics 24(8):2125-2137. *corresponding authors [preprint]

VERSIONS:

UPDATE (June 16, 2020): dbNSFP v4.1 is released. BayesDel (https://doi.org/10.1002/humu.23158), ClinPred (https://doi.org/10.1016/j.ajhg.2018.08.005) and LIST-S2 (https://doi.org/10.1093/nar/gkaa288) scores have been added. CADD has been updated to v1.6, CADD score based on hg19 model has been added. gnomAD genomes have been updated to r3.0: populations AMI (Amish) and SAS (South Asian) have been added; controls have been removed. Clinvar, GTEx have been updated. HPO terms have been added to the dbNSFP_gene. search_dbNSFP programs now support searching SpliceAI (https://doi.org/10.1016/j.cell.2018.12.015) as an attached database, please refer to the readme files of the search_dbNSFP programs for details.

Two branches of dbNSFP are provided: dbNSFP4.1a suitable for academic use, which includes all the resources, and dbNSFP4.1c suitable for commercial use, which does not include Polyphen2, VEST, REVEL, ClinPred, CADD, LINSIGHT, and GenoCanyon.

dbNSFP4.1a can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here. dbNSFP4.1c can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here.

Columns added (dbNSFP_variant): BayesDel_addAF_score, BayesDel_addAF_rankscore, BayesDel_addAF_pred, BayesDel_noAF_score, BayesDel_noAF_rankscore, BayesDel_noAF_pred, LIST-S2_score, LIST-S2_rankscore, LIST-S2_pred, CADD_raw_hg19, CADD_raw_rankscore_hg19, CADD_phred_hg19, gnomAD_genomes_AMI_AC, gnomAD_genomes_AMI_AN, gnomAD_genomes_AMI_AF, gnomAD_genomes_AMI_nhomalt, gnomAD_genomes_SAS_AC, gnomAD_genomes_SAS_AN, gnomAD_genomes_SAS_AF, gnomAD_genomes_SAS_nhomalt

Columns name changes (dbNSFP_variant): GTEx_V7_gene changed to GTEx_V8_gene, GTEx_V7_tissue changed to GTEx_V8_tissue

Columns deleted (dbNSFP_variant): gnomAD_genomes_controls_AC, gnomAD_genomes_controls_AN, gnomAD_genomes_controls_AF, gnomAD_genomes_controls_nhomalt, gnomAD_genomes_controls_AFR_AC, gnomAD_genomes_controls_AFR_AN, gnomAD_genomes_controls_AFR_AF, gnomAD_genomes_controls_AFR_nhomalt, gnomAD_genomes_controls_AMR_AC, gnomAD_genomes_controls_AMR_AN, gnomAD_genomes_controls_AMR_AF, gnomAD_genomes_controls_AMR_nhomalt, gnomAD_genomes_controls_ASJ_AC, gnomAD_genomes_controls_ASJ_AN, gnomAD_genomes_controls_ASJ_AF, gnomAD_genomes_controls_ASJ_nhomalt, gnomAD_genomes_controls_EAS_AC, gnomAD_genomes_controls_EAS_AN, gnomAD_genomes_controls_EAS_AF, gnomAD_genomes_controls_EAS_nhomalt, gnomAD_genomes_controls_FIN_AC, gnomAD_genomes_controls_FIN_AN, gnomAD_genomes_controls_FIN_AF, gnomAD_genomes_controls_FIN_nhomalt, gnomAD_genomes_controls_NFE_AC, gnomAD_genomes_controls_NFE_AN, gnomAD_genomes_controls_NFE_AF, gnomAD_genomes_controls_NFE_nhomalt, gnomAD_genomes_controls_POPMAX_AC, gnomAD_genomes_controls_POPMAX_AN, gnomAD_genomes_controls_POPMAX_AF, gnomAD_genomes_controls_POPMAX_nhomalt

Columns added (dbNSFP_gene): HPO_id, HPO_name

UPDATE (May 15, 2020): A minor bug is fixed in dbNSFP v4.0. In the previous release, the column Primate_AI_pred was not 100% correct. We thank Alex Kouris for reporting this issue. If you want to use Primate_AI_pred please download it again.

Two branches of dbNSFP are provided: dbNSFP4.0a suitable for academic use, which includes all the resources, and dbNSFP4.0c suitable for commercial use, which does not include Polyphen2, VEST, REVEL, CADD, LINSIGHT, and GenoCanyon.

dbNSFP4.0a can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here. dbNSFP4.0c can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here.

UPDATE (December 5, 2019): A minor bug is fixed in dbNSFP v4.0. In the previous release the content of the following columns were compressed, i.e. if annotations for all transcripts are identical, only one annotation was presented: genename, cds_strand, refcodon, codonpos, codon_degeneracy, FATHMM_score, FATHMM_pred, Interpro_domain. In this new release, those columns are decompressed, i.e. have the same number of annotations as the number of transcripts. A Java-based graphic user interface (GUI) search program (search_dbNSFP40a.jar or search_dbNSFP40c.jar) has been added. Users can double-click the jar file to launch the GUI (it supports command-line also, please check the search_dbNSFP readme pdf for details).

Two branches of dbNSFP are provided: dbNSFP4.0a suitable for academic use, which includes all the resources, and dbNSFP4.0c suitable for commercial use, which does not include Polyphen2, VEST, REVEL, CADD, LINSIGHT, and GenoCanyon.

dbNSFP4.0a can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here. dbNSFP4.0c can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here.

NEW VERSION (May 3, 2019): dbNSFP v4.0 is formally released. HGVS c. and p. presentations from ANNOVAR, SnpEff and VEP have been added. search_dbNSFP now supports search based on HGVS c. and p. presentations. Please refer to search_dbNSFP40a.readme.pdf or search_dbNSFP40c.readme.pdf for details. MedGen ID, OMIM ID and Orphanet ID from clinvar have been added.

Two branches of dbNSFP are provided: dbNSFP4.0a suitable for academic use, which includes all the resources, and dbNSFP4.0c suitable for commercial use, which does not include Polyphen2, VEST, REVEL, CADD, LINSIGHT, and GenoCanyon. Please contact Dr. Xiaoming Liu (xmliu.uth{at}gmail.com) for commercial usage of dbNSFP.

dbNSFP4.0a can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here. dbNSFP4.0c can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here.

UPDATE (February 20, 2019): dbNSFP v4.0b2 is released for beta testing. Uniprot sprot_varsplic was included in the mapping from Uniprot to Ensembl. Fixed column title inconsistency between the README file and data file. (We thank Kevin Xin and Julius Jacobsen for pointing out the inconsistency.) dbMTS was added as an attached database. search_dbNSFP added support for searching dbMTS with option '-m'.

Two branches of dbNSFP are provided: dbNSFP4.0b2a suitable for academic use, which includes all the resources, and dbNSFP4.0b2c suitable for commercial use, which does not include Polyphen2, VEST, REVEL, CADD, LINSIGHT, and GenoCanyon. Please contact Dr. Xiaoming Liu (xmliu.uth{at}gmail.com) for commercial usage of dbNSFP.

dbNSFP4.0b2a can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here. dbNSFP4.0b2c can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here.

UPDATE (December 30, 2018): A bug causing id mapping issue from Uniprot to Ensembl, which further causing increased missing rates of Polyphen2, MutationAssessor and DEOGEN2, has been found and fixed (We thank Dr. Daniele Raimondi). If you downloaded dbNSFP v4.0b1 before December 30, please download it again.

NEW VERSION (December 8, 2018): dbNSFP v4.0b1 is released for beta testing. The core set of nsSNVs and ssSNVs has been rebuilt based on Gencode 29/Ensembl 94 with human reference sequence hg38. Eight deleteriousness prediction scores (ALoFT, DEOGEN2, FATHMM-XF, MPC, MVP, PrimateAI, LINSIGHT, SIFT4G) have been added. Three conservation scores (phyloP17way_primate, phastCons17way_primate, bStatistic) have been added. Allele frequencies from the gnomAD controls subsets, eQTLs from the Geuvadis project, and genotypes of a Vindija33.19 Neanderthal have been added. Some resources have been updated, including VEST (We thank Dr. Karchin), CADD, M-CAP, ancestral alleles, dbSNP, ClinVar, GTEx and InterPro. The presentation of the prediction scores has been further improved by adding the correspondence to transcript/protein ids in a systematic way. APPRIS, GENCODE_basic, TSL and VEP_canonical have been added to facilitate the choice of appropriate transcripts. dbNSFP_gene has also been completely rebuilt using the up-to-date resources. HIPred, gene constraint scores from the gnomAD data, essential genes predictions based on CRISPR, gene-trap and gene networks have been added.

Two branches of dbNSFP are provided: dbNSFP4.0b1a suitable for academic use, which includes all the resources, and dbNSFP4.0b1c suitable for commercial use, which does not include Polyphen2, VEST, REVEL, CADD, LINSIGHT, and GenoCanyon. Please contact Dr. Xiaoming Liu (xmliu.uth{at}gmail.com) for commercial usage of dbNSFP.

dbNSFP4.0b1a can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here. dbNSFP4.0b1c can be downloaded from softgenetics ftp or googledrive. The md5sum of the zip file can be found here. A README file is here.

Columns added (dbNSFP_variant): VindijiaNeandertal, Uniprot_acc, Uniprot_entry, APPRIS, GENCODE_basic, TSL, VEP_canonical, MVP_score, MVP_rankscore, MPC_score, MPC_rankscore, PrimateAI_score, PrimateAI_rankscore, PrimateAI_pred, DEOGEN2_score, DEOGEN2_rankscore, DEOGEN2_pred, fathmm-XF_coding_score, fathmm-XF_coding_rankscore, fathmm-XF_coding_pred, bStatistic, bStatistic_rankscore, Aloft_Fraction_transcripts_affected, Aloft_prob_Tolerant, Aloft_prob_Recessive, Aloft_prob_Dominant, Aloft_pred, Aloft_Confidence, UK10K_AC, UK10K_AF, gnomAD_exomes_controls_AC, gnomAD_exomes_controls_AN, gnomAD_exomes_controls_AF, gnomAD_exomes_controls_nhomalt, gnomAD_exomes_controls_AFR_AC, gnomAD_exomes_controls_AFR_AN, gnomAD_exomes_controls_AFR_AF, gnomAD_exomes_controls_AFR_nhomalt, gnomAD_exomes_controls_AMR_AC, gnomAD_exomes_controls_AMR_AN, gnomAD_exomes_controls_AMR_AF, gnomAD_exomes_controls_AMR_nhomalt, gnomAD_exomes_controls_ASJ_AC, gnomAD_exomes_controls_ASJ_AN, gnomAD_exomes_controls_ASJ_AF, gnomAD_exomes_controls_ASJ_nhomalt, gnomAD_exomes_controls_EAS_AC, gnomAD_exomes_controls_EAS_AN, gnomAD_exomes_controls_EAS_AF, gnomAD_exomes_controls_EAS_nhomalt, gnomAD_exomes_controls_FIN_AC, gnomAD_exomes_controls_FIN_AN, gnomAD_exomes_controls_FIN_AF, gnomAD_exomes_controls_FIN_nhomalt, gnomAD_exomes_controls_NFE_AC, gnomAD_exomes_controls_NFE_AN, gnomAD_exomes_controls_NFE_AF, gnomAD_exomes_controls_NFE_nhomalt, gnomAD_exomes_controls_SAS_AC, gnomAD_exomes_controls_SAS_AN, gnomAD_exomes_controls_SAS_AF, gnomAD_exomes_controls_SAS_nhomalt, gnomAD_exomes_controls_POPMAX_AC, gnomAD_exomes_controls_POPMAX_AN, gnomAD_exomes_controls_POPMAX_AF, gnomAD_exomes_controls_POPMAX_nhomalt, gnomAD_exomes_nhomalt, gnomAD_exomes_AFR_nhomalt, gnomAD_exomes_AMR_nhomalt, gnomAD_exomes_ASJ_nhomalt, gnomAD_exomes_EAS_nhomalt, gnomAD_exomes_FIN_nhomalt, gnomAD_exomes_NFE_nhomalt, gnomAD_exomes_SAS_nhomalt, gnomAD_exomes_POPMAX_AC, gnomAD_exomes_POPMAX_AN, gnomAD_exomes_POPMAX_AF, gnomAD_exomes_POPMAX_nhomalt, gnomAD_exomes_flag, gnomAD_genomes_flag, gnomAD_genomes_nhomalt, gnomAD_genomes_AFR_nhomalt, gnomAD_genomes_AMR_nhomalt, gnomAD_genomes_ASJ_nhomalt, gnomAD_genomes_EAS_nhomalt, gnomAD_genomes_FIN_nhomalt, gnomAD_genomes_NFE_nhomalt, gnomAD_genomes_POPMAX_nhomalt, gnomAD_genomes_controls_AC, gnomAD_genomes_controls_AN, gnomAD_genomes_controls_AF, gnomAD_genomes_controls_nhomalt, gnomAD_genomes_controls_AFR_AC, gnomAD_genomes_controls_AFR_AN, gnomAD_genomes_controls_AFR_AF, gnomAD_genomes_controls_AFR_nhomalt, gnomAD_genomes_controls_AMR_AC, gnomAD_genomes_controls_AMR_AN, gnomAD_genomes_controls_AMR_AF, gnomAD_genomes_controls_AMR_nhomalt, gnomAD_genomes_controls_ASJ_AC, gnomAD_genomes_controls_ASJ_AN, gnomAD_genomes_controls_ASJ_AF, gnomAD_genomes_controls_ASJ_nhomalt, gnomAD_genomes_controls_EAS_AC, gnomAD_genomes_controls_EAS_AN, gnomAD_genomes_controls_EAS_AF, gnomAD_genomes_controls_EAS_nhomalt, gnomAD_genomes_controls_FIN_AC, gnomAD_genomes_controls_FIN_AN, gnomAD_genomes_controls_FIN_AF, gnomAD_genomes_controls_FIN_nhomalt, gnomAD_genomes_controls_NFE_AC, gnomAD_genomes_controls_NFE_AN, gnomAD_genomes_controls_NFE_AF, gnomAD_genomes_controls_NFE_nhomalt, gnomAD_genomes_controls_POPMAX_AC, gnomAD_genomes_controls_POPMAX_AN, gnomAD_genomes_controls_POPMAX_AF, gnomAD_genomes_controls_POPMAX_nhomalt, Geuvadis_eQTL_target_gene, clinvar_hgvs, clinvar_var_source, Eigen-raw_coding_rankscore, SIFT4G_score, SIFT4G_pred, SIFT4G_converted_rankscore, phyloP17way_primate, phyloP17way_primate_rankscore, phastCons17way_primate, phastCons17way_primate_rankscore

Columns name changes (dbNSFP_variant): MutationAssessor_score_rankscore to MutationAssessor_rankscore, VEST3_score to VEST4_score, VEST3_rankscore to VEST4_rankscore, GenoCanyon_score_rankscore to GenoCanyo_rankscore, integrated_fitCons_score_rankscore to integrated_fitCons_rankscore, GM12878_fitCons_score_rankscore to GM12878_fitCons_rankscore, H1-hESC_fitCons_score_rankscore to H1-hESC_fitCons_rankscore, HUVEC_fitCons_score_rankscore to HUVEC_fitCons_rankscore, phyloP20way_mammalian to phyloP30way_mammalian, phyloP20way_mammalian_rankscore to phyloP30way_mammalian_rankscore, phastCons20way_mammalian to phastCons30way_mammalian, phastCons20way_mammalian_rankscore to phastCons30way_mammalian_rankscore, clinvar_golden_stars to clinvar_review, GTEx_V6p_gene to GTEx_V7_gene, GTEx_V6p_tissue to GTEx_V7_tissue, Eigen-raw to Eigen-raw_coding, Eigen-phred to Eigen-phred_coding, Eigen-PC-raw to Eigen-PC-raw_coding, Eigen-PC-phred to Eigen-PC-phred_coding, Eigen-PC-raw_rankscore to Eigen-PC-raw_coding_rankscore, rs_dbSNP150 to rs_dbSNP151, clinvar_rs to clinvar_id.

Columns deleted (dbNSFP_variant): Uniprot_acc_Polyphen2, Uniprot_id_Polyphen2, Uniprot_aapos_Polyphen2, MutationAssessor_UniprotID, MutationAssessor_variant, Transcript_id_VEST3, Transcript_var_VEST3, gnomAD_exomes_OTH_AC, gnomAD_exomes_OTH_AN, gnomAD_exomes_OTH_AF, gnomAD_genomes_OTH_AC, gnomAD_genomes_OTH_AN, gnomAD_genomes_OTH_AF, Eigen_coding_or_noncoding

Columns added (dbNSFP_gene): gnomAD_pLI, gnomAD_pRec, gnomAD_pNull, HIPred_score, HIPred, Essential_gene_CRISPR, Essential_gene_CRISPR2, Essential_gene_gene-trap, Gene_indispensability_score, Gene_indispensability_pred.

REMINDER: For whole genome annotation, we recommend our whole genome annotation pipeline WGSA. Currently it supports SNP and indel annotation using hg19 and hg38 coordinates. dbNSFP v2.9.3 (the last dbNSFP native on hg19) is a component resource.

REMINDER: if your snp coordinates are based on hg19, remember to add option "-v hg19" when using the search program because the default position is now in hg38.

Contact Us:

         Xiaoming Liu, Ph.D.

         Associate Professor,

         USF Genomics,

Phone: 813-974-9865

Email: xiaomingliu@health.usf.edu

Lab Page: http://liulab.science

University of South Florida,
         3720 Spectrum Boulevard,
         Suite 304,
         Tampa, FL 33612





Powered by w3.css