Introduction

This database characterizes the allelic binding of 95,886 common human single nucleotide polymorphisms (SNPs, MAF>1% in Eeuropean and Asian populations) to 270 distinct transcription factors (TFs). The SNPs were chosen from neighboring regions (<=500 kb) of 83 risk loci of type 2 diabetes (T2D) that were identified in several genome wide association studies (GWAS). The data were generated using SNP-SELEX.

For SNPs that are not included in the SNP-SELEX experiments, we also built deltaSVM models for 94 TFs that can be used to predict allelic TF binding to any SNPs.

For detail about the SNP selection and experimental or computational pipeline, please refer to the paper under review, “Systematic Analysis of Differential Transcription Factor Binding to Non-Coding Variants in the Human Genome.”.

Items

We described the measurements as below:

OBS

Oligo Binding Score, defined by Area Under Curve (AUC) of oligo enrichment per TF. The reads from the 40-bp oligo (regardless of allele) were counted for the input pool and for the 6 SELEX cycles. Odds (oligo/others) ratio between each cycle and input was computed and plotted against SELEX cycle serial number (x-axis: SELEX cycle; y-axis: Log 10 odds ratio). A curve was drawn between the juxtaposed dots. AUC is used as a score to assess TF binding to the 40-bp sequence.

p-value (OBS)

25,000 Monte-Carlo randomization was carried out to get p-values of the OBS. A cut off of p-value 0.05 is used to define ‘Binding’ of the TF to the genomic fragment.

PBS

Preferential Binding Score, defined by Area Under Curve (AUC) of differential allelic enrichment per TF. Curves for both alleles of OBS (see above) were drawn in the same plot and area between the two curves was computed to define the differential binding of two alleles to the TF. The reference allele is always subtracted by the alternative allele so that the negative value means that the TF prefers alternative allele.

p-value (PBS)

25,000 Monte-Carlo randomization was carried out to get p-values of the PBS. A cut off of p-value 0.01 is used to define ‘Differential Binding’ of the TF to the two alleles.

Alt

Nucleotide sequence of the alternative allele of the SNP (hg19).

Ref

Nucleotide sequence of the reference allele of the SNP (hg19).