Putative transcription factor binding sites (TFBS) can be predicted using binding site profiles represented as position weight matrices (PWMs) -- also known as position specific scoring matrices (PSSMs). Applying PWMs to a particular position on a sequence yields a score at this position. This score represents the predicted binding affinity of the transcription factor (TF) at that position. This binding affinity may be disrupted by mutations within the binding site which will be reflected in the PWM score.
ABC4DE is a tool which predicts the impact of SNVs on putative TFBSs. TFBSs are predicted by scoring all the peak sequences from a large set of ChIP-seq experiemnts with PWMs for the corresponding TFs from the JASPAR database. For all TFBSs predicted within all ChIP-seq peak sequences which score above a given threshold, each position in the TFBS sequence is mutated with each of the 3 possible alternate alleles. For each of these mutations the best scoring PWM containing the SNV position is computed. The unmutated TFBS score is compared to the score of the best scoring TFBS containing the mutation and an impact score is computed. This information is stored in database.
The web interface allows a user to provide a set of SNVs using any one of a set of standard bioinformatics file formats. Each SNV in the user's file is compared to SNVs predicted to impact TFBSs from the MANTA database. The SNVs are also compared to the DECRES database of predicted regulatory elements. For each provided SNV, if a match is found in the MANTA TFBS database, the corresponding SNV, unmutated TFBS and mutated TFBS information is output to results along with the overlap of any DECRES predicted regulatory region. Results are provided as both a web page and a tab-delimited text file.
This page allows you to enter SNVs and search the MANTA and DECRES databases for putative TFBSs which are impacted by these SNVs as well as any DECRES predicted regulatory elements.
ABC4DE allows you to specify SNVs with several file formats including VCF, GFF and BED. You may either paste the SNVs directly into the text area, or upload a file using the file upload button. The formats and how they are used by ABC4DE are described below.
chr1 808631 rs11240779 G A
ABC4DE expects GFF file format version 2 which is described here. The chromosome should be specified in the seqname field (the 1st field), either with a chr prefix or not. The source and feature fields are ignored. The start and end fields define the position of the SNV. The score, strand and frame fields are ignored. The attributes field is used to specify the reference and alternate alleles using the tags ref_allele and alt_allele. As with the VCF format, the GFF input is expected to be tab-separated. The same caveats relating to tab versus space separated fields for VCF input also applies to GFF input. Similarly for the GFF fields that are not required, please use a dot (".") instead of leaving them blank.
An example line is as follows:
chr1 dbSNP SNP 808631 808631 . . . ref_allele=G; alt_allele=A
The BED file format is described here. ABC4DE only requires the first 4 fields. Additional fields are ignored. As there are no BED fields explicitly designed to specify allele information, the name field (4th field) is used to specify the alternate allele (the reference allele is assumed). Note that the BED format specifies 0-based coordinates and that the end is not included in the display of the feature. A simpler way of thinking about it for single nucleotide variations is that the end is the actual position of the variant (in the usual 1-based coordinate system) and the start coordinate is the nucleotide position before the variant. BED format may use either tabs or spaces (or any whitespace character) as field separators.
An example line is as follows:
chr1 808630 808631 A
ABC4DE only predicts the impact of SNVs on TFBSs and their overlap with DECRES predicted regulatory elements. It does not currently predict the impact of indels or other types of variants. For formats which specify both a start and end coordinate, if the coordinates specified in a particular line of the file do not indicate an SNV, this variant is ignored (this line is skipped). Future versions of ABC4DE may also implement indel impact predictions.
Once you have pasted or uploaded your SNVs press the Submit button to search the MANTA and DECRES database for putative TFBSs which are impacted and their overlap with predicted regulatory regions. Once the search is complete the results page with appear showing all SNVs which impacted TFBSs and any DECRES predicted regulatory elements.
The results of the ABC4DE search are provided in a table. For each SNV that impacted one or more TFBSs, it displays the SNV information along with the associated wild-type (reference) and mutated (alternate) TFBS information. If the SNV falls within a DECRES predicted regulatory region, this information is also provided. PLEASE NOTE that information in output lines may be duplicated. For example if an SNV impacts a putative TFBS and also overlaps multiple DECRES features (this will happen if DECRES predicted the same regulatory element for more than one cell type), the output is given as multiple lines where each line contains distinct DECRES information but the TFBS impact information is repeated.The results table contains the following columns:
The table may be sorted on any column by clicking on the column header. Clicking the same column header a second time will reverse the sort order on that column.
For a more detailed explanation of the DECRES enhancer/promoter prediction methods please refer to: http://biorxiv.org/content/early/2016/02/28/041616