ABC4DE Help

Description

Putative transcription factor binding sites (TFBS) can be predicted using binding site profiles represented as position weight matrices (PWMs) -- also known as position specific scoring matrices (PSSMs). Applying PWMs to a particular position on a sequence yields a score at this position. This score represents the predicted binding affinity of the transcription factor (TF) at that position. This binding affinity may be disrupted by mutations within the binding site which will be reflected in the PWM score.

ABC4DE is a tool which predicts the impact of SNVs on putative TFBSs. TFBSs are predicted by scoring all the peak sequences from a large set of ChIP-seq experiemnts with PWMs for the corresponding TFs from the JASPAR database. For all TFBSs predicted within all ChIP-seq peak sequences which score above a given threshold, each position in the TFBS sequence is mutated with each of the 3 possible alternate alleles. For each of these mutations the best scoring PWM containing the SNV position is computed. The unmutated TFBS score is compared to the score of the best scoring TFBS containing the mutation and an impact score is computed. This information is stored in database.

The web interface allows a user to provide a set of SNVs using any one of a set of standard bioinformatics file formats. Each SNV in the user's file is compared to SNVs predicted to impact TFBSs from the MANTA database. The SNVs are also compared to the DECRES database of predicted regulatory elements. For each provided SNV, if a match is found in the MANTA TFBS database, the corresponding SNV, unmutated TFBS and mutated TFBS information is output to results along with the overlap of any DECRES predicted regulatory region. Results are provided as both a web page and a tab-delimited text file.

Specify SNVs

This page allows you to enter SNVs and search the MANTA and DECRES databases for putative TFBSs which are impacted by these SNVs as well as any DECRES predicted regulatory elements.

SNV File Formats

ABC4DE allows you to specify SNVs with several file formats including VCF, GFF and BED. You may either paste the SNVs directly into the text area, or upload a file using the file upload button. The formats and how they are used by ABC4DE are described below.

VCF Format

The VCF file format is described here. ABC4DE ignores all lines starting with '#' and only requires the first 5 fields: CHROM, POS, ID, REF and ALT. The chromosome may be specified with or without the chr prefix. PLEASE NOTE that the VCF format specification requires that the columns be tab-separated. If you supply space separated fields, MANTA will give an error complaining that the number of columns is less than 5. Depending on which type of computer system you are using, if you are cutting and pasting VCF lines into the input text box, tabs may be automatically converted to spaces causing this error so please double check your input really is tab separated. Also, PLEASE NOTE that if you do not have a specific value for the ID field, please use a dot (".") instead of leaving it blank.

An example line is as follows:


  chr1    808631    rs11240779    G    A
  

GFF Format

ABC4DE expects GFF file format version 2 which is described here. The chromosome should be specified in the seqname field (the 1st field), either with a chr prefix or not. The source and feature fields are ignored. The start and end fields define the position of the SNV. The score, strand and frame fields are ignored. The attributes field is used to specify the reference and alternate alleles using the tags ref_allele and alt_allele. As with the VCF format, the GFF input is expected to be tab-separated. The same caveats relating to tab versus space separated fields for VCF input also applies to GFF input. Similarly for the GFF fields that are not required, please use a dot (".") instead of leaving them blank.

An example line is as follows:


  chr1    dbSNP    SNP    808631    808631    .    .    .    ref_allele=G; alt_allele=A
  

BED Format

The BED file format is described here. ABC4DE only requires the first 4 fields. Additional fields are ignored. As there are no BED fields explicitly designed to specify allele information, the name field (4th field) is used to specify the alternate allele (the reference allele is assumed). Note that the BED format specifies 0-based coordinates and that the end is not included in the display of the feature. A simpler way of thinking about it for single nucleotide variations is that the end is the actual position of the variant (in the usual 1-based coordinate system) and the start coordinate is the nucleotide position before the variant. BED format may use either tabs or spaces (or any whitespace character) as field separators.

An example line is as follows:


  chr1    808630    808631    A
  

ABC4DE only predicts the impact of SNVs on TFBSs and their overlap with DECRES predicted regulatory elements. It does not currently predict the impact of indels or other types of variants. For formats which specify both a start and end coordinate, if the coordinates specified in a particular line of the file do not indicate an SNV, this variant is ignored (this line is skipped). Future versions of ABC4DE may also implement indel impact predictions.

Searching ABC4DE

Once you have pasted or uploaded your SNVs press the Submit button to search the MANTA and DECRES database for putative TFBSs which are impacted and their overlap with predicted regulatory regions. Once the search is complete the results page with appear showing all SNVs which impacted TFBSs and any DECRES predicted regulatory elements.

MANTA Results

The results of the ABC4DE search are provided in a table. For each SNV that impacted one or more TFBSs, it displays the SNV information along with the associated wild-type (reference) and mutated (alternate) TFBS information. If the SNV falls within a DECRES predicted regulatory region, this information is also provided. PLEASE NOTE that information in output lines may be duplicated. For example if an SNV impacts a putative TFBS and also overlaps multiple DECRES features (this will happen if DECRES predicted the same regulatory element for more than one cell type), the output is given as multiple lines where each line contains distinct DECRES information but the TFBS impact information is repeated.

The results table contains the following columns:
  1. Chromosome - Name of the chromosome on which the SNV/TFBS appear
  2. Position - Chromosomal location of the SNV
  3. Ref Allele - The reference allele at this chromosomal location
  4. Alt Allele - The specified alternate allele for this chromosomal location
  5. SNV ID - ID/name of the SNV if the input file format allows for it. Otherwise displayed as '.'.
  6. TF - The name of the transcription factor whose binding site is impacted by this SNV
  7. JASPAR ID - The JASPAR ID of the position weight matrix used to predict the binding site for this transcription factor
  8. Ref TFBS Start - The start position of the predicted TFBS on the reference sequence
  9. Ref TFBS End - The end position of the predicted TFBS on the reference sequence
  10. Ref Strand - The strand of the predicted TFBS on the reference sequence
  11. Ref Abs Score - The absolute (raw) score of the predicted TFBS at this position on the reference sequence
  12. Ref Rel Score - The relative score of the predicted TFBS at this position on the reference sequence given as a percentage
  13. Alt TFBS Start - The start position of the best scoring predicted TFBS on the SNV mutated sequence
  14. Alt TFBS End - The end position of the best scoring predicted TFBS on the SNV mutated sequence
  15. Alt Strand - The strand of the best scoring predicted TFBS on the SNV mutated sequence
  16. Alt Abs Score - The absolute (raw) score of the predicted TFBS at the alternate position on the SNV mutated sequence
  17. Alt Rel Score - The relative score of the predicted TFBS at the alternate position on the SNV mutated sequence
  18. Impact Score - A measure of the impact of the SNV on the predicted TFBS
  19. Cell Line - The cell line for which the DECRES prediction was made
  20. DECRES Feature Type - The DECRES predicted feature type, given as either 'A-E' (active enhancer) or 'A-P' (active promoter)
  21. DECRES Feature Start - The start of the DECRES predicted feature
  22. DECRES Feature End - The end of the DECRES predicted feature
  23. DECRES Prediction Score - The DECRES predictions score. This is the the largest posterior probability p(y|x) versus the background.

The table may be sorted on any column by clicking on the column header. Clicking the same column header a second time will reverse the sort order on that column.

For a more detailed explanation of the DECRES enhancer/promoter prediction methods please refer to: http://biorxiv.org/content/early/2016/02/28/041616