User: Anonymous User ( Login | Register )

LDPlotter Help


Introduction
The LDPlotter Tool allows conversion of a Nickerson Lab prettybase format file into a plot showing pairwise LD of various types (r², r, D' and abs(D)). r² values are calculated using an iterative EM algorithm taken from "Estimation of LD in randomly mating populations", WG Hill, Heredity:33(2), 229-239 (1974). I think the implementation is correct, but caveat emptor. This software is made available without any kind of guarantee of fitness for any purpose whatsoever. It may not work correctly. If you miss the Nobel prize because of my mistake, tough. On the other hand, if you win one, an acknowledgement during your acceptance speech would be appreciated. If you detect reproducible errors, please immediately

Example
This example will walk through the generation of a plot like the one below:
I. Genename

The gene name can automatically be included in the plot title, and is also used to label the optional SNP Map which is drawn along the diagonal of the LD Plot. You can change the gene name in the gray box below to anything you like:




II. Prettybase

Input is a standard prettybase file. Use the file dialog to locate the prettybase file you would like to use on your computer.



Alternatively, for this exercise, you can use the following data (or modify it if you would like). This data represents a set of three SNPs (located at position 100, 200, and 300 in the gene). Each SNP has been genotyped in six individuals from two different populations (H001, H002, H003, M001, M002, M003), giving us a total of 18 data points:




III. LD Type

Indicates which measure of LD you would like to plot:
r² = D²/PA * PB * (1 - PA) * (1 - PB)
r = sqrt(r²)
D' = D/Dmax
abs(D) = | PAB - (PA * PB) |



IV. Configure Populations

This text area allows you to configure how the LD Plotter will split the samples in the prettybase file into different populations. The field should have one line for each population represented in your dataset. Each line should be a population identifier, followed by a colon, followed by a description of the population. Leaving this textarea empty will indicate that you do not wish to partition your sample set, but instead, would like to consider all of the samples to be part of a single population.




V. Plot Title

The plot title has two modes of operation, one for simple use, and the other for more advanced purposes for users who are familiar with Python format strings, and would to specify exactly what the plot title should look like.

  • Simple: With a simple plot title, we begin with an arbitrary string such as: and then add information about the plot to the heading:
    1. Append Gene name to plotTitle: The gene name entered above will appended to the title of the plot.
    2. Append Population to plotTitle: Each plot will be labeled with the population. If you are splitting your dataset by populations using the Population Configuration above, it is highly recommended that you keep this option checked, otherwise you will not know which plot belongs to which population.
    3. Append LD type to plotTitle: The type of LD measure calculated will be displayed in the plot title.
    4. Append minraf to plotTitle: The minraf used for the run will be displayed in the plot title.
  • Advanced: The advanced title option allows you to input the plot title as a legal python format string. This format string is evaluated against a dictionary of variables/values (plotTitle % variables). At the current time, the dictionary is populated with the variables minraf, population, and ldtype. These are the same variables which are available in the "Simple" option above, but you have the flexibility of arranging the items however you would link in the title. Errors in the title will be displayed in the title itself if at all possible.



VI. Miscellaneous

  • Minraf: You can set an arbitrary threshold for minimum allele frequence. The default value of will not exclude SNPs based on rare allele frequency. If you specify a higher value, SNPs with frequency for the rare allele below this threshold in any population will NOT appear in the plot. More information ...

  • Color Scheme: Several color schemes can be used to indicate the extent of LD between two loci
    Blue
    Blue/Red
    Grayscale
    Red
  • Circles: The extent of LD between two loci is draw by default as a colored square at the intersection point of the two loci. Checking this box will plot colored circles instead.
  • SNP Map: The SNP Map is a representation of the gene running along the diagonal of the half-matrix plot which shows the relative position of each SNP locus in the gene. It is only useful if you are using SNP IDs which directly correlate the position of the SNP in the gene.
  • Full-matrix:
  • Numerical: This option allows you to download a spreadsheet of LD values instead of a graphical plot. This is useful if you would like to process the information using another program, or if you have another plotting program.
  • Download: The download option will indicate that instead of displaying the results in the browser window, you would like to be prompted for a location where the file will be saved on your computer.