User: Anonymous User ( Login | Register )

Prettybase

The following is a brief discussion of the Prettybase format, developed by the SeattleSNPs project for easily representing SNP information for a groups of subjects over a given span of reference sequence. This format is used for input to most of the tools on this website.

  • Rows with 4 fields separated by whitespace - snp offset, subject id, allele1, allele2.
  • Missing alleles as N or -
  • Indel alleles can be "-" and "+", or "-" and the actual inserted allele (e.g. ATGTC)
  • If subject IDs begin with a letter, it will be assumed that this letter indicates the population to which this subject belongs. If a particular tool is configured to group subjects by population, this character will be used to perform the grouping.

Example:

The following example shows a prettybase file with 3 loci (000111, 000222, and 000333) in it. The names of these loci correspond to the relative offsets of the SNP locus. Whether these offsets are relative to an absolute chromosome location, a transcription start site, or some reference sequence is irrelevant, as long as they are all within the same coordinate system.

Each locus has been genotyped in 3 (A001, A002, and A003) subjects. Subject names must begin with a single letter which indicates the population to which this subject belongs. Here, "A" indicates that these subjects belong to our "Asthmatic" Population. Some of the tools can use this identifier to split populations and analyze populations seperately. Other than this splitting, this identifier will only effect the labels on certain plots and tables. Other population identifiers include:

  • A: Asthmatic
  • D: African American
  • E: European American
  • H: Hispanic American
  • S: Cases
  • L: Controls
  • F: Combined
  • C: Cell Line
  • X: Unknown

The prettybase file described above would look like this:

000111    A001    A    C
000111    A002    C    C
000111    A003    A    A
000222    A001    T    G
000222    A002    G    G
000222    A003    T    T
000333    A001    C    A
000333    A002    A    A
000333    A003    A    A

An example of a prettybase document "in the wild" can bee seen here