User: Anonymous User ( Login | Register )


  1. General Questions
  2. Bioinformatics
  3. Laboratory Methods
  4. Statistical Methods
  5. Epidemiology
  6. SNPper
  7. Citation
  8. Haplotypes
    1. Q: What is a PGA?

      A
      On September 30, 2000, the NHLBI launched the Programs for Genomic Applications (PGAs). This program is a major initiative to advance functional genomic research related to heart, lung, blood, and sleep health and disorders.

      The goals of the PGAs include developing information, tools, and resources to link genes to biological function on a genomic scale. All the information, reagents, and tools developed in the PGAs will be freely available in a timely manner to the research community.

      More information


    2. Q: What are the aims of this PGA?

      A
      The goal of the IIPGA is to discover and model the associations between nucleotide sequence variations, primarily Single Nucleotide Polymorphisms (SNPs) and Insertion Deletion polymorphisms (Indels), in the genes of the innate immunity pathway in humans.
    3. Q: Why innate immunity?

      A
      Asthma, chronic obstructive pulmonary disease, myocardial infarction, and deep venous thrombosis are common, complex diseases associated with significant morbidity and mortality. An important factor in the pathogenesis of all four conditions is the development of a local inflammatory process in which the innate immune system plays a major role.
    4. Q: Why a private site?

      A
      While many of the major deliverables for the IIPGA project are to be made public over the WWW, there are a range of collaborative processes which we believe can be supported using internet technologies. The IIPGA participants are principally located in two distant cities and the costs of physical meetings are high both in terms of travel costs and inconvenience. The internet offers opportunities for collaborative activities which go well beyond the already well established electronic mail traffic among the investigators. In addition, the use of an internet site facilitates the archiving of documents in searchable form, viewable from any internet enabled PC by authorised users. Email messages are often hard to locate after a month or two whereas an archived discussion forum is a more or less permanent record of discussions and of decision making. Similarly, while a document which needs to be shared can be emailed to all participants, it is arguably more efficient and certainly more secure to make it available from a secured web site such as this. One of the facilities available on this web site is a new form of collaborative document which allows viewers to edit the pages and add new pages. There is a link to the main collaborative document on the menu bar at the right of this page.
    5. Q: What facilities are available?

      A
      This site is relatively new and will evolve to suit the needs of participants. We have prepared a number of facilities which we believe will prove useful. We will respond to demand from users - if there is a facility you would like to use, please let the webmaster know.
    6. Q: What technologies are in use?

      A

      This is an SSL-protected web site using OpenSSL, mod_ssl and the Apache web server. All communication between your PC and the server is encrypted using 128-bit keys, making eavesdropping virtually impossible.

      The pages you are viewing are all generated dynamically by an open-source web application framework called Zope. Silly name, powerful software. It enables the entire site to be scripted so that a consistent UI is easy to impose. Zope is mostly Python with some c code for speed. It comes with it's own web server which talks to Apache via FastCGI. It has an Object Data Base and it's own ORB. It's easy to connect to LDAP for authentication and to Oracle or MySQL databases for RDBMS support.

      Extending Zope is trivial using python scripts. With a little more effort, an entirely new Zope Product can be readily built which literally drops into place as a new Zope extension. It has a very active developer community and is professionally supported by Zope Corporation (Zope is free to use, but of course you have to pay them for consulting or projects).



    1. Q: What is the Bioinformatics group?

      A
      The Bioinformatics Group for the IIPGA is supported within the larger context of the Channing Laboratory IT infrastructure and is serviced by a mix of dedicated personnel and part time personnel, including support from The Children's Hospital Bioinformatics group. The general role of bioinformatics within the IIPGA is to add value to IIPGA data and the goal of the group is to automate as much as is possible in the SNP discovery data pipeline so as to improve the quality, speed and efficiency of the whole process. The Bioinformatics Group makes use of tools and resources developed elsewhere and in return offers some of it's own freely for use by other groups.
    2. Q: What are the major responsibilities of the group?

      A
      • Development, management and maintenance of the core LIMS (OraGen) used for specimen tracking, genotyping and phenotype management in the Genotyping Laboratory.
      • Data curation and data transformation for analysis.
      • Oracle RDBMS management, disaster recovery planning, backup and management
      • Public and private Web based resource development, management and maintenance.
      • Related systems and application programming.

    3. Q: What hardware infrastructure is in place to support the group?

      A
      Our philosophy is to make use of existing Sun based Channing infrastructure while developing experience and expertise in the use of commodity PC hardware for clustering and related high performance and high availability computing applications. Client machines are a mix of Windows PCs, Sun workstations and servers, Linux on PCs. Backend machines include Oracle production and development servers, a main number crunching server and various fileservers.
    4. Q: What software is being used?

      A
      In general, Open source software is preferred over proprietary or closed source software wherever the two offer similar functionality. Reusable objects are preferred for all new programming tasks. System software is a mix of Sun Solaris, Windows 95 though 2000 and RedHat Linux. Infrastructure and services software includes Oracle for all database requirements, Apache, OpenSSL, Zope and Python for all Web applications and a mix of Python mainly, some Perl and SAS for scripting.
    5. Q: What specific tools are being used for this PGA?

      A
      SNP discovery software from Deborah Nickerson's Laboratory (Perl scripts, Phred, Phrap, Polyphred and Consed) are used in the sequencing operation. A wide range of freely available genetic statistical tools such as Arlequin, Phase, Solar, Pedcheck are used. Where necessary, special purpose scripts (eg for converting Nickerson Lab. 'prettybase' files into Arlequin and Phase input files). A substantial dynamic web infrastructure has been built with Zope and Python to support collaboration and dissemination activities.
    6. Q: What measures are in place for quality control?

      A
      Automation is used whenever possible to avoid human error which is particularly problematic for boring or repetitive tasks.
    7. Q: What is the difference between Phase and Phase 2?

      A
      Dr. Ross Lazarus: Phase 2 is a more recent release. The algorithm has changed somewhat - in theory, it's better. Unfortunately, with 23 or 24 subjects, there's considerable uncertainty for any statistical haplotyping method. That's one reason I (personally) am exploring other, less variable, tagging methods...
    8. Q: I noticed that the htSNPs selected by the programs have changed over the last few months, why is this and will the dataset be continually updated and if so how often?

      A
      Dr. Ross Lazarus: They shouldn't change, but I know they sometimes do. When we rerun our pipeline after fixing bugs, the whole process is rerun over every gene. I hope this doesn't break any of your work ...
    9. Q: When you follow the link for 'Phase output for Europeans only' it gives different tagging SNPs for the European American group, so what exactly is the population used for this output?

      A
      Dr. Ross Lazarus: The reports which show haplotypes for both European and African americans use all SNPs at 10% or greater MAF in either sample - so the SNPs are different (and so are the tags!) from SNPs at 10% or greater MAF in europeans only. This is one of the many nasty wrinkles we have to deal with - in trying to make the haplotypes comparable between races, we use a common set of SNPs but for designing experiments, it's better to only use SNPs at the specified MAF from the target race.




          1. Q: What is SNPper?

            A

            SNPper is a program for the retrieval, analysis and display of human SNPs extracted from public databases (dbSNP, TSC, goldenPath).


          2. Q: What requirements are there to run SNPper?

            A

            SNPper is accessible via a web interface, and is therefore useable from any relatively modern web browser.


          3. Q: Where is SNPper?

            A

            http://sppper.chip.org/


          4. Q: What input is required by SNPper?

            A

            A gene(s), a list of SNP names, or a region of the genome.


          5. Q: What is the output?

            A
            Extensive information on all the SNPs in the region of interest. The output also includes information about genes, annotated sequences, graphics and data export in several formats.
          6. Q: Who should I contact for more information about SNPper?

            A
            Alberto Riva
            Children's Hospital Informatics Program
            Enders Building, room 561
            320 Longwood Ave, Boston MA 02115
            Email: [email protected]

          7. Q: Is a license required?

            A

            No. Registration is suggested but optional.


          8. Q: How should I cite SNPper?

            A
            Riva A and Kohane IS, A Web-Based Tool to Retrieve Human Genome Polymorphisms from Public Databases AMIA 2001 Annual conference, Washington DC, November 2001
          9. Q: Is the source made available?

            A
            No.

          1. Q: Is there a general policy on data release?

            A
            Yes. The NIH policy mandates a maximum of 60 days between discovery and publication. In practice, the delay between discovery (dated from completion of quality control) and publication on the public website is currently of the order of 10 to 20 days for both SNP and genotype data.
          2. Q: Are there specific policies for release of SNP discovery information?

            A
            Yes. Following quality control review, raw data from sequencing and SNP identification is transferred from the sequencing facility in Tucson to the Bioinformatics group in Boston, where it is processed through the IIPGA Bioinformatics pipeline and published in detail on the private website for review by all IIPGA investigators.
            The reference sequence, raw genotype data for each subject sequenced and SNP flanking sequences, sample specific tables of allele and genotype frequencies and counts are prepared and published for review within a few hours of transfer of the raw data from the Sequencing facility. Haplotype inference and pairwise LD estimation involve substantial computing time and may take several days to complete. These are published for review as soon as they become available.
            Within one week of internal review release, all material is made available on the public website
            To ensure that visitors can quickly locate newly published material, the public site home page is automatically updated to show all SNP released within the previous 30 days. A wide range of prepared reports are available for all published SNP, but in order to ensure that visitors can obtain SNP data in formats which suit their individual needs, we make a dynamic web application available which permits site visitors to tailor their own reports. Visitors may choose a variety of report formats and may limit the output to one or more samples and exclude SNP below arbitrary minimum allele frequencies. Like the prepared reports, outputs from these visitor-specified reports may be viewed as web pages or downloaded as text or spreadsheet files.
            Software to prepare material for submission of SNP discoveries to dbSNP is nearing completion. SNP discoveries will be submitted to dbSNP as soon as practicable.
          3. Q: Are there specific policies for release of genotyping data?

            A
            Yes. As soon as genotyping quality control activities are completed, genotype and phenotype data are extracted from the Laboratory Information System Oracle database in the form of SAS and other analysis format files. SAS and other statistical analysis programs are immediately commenced and the outputs sent to the Statisticians for review as soon as they are available.
            Case-control genotyping summary and detailed reports are immediately released on the public site after review and approval.
          4. Q: How do I cite data from this website?

            A
            An excellent question! We're glad you're interested. Published data are freely available, but users are asked to appropriately cite the IIPGA as the source if they make use of the material. The public website has a link to the document which sets out in detail the style and details of citation and acknowledgements requested for use of any data published on the IIPGA public web site. The contents of that document were developed from a model document provided by the PGA Coordinating Committee.
          5. Q: Is there anything else I should know?

            A
            Yes. You should be aware of our "Disclaimer of Liability for IIPGA published data"
            Every web page on the IIPGA public site has a link (at the end of the page) to a formal disclaimer of liability. This document indicates that while every care is taken to ensure that the material is accurate and complete, no guarantees are given and no liability will be accepted for damages arising from any use of the material.

          1. Q: I am new to the concept of haplotypes, and would like some introductory information on their significance with respect to genotype and allele frequencies.

            A

            While we do look at allele and genotype frequency distributions, the problem of loss of statistical power because of the need to adjust Type I error control for multiple statistical tests is an increasingly important problem. In theory, haplotypes may provide better statistical power for situations where many alleles have been typed in a gene. There are many recent publications worth reviewing on this topic:

            Genet Epidemiol. 2002 Oct; 23(3):221-233
            Pharmacogenomics. 2001 Feb; 2(1):11-24
            Genome Res. 1999 Aug; 9(8):720-723
            Hum Hered. 2003; 56(1-3):18-31

            -- Ross Lazarus
          2. Q: Why is analysis of haplotypes important in association studies?

            A

            Remember that transcription takes place at the haplotype level, not the genotype level. If a high risk haplotype encodes a defective protein or possesses a defective transcription binding site or creates or removes a splice set then of course it can have direct biological significance.

            -- Ross Lazarus
          3. Q: Is it possible to determine one's haplotype from genotype information alone?

            A

            It is sometimes possible - but only if an individual has a genotype compatible with only two haplotypes.

            For example, an individual homozygous at 10 loci has two obvious and unambiguous haplotypes! If the same individual were heterozygous at an 11th locus, the situation remains unchanged.

            However, with two or more heterozygous loci, the individual has a (potentially large) set of haplotypes compatible with the genotype, so statistical inference over a sample of individuals is used to estimate the most likely set of haplotypes given the genotypes, and the most likely pair of haplotypes compatible with each individual's genotypes - Bayesian (eg Phase) and EM (eg SNPHAP) are the two most common statistical inference methods. Like most statistical methods, they only apply to a sample of subjects - one individual is not enough unless that individual is unambiguous as described above.

            -- Ross Lazarus




         
        ~ Privacy ~ Usage ~ ~ Sitemap