HGDP Selection Browser Help


Introduction

The HGDP Selection Browser is a tool for exploring the genetic signatures of natural selection in the human genome. The underlying data was generated by Li et al. (2008); it consists of 938 individuals from 53 populations genotyped on the Illumina 650K platform. Details about the populations contained in this panel are available in Li et al. (2008) and Rosenberg et al. (2002); we refer the reader to these publications for further information about these aspect of the data.

We have calcuated summary statistics regarding haplotype structure, and population differentiation on this data; the browser contains these summaries as well as various tools for visualizing the data.


Navigating the browser

To find a genomic region/gene of interest, use the search box. The data are searchable by three landmarks: chromosomal region, SNP, or gene. To search by chromosomal region, enter the chromosome number preceded by "chr" and the genomic coordinates separated by two periods--for example, "chr12:13000000..13100000". To search by gene name, simply enter the name in the search box; for example, "KITLG". To search by SNP, enter the rs number.

This search will bring up the region requested, along with the summary statistics. By default, these are Fst, XP-EHH, and iHS. Fst was calculated on the level of population groupings identified by Rosenberg et al. (2002); that is, if a SNP has high Fst, most of the variance in allele frequncies is captured by the seven labels identified in that paper. In the browser, plotted is the -log10 of the empirical p-value for each SNP--the higher the value, the more extreme (high) the Fst value is compared the the rest of the genotyped SNPs. iHS was calculated as in Voight et al. (2006) and smoothed across windows. Plotted is the -log10 of the p-value for a window centered at the SNP--high values again indicate potential signals of positive selection. XP-EHH was calculated as in Sabeti et al. (2007). Again, the plotted measure is a measure of how extreme a SNP is with regards the to rest of the genome, and high values indicate outliers potentially due to the action of natural selection.

iHS and XP-EHH have been calculated in the groupings noted in the caption--these are Bantu-speaking populations, Europeans, Middle Easterners, Central Asians, East Asians, Americans, and Oceanians. For the same statistics calculated on the level of individual populations, click on the track. This will pull up plots of the statistic calculated in individual populations.


Allele frequencies

The browser contains two tracks showing the positions of single nucleotide polymorphisms (SNPs)--those genotyped on the Illumina platform and those present in the HapMap. By default, only the positions of genotyped SNPs are shown (check the box labeled "HapMap SNPs" and update the view). SNPs are color-coded according to their annotation in dbSNP build 128--red indicate nonsynonymous SNP, green synonymous, yellow intronic, SNPs that fall in UTRs are orange, those that are within 2kb of a gene are blue, and intergenic SNPs are black. SNPs for which no information was available are white. Note that some SNPs appear as genic despite not falling by a gene according to our database; these are likely SNPs that are annotated as falling in predicted genes that are not present in the Entrez database being used here.

Clicking on a SNP will pull up a pie chart and bar chart of the allele frequencies worldwide. If the ancestral state of the SNP can be inferred, the alleles are labeled as such; if not, one allele is starred. Note that allele frequencies of SNPs present in the HapMap but not present on the Illumina chip have been imputed using fastPHASE. The error rates of these imputed allele frequencies varies greatly across SNPs and populations; we do not advise placing too much faith in their quality.


Haplotype plots

In the panel below the tracks are buttons for the generation of haplotype plots like those seen in Conrad et al. (2006). Clicking on the button for "Continents" or "Populations" will bring up haplotype plots either in the seven groupings identified by Rosenberg et al. (2002) or by individual population. The plots are a way of visualizing haplotype patterns in a genomic region--each row in the plot is a haplotype, and each column is a SNP. The rows are colored so that all haplotypes of the same color are identical. The algorithm for generating the plots is described in Conrad et al. (2006), and requires the definition of a "core" region. The plots are somewhat sensitive to the choice of the core region; the user can alter the choice of core by means of the pull-down menu.

For a view of the haplotypes that is less sensitive to the choice of core region, use the "Raw data" button. This will draw haplotype plots without the addition of colors. The haplotypes within each population are roughly ordered with respect to identity, and plotted with the derived allele in red and the ancestral allele in black. SNPs where no ancestral information is available are plotted in white and grey.


References

Conrad et al. (2006). A worldwide survey of haplotype variation in the human genome. Nature Genetics 38:1251-1260.

Li et al. (2008). Worldwide human relationships inferred from genome-wide patterns of variation. Science 319(5866):1100-1104

Rosenberg et al. (2002). The gentic structure of human populations. Science 298(5602):2381-2385.

Sabeti et al. (2007). Genome-wide detection and characterization of positive selection in human populations. Nature 449:913-918.

Voight et al. (2006) A map of recent positive selection in the human genome. PLoS Biology 4(3):e72.