Panzea - FAQs

Panzea FAQs

References

Why conserve genetic diversity?

Genetic diversity is essential for continued progress in breeding as well as for adaptation to future environmental challenges. Future environmental challenges include the need to adapt to new pest and disease strains or species, climate change, and pollutants. Maize is the most genetically diverse crop species, and is the most economically important in the U.S. We owe it to future generations to pass on the rich genetic legacy of maize and other crop species largely intact. The Panzea project will catalogue at the genomic scale the genetic diversity present both in maize and in its wild ancestor, teosinte.

What do we mean by "functional diversity"?

The majority of the diversity in the genome of a higher organism like maize is not expected to have any relationship to the fitness or even to the observable phenotypes of individuals. In other words, most DNA polymorphisms are genetically neutral. The overarching goal of the Panzea project is to identify the small minority of DNA polymorphisms in the maize genome that actually cause differences in phenotypic traits of agromomic, developmental, or evolutionary importance. It is the DNA variation at these sites that we consider to be "functional diversity".

What is teosinte?

Teosinte is the wild relative of maize. Together, teosinte and maize compose the genus Zea, which has four species: Z. luxurians, Z. diploperennis, Z. perennis, and Z. mays. Zea mays is in turn divided into four subspecies: ssp.mexicana, ssp. huehuetenangensis, ssp. parviglumis, and ssp. mays (maize). Molecular marker evidence has clearly shown that maize was domesticated from Z. mays ssp. parviglumis (Doebley 1990, Matsuoka et al. 2002).

What are maize landraces?

Maize landraces are forms of maize domesticated by the indigenous peoples of Latin and North America, and adapted to local growing conditions. The initial domestication of maize from teosinte occured about 9000 years ago in southern Mexico (Matsuoka et al. 2002). After that, cultivation of maize spread throughout the Americas, and through artificial selection, indigenous peoples developed landraces adapted to their local growing conditions and crop uses. After the arrival of Columbus, maize cultivation spread to the Old World and additional landraces were subsequently developed in Europe, Africa, and Asia.

What is the germplasm base of maize?

The maize germplasm pool includes wild teosintes, indigenous maize landraces of Latin and North America, Old World landraces and inbred lines used in modern maize breeding.

What is a SNP?

The acronym SNP stands for Single Nucleotide Polymorphism. This refers to a particular nucleotide (or "base") in a DNA sequence that is variable within a species (or between related species). For example, at a certain position in a DNA sequence there may be a C (cytosine) present in some individuals but a T (thymine) present in others. SNPs represent the most basic form of genetic polymorphism. There are tens of millions of SNPs present in the genome of a typical organism. However, usually only a very small subset of these will be developed into genetic markers (SNP markers). Although it is somewhat confusing, the term "SNP" can refer both to a particular polymorphism in the genome (for which a genetic marker may or may not have been developed) or to a marker that has been developed to evaluate (or "genotype") a particular SNP polymorphism.

Why use SNPs?

SNPs are one of many possible genetic markers. Available types of genetic markers include isozymes, RFLPs, RAPDs, CAPS, PCR-indels, AFLPs, microsatellites (SSRs), SNPs and DNA sequence. Each type of marker has its advantages and disadvantages. The main advantages of SNPs are: (1) they are so common and evenly-distributed in the genome, and (2) methods of detecting (or "assaying") SNPs can be easily automated. This ease of automation is what makes SNPs "high-throughput" markers. High-throughput means that large numbers of markers can be quickly assayed in a large number of DNA samples for a small cost per assay. This property is essential in the currentgenomics age. The main disadvantage of SNPs is the small number of alleles typically present. Although, in theory, each SNP marker can have up to four possible alleles (A, C, G, and T), in practice, only two alleles usually are present at any given SNP (e.g., C or T). This is a consequence of the low rate of mutation or base substitution (an analogy is that lightning rarely strikes the same person twice). Microsatellites, in contrast, typically have numerous alleles (from 5 to 40 in maize Liu et al. 2003). However, the large number of available SNPs and their low assay costs (in large scale experiments) overcome the disadvantage of their low variability per marker.

Random vs. candidate genes?

The distinction between "random" and "candidate" genes is of great importance to our project. By random genes we refer to genes which we have chosen to study without any prior knowledge or consideration of the function of the proteins (or RNAs) that they encode. These were selected from a random set of expressed DNA sequences (DNA sequences that are copied, or transcribed, into RNA), so the only thing that we knew for certain at the time of choosing was that the sequences in fact came from genes (as opposed to intergenic sequences). By candidate genes we refer to genes of known or suspected function that are likely to be involved in the control of agronomic or evolutionary traits of interest. Traits of interest to our project include flowering time, inflorescence architecture, cob development, kernel quality, leaf development, plant architecture, and traits that differ markedly between domesticated maize and its wild progenitor species, teosinte. Candidate genes are like "hunches" or educated guesses that we follow up on with additional "detective work" (experimental verification via association mapping). Random genes provide controls for these experiments since they have a very little chance of affecting particular traits of interest. In addition, random genes also provide us (by definition) with a random sample of genes from the across genome that we can use in QTL mapping studies or to answer questions such as "What proportion of genes in the genome were subjected to artificial selection during the domestication of maize?" (Wright et al. 2005).

How can our results be used to develop markers?

In the SNP discovery phase of the Panzea project, sequence alignments were produced for more than 3000 randomly chosen genes and for more than 1000 candidate genes. All of these sequence alignments are available via Panzea in a variety of formats (try a molecular diversity search using marker type 'Sequencing'). The PCR primers of the amplicons corresponding to these sequence alignments are provided -- so reseachers can amplify and sequence additional plants if they wish. Researchers also can utilize our alignments to develop their own SNP, indel, or CAPS markers, or they can download the 'context sequences' for the SNP markers that we have developed and validated. The context sequences were derived from the consensus sequence of our sequence alignments, and show the sequence surrounding the SNP in question, with the target SNP in square brackets and other flanking polymorphisms shown either in curly brackets or with IUPAC ambiguity codes. Currently, context sequences for all of our validated, function SNPs that we have successfully assayed can be downloaded from our datasets page. PCR primer sequences for the SSRs (microsatellites) that we use in our project are also available via the molecular diversity search page, using marker type 'SSR'.

What is an inbred line?

Maize and teosinte - like humans - are naturally outcrossing organisms, which means that matings that are not under direct human control usually occur between two unrelated or distantly related parents. This results in offspring in which the two copies of a gene (one from the maternal parent and one from the paternal, or pollen parent) are often different. Such offspring, containing two different alleles at a gene or locus, are said to be "heterozygous" at that gene. Breeders, farmers and researchers value uniformity and predictability. Hence, a commonly used breeding tactic in crop species is the development of inbred lines followed by crossing between certain inbred lines to produce superior seed for planting. The inbred lines are produced by repeated generations of selfing (achieved through controlled pollination) with each subsequent generation descending from a single seed. Over generations, alleles are lost by chance at those loci that were initially heterozygous, with a 50% chance of loss of an allele each generation. Hence, the resulting inbred lines tend to have the same two alleles present at virtually every gene in the genome. In other words, inbred lines are highly "homozygous" and will almost always pass on the same allele to all of their offspring. Crossing of two inbred lines together leads to "hybrid" offspring that are uniformly heterozygous at every gene that differs between the two inbred parental lines. Breeders look for combinations of inbred lines whose offspring display "hybrid vigor" or "heterosis": superior characteristics due to serendipitous combinations of alleles at the heterozygous loci.

What is QTL mapping?

The acronym QTL refers to Quantitative Trait Locus. A QTL is a chromosomal region suspected to contain a gene (or cluster of genes) that contributes to the variation observed at a quantitative trait. QTLs are detected through QTL mapping experiments. In crop plants, these experiments utilize experimental pedigrees, usually produced from crossing two inbred lines. A commonly used QTL mapping pedigree is the F2 pedigree. The first offspring generation (the F1), resulting from the crossing of the two parental inbred lines, is uniformly heterozygous. However, in the second generation (the F2), formed by intermating among the F1, the parental alleles are segregating and most chromosomes will berecombinant mixtures of the parental chromosomes. Genes and genetic markers that are close together on a chromosome will tend to co-segregate in the F2 (the same allele combinations that occurred in one of the parents will tend to occur together in the offspring). The closer together are two markers or genes on a chromosome, the less likely the parental alleles at the two loci will be split up in the F2 as a result of recombination. This will lead to a statistical association between a gene segregating for alleles that have a measurable difference in their affect on a quantitative trait and segregating alleles at closely linked marker loci. QTLs can thus be localized to specific chromosomal segments if the trait is measured in all the F2 offspring and if all of these offspring are genotyped at hundreds of genetic markers covering the whole genome.

What is association mapping?

As in QTL mapping, the goal of association mapping is to find a statistical association between genetic markers and a quantitative trait. However, in association mapping, the genetic markers usually must lie within (or directly upstream or downstream of) candidate genes suspected to contribute to the variation in that trait, and the goal is to identify the actual genes affecting that trait, rather than just (relatively large) chromosomal segments. Therefore, in order to perform association mapping, you must first make educated guesses as to which genes are likely to have a major effect on the particular trait of interest. In further contrast to QTL mapping, which is performed in the context of a pedigree, association mapping is performed at the population level: the genotypes of the candidate gene markers and the phenotypes of the corresponding trait are determined in a set of unrelated or distantly-related individuals sampled from a population. Association mapping relies on linkage disequilibrium (LD) between the candidate gene markers and the actual causative polymorphism in that gene (i.e., the actual polymorphism that causes the differences in the phenotypic trait). Hence association mapping is also referred to as 'LD mapping'. In natural populations LD will typically extend only short distances - usually less than 1500 bp in maize (Gaut & Long 2003). This is why you must have genetic markers either within or directly upstream or downstream of a candidate gene in order for assocaition mapping to be successful (and the candidate gene must in fact have a measurable effect on the trait). Since population genetic structure (genetic differences that accumulate between isolated populations) can cause LD even at loci that are on different chromosomes, association analyses must account for population genetic structure whenever it is present in the population from which your sample has been drawn (Pritchard et al. 2000; Thornsberry et al. 2001).

What are the main distinctions between QTL and association mapping?

The main differences between QTL and association mapping are: (1) the level of resolution (in terms of distance along the DNA or chromosome), and (2) the level of generality (in terms of the number of traits that can be studied with a given set of markers). (1) QTL analyses resolve the locations of genes (or gene clusters) influencing a trait down only to the level of chromosomal segments between one to 20 cM in size (roughly one million to 20 million base pairs). Association analyses, in contrast, can provide roughly three to four orders of magnitude finer resolution on the chromosomal scale, down to the level of the actual causative gene (i.e., within thousands, or even hundreds, of base pairs). (2) QTL experiments are more general than association analyses in the sense that, in a QTL experiment, the same set of marker genotypes from a pedigree can be used to examine a wide variety of traits, the only requirements being that the trait is variable in the offspring and that some of this variation is due to fairly strong genetic effects from a limited number of genes or chromosomal segments. Further, in a QTL study, which genes the genetic markers used come from (or whether ithe markers come from genes at all) is irrelevant: all that matters is that the markers provide near complete coverage of the genome, without any large gaps (usually about 300 markers will suffice). Hence the same set of markers can be used for many different traits in a QTL analysis. Association analysis, in contrast, is "tailor-made" for specific traits, and the markers that we use in an association analysis must come from candidate genes thought to affect that trait. Each trait will have its own specific set of candidate genes. Another contrast between the two approaches is the level of control over extraneous factors that can lead to false-positive or confounding results. In association analysis, extraneous factors such as population genetic structure or population history, if not properly accounted for, can cause false-positive results. In QTL analysis, the use of experimental pedigrees provides much greater control over such factors. However, improvement of statistical approaches to association analysis to better account for such factors is an active area of research, both within this project (see TASSEL) and elsewhere.

What are RILs?

The acronym RIL stands for Recombinant Inbred Line. These are produced to form a permanent and stable QTL mapping resource. In the first step of the development of RILs, two parental inbred lines are crossed (mated) together to form a uniformly heterozygous F1 generation. The F1 are intermated (or selfed) to form an F2 generation; most individuals in the F2 will contain recombinant chromosomes resulting from crossovers between the two purely parental chromosomes present in each F1 plant. The parental alleles are said to be segregating in the F2 generation, since it is a matter of chance just which of the three combinations of parental alleles (A/A, A/B, or B/B) will occur in a given F2 plant. Numerous individuals from the segregating F2 generation then serve as the founders of corresponding RILs. Each subsequent generation of a given RIL is formed by selfing in the previous generation and with single seed descent. In this manner each RIL, after several generations, will contain two identical copies of each chromosome, with most of them being recombinant. Each individual RIL will contain a different mix of recombinant and parental chromosomes, with a unique set of recombination breakpoint locations across the genome. Taken as a group, the set of RILs form a segregant QTL mapping population which can be stably regenerated year after year via single seed descent.

How is the set of RILs generated in this project (the maize NAM population) useful?

The Panzea (Maize Diversity) project has generated a set of 5000 RILs grouped into 25 families. The 25 RIL families were derived from the F2 of crosses between the elite inbred line B73 and 25 other diverse inbred maize lines (a list of the 25 diverse maize inbred parental lines can be obtained here). This set of 5000 RILs, (along with the pre-existing IBM RIL population derived from a cross of B73 and Mo17) comprise an immortalized QTL mapping resource for the maize community. We refer to these RILs as our "maize Nested Association Mapping (NAM) population" (see What is NAM? below). We have genotyped these 5000 RILs (and the IBM mapping population) with 1106 SNPs from both randomly chosen and candidate genes. These SNP genotypes (available here) provide comprehensive coverage of the genome for the purpose of gene mapping either via the NAM approach or via conventional QTL mapping. Hence, any maize researcher interested in performing a QTL or NAM analysis in these RIL populations will not need to do any further genotyping. They will only need to acquire the seed, plant it out in the field (in an appropriate experimental design) and then measure their phenotypic traits of interest. SEED ARE NOW AVAILABLE for 4,821 of the 5,000 RILs from the Maize Genetics Cooperation Stock Center. Seed for the remaining 179 RILs will be available within one year (by Spring of 2009). For those who would like to use the NAM population to map genes for their own traits of interest, but who lack the resources to grow the entire population, PUBLIC GROW-OUTS will be held in Ithaca NY in 2008, in Columbia MO in 2009 and in Raleigh NC in 2010 - click here for more details.

What is Nested Association Mapping (NAM)?

Our collection of 5000 RILs in 25 populations (plus the pre-existing IBM RIL population derived from a cross of B73 and Mo17) is collectively referred to as our "maize Nested Association Mapping (NAM) population". NAM is a new approach to the mapping of genes underlying complex traits, in which the statistical power of QTL mapping is combined with the high (potentially gene-level) chromosomal resolution of association mapping. The RILs are "nested" in the sense that they all share a common parent, B73, but each population has a different alternate parent. The NAM strategy consists of genotyping a feasible number of (e.g., about 1000-2000) common parent specific SNPs in the entire mapping population, in combination with much higher resolution genotyping of the 26 parents (e.g., by sequencing the entire genome of all 26 parents). The common parent specific SNPs are either present only in B73 or are present in B73 and rare in the other parents. These are used to classify each chromosomal segment that they define in each RIL according to whether it derives from B73 or from the corresponding alternate parent. In this manner, the high resolution genotypic sequence data can be projected from the parents onto the RIL offspring, without the need to sequence all the offspring. An association analysis can then be performed across the entire population. This takes advantage of historical recombination in the ancestors of the 26 parents in order to map the genes responsible for a given trait, potentially down to gene-level resolution. The NAM concept is further explained and demonstrated via computer simulation in Yu et al. (2008).

How were the parental lines of the the maize NAM population chosen?

The common parental line used in all 25 families is B73, the most important U.S. corn breeding line: descendents of B73 are widely deployed in U.S. production corn agriculture (and the B73 genome is being sequenced). Hence, the Panzea (Maize Diversity) project will provide an unprecedented understanding, via QTL mapping, of the genetic basis of the agronomic superiority of the B73 line. The remaining NAM parental inbred lines were chosen either on the basis of their agronomic importance in the U.S. or to capture as much of the genetic diversity present in maize as possible. The addtional constraint of not being maladapted to U.S. environmental conditions was also applied. These choices were made based on DNA marker data across 94 microsatellite (SSR) loci genotyped in a broad set of 260 temperate, subtropical and tropical maize inbred lines, and on the basis of agronomic performance of the same 260 lines in plantings in Florida and North Carolina (Liu et al. 2003). A list of the 25 diverse maize inbred NAM parental lines can be obtained here The broad sample of diversity captured in our NAM RIL germplasm resource will provide the maize research community with the opportunity to map genes involved in almost any trait of agronomic or scientific interest.

Genetic Diversity

Functional Diversity

Teosinte

Maize Landraces

Maize Germplasm

SNPs

Why SNPs?

Random vs Candidate Genes

How to Develop Markers

Inbred Lines

QTL Mapping

Association Mapping

QTL and Association Mapping

RILs

Use of RILs

NAM

Parental Lines

References

Doebley, J. F., 1990 Molecular evidence and the evolution of maize. Economic Botany 44(3, supplement): 6-27.
Gaut, B.S. and A.D. Long, 2003 The lowdown on linkage disequilibrium. Plant Cell 15: 1502-1506. [Article]
Liu, K., M.M. Goodman, S. Muse, J.S.C. Smith, E.S. Buckler, and J. Doebley, 2003 Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites. Genetics 165: 2117-2128. [pdf]
Matsuoka, Y., Y. Vigouroux, M.M. Goodman, J. Sanchez G., E. Buckler and J. Doebley, 2002 A single domestication for maize shown by multilocus microsatellite genotyping. Proceedings of the National Academy of Sciences USA 99: 8060-8064. [pdf]
Pritchard, J.K., M. Stephens, N.A. Rosenberg and P. Donnelly, 2000 Association mapping in structured populations. American Journal of Human Genetics 67: 170-181. [Abstract]
Thornsberry, J., M. Goodman, J. Doebley, S. Kresovich, D. Nielsen and E. Buckler, 2001 Dwarf8 polymorphisms associate with variation in flowering time. Nature Genetics 28: 286- 289. [pdf]
Wright, S.I., I. Vroh Bi, S.G. Schroeder, M. Yamasaki, J.F. Doebley, M.D. McMullen and B.S. Gaut, 2005 The effects of artificial selection on the maize genome. Science 308: 1310-1314. [Article]
Yu, J., J.B. Holland, M.D. McMullen and E.S. Buckler. 2008. Genetic design and statistical power of Nested Association Mapping. Genetics 178: 539-551. [pdf]

References