DDBJ/EMBL/GenBank accession nos Y07727, Y07728
We describe the first haploid minisatellite, the human Y chromosome-specific locus, MSY1. It consists of an array of 48-114 AT-rich 25 bp repeats of at least five different variant types. A minisatellite variant repeat PCR (MVR-PCR) system gives Y-specific DNA codes, with a virtual heterozygosity of 99.9%, making MSY1 by far the single most variable locus on the Y. African populations contain the most diverged MSY1 structures. MSY1 is the only Y-chromosomal system where the characteristics of large numbers of mutations can be studied in detail: it provides a uniquely powerful tool both for the investigation of mutation in a haploid system, and for the dating of paternal lineages.
The Y chromosome is paternally inherited, and for most of its length escapes from recombination. Thus Y-chromosomal DNA polymorphisms contain information about the histories of paternal lineages (reviewed in refs 1,2). Such information should complement studies using maternally inherited mitochondrial DNA and biparentally inherited markers in the rest of the genome, contributing to our understanding of human history and evolution. This potential is now being realized as a wide variety of new markers becomes available on the Y, from base substitutions (3-7) and some indels (4,8), which can be regarded as unique events, to more rapidly mutating loci such as microsatellites (9-11). This variety can be exploited to address evolutionary questions on different timescales: slowly evolving polymorphisms define groups of chromosomes related by descent, while microsatellites can be used to assess diversity within these groups (12-14) and to distinguish between closely related populations (11).
As well as its potential as an evolutionary tool, the Y chromosome is of interest because of the effect that constitutive haploidy must have on its mutation processes. Loci on the Y are exempt from events in which sequence information is exchanged between alleles from different lineages, and therefore the Y provides a system for the study of intra-allelic mutation processes in isolation. An understanding of both inter- and intra-allelic processes is becoming more important as an increasing number of genetic disorders are shown to be associated with alterations in the structures of specific tandemly repeated loci, from trinucleotide repeat expansions (reviewed in ref. 15) to the generation of certain alleles at minisatellites (reviewed in ref. 16).
Although several microsatellites are now available (10,11,13), and large satellite arrays have been known for some time (17-19), no members of the intermediate class of tandemly repeated loci, minisatellites, have been described on the Y chromosome outside the short arm pseudoautosomal region. Questions about both evolution and mutation processes could be addressed using this class of sequences, and their variability might have applications in forensic and genealogical studies.
Minisatellites (20) consist of tandem arrays of short (10-50 bp), usually GC-rich repeat units, found in the genomes of most higher eukaryotes. As well as variation in the numbers of these units, variation in their sequence also exists; this can be assayed conveniently by minisatellite variant repeat PCR (MVR-PCR; ref. 21), where primers homologous to different repeat types generate products extending to a fixed flanking primer, thus mapping the positions of particular variant repeats along the array in a digital code.
MVR-PCR reveals the enormously high degree of polymorphism at autosomal minisatellites (21-26), and has found application from forensic studies (27) to investigations of the origins of modern humans (28). Through experiments in pedigrees and in sperm DNA, it has also allowed the details of germline mutation processes to be investigated (25,29-31). Mutations to new length arrays occur with a frequency of up to 13%, and most minisatellites studied so far show polarity in mutation, with a high frequency of recombination and gene conversion events occurring at one end of the array, leading to the suggestion that these apparently meiotic inter-allelic exchanges are controlled by a cis-acting element in the flanking DNA (29,30).
We describe here the isolation and characterization of the first haploid minisatellite, the human Y-specific minisatellite MSY1, the generation of hypervariable digital DNA codes for Y chromosomes by MVR-PCR, and a global survey of MSY1 diversity.
The probe 50f2 (DYS7; ref. 32) detects five Y-specific EcoRI fragments (Fig. 1a), corresponding to five distinct loci. The third largest fragment, 50f2/C, can be absent as a polymorphism (ref. 13; male 119 in Fig. 1a). The size of the second largest fragment, 50f2/B, mapping to interval 3E in Yp (33) varies between ~7.5 and 8.5 kb, suggesting that it might contain a minisatellite of limited length diversity.
PCR using primers flanking the repeat array amplifies the polymorphic MSY1 product, as well as a product of 203 bp (Fig. 1b). This is not seen in males who lack 50f2/C, suggesting that a smaller, invariant homologue of MSY1 (DYF155S2) lies close to the 50f2/C locus. In support of this, PCR using a 50f2/C-containing cosmid (M43G8) also yields the 203 bp product (Fig. 1b), and this system thus provides a convenient PCR assay for 50f2/C deletion polymorphisms (13).
A 585 bp HindIII-HaeIII fragment at DYF155S2, homologous to the 890 bp fragment at MSY1, was subcloned from cosmid M43G8 and sequenced (data not shown). Instead of a run of 23 A residues, the homologue has the sequence (A)7C(A)13C(A)6, and, in place of an array of 25 bp repeats, a repeat monomer. Outside these repetitive tracts, the two sequences are identical.
Southern and PCR analysis of DNAs from several unrelated male chimps, gorillas and orangutans detects no intra-specific fragment size variability (data not shown). MSY1 appears to have arisen as a result of a repeat expansion since the human-chimp divergence, and the invariant homologue may represent an unamplified progenitor copy of the minisatellite.
The MSY1 repeats are AT rich (75-80%), and have the potential to form almost perfect hairpin structures (Fig. 2b); this hairpin, rather than the first appearance within the sequence of a recognizable repeat unit, is used to define the repeat unit register. A single repeat at one end of the cloned array (arbitrarily designated 3') differs from the other 14 by two base substitutions (Fig. 2a); cloning has resulted in the deletion of 60 repeats with respect to the source chromosome [number 50 of ref. 10, in hybrid J640-51 (34)].
In order to identify further variant repeats for MVR-PCR, and also to provide partial MVR codes as a test for the fidelity of eventual PCR-generated data, nine alleles were amplified and partially sequenced from flanking primers. Five variant repeats, designated types 1-5, were identified (Fig. 2c), including the two types first seen in the clone; in this nomenclature, the monomer at DYF155S2 is a type 4 repeat. Repeat types 1-4 differ at two sites-a C/T transition at position 3, and a C/G transversion at position 13. The single type 5 repeat, which begins the array in the Biaka Pygmy male 47, differs from a type 2 repeat by a transversion at position 21. A transversion polymorphism was also identified just outside the array: the base 5 bp 5' to the first repeat unit is an A in all cases except m38, who instead has a T (Fig. 2a).
An MVR-PCR system was designed to map into both ends of the MSY1 array from the flanking primer sites (Fig. 2d). The discriminator primers have a 3' homologous sequence, slightly longer than a full repeat unit (27-29 bp) to maximize annealing temperature and thus specificity, and a 5' 20mer `tag' sequence identical to that used elsewhere (21). Although the palindromic and AT-rich nature of the repeats makes assay design difficult, one practical advantage of this locus over diploid minisatellites is that its single allele can be mapped directly, without prior isolation or ablation of alleles. Primers specific for repeat types 1, 3 and 4 discriminate well, yielding three-state MVR codes covering entire alleles, and consistent with the partial codes produced from sequencing data (Fig. 3a and b). Those alleles (including that in the Biaka Pygmy male studied by sequencing) which contain tracts of repeats which fail to amplify (`nulls') using type 1, 3 or 4 discriminator primers (Fig. 3b) are analysed additionally with a type 2 discriminator (`four-state' system: Fig. 3c).
To gain a picture of MSY1 diversity, we determined the codes for 465 Y chromosomes from worldwide population samples (populations described in Materials and Methods). Included in this set were chromosomes representing 20 of 23 haplogroups: these are monophyletic groups defined by polymorphisms such as base substitutions which represent unique or very rare events in human evolution (ref. 1; A. Pandya et al., in preparation).
Because all alleles can be mapped in their entirety, we can examine allele length diversity, as well as internal structure, in detail: this is strikingly restricted (Fig. 4a). All 465 alleles have between 48 and 114 repeats, and 83% have between 58 and 77 repeats. Within haplogroups, we see an even tighter distribution of repeat number (Fig. 4b and c): the 11 haplogroup 3 chromosomes, for example, all have between 89 and 93 repeats. This is in contrast to the situation at autosomal minisatellites [e.g. MS32, where alleles have between 12 and 800 or more repeats (35)]. Though it is possible that a functional constraint, such as the presence of a nearby gene, is responsible, it seems more likely to be due to a combination of mutation process and the short time to a common ancestor for Y chromosomes: insertions or deletions of large numbers of repeats must be rare, and the commonest mutations are likely to involve small, possibly single-step, changes in repeat number.
Figure
Despite this constrained length diversity, we see a very high degree of internal structural diversity: the 465 chromosomes have 386 different codes, implying a very high mutation rate (µ) at MSY1. Although µ is in principle directly measurable, we can estimate it here from the observed diversity by calculating the parameter [theta] (36); for a haploid system, this is equal to 2Neµ, where Ne is the effective population size (ref. 37). An Ne value of 4900 has been published for the Y chromosome (4), but this relies on the diversity of slowly mutating markers such as base substitutions. Minisatellites have very much higher mutation rates, and therefore much of the variability observed has been generated recently, when population size was relatively large. For MS32, for example, measurements of [theta] and µ have allowed Ne values to be calculated which are three to seven times larger than those deduced from conventional loci (38). Based on this argument, we also use a second, ~5-fold increased, value of Ne, calculating µ over the range 4900-25 000. From the observed MSY1 diversity, we calculate [theta] to be 1050, giving a µ of between 2 and 11%; the virtual heterozygosity (He) is then [theta]/(1 + [theta]) = 99.9%, a remarkably high value for a chromosome so renowned for its lack of variability (39). Our sample of males contains many individuals belonging to particular populations (e.g. Basques) and particular haplogroups (e.g. haplogroups 16 and 22), and we expect a more random sample to show even higher diversity. If µ is indeed of the order of several per cent, we expect to observe mutations in pedigree analysis; in preliminary experiments, we have indeed found such mutations, and a full study of these will be published elsewhere (P.G. Taylor et al., in preparation).
A selection of codes representing the range of observed structures is shown in Figure 5. Repeat types tend to be clustered in blocks of up to 64 identical repeats, as opposed to the highly interspersed patterns seen in autosomal minisatellites; this different organization must reflect differences in mutation processes. Different repeat types are distributed non-randomly throughout alleles. With only three exceptions, alleles have type 4 repeats at their 3' ends, which is consistent with the idea of an original amplification of the locus from a type 4 monomer; type 4 repeats never occur at the 5' end, type 1 repeats are never seen at the 3' end, and type 1 are never seen adjacent to type 4 repeats. This organization may be due to the sequential generation of repeat variants at the 5' ends of ancestral alleles, and a subsequent limitation on the mutation process which does not permit extensive scrambling of arrays.
Figure
The concentration of diversity at the 5' ends of alleles suggests that, as in the case of most autosomal minisatellites studied, the mutation process has an element of polarity; however, non-polar processes must also be at work, since heterogeneity is seen in the number of type 4 repeats at the 3' ends of alleles even in closely related chromosomes (e.g. those in haplogroup 3), and also in blocks of repeats deep within arrays. This apparent occurrence of mutational changes throughout the array, and the repeat clustering, makes any attempt at alignment and grouping of codes [as can be done with the autosomal minisatellite MS205 (28)] or the construction of trees from MSY1 codes alone, difficult.
Many alleles (145 of the 465 surveyed here) contain repeats which fail to code at all, or code faintly, in the four-state system, and we assume this is because of the occurrence of additional base substitutions. One sequenced example, the eighth repeat in m121's allele (Fig. 3b), is a type 1 repeat with an additional A -> T transversion at position 11 (data not shown); we have seen no evidence for repeat units other than 25 bp in length. Although these repeats are themselves heterogeneous, as can be seen from their discrete and reproducible differences in intensity, they are here referred to collectively as nulls. Nulls often exist in blocks, presumably deriving from a single mutant repeat by expansion-a phenomenon also seen at minisatellite MS32 (40). The finding of alleles (e.g. m47, see below) with repeat-type content so radically different from the majority is, however, in contrast to the situation at autosomal loci, and reflects the accumulation of base substitutions and thus variant repeats along haploid sub-lineages.
One class of alleles, found in haplogroup 8 chromosomes, shows one or two bands in type 4 coding of normal intensity at their 3' termini (Fig. 3b); other repeats at the 3' and 5' ends of these alleles show very reduced intensity, and the core block of repeats codes as type 3 in reverse, but as null in forward coding, suggesting that these repeats may contain additional mutations: here, we refer to these as null repeats, and a fuller characterization of these alleles is described elsewhere (41).
Where shared codes occur, these are usually within a population and within a haplogroup. Some exceptions, such as a shared 1,3,4 code between a British haplogroup 1, a French haplogroup 22, and an Australian chromosome of unknown haplogroup, seem to be clear instances of homoplasy.
Table 1 summarizes the diversity of codes in eight selected populations; using Nei's unbiased estimator (42,43), all populations have a diversity of >0.95, with the exception of the Surui (0.492 ± 0.081). Populations containing significant numbers of shared codes represent clear cases of geographical or cultural isolation. Nineteen Cook Islanders have only 15 codes, with one trio and two pairs of identical codes. The 35 Basque chromosomes have 23 codes (one set of five, two trios, four pairs) and, in the most extreme case, four out of the 16 Surui males have one code, while 11 have another. For autosomal minisatellites, shared codes within populations have provided evidence for mutation rate heterogeneity (28,30). The code sharing we see at MSY1 is instead likely to be a reflection of population sub-structuring of Y chromosomes.
Basing estimates of diversity simply on the number of codes in a group of chromosomes is often not very informative, and a more useful method is required. We use a second and more general characteristic of codes, modular structure, i.e. the order of different repeat blocks along alleles. For example, the modular structure 1,3,4 is a 5' block of type 1 repeats, followed by a central block of type 3 and a 3' block of type 4. In assessing Y chromosome diversity in this way, we are assuming that the generation of a new block of repeats is less likely than the expansion or contraction of an existing block, an assumption supported by observations of MSY1 diversity within sets of chromosomes whose relatedness is judged using an independent set of loci, multiple microsatellites (data not shown). The number of modular structures within a group is likely to be an underestimate of the true number of classes (MSY1 lineages), since it is clear from the positions of MSY1 codes on the haplogroup tree that identical modular structures (in particular simple structures such as 1,3,4 and 3,1,3,4) have arisen more than once, and these independent occurrences can exist within one population, and perhaps within a single haplogroup. Also, nulls are considered in this scheme as a single repeat class, when it is clear that they are not. The total number of different modular structures we observe is 45; by far the commonest classes are 1,3,4 (169 chromosomes) and 3,1,3,4 (122 chromosomes), and 19 modular structures occur as singletons.
Table 1.
When we examine the diversity of the eight populations in terms of modular structures, we see a greater range of diversity values, with the Basques showing much lower diversity than any other population (0.111 ± 0.049).
How do MSY1 codes correspond with the haplogroups? Table 1 lists the diversities of six haplogroups in terms of codes (all values are >0.9) and of modular structures. Haplogroup 3 chromosomes are geographically quite widespread, with examples from India, Mongolia, Britain and Finland. However, all 11 MSY1 alleles examined here are of the 1,3,4 class, with very low diversity in repeat type number within blocks. In other haplogroups, such as haplogroup 2, where 24 alleles comprise six different classes (h = 0.649 ± 0.071) chromosomes are much more diverse. Haplogroup 1 chromosomes have intermediate modular structural diversity (0.297 ± 0.050; examples are in Fig. 5), and there is geographical structure here, with European haplogroup 1 examples being much less diverse (0.083 ± 0.043) than Indian ones (0.818 ± 0.051), a finding which agrees with microsatellite data on these chromosomes (44). The general picture of MSY1 diversity within haplogroups is consistent with other measures of diversity, such as alphoid block size (10) and microsatellite haplotypes (refs. 14,44; and unpublished observations), and may reflect the relative ages of the different haplogroups; there is no evidence from these comparisons of intra-haplogroup diversities for mutation rate heterogeneity at MSY1.
A comparison of the MSY1 codes of haplogroup 16 chromosomes with published 10-locus microsatellite haplotype data (14) shows that MSY1 has the higher diversity: 56 haplogroup 16 chromosomes have 21 different microsatellite haplotypes; these same chromosomes have 31 different MSY1 codes, belonging to seven different modular classes.
Although they provide robust definitions of related groups of chromosomes, the `unique event' markers suffer from ascertainment bias. Multiallelic markers such as MSY1 offer a better opportunity to assess true Y chromosome diversity, and we have examined this in an intercontinental comparison (Fig. 6), using the criterion of modular structure. Sample sizes for the continents are not equivalent, and for this discussion we set aside singletons, which occur mostly in the larger samples. All five continents share the two commonest modular classes, 1,3,4 and 3,1,3,4, though these are both much rarer by proportion in Africa than elsewhere. Africa, Oceania and the Americas have common modular structures not present elsewhere, and this may reflect the effects of founding lineages and drift. Oceania also has the highest diversity value for modular structures (Fig. 6; h = 0.853 ± 0.021), and this again is likely to reflect pronounced drift.
Figure
The intermediate value of the modular diversity of the African sample is influenced by the large number of chromosomes regarded here as identical (structure: 0,4; see ref. 41). However, it stands out as having a set of modular structures strikingly distinct from those of the other continents (Fig. 6), and these include the sets of alleles (found in haplogroups 6 and 7) lacking type 1 and 3 repeats, and instead containing type 2 repeats and unknown nulls. Trees of Y chromosomes are rooted by determining the ancestral states of the unique and rare event markers which define the haplogroups. Ancestral state information is not available for all of the markers used to construct the tree to which we refer, but, from the data which are available, haplogroup 7 is likely to lie closest to the root. The status of haplogroup 6 is uncertain, but the two haplogroups clearly have related sets of MSY1 structures. The highly diverged structures of MSY1 alleles in these African-specific haplogroups further suggest that they may represent deep branches in the Y chromosome tree. The large proportions of nulls in African alleles may conceal yet greater diversity. The distinctness of African Y chromosomes is in contrast to the picture obtained with some autosomal systems, where non-African diversity is a subset of African diversity (28,45), and which has been taken as evidence for an African origin for modern humans.
Searches for restriction fragment length polymorphisms (RFLPs) (46,47) and sequence variation (3-5,7,39) have demonstrated the low nucleotide diversity of the Y chromosome relative to other parts of the nuclear genome. Base substitutions on the Y may be rare, but they are very useful, defining groups of related chromosomes and allowing trees to be constructed (1,4,5,7). Although reliable, these trees have low resolution. More variable markers, such as micro- and minisatellites, reveal the diversity within haplogroups and can be used to address microevolutionary issues. Since they are variable in all populations studied, they are relatively free from the ascertainment bias which often afflicts other polymorphisms. Here we have described the most variable single locus yet described on the Y chromosome, and carried out a global survey of diversity.
Our survey shows a sharing of modular structures between all five continents, but also shows the presence of specific lineages within continents, for example Africa and Oceania. This geographical specificity has been observed previously with base substitutions (e.g. refs. 3,6,7) and its confirmation here with a hypervariable system emphasizes this phenomenon. It may be an indication of bottlenecks during the founding of populations, the enhanced effect of drift on the Y chromosome with respect to other parts of the genome (through mating practices, for instance) and, in the case of Africa, the persistence of ancient lineages. We do not observe strong evidence for substantial Asian input into the African Y chromosome pool, as has been suggested on the basis of the global distribution of YAP+ chromosomes (48); this might, however, be a reflection of the very high mutation rate of MSY1, which could obscure such relatively deep connections.
If mutation rates at the more variable Y loci could be measured, the ages of particular haplogroups could be estimated provided assumptions are made about population size. Microsatellites have been used in this way in an attempt to date the origin of two base substitutions (DYS199: ref. 6; Tat: ref. 14), but relied upon uncertain estimates of mutation rate; indeed, there is little practical prospect of measuring mutation rates and defining mutation processes accurately at these kinds of loci. Pedigree analysis can be done (49), but, because mutation rates are well below 1% (49,50), this yields very few mutants, and gives no information about the mutational characteristics of individual alleles. Mutation rates and processes at MSY1 should, in principle, be analysable directly by SP-PCR studies in sperm DNA (29), a method which yields allele-specific information and which is ideally suited to this locus, since, unlike autosomal minisatellites, it never passes through female meiosis. Our estimate of µ at MSY1 suggests that these studies should be productive.
A knowledge of mutation rate could then be applied to estimate the ages of particular haplogroups, and also to provide a rationale for the grouping of MSY1 codes independently of other haplotypic information. It should also be possible to model the evolution of MSY1 from a range of starting structures and to ask, for instance, whether we expect to see the very restricted length diversity which we observe. Without direct information about mutation processes, phylogenetic analysis over diverse sets of chromosomes using MSY1 is unwise. We do not know the ancestral state, the rate of contraction or expansion of existing blocks of repeats, the rate of generation of new blocks of existing repeat types, the rate of generation of novel repeat types or the rate of any homogenization processes which might be at work in MSY1 arrays. However, where chromosomes are more closely related, for instance within a haplogroup, and especially where modular structure is invariant, matters should be more straightforward, and informative analyses relating codes will be possible (P.G. Taylor et al., in preparation). This emphasizes the importance of a hierarchical approach, combining rapidly mutating systems such as this one with very slowly mutating polymorphisms.
What mutation processes do we expect to act? Studies at autosomal minisatellites have demonstrated the importance of interallelic events (25,29), which are clearly precluded at MSY1. However, intra-allelic processes are also active at such diploid loci: 15/19 germline mutations analysed at the hypermutable GC-rich minisatellite CEB1 (25) were intra-allelic, and similar events have also been observed in the germline at minisatellites MS32, MS31A and MS205 (29). These processes may include unequal sister chromatid exchange (USCE) and replication slippage. The kinds of structures we observe at MSY1 are completely different from those seen at autosomal loci, and reflect the exclusive action of these processes. The predominant feature appears to be the linear diffusion of variants, probably through slippage within homogeneous blocks of repeats, which is reminiscent of microsatellite mutation. The relative contributions of slippage and USCE are not clear, but the structures we see indicate that USCE must be constrained-exchanges between distant parts of an allele are rare.
Most minisatellites which have been studied, and all those which are detectable using traditional DNA fingerprinting approaches (51), are of the `classical' GC-rich variety; MSY1 is atypical in that it is AT rich. Three other loci of this kind, COL2A1 (52), ApoB (53-55) and FRA16B (56), have been described on autosomes. They share some features with MSY1: a predicted tendency to form hairpin structures, and a domain organization, with like repeats commonly existing as blocks within arrays. ApoB shows evidence of polarized variability (54). All of these loci may also share some mechanisms of mutation, with transiently single-stranded DNA forming stable secondary structures which promote inter-strand misalignment and subsequent expansions or contractions in repeat number. Allele structures at both COL2A1 (52) and ApoB (54,55) are consistent with this model, and do not show evidence of frequent inter-allelic events.
Highly polymorphic Y-specific markers have potential applications in forensic studies (44) in the analysis of male-specific DNA. Such markers would be useful exclusion tools, and, if adequate population data were available, might also have a role in identification. MSY1 can be typed from very small amounts of DNA. Another application is likely to be in the field of genealogy: patronymic surnames are co-inherited with Y chromosomes, and the high variability of MSY1 may give it sufficient resolution to detect genealogical relationships reliably.
These were carried out as described (57,58). Filters hybridized with 50f2 were washed to a stringency of 2* SSC/0.1% (w/v) SDS at 65°C. Cosmid subcloning was into BlueScript II in the host XL1-Blue (both Stratagene).
DNAs from males numbered m1-m120 have been described previously (10,19). DNA from hybrid J640-51 (ref. 34) was a gift from Carol Jones (Denver, CO). Other DNAs were gifts from Jaume Bertranpetit, Anne Cambon-Thomsen, Christine Disteche, Mike Hammer, John Mitchell, Antti Sajantila, Bryan Sykes and Chris Tyler-Smith, or from collections of the authors. Population origins of samples were as follows (two or fewer individuals from a population are grouped under `other'): 162 Europeans-40 British, 35 Basque, 21 Finnish, 12 French, 11 Spanish (non-Basque), seven Saami, four Russian, three German, 23 other; 171 Asians-47 Mongolian, 31 Chinese, 30 Indian, 18 Yakut, 17 Indonesian, 16 Japanese, five Altai, seven other; 53 Africans-13 Kenyan, nine E. Bantu, eight Gambian, four Central African Republic Pygmy, four Cameroon, three San, three Zimbabwean, nine other; 41 Oceanians-20 Australian, 19 Cook Islander, two other; 21 native Americans-16 Surui, five other; 17 of unknown origin.
The arrayed Y-specific cosmid library LL0YNC03 was a gift from Pieter de Jong (LLNL, CA). Filters bearing DNA from 12 480 clones were made according to ref. 59 and screened with 50f2 using standard methods.
Flanking amplifications used 100 ng of genomic or 1 ng of cosmid DNA in the buffer system of ref. 35, with 1 µM primers Y1A+ and Y1B+ (sequences given below) and 0.5 U of Taq polymerase (Advanced Biotechnologies), and were carried out in a 10 µl volume in a Perkin-Elmer-Cetus 4800 or MJR PTC-200 thermocycler. Amplification conditions were 95°C 1 min, 66°C 3.5 min, for 25 cycles (18 cycles for cosmids). The product obtained from the DYF155S2 locus is larger (203 bp) than that described previously (196 bp; refs. 1,13) because the Y1A+/Y1B+ primers have 5' extensions with respect to the original primers.
Taq cycle sequencing of gel-purified MSY1-flanking PCR products was carried out as described (23), using 33P-end-labelled (Amersham) primers. Products from nine males were sequenced, including m38 (Chinese), m119 and m120 (Australian), m47 (Biaka Pygmy) and m101 (German). Sequence data were obtained on up to 15 repeats into the 5' and 12 repeats into the 3' end of the array.
PCR primer sequences were as follows (`tag' moieties shown in lower case): Y1A+, 5'-ACA GAG GTA GAT GCT GAA GCG GTA TAG C-3'; Y1B+, 5'-GCA ACT CAA GCT AGG ACA AAG GGA AAG G-3'; TAG1, 5'-tca tgc gtc cat ggt ccg gaT Gtg tat aat ata cat cat gta tat tg-3'; TAG2, 5'-tca tgc gtc cat ggt ccg gaC ATC ATG TAT ATT ATG TAT AAT ATA CAT C-3'; TAG3, 5'-tca tgc gtc cat ggt ccg gaT Gtg tat aat ata cat Gat gta tat tg-3'; TAG4, 5'-tca tgc gtc cat ggt ccg gaC ATG ATG TAT ATT ATG TAT AAT ATA CAT G-3'; TAG1R, 5'-tca tgc gtc cat ggt ccg gaC ATG ATG TAT ATT ATA CAC AAT ATA CAT G-3'; TAG3R. 5'-tca tgc gtc cat ggt ccg gaC ATC ATG TAT ATT ATA CAC AAT ATA CAT C-3'; and TAG4R, 5'-tca tgc gtc cat ggt ccg gaC ATC ATG TAT ATT ATA CAT AAT ATA CAT C-3'.
Three-state MVR-PCR was performed as follows: 50-100 ng of genomic DNA were used as template in an initial flanking amplification with primers at 100 nM, and MSY1 product was isolated from an agarose gel as a gel slice. The slice was soaked overnight in 500 µl of water, and 2 µl of this was used in each of four (sometimes five-including type 1 reverse coding) 10 µl PCR reactions with either Y1A+ (forward) or Y1B+ (reverse) at 100 nM, plus one of four (or five) 33P-end-labelled (Amersham) discriminator primers (TAG1, TAG3, TAG3R, TAG4R and sometimes TAG1R), also at 100 nM, and 0.5 U of Thermus brockianus (Tbr) DNA polymerase (NBL). The buffer was 10 mM Tris-HCl pH 8.8, 50 mM KCl, 1.5 mM MgCl2, 0.1% (v/v) Triton X-100, plus 200 µM dNTPs and 200 µg/ml bovine serum albumin (Boehringer). Reactions were sometimes supplemented with 0.02 U of cloned Pfu DNA polymerase (Stratagene). Amplification was carried out in an MJR PTC-200 thermocycler (run in `calculated' mode) using an initial denaturation step of 96°C 40 s followed by a primary phase of 94°C 8 s, 64°C 1 min, 68°C 3 min, for three cycles; then by a secondary phase: 94°C 8 s, 68°C 4 min, increasing by 4 s per cycle, for 21-28 cycles. PCR products were run out on 50 cm long 2.5% denaturing polyacrylamide gels (Sequagel, National Diagnostics); autoradiography was at room temperature for 4 h to 3 days.
The concentrations of discriminator primers required here are high (100 nM, cf. as low as 1 nM at CEB1; ref. 25), suggesting that they function inefficiently, probably because the nature of the repeats makes them partly self-complementary. These high concentrations make the use of the 20mer `tag' primer itself (cf. ref. 21) unnecessary. Here, decoupling between detection and amplification, necessary to avoid PCR collapse, is done by increasing annealing temperature after the first three cycles. During these lower temperature (64°C) cycles, the discriminator primers give product extending into the flanking DNA from their specific repeats. Annealing temperature is then raised to 68°C for the remaining cycles, during which the flanking primer and the long discriminator plus tag primer, which at the elevated temperature cannot bind internally within the products of previous cycles, drive the reaction.
The complete set of 465 MSY1 codes is available from the authors on request.
[theta] was calculated using a program written in Microsoft QuickBasic by Alec Jeffreys.
The unbiased estimate of diversity, equivalent to heterozygosity, was calculated according to ref. 42, and standard errors according to ref. 43.
Codes are represented in a font written in Macromedia Fontographer 4.1.4 PM.
We thank John Armour, Pete Corish, Gabby Dover, Neale Fretwell, Matt Hurles, Alec Jeffreys, Turi King, Celia May, Arpita Pandya, Fabricio Santos and Chris Tyler-Smith for help and advice, Jaume Bertranpetit, Christine Disteche, Mike Hammer, Carol Jones, John Mitchell, Bryan Sykes and others who kindly gave cell lines and DNA samples, and Pieter de Jong for the arrayed cosmid library LL0YNC03, which was constructed at the Biomedical Sciences Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA, under the auspices of the National Laboratory Gene Library Project sponsored by the U.S. Department of Energy. M.A.J. was supported by an MRC Training Fellowship and a Wellcome Career Development Fellowship (grant no. 044910), N.B. by the EC [as part of the Network Project: `The Biological History of European Populations' (EC Contract 92-0032)], and P.G.T. by the Wellcome Trust.
Human Molecular Genetics
Pages
Introduction
Results
Isolation and characterization of MSY1
An invariant homologue of MSY1
Identification of variant repeats
Four-state MVR-PCR at MSY1
Allele length diversity
MVR-PCR code diversity and MSY1 mutation rate
Null repeats and repeat type diversity
Diversity within populations and haplogroups
Global Y chromosome diversity assessed with MSY1
Discussion
Materials And Methods
Standard procedures
Genomic DNA samples
Cosmid library
PCR amplification across MSY1
Direct sequencing of alleles
Four-state MVR-PCR
Data analysis
Representation of codes
Acknowledgements
References
Figure
No. of
chromosomesNo. of
different
codesh (codes)
No. of
modular
structuresh (modular
structures)
Population
Basque
35
23
0.967 ± 0.011
2
0.111 ± 0.049
British
40
39
0.999 ± 0.003
6
0.513 ± 0.058
Chinese
31
28
0.993 ± 0.005
7
0.711 ± 0.055
Cook Islander
19
15
0.983 ± 0.021
7
0.790 ± 0.048
Finnish
21
19
0.990 ± 0.037
6
0.638 ± 0.068
Indian
30
29
0.997 ± 0.005
7
0.722 ± 0.043
Indonesian
17
16
0.992 ± 0.012
4
0.684 ± 0.049
Surui
16
3
0.492 ± 0.081
3
0.492 ± 0.081
Haplogroup
1 (all)
67
52
0.989 ± 0.004
6
0.297 ± 0.050
1 (Asia)
11
11
1.000 ± 0.019
4
0.818 ± 0.051
1 (Europe)
47
37
1.000 ± 0.009
2
0.083 ± 0.043
2
24
24
1.000 ± 0.006
6
0.649 ± 0.071
3
11
10
0.982 ± 0.025
1
0
10
10
10
1.000 ± 0.035
4
0.711 ± 0.080
16
57
31
0.936 ± 0.015
7
0.607 ± 0.031
22
28
20
0.950 ± 0.020
2
0.138 ± 0.059
REFERENCES
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 14 Mar 1998
Copyright© Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. H. Vogt AZF deletions and Y chromosomal haplogroups: history and update based on sequence Hum. Reprod. Update, July 1, 2005; 11(4): 319 - 336. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Denoeud, G. Vergnaud, and G. Benson Predicting Human Minisatellite Polymorphism Genome Res., May 1, 2003; 13(5): 856 - 867. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Langdon and J. A.L. Armour Evolution and population genetics of the H-ras minisatellite and cancer predisposition Hum. Mol. Genet., April 15, 2003; 12(8): 891 - 900. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Hurles, J. Nicholson, E. Bosch, C. Renfrew, B. C. Sykes, and M. A. Jobling Y Chromosomal Evidence for the Origins of Oceanic-Speaking Peoples Genetics, January 1, 2002; 160(1): 289 - 303. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




