DnaSP Information

DnaSP v5
Julio Rozas et al.

DnaSP is a software package for Windows that performs extensive population genetics analyses from DNA sequence data. DnaSP estimates several measures of DNA sequence variation within and between populations (in noncoding, synonymous or nonsynonymous sites), linkage disequilibrium, recombination, gene flow and gene conversion. DnaSP can also carry out several tests of neutrality: those of Hudson, Kreitman and Aguadé (1987), Tajima (1989), McDonald and Kreitman (1991), Fu and Li (1993), and Fu (1997) tests. Additionally, DnaSP can estimate the confidence intervals of some test-statistics by the coalescent.

System Requirements

Hardware:
IBM-Compatible PC; 32 Mb RAM
Operating System:
Windows 95 / 98 / NT / XP / 2000 / Vista

DnaSP can also run on Apple Macintosh platforms (using VirtualBox, VMWare Fussion, Parallels Desktop or Virtual PC), and on Linux and Unix-based operating systems (using VirtualBox, VMWare or Wine).

DnaSP User Interface

DnaSP provides a standard Microsoft Windows user interface with several commands to import, export, view, print and save data files, results or graphs. DnaSP also provides a standard Microsoft windows help file with instructions and descriptions of all programs and commands.

Input and Output

DnaSP can automatically read or write (export) five types of file formats: MEGA (Kumar et al. 1994), NBRF/PIR (Sidman et al. 1988), NEXUS (Maddison et al. 1997), FASTA, and PHYLIP (Felsenstein 1993). In all cases one or more homologous aligned nucleotide (DNA or RNA) sequences should be included in a text file. The total number of sequences and the sequence length that can be handled by DnaSP mainly depend on the available memory, which can analyze data files with a large number (thousands) of sequences of thousands of nucleotides each. The output is displayed in windows with text, tables, grids (the output data are laid out in rows and columns as on a spreadsheet) and graphs. The output can either be sent to the printer (any Windows printer driver) or be saved in a file.

DnaSP modules and commands

DnaSP allows the analysis in a subset of sites, or in a subset of sequences of the data file. DnaSP also allows analyses in synonymous and nonsynonymous sites (there are nine predefined genetic codes), or in various sorts of codon positions (in zero- two- and four-fold degenerate codon positions; in the first, second and third codon positions). Additionally, DnaSP can perform several analyses by the sliding window method (the option can be used to obtain a graphic representation of the pattern of change of a specific parameter along the sequence).

Polymorphic Sites
This command displays general information about the polymorphisms in the data file: the number of sites with alignment gaps (or missing data), the number of monomorphic sites, the number of polymorphic sites segregating for two, three, or four nucleotides, the number of parsimony informative sites, the number of synonymous and nonsynonymous polymorphisms, etc.

DNA Polymorphism
This command computes several measures of the extent of DNA polymorphism and their variances. DnaSP estimates i) the average number of nucleotide differences per site between two sequences, or nucleotide diversity, Pi (Nei 1987, equations 10.5 or 10.6), and its sampling variance and standard error (Nei 1987, equation 10.7); ii) the nucleotide diversity using the Jukes and Cantor correction (Jukes and Cantor 1969; Lynch and Crease 1990, equations 1-2); iii) the nucleotide diversity by pairwise-deletion; iv) the average number of nucleotide differences, k (Tajima 1983, equation A3) and its stochastic, and sampling variances (Tajima 1993, equations 13-18); v) Theta = 4Nu, where N is the effective population size, and u is the mutation rate per nucleotide (or per sequence) and per generation (Nei 1987, equation 10.3; Tajima 1993, equation 3) and its variance for free and for no recombination (Tajima 1993, equations 4 and 8); vi) Theta per nucleotide under the finite sites model (Tajima 1996, equations 9-10, 16).

DNA Divergence Between Populations
This module allows computation of several measures of the extent of DNA divergence between populations. DnaSP computes the nucleotide diversity of each population, the average number of nucleotide substitutions per site between populations, Dxy (Nei 1987, equation 10.20), and the number of net nucleotide substitutions per site between populations, Da (Nei 1987, equation 10.21). DnaSP can estimate these parameters and their variances using the Jukes and Cantor method (Nei 1987, equations 10.20 - 10.24).

Synonymous and Nonsynonymous Substitutions
This program estimates Ka (the number of nonsynonymous substitutions per nonsynonymous site), and Ks (the number of synonymous substitutions per synonymous site) for any pair of sequences (Nei and Gojobori 1986, equations 1-3). DnaSP can estimate the nucleotide diversity for synonymous, nonsynonymous and silent (both synonymous and noncoding positions) sites. Nine pre-defined genetic codes can be used, among others: the universal nuclear code, and the mitochondrial code of Drosophila, mammals and yeast.

Polymorphism and Divergence
This module allows the analysis of the extent of DNA polymorphism and divergence in synonymous, nonsynonymous and silent (both synonymous and noncoding positions) sites. The analysis can be performed separately for noncoding, exonic or intronic regions (Jukes and Cantor 1969; Nei 1987; Nei and Gojobori 1986).

Codon Usage Bias
This module estimate some measures of the extent of the nonrandom usage of synonymous codons. DnaSP computes the RSCU, Relative Synonymous Codon Usage (Sharp et al. 1986), ENC, the Effective Number of Codons (Wright 1990), the CBI, Codon Biax Index (Morton 1993), the Scaled Chi Square (Shields et al. 1988). Additionally DnaSP can also estimate the G+C content at coding, noncoding positions.

Gene Conversion
DnaSP incorporates the algorithm developed by Betrán et al. (1997) to detect gene conversion tracts from two differentiated populations (or subpopulations). These subpopulations could be, for example, two different chromosomal gene arrangements (Rozas and Aguadé 1994), or two sets of paralogous sequences. DnaSP also estimates the parameter Psi (Betrán et al. 1997), which measures the probability per site of detecting a conversion event between two subpopulations; from this information the true number and length of the gene conversion tracts can be estimated (Betrán et al. 1997).

Gene Flow
DnaSP computes different measures of the extent of DNA divergence between populations, and from these measures it computes the average level of gene flow, assuming the island model of population structure (Wright 1951). DnaSP estimates the following measures: dST, gST and Nm (Nei 1982), NST and Nm (Lynch and Crease 1990), FST and Nm (Hudson et al. 1992).

Linkage Disequilibrium
DnaSP estimates the degree of linkage disequilibrium (or nonrandom association between variants of different polymorphic sites) with the following parameters: D (Lewontin and Kojima 1964), D' (Lewontin 1964), R and R2 (Hill and Robertson 1968). For the purposes of analysis, gametes with the most or the least common variants are considered in the coupling phase (Langley et al. 1974). Both the two-tailed Fisher's exact test and the chi-square test are computed to determine whether the associations between polymorphic sites are, or are not, significant.

Population Size Changes
Analysis of the pairwise differences distribution (mismatch distribution), and the frequency of segregating sites (frequency spectrum). DnaSP shows a graphic representation of the observed and expected values for expanding and stationary populations. (Slatking and Hudson 1991; Rogers and Harpending 1992; Harpending et al. 1993; Rogers 1994; Tajima 1989a; Tajima 1989b).

Recombination
This module computes the recombination parameter R = 4Nr, where N is the population size and r is the recombination rate per sequence -or between adjacent sites- (Hudson 1987). DnaSP has also included the algorithm (the four-gametic test) described in Hudson and Kaplan (1985) to estimate RM, the minimum number of recombination events in the history of the sample.

Hudson, Kreitman and Aguadé's Test
The Hudson, Kreitman and Aguadé's (1987) test (HKA test) is based on the neutral theory of molecular evolution (Kimura 1983) which predicts that for a particular region of the genome, its rate of evolution is correlated with the levels of polymorphism within species. The test requires data from at least two regions of the genome both for an interspecific comparison and also data for the intraspecific polymorphism from at least one species. DnaSP performs the HKA tests: i) using the sequence information included in the data files; ii) or alternatively, by entering the data (the number of nucleotide differences between species and the number of segregating sites within species) in a dialog box. This latter option allows comparison of autosomal and sex-linked regions, and to perform the HKA test when sample sizes for the two regions being compared are different, or when the number of analyzed sites is different in the intraspecific and in the interspecific comparison.

Fu and Li's Tests
DnaSP computes the D, D*, F and F* test statistics proposed by Fu and Li (1993) to test various predictions made by the neutral theory of molecular evolution (Kimura 1983). The tests statistics D and F require data from intraspecific polymorphism and from an outgroup (a sequence from a related species), and D* and F* only require intraspecific data. DnaSP uses the critical values obtained by Fu and Li (1993) to determine the statistical significance of D, F, D* and F* test statistics. DnaSP can also conduct the Fs test statistic (Fu 1997).

Tajima's Test
This command calculates the D test statistic proposed by Tajima (1989a) to test the neutral theory of molecular evolution (Kimura 1983). This test is based on the fact that under the neutral model estimates of the number of segregating sites and of the average number of nucleotide differences are correlated. DnaSP calculates the confidence limits of D (two-tailed test) assuming that this statistic follows a beta distribution (Tajima 1989a).

McDonald and Kreitman Test
This command performs the test proposed by McDonald and Kreitman (1991). That test compares the synonymous and nonsynonymous variation within and between species. Under neutrality, the ratio of nonsynonymous to synonymous fixed substitutions between species should be the same as the ratio of nonsynonymous to synonymous polymorphism within species.

Coalescent simulations
DnaSP can perform computer simulations based on the coalescent process for a neutral infinite-sites model assuming a large constant population size (Hudson 1990). DnaSP can perform the coalescent simulations for different levels of intragenic recombination (no recombination, intermediate levels and free recombination). DnaSP conducts computer simulations, (i) fixing the value of q (i.e. assuming a value of q), or (ii) fixing S, the number of segregating sites (mutations) on the genealogy.
DnaSP can generate the empirical distributions of some test-statistics. From that distributions DnaSP can provide the confidence limits for a given interval. Both one-sided and two-sided tests can be conducted. DnaSP can generate the empirical distribution of the following statistics: Haplotype diversity (Nei 1987), the number of haplotypes (Nei 1987), the nucleotide diversity (Nei 1987), theta (Watterson 1975), the ZnS test statistic for linkage disequilibrium (Kelly 1997), the Rm, the minimum number of recombination events (Hudson and Kaplan 1985), the Tajima's D (Tajima 1989), the D*, F*, D and F statistics (Fu and Li 1993), the Fs (Fu 1997), and the raggedness statistic (Harpending 1994).

... And more

References

Betrán, E., Rozas, J. Navarro, A. and Barbadilla, A. (1997). The estimation of the number and the length distribution of gene conversion tracts from population DNA sequence data. Genetics, 146, 89-99.
Felsenstein, J. (1993). Phylogeny Inference Package (PHYLIP). Version 3.5. University of Washington, Seattle.
Fu, Y.-X. (1997). Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics 147: 915-925.
Fu, Y.-X., and Li, W.-H. (1993). Statistical tests of neutrality of mutations. Genetics, 133, 693-709.
Harpending, H. (1994). Signature of ancient population growth in a low-resolution mitochondrial DNA mismatch distribution. Human Biology 66, 591-600.
Hill, W. G. and Robertson, A. (1968). Linkage disequilibrium in finite populations. Theor. Appl. Genet., 38, 226-231.
Hudson, R. R. (1987). Estimating the recombination parameter of a finite population model without selection. Genet. Res., 50, 245-250.
Hudson, R. R. and Kaplan, N. L. (1985). Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics, 111, 147-164.
Hudson, R. R., Kreitman, M. and Aguadé, M. (1987). A test of neutral molecular evolution based on nucleotide data. Genetics, 116, 153-159.
Hudson, R. R., Slatkin, M. and Maddison, W. P. (1992). Estimation of levels of gene flow fom DNA sequence data. Genetics, 132, 583-589.
Jukes, T. H. and Cantor, C. R. (1969). Evolution of protein molecules. In Munro,H.N. (ed), Mammalian Protein Metabolism. Academic Press, New York, NY, pp. 21-132.
Kelly, J. K. (1997). A test of neutrality based on interlocus associations. Genetics 146, 1197-1206.
Kimura, M. (1983). The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, MA.
Kumar, S., Tamura, K. and Nei, M. (1994). MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers. Comput. Applic. Biosci., 10, 189-191.
Langley, C. H., Tobari, Y. N. and Kojima, K. (1974). Linkage disequilibrium in natural populations of Drosophila melanogaster. Genetics, 78, 921-936.
Lewontin, R. C. (1964). The interaction of selection and linkage. I. General considerations: heterotic models. Genetics, 49, 49-67.
Lewontin, R. C. and Kojima, K. (1960). The evolutionary dynamics of complex polymorphisms. Evolution, 14, 458-472.
Lynch, M. and Crease, T. J. (1990). The analysis of population survey data on DNA sequence variation. Mol. Biol. Evol., 7, 377-394.
McDonald, J. H. and Kreitman, M. (1991). Nature, 351, 652-654.
Maddison, W. P. and Maddison, D. R. (1992). MacClade: analysis of phylogeny and character evolution. Version 3. Sinauer Associates, Sunderland, MA.
Maddison, W. P., Swofford, D. L. and Maddison, D. R. (1997). NEXUS: an extendible file format for systematic information. System. Biol., 46, 590-621.
Morton, B. R. (1993). Chloroplast DNA codon use: Evidence for selection at the psb A locus based on tRNA availability. J. Mol. Evol. 37, 273-280.
Nei, M. (1982). Evolution of human races at the gene level, Pp. 167-181. In Bonne-Tamir,B., Cohen,T. and Goodman,R.M. (eds.), Human genetics, part A: The unfolding genome. Alan R. Liss, New York, NY.
Nei, M. (1987). Molecular Evolutionary Genetics. Columbia University Press, New York, NY.
Nei, M. and Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 3, 418-426.
Rogers, A. R. (1995). Genetic evidence for a pleistocene population. Evolution 49, 608-615.
Rogers, A. R. and Harpending, H. (1992). Population growth makes waves in the distribution of pairwise genetic differences. Mol. Biol. Evol. 9, 552-569.
Rozas, J. and Aguadé, M. (1994). Gene conversion is involved in the transfer of genetic information between naturally occurring inversions of Drosophila. Proc. Natl. Acad. Sci. USA, 91, 11517-11521.
Sharp, P. M., T. M. F. Tuohy and K. R. Mosurski. (1986). Codon usage in yeast: Cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 14, 5125-5143.
Shields, D. C., P. M. Sharp, D. G. Huggins and F. Wright. (1988). "Silent" sites in Drosophila genes are not neutral: Evidence of selection among synonymous codons. Mol. Bio. Evol. 5, 704-716.
Sidman, K. E., George, D. G., Barker, W. C. and Hunt, L. T. (1988). The protein identification resource (PIR). Nucleic Acids Res., 16, 1869-1871.
Slatkin, M. and R. R. Hudson (1991). Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129, 555-562.
Swofford, D. L. (1991). PAUP: phylogenetic analysis using parsimony, version 3.0. Illinois Natural History Survey, Champaign, IL.
Tajima, F. (1983). Evolutionary relationship of DNA sequences in finite populations. Genetics,105, 437-460.
Tajima, F. (1989a). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics, 123, 585-595.
Tajima, F. (1989b). The effect of change in population size on DNA polymorphism. Genetics, 123, 597-601.
Tajima, F. (1993). Measurement of DNA polymorphism. In Takahata,N. and Clark, A. G. (eds), Mechanisms of Molecular Evolution, Sinauer Associates. Inc., Sunderland, MA, pp.37-59.
Tajima, F. (1996). The amount of DNA polymorphism maintained in a finite population when the neutral mutation rate varies among sites. Genetics, 143, 1457-1465.
Watterson, G. A. (1975). On the number of segregating sites in genetical models without recombination. Theor. Pop. Biol. 7, 256-276.
Wright, F. (1990). The "effective number of codons" used in a gene. Gene 87, 23-29.
Wright, S. (1951). The genetical structure of populations. Ann. Eugenics,15, 323-354.

DnaSP References

Rozas, J. and Rozas, R. 1995. DnaSP, DNA sequence polymorphism: an interactive program for estimating Population Genetics parameters from DNA sequence data. Comput. Applic. Biosci. 11: 621-625.

Rozas, J. and Rozas, R. 1997. DnaSP version 2.0: a novel software package for extensive molecular population genetics analysis. Comput. Applic. Biosci. 13: 307-311.

Rozas, J. and Rozas, R. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15: 174-175.

Rozas, J., Sánchez-DelBarrio, J. C., Messeguer, X. and Rozas, R. 2003. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19: 2496-2497.

Rozas, J. 2009. DNA Sequence Polymorphism Analysis using DnaSP. Pp. 337-350. In Posada, D. (ed.) Bioinformatics for DNA Sequence Analysis; Methods. In Molecular Biology Series Vol. 537. Humana Press, NJ, USA.

Librado, P. and Rozas, J. 2009. DnaSP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25: 1451-1452 | doi: 10.1093/bioinformatics/btp187.

February 13, 2009
Return to DnaSP Home Page

Go to Julio Rozas Home Page