Analytica Chimica Acta 2009, in press
Joaquim Jaumot1*, Ramon Eritja2, Susana Navea3
and Raimundo Gargallo1
1. Department of Analytical Chemistry, Universitat de Barcelona, Diagonal 647,
Barcelona, E-08028 Spain.
2. Department of Structural Biology, IBMB-CSIC, Jordi Girona 18-26, Barcelona,
E-08034 Spain
3. Acciona Agua, Av. de les Garrigues 22, El Prat de Llobregat, E-08820, Spain
* Author to whom
correspondence should be addressed: Tel: +34-934034445;
Fax: +34-934021233; E-mail:
joaquim@apolo.qui.ub.es
DNA can adopt structures in solution apart from the well-known Watson-Crick double helix, ranging from disordered single strands to high-order structures such as triplexes or quadruplexes. Moreover, different topologies can be adopted depending on the polarity of the DNA strands. The elucidation of the structure and topology adopted by a DNA sequence is usually carried out by means of spectroscopic techniques, such as circular dichroism.
In this work, the ability of several chemometric methods to efficiently classify
DNA structures from circular dichroism data is tested. With this objective in
mind, a data set including 50 experimental spectra
corresponding to different DNA structures (random coil, duplex, hairpin,
reversed and normal triplex, parallel and antiparallel G-quadruplex, and
i-motif) has been analyzed by means of unsupervised Hierarchical Clustering
Analysis, Principal Component Analysis and Partial Least Squares Discriminant
Analysis. The results have shown than those methods allow efficiently the
classification of DNA structures from circular dichroism spectra. Moreover,
these classification methods also provided the most characteristic wavelengths
used in the classification procedures.
Keywords
DNA structure, classification, Principal Component Analysis, Clustering, Partial
Least Squares Discriminant Analysis, Circular Dichroism spectroscopy

Hierarchical Cluster Analysis obtained dendrogram using Ward’s linkage method and Euclidian distance. Two main branches can be clearly distinguished. The branch on the left contains the disordered DNAs together with several triplex, i-motifs and antiparallel G-quadruplex structures, whereas the branch on the right includes most of the parallel G-quadruplex and duplex structures included in the analyzed data set.