Analytica Chimica Acta 2009, in press

Classification of nucleic acids structures by means of the chemometric analysis of circular dichroism spectra

Joaquim Jaumot1*, Ramon Eritja2, Susana Navea3 and Raimundo Gargallo1

1. Department of Analytical Chemistry, Universitat de Barcelona, Diagonal 647, Barcelona, E-08028 Spain.
2. Department of Structural Biology, IBMB-CSIC, Jordi Girona 18-26, Barcelona, E-08034 Spain
3. Acciona Agua, Av. de les Garrigues 22, El Prat de Llobregat, E-08820, Spain
 

* Author to whom correspondence should be addressed: Tel: +34-934034445; Fax: +34-934021233; E-mail: joaquim@apolo.qui.ub.es
 

Abstract

DNA can adopt structures in solution apart from the well-known Watson-Crick double helix, ranging from disordered single strands to high-order structures such as triplexes or quadruplexes. Moreover, different topologies can be adopted depending on the polarity of the DNA strands. The elucidation of the structure and topology adopted by a DNA sequence is usually carried out by means of spectroscopic techniques, such as circular dichroism.


In this work, the ability of several chemometric methods to efficiently classify DNA structures from circular dichroism data is tested. With this objective in mind, a data set including 50 experimental spectra corresponding to different DNA structures (random coil, duplex, hairpin, reversed and normal triplex, parallel and antiparallel G-quadruplex, and i-motif) has been analyzed by means of unsupervised Hierarchical Clustering Analysis, Principal Component Analysis and Partial Least Squares Discriminant Analysis. The results have shown than those methods allow efficiently the classification of DNA structures from circular dichroism spectra. Moreover, these classification methods also provided the most characteristic wavelengths used in the classification procedures.
 


Keywords

DNA structure, classification, Principal Component Analysis, Clustering, Partial Least Squares Discriminant Analysis, Circular Dichroism spectroscopy
 

 

 

Hierarchical Cluster Analysis obtained dendrogram using Ward’s linkage method and Euclidian distance. Two main branches can be clearly distinguished. The branch on the left contains the disordered DNAs together with several triplex, i-motifs and antiparallel G-quadruplex structures, whereas the branch on the right includes most of the parallel G-quadruplex and duplex structures included in the analyzed data set.