Homepage of Michael Y. Tolstorukov


In my research I use methods of bioinformatics and computer modeling to address questions of genome biology related to epigenetic gene regulation and genome packaging. How the limited number of proteins can find their binding sites in a genome of billions nucleotides packaged in a tiny cell? How gene regulation is managed with reliability of a well-tuned machine in a myriad of epigenomes across multiple tissues, conditions, and developmental stages? With the development of new high-throughput techniques such as tiling arrays and next generation sequencing these intriguing questions can be tackled.

The main focus of my research is on the analysis of the primary structure of chromatin, which includes nucleosome positioning, distribution of histone variants and modifications in the genome, and regulatory protein binding.

Completed projects:

Impact of chromatin structure on sequence variability in the human genome

(with Peter Park, Harvard Medical School; Natalia Volfovsky, Robert M. Stephens, Advanced Biomedical Computing Center, NCI at Frederick)

DNA sequence variations in individual genomes give rise to different phenotypes within the same species. One mechanism in this process is the alteration of chromatin structure due to sequence variation that impacts gene regulation. We composed a high-confidence collection of human SNPs and indels based on analysis of publicly available sequencing data and investigated whether the DNA loci associated with stable nucleosome positions are protected against mutations. We addressed how the sequence variation is reflected in the occupancy profiles of nucleosomes bearing different epigenetic modifications on genome scale. We find that indels are depleted around nucleosome positions of all considered types, while SNPs are enriched around the positions of bulk nucleosomes but depleted around the positions of epigenetically modified nucleosomes. These findings indicate an increased level of conservation for the sequences associated with epigenetically modified nucleosomes, highlighting complex organization of the human chromatin.

Interplay of chromatin-mediated mutation bias and selection can shape sequence variation profile (cf. to schematic illustration in Semple & Taylor, Science, 2009). (a) Bulk and epigenetically modified nucleosomes are represented with blue and red ovals. Green and orange lines represent mutation rate of SNPs and indels respectively, and black line represents selection pressure acting on the DNA sequence. (b) The significant difference in the indel rate inside and outside nucleosomes mainly determines the indel density profile observed in the genome (orange), while SNP density profile (green) is mainly affected by selection. Our results do not exclude the possibility that natural selection can affect the distribution of indels and that alteration of the mutation rate affects the distribution of SNPs.  Rather, they indicate that these mechanisms are not the major factors shaping the resulting profiles.

back to top

(with Peter Park, Peter Kharchenko, Harvard Medical School; Robert Kingston, J. Aaron Goldman, Massachusetts General Hospital)

Eukaryotic DNA is wrapped around a histone protein core to constitute the fundamental repeating units of chromatin, the nucleosomes. The affinity of the histone core for DNA depends on the nucleotide sequence; however, it is unclear to what extent DNA sequence determines nucleosome positioning in vivo, and if the same rules of sequence-directed positioning apply to genomes of varying complexity. Using the data generated by high-throughput DNA sequencing combined with chromatin immunoprecipitation, we have identified positions of nucleosomes containing the H2A.Z histone variant and histone H3 trimethylated at lysine 4 in human CD4+ T-cells. We find that the 10-bp periodicity observed in nucleosomal sequences in yeast and other organisms is not pronounced in human nucleosomal sequences. This result was confirmed for a broader set of mononucleosomal fragments that were not selected for any specific histone variant or modification. We also find that human H2A.Z nucleosomes protect only about 120 bp of DNA from MNase digestion and exhibit specific sequence preferences, suggesting a novel mechanism of nucleosome organization for the H2A.Z variant. (Genome Res 2009. 19: 967-977)


Periodograms showing spectral density for the WW and SS dinucleotide autocorrelation functions for human and yeast H2A.Z nucleosomes (data from Barski et al, Cell 2007 and Albret et el, Nature 2007). The red lines represent the power spectral density for nucleosomal sequences and solid and dashed blue lines represent the statistical significance levels P = 0.001 and P = 0.05 respectively.

Fragment of nucleosome core particle structure containing H2A.Z variant (PDB_ID 1f66 (Suto et al., Nat Struct Biol 2000), H2A.Z is shown in magenta) with the superimposed major H2A histone (dark blue) from the best resolved nucleosome structure (PDB_ID 1kx5 (Davey et al, JMB, 2002)). Base pair at position -43 (yellow) marks the 30-bp shortening of the protected DNA fragment.

back to top

Sequence-directed nucleosome positioning in CpG islands

(with Wilma Olson, Rutgers, the State University of New Jersey; Victor Zhurkin, NCI)

Unlike most of the genome, the CpG islands remain unmethylated and are associated with the open, transcriptionally competent form of chromatin rather than the closed, inactive form. Malfunctions of the gene regulatory machinery that affect the state of chromatin in CpG islands may result in various cancers and developmental disorders. We hypothesize that the sequence-dependent structural properties of DNA in the CpG islands are crucial for maintaining the open state of chromatin.

In our recent study on the role of different degrees of freedom in the formation of the superhelical nucleosomal trajectory (Tolstorukov, Olson, Zhurkin et al. submitted) a novel structural approach has been developed to map potential nucleosome locations on genomic sequences. Since there are very few (if any) direct sequence-specific interactions between the histones and DNA bases, the affinity of the histone core to DNA is determined primarily by the energy needed to wrap DNA on the surface of the nucleosome. Hence, our algorithm is based on the calculation of the deformation energy required for a duplex of given sequence to follow the nucleosomal DNA trajectory. The developed algorithm was successfully tested on a set of the sequences for which nucleosome positions were mapped to high resolution.

Analysis of human genome sequences showed that the “concentration” of nucleosome-attracting sites (characterized by a lower-than-average DNA deformation energy) is noticeably lower and the “concentration” of nucleosome-repelling sites (characterized by a higher-than-average DNA deformation energy) is noticeably higher in CpG islands. The observed non-trivial distribution of nucleosome-positioning sites provides new insight into the well-documented phenomenon of nucleosome depletion in CpG islands (these results are currently in preparation for publication).

Distributions of nucleosome-attracting (A) and nucleosome-repelling (B) sites near gene starts. Data points represent the average numbers and of such sites occurring in a 0.5-kb (kilobase) running window as function of the distance, dTSS, between the window center and the transcription start site (denoted by hooked green arrows). Results for two groups of aligned genes are shown: red, 10,773 genes with CpG islands; blue, 15,642 genes without CpG islands.

back to top

Effect of base-pair shear deformations on formation of nucleosomal DNA path

(with Victor Zhurkin, NCI; Wilma Olson, Rutgers, the State University of New Jersey)

The bending of DNA in nucleosomes is accompanied by lateral displacements of adjacent base pairs, the effect of which on the overall DNA folding is generally neglected. We demonstrate, however, that these displacements play a much more important structural role than ever imagined. Specifically, the Slide deformations imposed on DNA by the histones at sites of local anisotropic bending appear to govern both the superhelical trajectory of DNA and the positioning of nucleosomes. Furthermore, the computed cost of deforming DNA on the nucleosome is sequence specific: in optimally positioned sequences the most easily deformed base-pair steps (CA:TG and TA) occur at sites of large positive Slide and negative Roll (where the DNA bends into the minor groove). These conclusions rest upon a treatment of DNA that goes beyond the conventional ‘elastic-rod’ model, incorporating all essential degrees of freedom of ‘real’ duplexes in the estimation of DNA deformation energies. Indeed, only after lateral Slide displacements are considered, are we able to account for the sequence-specific folding of DNA found in nucleosome structures. The close correspondence between the predicted and observed nucleosome locations demonstrates the potential advantage of our ‘structural’ approach in the assessment of nucleosome positioning.

Effect of base-pair Slide on the superhelical path of DNA in the best resolved nucleosome core particle structure (NCP147, Davey et al., 2002). A. DNA model (red) superimposed at the initial base pair on the superhelical path of the real nucleosomal DNA (NCP147, white). The model structure is constructed from the structural parameters of NCP147 with Slide equated to zero at each dimeric step. B. DNA model with negative and positive Slide at dimeric steps separated by 5 bp (colored blue and red respectively). The DNA helical axis is represented by yellow sticks. Because of the ~180° net helical twisting between base pairs, the sliding occurs in the same direction (red and blue arrows), i.e., the overall effect of sliding is cumulative.

back to top

Non-random distributions of A-tracts facilitates bacterial genome packaging

(with Sankar Adhya, Victor Zhurkin, Konstantin Virnik, NCI)

Molecular mechanisms of the bacterial chromatin packaging are still unclear, as bacteria lack nucleosomes or other apparent basic elements of the DNA compaction. It is known that the correlations in the genomic DNA sequence may constitute a structural code, facilitating DNA folding. We elaborated this concept analyzing the distributions of the A-tracts (the sequence motifs that introduce the most pronounced local curvature of DNA). We have observed that their distribution is highly non-random: (i) A-tracts are phased with the DNA pitch, i.e. positioned in such a way that the individual bends they introduce produce a curvature build-up; (ii) the phased A-tracts are organized in clusters of about 100 bp long, as revealed by the specially designed algorithm based on the Fourier formalism. Such clusters are present throughout the genome including the coding sequences. The clustering of A-tracts greatly increases the local curvature of DNA and therefore appears critical for formation of the DNA loops and coils. Moreover, the clusters of A-tracts may serve as binding sites for nucleoid-associated proteins that have propensities for binding curved DNA (e.g., HU, H-NS, Hfq). Thus, for the first time we have observed a clear structural signal in the DNA sequences that can facilitate DNA folding genome-wide, introducing DNA intrinsic curvature and increasing the stability of the DNA complexes with architectural proteins, so-called “compactosomes.”

back to top

Indirect sequence readout

(with Victor Zhurkin, NCI; Robert Jernigan, Iowa State University)

The energy of protein-induced structural deformations of a DNA duplex in complexes is sequence-dependent, which provides another way of recognition (indirect information readout), additional to the specific patterns of hydrogen bonding. To explore this phenomenon we have built a unique database of protein-DNA complexes and developed a novel algorithm for analyzing the protein-DNA interactions separately in the DNA major and minor grooves. As a result, for the first time we have observed hydrophobicity-structure correlations in protein-DNA complexes, namely, that the hydrophobic and polar amino acids, interacting in minor groove, induce distinct DNA structural deformations. These results and are currently used in my recently started project on MD simulation of the evolution of DNA structural deformations during formation and dissolution of protein-DNA complexes, which aims to shed additional light onto the problem of recognition of degenerate DNA sequences.

back to top

Large nucleoprotein complexes

(with Sankar Adhya, Victor Zhurkin, Szabolcs Semsey, Mofang Liu, NCI)

Details of spatial organization of the large nucleoprotein complexes are unidentified in many cases. To calculate the minimum-energy 3D trajectory of the DNA under the structural constrains in a particular nucleoprotein assembly we applied a knowledge-based elastic model of DNA, suitable for mesoscopic simulations (DNA fragments ~100 bp). This approach allowed determining the trajectory of the repression loop and the relative positioning of the binding sites of regulatory proteins (GalR, HU) in the gal repressosome in E. coli cell (the higher-order nucleoprotein complex similar to that shown in the left panel of the above figure). The performed computer modeling helped us to reveal specific pathways of gal operon repression.

Minimal energy configuration of the repression loop model.  Two GalR dimers (purple and teal blue) and shown as ribbons.  The OE operator is highlighted with red color, and the OI is with yellow.  The -10 element of the P2 promoter is colored blue and orange. The experimentally observed HU binding site (hbs, position +6.5, six base pairs are colored magenta and orange).

Substantial DNA structural deformations are also known to be crucial for transcription initiation. In collaboration with experimental group (Lab of Mol. Biol., NCI), I study the dependence of the bacterial promoter strength on the sequence of its spacer region. Particularly, we have observed that presence of “soft” AT-rich sequence in the spacer region can increase the promoter strength up to 100 fold. The structural analysis has shown that the interactions of beta subunit of RNA polymerase in the DNA minor groove are accountable for the effect. Based on our results, we predicted the mutations in the promoter spacer region that would increase the promoter strength, as was confirmed experimentally.

back to top

B-to-A transition in DNA:

Propensity scales based on trimeric and dimeric models

(with Victor Zhurkin, NCI; Robert Jernigan, Iowa State University)

Experimental data on the sequence-dependent B«A conformational transition in 24 oligo- and polymeric duplexes yield optimal dimeric and trimeric scales for this transition.  The ten sequence dimers and the 32 trimers of the DNA duplex were characterized by the free energy differences between the B- and A-forms in water solution.  In general, the trimeric scale describes the sequence-dependent DNA conformational propensities more accurately than the dimeric scale, which is likely related to the trimeric model accounting for the two interfaces between adjacent base pairs on both sides (rather than only one interface in the dimeric model).  In particular, the exceptional preference of the B-form for the AA:TT dimers and AAN:N’TT trimers is consistent with the cooperative interactions in both grooves.  In the minor groove, this is the hydration spine that stabilizes adenine runs in B-form.  In the major groove, these are hydrophobic interactions between the thymine methyls and the sugar methylene groups from the preceding nucleotides, occurring in B-form.  This interpretation is in accord with the key role of hydration in the B«A transition in DNA.  Importantly, our trimeric scale is consistent with the relative occurrences of the DNA trimers in A-form in protein-DNA cocrystals. The B/A-scales developed here can be used for analyzing genome sequences in search for A-philic motifs, putatively operative in the protein-DNA recognition.

back to top

Modeling the nucleic acids conformational transitions with hysteresis over hydration-dehydration cycle

 (with Vladimir Maleev, V. Karazin Kharkov National University)

A mathematical model of the conformational transitions of the DNA, mainly molecule during the water adsorption-desorption cycle has been proposed.  The nucleic acid-water system is considered as an open system.  The model describes the transitions between three main conformations of wet DNA samples: A-, B- and unordered forms.  The analysis of kinetic equations shows that the non-trivial bifurcation behavior of the system which leads to the multistability.  This fact allows one to explain the hysteresis phenomena observed experimentally in the DNA-water system.  It was shown that hysteresis phenomena appear only in case cooperative conformational transitions. Microgravimetrical experiments were performed to test the system.  The model and experimental results are in good agreement with each other. Distributed parameter model describes conformational junctions in heterogeneous DNA sequences.

back to top

You may also like...