IMM Report Number 30

In conjunction with Foresight Update 48

Recent Progress: Steps Toward Nanotechnology

By Jim Lewis

A Better DNA Motor
Building Complexes of Proteins

Jim Lewis The past several years have seen rapidly accelerating progress in nanoscale science and technology. Advances in various research topics have exploited novel molecules to create designed molecular arrays with intriguing properties, and paths to practical applications have started to appear. Particularly striking has been the progress in molecular electronics (see Update 47). It seems likely such progress will lead to powerful molecular scale computational devices. It is much less clear that such progress will lead to molecular machine systems that could be used to build better molecular machine systems, leading eventually to a generalized molecular manufacturing capacity.

Different paths to molecular manufacturing can be imagined. Zyvex is progressing along the top-down path, proposed in 1959 by Richard Feynman, of creating machine systems to build smaller machine systems (see nanodot.org, 26 February 2002). Another path takes inspiration from biology’s molecular machine systems (Drexler 1981, Drexler 1999) to envision polymers designed to fold into specific shapes and perform specified functions. Simple devices built from folded polymers could be used to build crude molecular machine systems, which could be used to build better molecular machine systems.

Important goals along the folded polymer path to molecular manufacturing include producing a variety of molecular building blocks with diverse properties, designing molecular recognition systems to enable complex three-dimensional arrangements of such building blocks, providing molecular effectors and motors to position components and substrates, and tools to effect molecular changes.

This column will attempt to highlight developments that seem to advance these goals. This restricted focus is due in part to the fact that progress in the broader area of nanotechnology is now far too rapid to pretend to provide a representative snapshot in a quarterly column, and in part to the belief of this writer that molecular machine systems will provide the best route to achieving the wide promise of molecular manufacturing.

A Better DNA Motor

H Yan, X Zhang, Z Shen & NC Seeman “A robust DNA mechanical device controlled by hybridization topology” Nature 415:62-65, 3 January 2002.

On nanodot
For more information on the NYU DNA nano-device, see the post on Nanodot from 2 January 2002

Three years ago Nadrian Seeman and his colleagues at New York University reported a nanomechanical device based upon conformation change in a rigid DNA structure (see Update 36). In that earlier device, a small molecule added to the solution caused a change in the basic geometry (changing from the B to the Z form of DNA) of a section of the DNA molecule, causing the separation between two points on the device to increase by about 2 nm. Although it was an important advance, this demonstration suffered from the fact that the amount of movement could not be easily modulated, and from the fact that movement was caused by a non-specific small molecule so that there was no way to restrict movement to specific devices within an array of similar devices.

In this latest advance from Seeman’s group, the movement is produced in an entirely different fashion. Instead of using a small molecule to influence the gross geometry of the two strands of a single double helical DNA segment, specific DNA strands are added to a four strand DNA molecule to “set” the number of times that the four strands of two parallel double helical DNA segments cross over each other. In one “topoisomer” (PX) the DNA strands cross over four times; in the other topoisomer (JX₂), only twice. Converting between the two topoisomers rotates one end of the four-strand molecule with respect to the other end by 180 degrees. The sequence of the DNA set strands can be specified to produce either one topoisomer or the other. Not only does the sequence of the set strand determine which of the two topoisomers is produced, but any of a large number of different set strands can be designed to target any of large number of four strand molecules. Thus, in a large array of devices, specific different strands of DNA can be added to target specific different devices within the population.

Accomplishing the actual conversion of one topoisomer to the other is done via addition of a “fuel” strand, a technique used earlier by Bernard Yurke and colleagues to power their DNA actuator design (see Update 42). Each set strand includes an extension on one end of the strand that does not participate in base pairing with the four-strand DNA molecule. The fuel strand however base pairs with the entire set strand, so adding the fuel strand “pulls” the set strand out of the complex, creating an unstructured intermediate. Addition of the appropriate different set strand drives the intermediate into the opposite topoisomer. Addition of the appropriate fuel strand drives that topoisomer into another unstructured intermediate, which in turn can be driven into the original topoisomer by adding the original set strand. This sequence of additions of set and fuel strands produces a four step motor that rotates 180 degrees from one stable configuration to another stable configuration, and back again.

Each of the topoisomers consists of 252 nucleotides, implying a molecular mass of about 75,000 daltons. Although the two topoisomers have similar molecular masses, their different shapes enable separation by a bulk separation technique known as non-denaturing gel electrophoresis. This technique was used to demonstrate that additions of the appropriate strands converted a population of one topoisomer cleanly into a population of the other, and back, exactly as expected. The authors also used this technique to demonstrate the striking purity of each molecular population.

To prove unambiguously that the altered electrophoretic mobility actually represented the intended change in molecular conformation (a rotation of 180 degrees), the authors used atomic force microscopy of individual molecular devices. For this purpose, the authors created a linear array in which three of the PX-JX₂ devices alternate with four half-hexagon markers, each constructed from three DNA triangles; that is, marker-device-marker-device-marker-device-marker. Each half-hexagon marker is comprised of nearly 250 nucleotides so that each marker is comparable in mass to one of the mechanical devices. When the three devices are in the PX configuration, the four markers are all aligned along one side of the complex (cis arrangement); when the devices are in the JX₂ configuration, the markers form a zig-zag pattern with alternate markers on opposite sides of the complex (trans arrangement). AFM images show that individual molecular complexes are in either the cis or trans arrangement, as expected according to whether DNA strands are added to convert the devices to the PX or to the JX₂ configurations. Visualization of the arrangement of the markers in the AFM images thus proves that the devices function as expected.

The array of markers and devices appears to be about 200 nm in length, and the further points of the half-hexagon markers are rotated a distance of about 35 nm. Further, the mass of the marker that is rotated is comparable to the mass of the topoisomer that produces the rotation. The authors point out that by using N different device sequences of the type that they have demonstrated, they can produce 2- or 3-dimensional arrays capable of 2^N different structural states. They also note that further work will be required to demonstrate that such devices can transmit forces “capable of performing useful work on other chemical species.” Nevertheless, it appears that they have developed a robust way to program a diversity of molecular shapes, which could conceivably form the basis of crude molecular machine systems.

Building Complexes of Proteins

Although the proteins in living cells comprise a plethora of molecular machines that inspired Drexler’s proposal for generalized molecular engineering, progress in designing and building with proteins has lagged far behind the achievements of Seeman and others in building with DNA. One reason for this is that the molecular recognition rules for binding one strand of DNA to another are simple and well understood: of the four nucleotide components of DNA, A binds to T and G binds to C. By contrast, the molecular recognition rules for building with proteins are far more complex and very poorly understood. The 20 amino acids commonly used in biological proteins embody several different chemical characteristics to varying degrees so that they fold to form surfaces that can bind to other surfaces that are complementary in terms of electrical polarization, Van der Waals interactions, etc. There is no simple way to predict which protein sequence will bind to which other protein sequence, so protein sequences can not easily be used to design molecular complexes from scratch, as can DNA sequences.

Although a general set of rules for molecular recognition in proteins still seems very far away, it is possible that recent progress in genomics will provide a set of valuable clues for decoding protein recognition rules. It is generally accepted that the myriad biochemical functions of each cell occur as a result of specific complexes formed among the proteins encoded by the genes expressed in that cell. Decoding the genome of an organism provides the sequence of each of the thousands of proteins used by the organism. If we knew which proteins formed complexes with which other proteins, we could look more closely for information on which protein sequences bind to which other protein sequences. The two papers considered here present different approaches to identifying many of the protein complexes produced in yeast cells.

AC Gavin et al. (38 authors in all) “Functional organization of the yeast proteome by systematic analysis of protein complexes” Nature 415:141-147, 10 January 2002.

On nanodot
For more information on this research, including links to extensive online resources, see the Nanodot post from 9 January 2002

These authors exploited the ability to add to the ends of various genes a synthetic “gene cassette” encoding a synthetic protein sequence that could be used to very efficiently purify the protein encoded by the gene to which the tag cassette had been added. Using a semi-automated technology, they were able to add this cassette to 1739 genes, generating a library of 1548 yeast strains, each of which should express one tagged protein. Tagged proteins were actually detected in 1167 strains, and in 589 cases the authors were able to purify the tagged protein. About 78% of these were purified in a complex with other proteins. From these complexes they generated 20,946 samples for mass spectrometry and identified 16,830 protein fragments present in one or more of the 589 complexes. These fragments came from 1440 distinct protein gene products, representing about 25% of the protein encoding sequences in the yeast genome. Thus these complexes represent a significant fraction of the protein-protein interactions present in yeast cells.

After eliminating redundancies and possible artifacts, the authors were left with 232 unique protein complexes: 98 known plus 134 newly identified complexes. The size of the complexes varied from 2 to 83 proteins, with an average of 12.

The interests of the authors in their results lie in looking for clues to the high level functional organization of gene products in cells, and in potential application to drug discovery programs. They do not address protein molecular recognition codes, and clearly much additional work would be needed to extract from such complexes information about which protein sequences bind to each other since the binding domains within each protein are not identified.

Tong et al. (16 authors in all) “A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules” Science 295:321-324, 11 January 2002.

Some information on protein molecular recognition codes is available in terms of a number of peptide recognition modules, each of which is a family of similar peptide sequences that recognize a class of peptide sequences with certain specific characteristics. For example, the SH3 module is a family of protein sequences of about 50 amino acid residues each found in certain signal transduction and cytoskeleton proteins that recognizes certain proline-rich peptide sequences. For such information to become useful in designing protein interactions, it is necessary to move from approximate consensus sequences to information on which specific residues where within a protein sequence will enable strong binding to which specific partner sequences.

The authors developed a strategy to elucidate the specific interactions of the SH3 domain found in yeast proteins. They first queried the yeast genome DNA sequence to identify yeast proteins with sequences similar to a known yeast SH3 protein sequence, and found 24 SH3 proteins among the predicted yeast proteome (the collection of proteins predicted to be encoded by a genome) containing a total of 28 SH3 domains (a few proteins had more than one SH3 domain). They expressed 25 of these predicted SH3 domains as soluble proteins and used these to screen random collections of peptides for peptides that bind to each SH3 domain.

The technique used for this screening is phage display technology. A library is prepared of bacteriophage (a virus that infects bacterial cells) in which a portion of one bacteriophage gene is a random sequence coding for nine amino acid residues. Thus the pot of bacteriophage produced have a coat in which each phage particle displays one of the 10¹³ possible sequences nine amino acids long. After several cycles of selecting all phage that bound to each SH3 domain, growing up more of those phage that bound, and then repeating the cycle of selection and growing, it was possible to identify all peptides that bound to each SH3 domain.

Four SH3 domains did not bind any peptide, suggesting that they do not bind to a simple linear sequence (for example they might bind 9 residues of a longer sequence folded to bring those nine residues into close proximity). From the others, analysis of the sequences that bound to each SH3 domain allowed the identification of “consensus sequences” spotlighting the common features of the sequences that bound to each SH3 domain.

The consensus sequences were then used to screen the yeast proteome database to identify potential natural ligands (binding partners) for each SH3 domain. Using computer programs to analyze interactions in networks, the authors identified 394 interactions among 206 proteins.

Not satisfied with one technology to identify protein interaction networks, the authors also used yeast two-hybrid technology to identify protein interactions. This technology uses molecular genetic techniques to split into two pieces a gene whose function in yeast can be easily assayed or selected for. Thus the gene will only function in a cell if the proteins produced by the two pieces bind to each other within that cell. By cloning one target sequence on one part of the gene, and a collection of many different sequences on the other part of the gene, one can identify binding partners for the target sequence by isolating the few cells in which the gene functions. In this way 233 interactions among 145 proteins were identified. Because phage display and two hybrid technologies identify binding of sequences in different environments, under different conditions, and subject to different artifacts, it is not expected that an identical array of interactions will be identified. For example, phage display uses short peptides and binding in a test tube, while two hybrid tests use whole domains of native proteins inside the cell. 59 interactions were found in both networks. The authors did further experiments with some of these interactions to verify that the interactions were biologically significant.

As with the work of Gavin et al., Tong et al. appear interested in answering biological questions rather than deriving molecular recognition information useful for designing molecular machine systems, but they demonstrate the wide array of powerful technologies now available for defining molecular interactions.

IMM would appreciate learning your thoughts on the above article.

In conjunction with Foresight Update 48

Recent Progress: Steps Toward Nanotechnology

By Jim Lewis

A Better DNA Motor

Building Complexes of Proteins

Footer