Brief Overview of Current Research

Monte Carlo Methods

The Monte Carlo methods as we understand it dates back almost at the beginning of computer age, aroung the same period as the first computer, ENIAC, was born in 1946. Although Buffon's needle was perhaps the first attempt to conduct random experiments to approximate a deterministic quantity, in this case $\pi$ , it was with the advent of computer and its rapid development that Monte Carlo methods really started to boost. First derived and used by physicists, now the methods have found its use in a variety of disciplines, such as chemistry, biology, engineering, finance, econometrics and statistics. If direct numerical methods are infeasible, for example a high dimensional integration problem, then MC may be a decent alternative. One fundamental step of nearly all MC methods is to generate (pseudo-) random numbers from a distribution, usually complicated and known only up to a normalizing constant. There are roughly two classes of methods to achieve this, one is to come up with a clever and easy-to-sample trial distribution and use properly weighted samples, examples are rejection sampling, importance sampling and the more sophisticated Seqential Monte Carlo (SMC) in general; the other way is to generate dependent samples by evolving a stochastic process (e.g., Markov chain) such that it has the desired stationary distribution, the big class of methods known as Markov Chain Monte Carlo (MCMC) falls into this catagory. An up-to-date account of Monte Carlo methods and its application in science can be found in [1], and a comprehensive treatment of MCMC in [2].

In molecular simulation, the distribution of interest is typically the Boltzmann distribution $\pi(\mathbf{x})$ for configuration $\mathbf{x}$ , which consists of the coordinates of individual atoms, and what MC does in this context is trying to sample the equilibrium configurations that are described by $\pi(\mathbf{x})$ . Thermodynamic quantities such as the average potential energy, the specific heat and the entropy can then be estimated based on sample averages. There is another approach termed Molecular Dynamics (MD) which is limited to particle simulations and does not enjoy a wide application as MC but has been proven to be quite popular among the molecular simulations community. The basic idea of MD is evolving a system which is conserved on a Hamiltonian hypersurface, based on the integration of the equations of motion. Thermodynamic quantities of interest can again be calculated using time averages. The ergodic theorem states that time average converge to ensemble (statistical) average, therefore results from a MD simulation often coincides with a MC simulation. There are several existing software packages that's dedicated to MD simulation, such as GROMACS, NAMD, etc., and they have been constantly advancing incorporating new HPC (High Performance Computing) technology into it's design to make simulations run faster, this trend partially explains why MD simulation is popular in the field. On the other hand, not so many packages exist and researchers interested in MC implementaion typically write their own programs (as I do). Because of the flexibility of Monte Carlo methods, researchers would have to see which is more effective for the particular problem they are looking at.

While MD simulations provide a comprehensive dynamic picture of the system and hence contain more information than MC simulations, it's still limited by the timescale of the underlying events. Currently my work is focusing on the lattice protein model with implicit membrane and water region [3]. One advantage of using MC in this case is that for complex membrane-protein systems where interaction between proteins is an essential ingredient, atomistic simulation is unlikely to capture protein-protein behavior within reasonable amount of time.

The Twin-Arginine Translocation (Tat) Pathway

The Tat pathway [4, 5] is an important mechanism of protein transport across membrane in bacteria and in plants. It is responsible for certain protein export in bacteria cytoplasmic membrane and the import to thylakoid in chloroplast. The substrate proteins bear a unique signal peptide characterized by the double arginine motif, hence the name. The distinctive feature of Tat mechnisim is that the protein is transported in a folded manner. It has been suggested that a pore of variable size is formed to accommodate substrates being transported. This pore is mainly TatA whose number changes to match the size of substrate proteins [6].

Protein Aggregation

To allow for the formation of the translocation pore, individual TatA's may have to somehow aggregate. Of interest is to study what drives this aggregation: Is it the nature of TatA itself or it must be mediated by other components in the Tat apparatus?

The subject of protein aggregation is actually a huge area that has been under active research. The aggregation behavior is often related to patholgy and is believed to be responsible for many neurodegenerative diseases, such as Alzheimer's and Parkinson's diseases [7]. There is no doubt that biotechnology and drug industry will benefit from the study of protein aggregation, as computer simulations can be used to reduce the experimental burden [8].

[1] Jun Liu (2001) Monte Carlo Strategies in Scientific Computing.

[2] Robert and Casella. (2004) Monte Carlo Statistical Methods.

[3] Y. Chen, et al. Construction of an implicit membrane environment for the lattice Monte Carlo simulation of transmembrane protein. Biophysical Chemistry. 2010

[4] Philip A. Lee, et al. The Bacterial Twin-Arginine Translocation Pathway. Annu Rev Microbiol. 2006

[5] William Wickner and Randy Schekman, Protein Translocation Across Biological Membranes. Science. 2005

[6] Ulrich Gohlke, et al. The TatA component of the twin arginine protein transport system forms channel complexes of variable diameter. PNAS. 2005

[7] E. H. Koo, P. T. Lansbury and J. W. Kelly, Amyloid diseases: abnormal protein aggregation in neurodegeneration, Proc. Natl. Acad. Sci. 1999, 96, 9989–9990.

[8] Bratko D, Cellmer T, Prausnitz JM, Blanch HW. 2007. Molecular simulation of protein aggregation. Biotechnol Bioeng 96(1): 1–8.