Research Interests

Our current areas of interest include:

Parallel Computing

Our work on parallel computing focuses on scaling applications in these areas to emerging hardware architectures. We work with Life Sciences applications such as: AMBER, AutoDock, BLAST, ClustalW, DOCK, Gaussian, HMMER, LAMMPS, NAMD and several others.

Parallel processing long has been recognized as a potentially powerful tool for faster computing. As single processors approach physical limitations, it becomes more difficult to improve performance based on a single processor. In fact, massively parallel machines such as the Cray XK6 used a hybrid approach by combining CPUs with accelerators.

Cray XK6

Some of our early work involved enabling the first parallel version of the Gaussian 94 series of electronic structure programs on a distributed-memory massively parallel supercomputer [C. P. Sosa, J. Ochterski, J. Carpenter, and M. J. Frisch, J. Comp. Chem. 19, 1053-1063(1998)]. We analyzed the scalability of methods such as Hartree-Fock and density functional theory (DFT), including first and second derivatives. In addition, we explored scalability for CIS, MP2, and MSCF calculations. We are also interested in hybrid models. We reported the first OpenMP implementation of Gaussian 98 [C. P. Sosa, G. Scalmani, R. Gomperts, and M. J. Frisch, Parallel Computing 26, 843-856(2000). OpenMP is a standard for parallel programming on shared-memory computers.

More recently, we have worked on Bioinformatics applications such as HMMER and mpiBLAST-PIO. Bioinformatics databases used for sequence comparison and sequence alignment are growing exponentially. This has popularized programs that carry out database searches. Current implementations of sequence alignment methods based on hidden Markov models (HMM) have proven to be computationally intensive and, hence, amenable to architectures with multiple processors. In our work, we start by porting the parallel virtual machine (PVM) versions of these two programs currently available as part of the HMMER suite of programs. We introduced techniques such as alternate sequence file indexing, multiple-master configuration, dynamic data collection and, finally, load balancing via the indexed sequence files. This set of optimizations constitutes our modified version for massively parallel systems. Our results show parallel performance improvements of more than one order of magnitude (16 times) for hmmsearch and hmmpfam [K. Jiang, O. Thorsen, A. Peters, B. Smith, and C. P. Sosa, IEEE Transactions in Parallel Distributed Systems, 19, 15-23(2008)].

Scalability plot

Scaling chart for queries run versus the nr database.  From the top, thick solid line corresponds to ideal scaling; thin solid line corresponds to the large query; dashed line corresponds to the medium query; dotted line corresponds to the small query.

Also, in collaboration with H. Lin, North Carolina State University, Dr P. Balaji, Argonne National Laboratory, Prof. X. Ma, North Carolina State University and Prof. W. Fen, Virginia Tech we have extensively optimized mpiBLAST-PIO [O. Thorsen, K. Jiang, A. Peters, B. Smith, H. Lin, W. Feng, C. P. Sosa, Proceedings of the 4th International Conference on Computing Frontiers, 59-68(2007) and H. Lin, P. Balaji, R. Poole, C. P. Sosa, X. Ma and W. Feng, IEEE/ACM International Conference for High-Performance Computing, Networking, Storage and Analysis (SC), 2008].

Molecular Simulations

In collaboration with institutions such as the Hormel Institute and the Mayo Clinic we are carrying out molecular simulations. Understanding cancer at the cellular and molecular level is important in structure-based drug discovery. We are working together by combining state-of-the-art supercomputer modeling and molecular and cellular biology to identify novel high-affinity chemical scaffold inhibitors and further develop them to nanomolar-affinity small-molecule inhibitors. Similarly, we are collaborating with the Mayo Clinic to identify viral replication inhibitors. The application of current in silico docking methodologies to the discovery of human immunodeficiency virus (HIV) and Ebola virus replication inhibitors. Evaluation and validation of in silico results through laboratory based in vitro high-throughput screening. Improving in silico docking by theoretical research leading to refinement in docking algorithms.

We primarily use the DOCK6 (developed at UCSF) and AutoDock (developed at the Scripps Institute) software packages to carry out molecular docking and in silico screening. Our work in collaboration with the DOCK6 developers allowed us to implement a massively parallel version of DOCK6

parallel docking

[A. Peters, M. Lundberg, T. Lang, C. P. Sosa, Redpaper 4410, Poughkeepsie, NY, April, 16 2008].


We are interested in identifying secreted proteins in silico. Proteins that are processed through the secretory pathway and included in the secretome are a subset of the proteome. Unfortunately, experimental determination of localization is only available for ~30% of the human proteome, and often researchers rely on localization-prediction programs to annotate gene products of interest. Deciding which in silico tool to use is a difficult task because, each year, more programs are created and published and are readily available. Programs use different computation methods, predict localization to different locations and, often, are not either independently benchmarked or easily compared. Thus, it is important to understand the capabilities and limitations of these in silico tools. We are interested in developing in silico techniques for classifying secreted proteins [E. W. Klee and C. P. Sosa, Invited Review Article, Drug Discovery Today, 12, 234-239(2007)].

Last modified: April 10, 2012


[Return to Home Page]