Roland L. Dunbrack, Jr, PhD

Roland Dunbrack, Jr., PhD


Director, Organic Synthesis Facility

Director, Molecular Modeling Facility

Adjunct Professor, University of Pennsylvania School of Medicine

Adjunct Associate Professor, Drexel University College of Medicine

Head and Neck Cancer TRDG Member

Kidney, Bladder, and Prostate Cancer TRDG Member

Research Program

Computational design of an antibody against HIV gp120
New clustering and nomenclature for beta turns in proteins
Cluster of similar interfaces among protein kinases, including BRAF, CSK, ITK, and RIPK2
Dunbrack group
Clusters 1, 2, and 3 of antibody CDR L3 loops (length 8) and the outliers (right)
Structures of kinases in the process of autophosphorylation


Education, Training & Credentials

Educational Background

  • PhD, Biophysics, Harvard University, Cambridge, MA, 1993
  • AB, Chemistry, Harvard College, Cambridge, MA, 1985
Research Profile

Research Program

Research Interests

Statistical Analysis of Protein Structures

We have several ongoing projects in structural bioinformatics, whose purpose is twofold: understanding the determinants of the structures and dynamics of proteins and protein complexes; and providing information that is useful for improving structure prediction.

  1. Backbone-Dependent Rotamer Library
    We are continuing to develop our backbone-dependent rotamer library, which expresses the frequency of side-chain rotamers as a function of the backbone dihedral angles phi and psi. The most recent publicly available version of the rotamer library was developed using Bayesian statistical analyses from a set of 850 protein chains. Our rotamer libraries are used in most side-chain conformation prediction programs as well most protein design methods.

    More recently, we have been developing a new backbone-dependent rotamer library that has smoothly varying probabilities and mean dihedral angles and variances. This rotamer library has been developed using non-parametric statistics using kernel density estimates and kernel regressions. It is designed primarily for use in programs that require smoothly varying functions for local minimization, such as Rosetta. It will be available in the near future (Shapovalov and Dunbrack, in preparation).
  2. Neighbor-Dependent Ramachandran Maps
    Statistics of protein backbone conformations have been studied for over 40 years. While many studies have been presented, only a handful of distributions are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. We have developed Ramachandran probability distributions for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Maps for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent maps. The neighbor-independent and neighbor-dependent maps have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between maps for a particular residue type and different neighbor residue types. The resulting maps are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions will be made available at on publication (Ting et al., submitted).
  3. Protein Complexes
    Much of the software and results from molecular modeling have focused on the prediction of the structure of a single protein molecule. Proteins act on other molecules, including other proteins, DNA, and ligands or substrates, and our recent focus has been on developing methods for predicting these kinds of structures. The rapidly growing number of Protein Data Bank (PDB) entries with increasing complexity and diversity provides rich information for structure prediction and modeling of protein interactions.

    Many proteins function as homo-oligomers and are regulated via their oligomeric state. For some proteins, the stoichiometry of homo-oligomeric states under various conditions has been studied using gel filtration or ana- lytical ultracentrifugation experiments. The interfaces involved in these assemblies may be identified using cross-linking and mass spectrometry, solution-state NMR, and other experiments. However, for most proteins, the actual interfaces that are involved in oligomerization are inferred from X-ray crystallographic structures using assumptions about interface surface areas and physical properties. Examination of interfaces across different Protein Data Bank (PDB) entries in a protein family reveals several important features. First, similarities in space group, asymmetric unit size, and cell dimensions and angles (within 1%) do not guarantee that two crystals are actually the same crystal form, containing similar relative orientations and interactions within the crystal. Conversely, two crystals in different space groups may be quite similar in terms of all the interfaces within each crystal. Second, NMR structures and an existing benchmark of PDB crystallographic entries consisting of 126 dimers as well as larger structures and 132 monomers were used to determine whether the existence or lack of common interfaces across multiple crystal forms can be used to predict whether a protein is an oligomer or not. Monomeric proteins tend to have common interfaces across only a minority of crystal forms, whereas higher-order structures exhibit common interfaces across a majority of available crystal forms. The data can be used to estimate the probability that an interface is biological if two or more crystal forms are available. Finally, the Protein Interfaces, Surfaces, and Assemblies (PISA) database available from the European Bioinformatics Institute is more consistent in identifying interfaces observed in many crystal forms compared with the PDB and the European Bioinformatics Institute's Protein Quaternary Server (PQS). The PDB, in particular, is missing highly likely biological interfaces in its biological unit files for about 10% of PDB entries.

Programs for Protein Structure Prediction

We have developed several programs for protein structure prediction, and made these publicly available via our website.

  1. SCWRL4
    Determination of side-chain conformations is an important step in protein structure prediction and protein design. Many such methods have been presented, although only a small number are in widespread use. SCWRL is one such method, and the SCWRL3 program (2003) has remained popular because of its speed, accuracy, and ease-of-use for the purpose of homology modeling. However, higher accuracy at comparable speed is desirable. This has been achieved in a new program SCWRL4 through: (1) a new backbone-dependent rotamer library based on kernel density estimates (described above); (2) averaging over samples of conformations about the positions in the rotamer library; (3) a fast anisotropic hydrogen bonding function; (4) a short-range, soft van der Waals atom–atom interaction potential; (5) fast collision detection using k-discrete oriented polytopes; (6) a tree decomposition algorithm to solve the combinatorial problem; and (7) optimization of all parameters by determining the interaction graph within the crystal environment using symmetry operators of the crystallographic space group. Accuracies as a function of electron density of the side chains demonstrate that side chains with higher electron density are easier to predict than those with low-electron density and presumed conformational disorder. For a testing set of 379 proteins, 86% of chi1 angles and 75% of chi1+chi2 angles are predicted correctly within 408 of the X-ray positions. Among side chains with higher electron density (25–100th percentile), these numbers rise to 89 and 80%. The new pro- gram maintains its simple command-line inter- face, designed for homology modeling, and is now available as a dynamic-linked library for incorporation into other software programs.

    Ongoing work in SCWRL4 is focused on two efforts. First, we have developed a method for predicting multiple side-chain conformations for each residue, and compared these predictions with electron density calculations (Shapovalov and Dunbrack, Proteins 2007). Second, we are incorporating protein design algorithms into SCWRL4 to achieve fast and hopefully useful design capability.
  2. MolIDE
    MolIDE 1.6 (Molecular Interactive Development Environment) provides a graphical interface for basic comparative (homology) modeling using SCWRL and other programs. MolIDE takes an input target sequence and uses PSI-BLAST to identify and align templates for comparative modeling of the target. The sequence alignment to any template can be manually modified within a graphical window of the target–template alignment and visualization of the alignment on the template structure. MolIDE builds the model of the target structure on the basis of the template backbone, predicted side-chain conformations with SCWRL and a loop-modeling program for insertion–deletion regions with user-selected sequence segments.

    We are currently working on a new version of MolIDE (MolIDE2) that will enable modeling of biologically relevant protein complexes, including homooligomers and heterodimers and larger structures. The input can be more than one protein sequence, and the first step is to determine the protein domain content of the input sequences using PFAM. The program then searches the PDB for any structure with one or more of the PFAM families present in the query sequences, presenting the results with the largest overlap to the queries first. So for instance, if the query consists of two proteins, protein complexes with homologues to both proteins will be presented first. With a few clicks, it is then possible to find all the biologically relevant complexes that can be built from the available templates, and to perform homology modeling from these complexes

Lab Overview


Mark Andrake

Research Assistant Professor; Manager, Molecular Modeling Facility

Room: R428

Qifang Xu, PhD

Research Assistant Professor

Room: R462

Maxim Shapovalov, MS

Senior Programmer Analyst

Room: R462

Vivek Modi, PhD

Postdoctoral Associate

Room: R462

Peter Huwe

Postdoctoral Research Associate

Room: R462

Simon Kelow

Graduate Student, University of Pennsylvania School of Medicine

Room: R462

Cynthia Myers

Manager, Organic Synthesis Facility

Room: R458

Selected Publications

Adolf-Bryfogle J, Xu Q, North B, Lehmann A, Dunbrack RL Jr. PyIgClassify: A database of antibody CDR structural classifications. Nucleic Acids Research (database issue) 43:D432-438, 2015. DOI: 10.1093/nar/gku1106; PMCID: PMC4383924.

Berkholz DS, Driggers CM, Shapovalov MV, Dunbrack RL Jr., Karplus PA. Nonplanar peptide bonds in proteins are common and conserved but not biased toward active sites. Proc Natl Acad Sci USA 109:449-453, 2012. PMCID: PMC3258596.

Huwe PJ, Xu Q, Shapovalov MV, Modi V Andrake MD, Dunbrack RL. Biological function derived from predicted structures in CASP11. Proteins 84 Suppl. 1:370-391, 2016. DOI: 10.1002/prot.24997; PMCID: PMC4963311.

Krivov GG, Shapovalov MV, Dunbrack RL Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins 77:778-795, 2009. PMCID: PMC2885146.

Lehmann AK, Wixted JHF, Shapovalov MV, Roder H, Dunbrack RL Jr.,* Robinson MK.* Stability engineering of anti-EGFR scFv antibodies by rational design of a -to- swap of the VL framework using a structure-guided approach. mAbs 7:1058-1071, 2015. DOI:10.1080/19420862.2015.1088618; PMCID: PMC4966335.

North B, Lehmann A, Dunbrack RL Jr. A new clustering of antibody CDR loop conformations. J Molec Biol 406:228-256, 2011. PMCID: PMC3065967.

Shapovalov MV, Dunbrack RL Jr. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure (Cell Press) 19:844-858, 2011. PMCID: PMC3118414.

Shapovalov MV, Wang Q, Xu Q, Andrake MD, Dunbrack RL Jr. BioAssemblyModeler (BAM): User-friendly homology modeling of protein homo- and heterooligomers. PLOS ONE 9:e98309, 2014. PMCID: PMC4055448.

Ting D, Wang G, Shapovalov MV, Mitra R, Jordan MI, Dunbrack RL Jr. Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical Dirichlet process model. PLOS Comp Biol 6:e1000763, 2010. PCMID: PMC2861699.

Wei Q, Dunbrack RL Jr. The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLOS ONE 8:e67863, 2013. PMCID: PMC3706434.

Wei Q, Xu Q, Dunbrack RL Jr. Prediction of phenotypes of missense mutations in human proteins from the structures of biological assemblies. Proteins 81:199-213, 2013. PMCID: PMC3552143.

Weitzner BD, Dunbrack RL Jr.,* Gray JJ*. The origin of CDR H3 structural diversity. Structure 23:302-311, 2015. *Corresponding authors. PMCID: PMC4318709.

Weitzner BD, Jeliazkov JR, Lyskov S, Marze N, Kuroda D, Frick R, Adolf-Bryfogle J, Biswas N, Dunbrack RL Jr, Gray JJ. Modeling and docking of antibodies with Rosetta. Nature Protocols 12:401-416, 2017. PMCID: in process.

Xu Q, Dunbrack RL Jr. Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB. Bioinformatics 28:2763-2772, 2012. PMCID: PMC3476341.

Xu Q, Dunbrack RL Jr. ProtCid: the Protein Common Interface Database. Nucleic Acids Res 39:D761-770, 2011. PMCID: PMC3013667.

Xu Q, Malecka KL, Fink L, Jordan JJ, Duffy E, Kolander S, Peterson J, Dunbrack RL Jr. Three-dimensional structures of autophosphorylation complexes in crystals of protein kinases. Science Signaling 8:rs13, 2015. DOI:10.1126/scisignal.aaa6711; PMCID: PMC4766099.

This Fox Chase professor participates in the Undergraduate Summer Research Fellowship
Learn more about Research Volunteering.