
GitHub |
GitLab mirror | Preprint
Protein Flexibility¶
The protein flexibility workflow is designed for fast simulations of protein conformational flexibility using the CABS coarse-grained model. It samples the conformational space around an input structure and generates representative near-native models - 10 by default - that can be used to analyze local and global protein flexibility.
CABS-flex is a well-established approach for modeling protein flexibility and structure–dynamics–function relationships. The method has been validated against all-atom molecular dynamics simulations in CABS-flex MD validation (JCTC 2013), and against NMR ensembles in CABS-flex NMR validation (Bioinformatics 2014). It was later developed as a standalone and web-server tool, with applications summarized in CABS-flex applications review (Protein Science 2024), and the current web-server implementation described in CABS-flex 3.0 web server (Nucleic Acids Research 2025). Recent updates include pLDDT-guided restraints and support for rigid-like restraint schemes for stable, well-folded globular proteins, introduced in pLDDT-guided flexibility modeling (CSBJ 2024).
In CABS-flex standalone 3, the protein flexibility workflow provides command-line control over restraint-generation modes, simulation length, temperature, clustering, analysis, and all-atom reconstruction of representative models.
For more information about the underlying model, see CABS Model. For distance restraints and manual restraint definitions, see Restraints. For choosing predefined restraint-generation modes, see Flexibility Modes.
Input and basic run¶
To run a standard protein flexibility simulation, you only need an input protein structure in PDB format.
Minimal command¶
CABSflex -i <pdb_code_or_file>
Example:
CABSflex -i 2gb1
This command will:
- Preparation: Download the structure (if a 4-letter code is provided) and assign secondary structure using DSSP.
- Restraints: Set up
flexiblerestraints, which stabilize helices and sheets while allowing loops to move. - Simulation: Perform Monte Carlo sampling (default: 1 replica, 20 annealing cycles).
- Analysis: Cluster the results and reconstruct the top 10 representative models to all-atom representation.
Key options¶
Flexibility Control (-g)¶
The --protein-restraints (or -g) option controls the automatic generation of distance restraints.
flexible(Default): Restrains residues in secondary structure elements; loops and coils are free to fluctuate.rigid: Restrains all residues within a certain distance, leading to smaller fluctuations.plddt: Adjusts restraint strength based on AlphaFold pLDDT scores (requires pLDDT input).unleashed: Disables all automatic restraints, allowing for very broad sampling.
See Flexibility Modes for a detailed comparison.
Simulation Length (-a, -y, -s)¶
The simulation depth and length can be adjusted to ensure convergence.
- Default Configuration: 20 cycles (
-a), 50 snapshots per cycle (-y), and 50 MC steps between snapshots (-s). - Snapshots vs Length: Adjust
-yfor more trajectory resolution (more snapshots) or-sfor longer sampling between snapshots.
For a full explanation of how cycles and steps are calculated to determine simulation length, see the Sampling and Temperature guide.
Advanced options¶
Custom Restraints & Flexibility¶
- Manual Restraints: Add specific distance constraints between residues using
--ca-rest-add(CA-CA) or--sc-rest-add(SC-SC). Example:--ca-rest-add 5:A 15:A 8.7 1.0You can also provide a file with multiple restraints. - Local Flexibility (
-f): Manually override the flexibility of specific regions (e.g., loops) using a file. Example:CABSflex -i 1kl3 -f flexibility.inp
Temperature (-t)¶
Controls the thermal noise (reduced dimensionless temperature) of the simulation.
- Default:
1.4 1.4(constant temperature). - Tuning: Increase the temperature (e.g.
2.0 2.0) for intrinsically disordered proteins (IDPs) or highly flexible regions.
For intrinsically disordered proteins (IDPs) or highly mobile loops, increasing the temperature to 2.0 2.0 can provide a better representation of the conformational landscape.
Example: CABSflex -i 1z1m --temperature 2.0 2.0
For a detailed guide on the physical interpretation of CABS temperature and simulated annealing, see the Sampling and Temperature guide.
Output and analysis¶
All results are saved in the working directory (default: CABSflex_output/):
Structural Ensembles (output_pdbs/)¶
model_1.pdbtomodel_10.pdb: The 10 representative all-atom models (centroids of the largest clusters).replica_1.pdb: The full simulation trajectory (now contains CA and SC atoms).start_*.pdb: Various versions of the starting structure with information mapped to the B-factor column (see below).
Understanding PDB Columns¶
CABS-flex utilizes standard PDB columns to store simulation-specific information:
- Occupancy (
occ): Stores numeric secondary structure codes (1: Coil, 2: Helix, 3: Turn, 4: Sheet). -
B-factor (
bfac): Used instart_*.pdbfiles to store:start_bfac.pdb: Original B-factors from the input structure.start_rmsf.pdb: RMSF values calculated from the simulation trajectory.start_plddt.pdb: pLDDT scores (if provided as input).start_category.pdb: Flexibility categories (0: rigid, 4: most flexible).start_secstr.pdb: Secondary structure numeric codes (same asocc).
Example of a PDB file generated by CABS-flex (CA + SC atoms):
MODEL 1
ATOM 1 CA ILE H 16 13.453 28.620 28.758 1.00 14.14
ATOM 2 SC ILE H 16 14.949 27.343 29.585 1.00 14.14
ATOM 3 CA VAL H 17 12.841 28.591 25.098 4.00 14.94
ATOM 4 SC VAL H 17 11.783 29.360 24.236 4.00 14.94
ATOM 5 CA GLY H 18 14.672 26.138 23.286 1.00 14.62
ATOM 6 SC GLY H 18 14.672 26.138 23.286 1.00 14.62
...
ENDMDL
Data & Plots¶
All plots and data are saved in the plots/ and contact_maps/ directories:
- RMSF Profile:
plots/RMSF_seq.svg(andplots/RMSF.csvfor raw data). Indicates which parts of the protein are most flexible. - Energy vs. RMSD:
plots/E_RMSD_<chains>_total.svg. A scatter plot showing the relationship between model energy and similarity to the starting structure. - Trajectory RMSD:
plots/RMSD_frame_<chains>_replica_0.svg. RMSD from the starting structure over the course of the simulation. - Contact Maps:
contact_maps/all.svg(average for all models) andcontact_maps/top10.svg(for the top 10 models). Frequency of residue-residue contacts during the simulation.
Raw Data (output_data/)¶
ss.txt: The secondary structure sequence used for the simulation.lowest_rmsds_<chains>.txt: Summary of the lowest RMSD values achieved.all_rmsds_<chains>.txt: Full list of RMSD values for every frame.
Related pages¶
- Flexibility Modes: Choosing the right restraint scheme.
- Restraints: How distance restraints govern the simulation.
- Examples: Detailed walkthroughs of protein flexibility cases.
- References: CABS-flex standalone (Bioinformatics 2019).
← All-Atom Reconstruction | ⬆ Back to top | Next: Peptide Modeling