CABS-flex logo

GitHub | GitLab mirror | Preprint

Protein Flexibility

The protein flexibility workflow is designed for fast simulations of protein conformational flexibility using the CABS coarse-grained model. It samples the conformational space around an input structure and generates representative near-native models - 10 by default - that can be used to analyze local and global protein flexibility.

CABS-flex is a well-established approach for modeling protein flexibility and structure–dynamics–function relationships. The method has been validated against all-atom molecular dynamics simulations in CABS-flex MD validation (JCTC 2013), and against NMR ensembles in CABS-flex NMR validation (Bioinformatics 2014). It was later developed as a standalone and web-server tool, with applications summarized in CABS-flex applications review (Protein Science 2024), and the current web-server implementation described in CABS-flex 3.0 web server (Nucleic Acids Research 2025). Recent updates include pLDDT-guided restraints and support for rigid-like restraint schemes for stable, well-folded globular proteins, introduced in pLDDT-guided flexibility modeling (CSBJ 2024).

In CABS-flex standalone 3, the protein flexibility workflow provides command-line control over restraint-generation modes, simulation length, temperature, clustering, analysis, and all-atom reconstruction of representative models.

For more information about the underlying model, see CABS Model. For distance restraints and manual restraint definitions, see Restraints. For choosing predefined restraint-generation modes, see Flexibility Modes.

Input and basic run

⬆ Back to top

To run a standard protein flexibility simulation, you only need an input protein structure in PDB format.

Minimal command

CABSflex -i <pdb_code_or_file>

Example:

CABSflex -i 2gb1

This command will:

  1. Preparation: Download the structure (if a 4-letter code is provided) and assign secondary structure using DSSP.
  2. Restraints: Set up flexible restraints, which stabilize helices and sheets while allowing loops to move.
  3. Simulation: Perform Monte Carlo sampling (default: 1 replica, 20 annealing cycles).
  4. Analysis: Cluster the results and reconstruct the top 10 representative models to all-atom representation.

Key options

⬆ Back to top

Flexibility Control (-g)

The --protein-restraints (or -g) option controls the automatic generation of distance restraints.

  • flexible (Default): Restrains residues in secondary structure elements; loops and coils are free to fluctuate.
  • rigid: Restrains all residues within a certain distance, leading to smaller fluctuations.
  • plddt: Adjusts restraint strength based on AlphaFold pLDDT scores (requires pLDDT input).
  • unleashed: Disables all automatic restraints, allowing for very broad sampling.

See Flexibility Modes for a detailed comparison.

Simulation Length (-a, -y, -s)

The simulation depth and length can be adjusted to ensure convergence.

  • Default Configuration: 20 cycles (-a), 50 snapshots per cycle (-y), and 50 MC steps between snapshots (-s).
  • Snapshots vs Length: Adjust -y for more trajectory resolution (more snapshots) or -s for longer sampling between snapshots.

For a full explanation of how cycles and steps are calculated to determine simulation length, see the Sampling and Temperature guide.

Advanced options

⬆ Back to top

Custom Restraints & Flexibility

  • Manual Restraints: Add specific distance constraints between residues using --ca-rest-add (CA-CA) or --sc-rest-add (SC-SC). Example: --ca-rest-add 5:A 15:A 8.7 1.0 You can also provide a file with multiple restraints.
  • Local Flexibility (-f): Manually override the flexibility of specific regions (e.g., loops) using a file. Example: CABSflex -i 1kl3 -f flexibility.inp

Temperature (-t)

Controls the thermal noise (reduced dimensionless temperature) of the simulation.

  • Default: 1.4 1.4 (constant temperature).
  • Tuning: Increase the temperature (e.g. 2.0 2.0) for intrinsically disordered proteins (IDPs) or highly flexible regions.

For intrinsically disordered proteins (IDPs) or highly mobile loops, increasing the temperature to 2.0 2.0 can provide a better representation of the conformational landscape.

Example: CABSflex -i 1z1m --temperature 2.0 2.0

For a detailed guide on the physical interpretation of CABS temperature and simulated annealing, see the Sampling and Temperature guide.

Output and analysis

⬆ Back to top

All results are saved in the working directory (default: CABSflex_output/):

Structural Ensembles (output_pdbs/)

  • model_1.pdb to model_10.pdb: The 10 representative all-atom models (centroids of the largest clusters).
  • replica_1.pdb: The full simulation trajectory (now contains CA and SC atoms).
  • start_*.pdb: Various versions of the starting structure with information mapped to the B-factor column (see below).

Understanding PDB Columns

CABS-flex utilizes standard PDB columns to store simulation-specific information:

  • Occupancy (occ): Stores numeric secondary structure codes (1: Coil, 2: Helix, 3: Turn, 4: Sheet).
  • B-factor (bfac): Used in start_*.pdb files to store:

    • start_bfac.pdb: Original B-factors from the input structure.
    • start_rmsf.pdb: RMSF values calculated from the simulation trajectory.
    • start_plddt.pdb: pLDDT scores (if provided as input).
    • start_category.pdb: Flexibility categories (0: rigid, 4: most flexible).
    • start_secstr.pdb: Secondary structure numeric codes (same as occ).

Example of a PDB file generated by CABS-flex (CA + SC atoms):

MODEL        1
ATOM      1  CA  ILE H  16      13.453  28.620  28.758  1.00 14.14 
ATOM      2  SC  ILE H  16      14.949  27.343  29.585  1.00 14.14 
ATOM      3  CA  VAL H  17      12.841  28.591  25.098  4.00 14.94 
ATOM      4  SC  VAL H  17      11.783  29.360  24.236  4.00 14.94 
ATOM      5  CA  GLY H  18      14.672  26.138  23.286  1.00 14.62 
ATOM      6  SC  GLY H  18      14.672  26.138  23.286  1.00 14.62 
...
ENDMDL

Data & Plots

All plots and data are saved in the plots/ and contact_maps/ directories:

  • RMSF Profile: plots/RMSF_seq.svg (and plots/RMSF.csv for raw data). Indicates which parts of the protein are most flexible.
  • Energy vs. RMSD: plots/E_RMSD_<chains>_total.svg. A scatter plot showing the relationship between model energy and similarity to the starting structure.
  • Trajectory RMSD: plots/RMSD_frame_<chains>_replica_0.svg. RMSD from the starting structure over the course of the simulation.
  • Contact Maps: contact_maps/all.svg (average for all models) and contact_maps/top10.svg (for the top 10 models). Frequency of residue-residue contacts during the simulation.

Raw Data (output_data/)

  • ss.txt: The secondary structure sequence used for the simulation.
  • lowest_rmsds_<chains>.txt: Summary of the lowest RMSD values achieved.
  • all_rmsds_<chains>.txt: Full list of RMSD values for every frame.

⬆ Back to top


← All-Atom Reconstruction | ⬆ Back to top | Next: Peptide Modeling