Protein Structure Design¶
Protein structure design is the process of creating novel protein structures with desired functions. Combined with Protein Sequence Design, it is a powerful tool for the following applications:
- Motif scaffolding: Design a novo protein bearing a specific motif (for binding, catalysis, antigenicity, etc.) copied from the initial structure.
- Interface redesign: Redesign the protein-protein interface to improve the interaction between two proteins.
- De novo binder design: Design a protein binder against a specific target protein from scratch.
Features¶
-
Validated algorithm: The RFDiffusion algorithm is published in Nature, validated by multiple design campaigns and widely used for protein structure design.
-
Flexible design approach: Fully customize your design regions. Specify whether you would like to design the sequence, structure, or both.
-
Cloud-based, Mass-scale: The algorithm runs on our cloud platform and can be executed at scale.
Inputs¶
To submit a Protein Structure Design job, open the Project Editor and click the "New Job" button on the left sidebar. Then select "Protein Structure Design" under the "Protein Design" group to open the job submission page.
-
Initial Structure: The template structure that serves as the starting point for your design in PDB/mmCIF format. The structure can be uploaded from your local machine, imported from a cloud database (see Add File to Project for instructions), or generated by another GeoBiologics job.
Requirements for the initial structure
The initial structure must have no insertion codes and no UNK residues.
-
Design Contigs: A string to describe the designed structure. Chains in the designed structure are separated by “/0 ”. Each chain is composed of multiple fixed or designed contigs, separated by “/”.
- Fixed contig: Copy both sequence and structure from a specified segment of the initial structure file. Represented as
{chain_id}:{start_resi_id}-{end_resi_id}
or simply{chain_id}:{resi_id}
. - Designed contig: Design both sequence and structure for a protein segment de novo. Represented as
{min_len}-{max_len}
or simply{len}
.
- Fixed contig: Copy both sequence and structure from a specified segment of the initial structure file. Represented as
-
Annotations: You can add annotations to the designed contigs to give hints/instructions to the model.
- Seq-masked sites: Mask the sequences while fixing the structure for these sites in fixed contigs. Represented as
{chain_id}:{start_resi_id}-{end_resi_id}
, or simply{chain_id}:{resi_id}
. - Struct-masked sites: Redesign the structures while fixing the sequence for these sites in fixed contigs. Usually used for ligand-binding regions on the receptor - its sequence is fixed but structure may vary due to ligand binding. Format is the same as above.
- Interface: Mark residues on a fully-fixed “receptor” chain as interface residues. Format is the same as above.
Legend for design contigs and annotations - Seq-masked sites: Mask the sequences while fixing the structure for these sites in fixed contigs. Represented as
-
Job Name: Name of the job. Note that the job name must be unique within the project.
Models & Parameters¶
Currently, RFDiffusion is the recommended model for protein structure design. This model has the following parameters:
-
# designs: The number of designs to generate. Default is 10. We recommend generating at least 100 designs for downstream inverse folding and virtual screening.
-
Noise scale: Controls the diversity of generated designs. This can vary from 0 to 1 (default). Higher values increase diversity at the expense of design quality.
-
Diffuse ratio%: When the designs are aligned to the initial structure, you may use partial diffusion to make the designs closer to the initial structure. The default value is 100%, which means full diffusion.
Results¶
Click the job name in the Files & Jobs panel to view the job results.
The summary table simply contains a list of designed structures.
Each design can be visualized in 3D directly in the interface, and structures can be downloaded in PDB format for further analysis or experimental validation.