Protein Structure Prediction¶
Protein structure prediction is the task of predicting the 3D structure of a protein from its amino acid sequence. As structure determines function, protein structure prediction is of fundamental importance in biology and medicine. Here we support two state-of-the-art protein structure prediction methods: ESMFold and AlphaFold2-Multimer v3.
Features¶
-
Flexible configuration: Supports single-sequence prediction (ESMFold) and prediction with multiple sequence alignment and structure templates (AF2-Multimer v3). The latter can directly predict the structure of multi-chain proteins and could be useful for protein-protein docking.
-
Batch prediction: Supports batch prediction and Amber relaxation (optionally) of multiple sequences in the cloud.
-
Graphical interface: No need to configure the environment or use the command line; simply click on the graphical interface to complete the structure prediction.
ESMFold¶
Inputs¶
To submit a Protein Structure Prediction job, open the Project Editor and select "Protein Structure Prediction" from the "Structure Modeling" dropdown menu.
-
Proteins: A list of input single-chain protein sequences. You can either enter the sequences in the input box or upload a FASTA file (by clicking the "" button). Each chain is predicted independently.
- : Add a new protein sequence to the input.
- Protein Name: Name of the protein. Defaults to "Protein {i}" for the i-th sequence. To change it, hover above the protein name, and click on the "" button.
-
Job Name: Name of the job. Note that the job name must be unique within the project.
Parameters¶
-
# cycles: Number of recycles to run (0-4). Defaults to 4, the number used in training.
-
Relax structure:Whether to relax the model-generated protein using Amber (default to false).
-
chunk size: If not None (default), compute attention in chunks. Lower values will result in lower memory usage at the cost of speed.
Results¶
Click the job name in the Files & Jobs panel to view the job results. The result summary is stored in a CSV file, which can be downloaded by clicking the "" button to the top-right of the Summary Table.
The summary table contains the following columns:
- name: FASTA label, as in input.
- sequence: FASTA sequence, as in input.
- plddt: Predicted lDDT (local Distance Difference Test) score for the generated protein. Higher is better. The per-residue lDDT scores are stored in the b-factor of the output .pdb files. You could view them by changing the color theme of the Cartoon representation (of the Polymer component) to "Atom Properties > Uncertainty/Disorder".
- ptm: Predicted TM (Template modeling) score for the generated protein. Higher is better.
In the rightmost column, you could click on the "" button to view the predicted structure. If you enabled the "relax" option, you would find two files in the dropdown menu: one for the unrelaxed structure and one for the relaxed structure.
AlphaFold2-Multimer v3¶
Inputs¶
To submit a Protein Structure Prediction job, open the Project Editor and select "Protein Structure Prediction" from the "Structure Modeling" dropdown menu. Then, select "Show Parameters" to reveal the model and parameter options. Change the model from "ESMFold" to "AlphaFold2-Multimer v3".
-
Proteins: Input protein sequences in FASTA format. You can either enter the sequences in the input box or upload a FASTA file (by clicking the "" button). Each protein, potentially multi-chain, is predicted independently.
- : Add a new protein sequence to the input.
- Protein Name: Name of the protein. Defaults to "Protein {i}" for the i-th sequence. To change it, hover above the protein name, and click on the "" button.
- Chain ID: A single character identifying the chain in the protein. Defaults to uppercase letters in alphabetical order. You can change it in the "Chain
- Add Chain: Add a new chain to the protein. AF2-Multimer can predict the structure of multi-chain proteins. We recommend no more than 9 chains and 1000 residues in total.
-
Job Name: Name of the job. Note that the job name must be unique within the project.
Parameters¶
- # cycles: Number of recycles to run (1-48). Defaults to 16.
- Relax structure: Whether to relax the model-generated protein using Amber (default to false).
- MSA mode: Search MSA using MMSeqs2 from UniRef sequence database and, optionally, environmental sequences. Valid choices are "MMSeqs2 (UniRef + Environmental)" (Default), "MMSeqs2 (UniRef)", "Single sequence (No MSA)".
- Pair mode: Multi-chain MSA setting. Paired: use paired sequences from the same species. Unpaired: use separate MSA for each chain. Valid choices are "Unpaired", "Paired", "Unpaired + Paired" (Default).
Results¶
Click the job name in the Files & Jobs panel to view the job results. The result summary is stored in a CSV file, which can be downloaded by clicking the "" button to the top-right of the Summary Table.
The summary table contains the following columns:
- name: FASTA label, as in input.
- sequence: FASTA sequence, as in input.
- plddt: Predicted lDDT (local Distance Difference Test) scores from five AF2 models for the generated protein. Higher is better. The per-residue lDDT scores are stored in the b-factor of the output .pdb files. You could view them by changing the color theme of the Cartoon representation (of the Polymer component) to "Atom Properties > Uncertainty/Disorder".
- ptm: Predicted TM (Template modeling) scores from five AF2 models for the generated protein. Higher is better.
In the rightmost column, you could click on the "" button to view the predicted structure. If you enabled the "relax" option, you would find two files in the dropdown menu: one for the unrelaxed structure and one for the relaxed structure.