Protein Inverse Folding¶
Sequence design is the task of generating a sequence that is compatible with a given protein/antibody structure. It is of great importance in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes).
Features¶
-
Validated algorithm: The ProteinMPNN algorithm is experimentally validated and widely used for protein sequence design. It can also be used to design antibodies after some fine-tuning.
-
Cloud-based, Mass-scale: The algorithm is deployed on our cloud platform and can be run on a massive scale. Results are stored in a CSV file and can be easily downloaded for further analysis.
Inputs¶
To submit a Protein Inverse Folding job, open the Project Editor and select "Protein Inverse Folding" from the "Protein Design" dropdown menu.
- Target: The target structure for sequence design.
- Structure: Target structure in PDB/mmCIF format. The structure can be uploaded from a local machine, imported from a cloud database or generated by a GeoBiologics job.
- Chains to design: IDs of the chains to design. If not specified, all chains will be designed.
- Job Name: Name of the job. Note that the job name must be unique within the project.
Models & Parameters¶
Two models are available for this job: ProteinMPNN, suitable for general protein design, and ProteinMPNN-Ab, more adept at antibody design.
The parameters for the two models are the same, as shown below.
- # designs: The number of sequences to generate. The default value is 10.
Results¶
Click the job name in the Files & Jobs panel to view the job results. The results are stored in a CSV file, which can be downloaded by clicking the "" button to the top-right of the Summary Table.
The summary table contains the following columns:
- name: name of the output sequence
- score: average negative log likelihood of the sampled (designed) amino acids
- global_score: average negative log likelihood of amino acids in all chains
- seq_recovery: average sequence identity between the designed sequence and the input sequence
- sequence_chain_{chain_id}: designed sequence for chain {chain_id}. E.g., if you chose
H,L
as the chains to design, you would have two columns in the summary table,sequence_chain_H
andsequence_chain_L
.