Humanization¶
Humanization is the task of changing the antibody sequence to make it closer to a human antibody, thus reducing its immunogenicity in humans.
Features¶
- State-of-the-art Algorithm: Based on an antibody large language model, GeoHumAb balances humanness improvement and sequence preservation well. From the precision results, we can see that its prediction results highly align with those carried out by human experts.
- Intuitive interface: Major results are clear and easy to read. Compare with germline sequences, beware of the risk sites / CDR regions and view the suggested mutations all in one place.
- Cloud-based, mass-scale: Support batched humanization of multiple sequences on our cloud platform.
Inputs¶
To submit an Humanization job, open the Project Editor and select "Humanization" from the "Optimization" dropdown menu.
-
Antibodies:A list of antibody sequences in FASTA format. You can either enter the sequences in the input box or upload a FASTA file (by clicking the "" button). Each antibody is humanized independently.
- Chain type: Chain type(s) of the input antibodies. Could be either "VH", "VL" or "VH+VL" (default). If you want to humanize nanobodies, please select "VH" here.
Pending adatation to nanobody humanization
Currently, we do not fully support nanobody humanization. Specifically, we do not back-mutate (i.e. preserve) key residues in the framework regions of nanobodies. If you use this model to humanize nanobodies, you might want to manually back-mutate these residues.
- Antibody Name: Name of the antibody. Defaults to "Antibody". To change it, hover above the antibody name, and click on the "" button.
- Sequence: Sequence of an antibody. Each chain shall not exceed 200 residues.
- (Upload Sequence): Upload your own antibodies (a .FASTA file) by clicking the "" button. Each antibody must have 2 chains (if the chain type is "VH+VL") with labels sharing the same prefix and ending with "_H" and "_L" respectively. Above is an example.
- Job Name: Name of the job. Note that the job name must be unique within the project.
Model & Parameters¶
Two models are available for this job: GeoHumAb and CDR Grafting. GeoHumAb is our proprietary, state-of-the-art humanization algorithm. CDR Grafting is a traditional humanization technique and is recommended if you know the exact human germline sequences to use for your antibody.
Parameters of GeoHumAb are as follows.
- scheme: Antibody numbering scheme. Could be either "Kabat", "IMGT", "Chothia" or "AHo".
- CDR def.: Numbering scheme used for definition of CDR regions. Could be either "Kabat", "IMGT", "Chothia", "North". Defaults to the same as scheme. Required when
scheme = AHo
. - niter: Number of humanization iterations for the GeoHumAb algorithm. Defaults to 1. Higher niter value will result in better humanization but worse sequence preservation.
- Keep CDRs: Whether to keep CDR regions as in parental sequence to preserve binding.
- Keep Vernier: Whether to keep Vernier regions as in parental sequence to preserve binding. Only available when
CDR def. = Kabat
.
Parameters of CDR Grafting are as follows.
- scheme: Antibody numbering scheme. Could be either "Kabat", "IMGT", "Chothia" or "AHo".
- CDR def.: Numbering scheme used for definition of CDR regions. Could be either "Kabat", "IMGT", "Chothia", "North". Defaults to the same as scheme. Required when
scheme = AHo
. - VH germline: Germline heavy V gene to use as template for humanization. Defaults to 'Auto', which selects the human germline closest to the parental sequence.
- VL germline: Germline light V gene to use as template for humanization. Defaults to 'Auto', which selects the human germline closest to the parental sequence.
- Keep Vernier: Whether to keep Vernier regions as in parental sequence to preserve binding. Only available when
CDR def. = Kabat
.
Results¶
Click the job name in the Files & Jobs panel to view the job results.
Result Summary¶
The result summary is stored in a CSV file, which can be downloaded by clicking the "" button to the top-right of the Summary Table.
The summary table contains the following columns:
- Name: Name of the antibody.
- Humanness: Humanness score of the antibody before and after humanization, determined by the average occurrence of the 9-mer peptide fragments among OAS human assays. The higher the score, the more human-like the antibody is.
- Percentile: Percentile of the humanness score among all antibodies in OAS, before and after humanization. The higher the percentile, the more human-like the antibody is.
- Germline content: Germline content of the antibody before and after humanization, determined by the sequence identity with nearest heavy and light human germline sequences.
- Preservation: Sequence preservation of the antibody during humanization, determined by the sequence identity with the parental antibody.
- Humanness Improvement: Humanness improvement of the antibody during humanization, determined by the difference between the humanness score before and after humanization.
In the rightmost column, you could click on the "Detail" button to view the detailed humanness results of the antibody.
Result Detail¶
The result detail page is divided into 3 sections: Summary, Germline Comparison and Detailed View. If the input chain type is "VH+VL", the Germline Comparison and Detailed View sections will be different for the heavy and the light chain. You can switch between the two chains by using the "" chain switcher just below the Summary section.
Raw data of the result detail is stored in a CSV file, which can be downloaded by clicking the "" button to the top-right of the Detailed View section.
Summary¶
The Summary section presents the model and setting used to calculate humanness. It also shows the humanness score (with improvement), percentile, and germline content (with improvement) of the whole antibody, calculated by an average of the heavy and light chain values, if applicable.
Sequence Viewer¶
Right below the Summary section is the Sequence Viewer, which displays the parental and humanized antibody sequences. Positions where the parental and humanized sequences differ are marked with an arrow (↓).
Humanized residues with a germline frequency lower than 1% are marked with a red triangle. If a parental residue has a germline frequency lower than 1% but its humanized counterpart has a germline frequency higher than 10%, it is marked with a green triangle.
Parental/humanized residues that lie in a high-risk peptide fragment are highlighted in red. The more high-risk peptides a residue is in, the darker its color.
CDR residues are underlined with dark gray lines. Due to VD(J) recombination and somatic hypermutation, CDR regions are highly variable and thus are usually highlighted in red. However, this does not necessarily mean that these regions are immunogenic.
Vernier residues are underlined with light gray lines if using the Kabat CDR definition.
Terms
- Germline frequency: The residue frequency in the germline family at given position in human repertoires from the OAS database. The higher the frequency, the more likely the residue is a valid residue in the germline family.
- High-risk peptide: A 9-mer peptide is considered high-risk if less than 10% of human assays in OAS contain this peptide.
Germline Comparison¶
The Germline Comparison sections aligns the humanized antibody chain with its closest human germline sequences. The name of these germline sequences are shown on the right. Germline residues that are identical to the humanized sequence are highlighted in blue.
Detailed View¶
The Detailed View section presents the detailed metrics for each position in the antibody chain in a table. The table contains the following columns:
- Region: Antibody region this position belongs to according to the CDR definition. Could be FR1-4 / CDR1-3. If the CDR definition is set to Kabat, Vernier regions are also shown.
- Pos: Position in the antibody chain, numbered by the specified scheme. Shown in the format of "{chain type}{residue number}{insertion code}", e.g. "H100A".
- Sequence: Residue type at this position, before and after humanization. Residues are colored by a speicified color scheme. You can change the scheme using the "" color scheme selector at the top-right of the table.
- LM Score: Language model score of the residue, before and after humanization. The higher the score, the more likely the model thinks this residue is the most suitable one in this context.
- Resi Freq: Residue frequencyin the germline family at given position in human repertoires from the OAS database, before and after humanization. The higher the frequency, the more likely the residue is a valid residue in the germline family.
- Peptides: The 9-mer peptide that starts with this residue, before and after humanization. If the corresponding "Peptide Freq" is below 10%, the peptide is shown in red and is considered a high-risk peptide.
- Peptide Freq: The frequency of the peptide occuring in a OAS human assay, before and after humanization.