METHOD OF ANALYZING BINDING INTERACTIONS |
|||||||
申请号 | EP11827086.7 | 申请日 | 2011-09-20 | 公开(公告)号 | EP2619578B1 | 公开(公告)日 | 2016-12-14 |
申请人 | Full Spectrum Genetics, Inc.; | 发明人 | DUBRIDGE, Robert, B.; | ||||
摘要 | |||||||
权利要求 | |||||||
说明书全文 | Great effort has been directed to understanding and manipulating protein-protein and protein-ligand binding reactions because of the central role such reactions play in living systems and in drug development. In particular, a wide range of techniques have been developed to identify or improve the binding reactions of antibodies for therapeutic, diagnostic, analytical and chromatographic applications, e.g. The strength of the binding interaction between a protein and its ligand is characterized by its binding affinity, a function of the ratio under equilibrium conditions of ligand bound to protein and the product of free ligand and free protein. One way to measure a protein's binding affinity for its ligand is to mix a known quantity of the protein with decreasing concentrations of the ligand, allow these reactions to reach equilibrium and measure the concentrations of bound versus free protein in each reaction. These measurements can then be used to rank the binding affinities of multiple proteins or protein variants that all bind the same ligand. The protein that has the highest percent binding at any given concentration of ligand will have the highest binding affinity, e.g. In view of the above, applications requiring an understanding of protein binding reactions, such as antibody engineering, would be advanced by the availability of efficient techniques for providing statistically significant information on candidate binding molecules despite the large number of candidates that must be assessed in typical protein-ligand and protein-protein interactions. The present disclosure is directed to methods for analyzing protein-protein and/or protein-ligand binding reactions and for improving such reactions for at least one member of such a binding pair, or for improving other characteristics of at least one member of such a pair, including, but not limited to, stability, specificity, immunogenicity, expressibility, manufacturability, or the like. Aspects and embodiments of the present invention are exemplified in a number of implementations and applications, some of which are summarized below and throughout the specification. In one aspect the disclosure includes a method of analyzing affinities of a library of binding compounds to one or more ligands, the method comprising the steps of: (a) reacting under binding conditions one or more ligands with a library of binding compounds, each binding compound consisting of or being encoded by a nucleotide sequence; (b) determining the nucleotide sequences of binding compounds forming complexes with the one or more ligands; (c) determining the nucleotide sequences of binding compounds free of ligand; and (d) ordering the nucleotide sequences of the binding compounds in accordance with the affinities of their respective binding compounds for the one or more ligands, wherein the affinities are determined by comparing the number of times a nucleotide sequence is identified among binding compounds forming complexes with the one or more ligands and the number of times the same nucleotide sequence is identified among the binding compounds free of the one or more ligands. In another aspect, the disclosure includes a method of identifying binding compounds that have similar or equivalent affinities to a ligand as that of a standard, or reference, binding compound, the method comprising the steps of: (a) reacting under binding conditions a ligand with a library of candidate binding compounds and a standard, or reference, binding compound, each candidate binding compound and the standard, or reference, binding compound consisting of or being encoded by a nucleotide sequence; (b) determining the nucleotide sequences of binding compounds forming complexes with the ligand; (c) determining the nucleotide sequences of binding compounds free of ligand; (d) ordering the nucleotide sequences of the binding compounds in accordance with the affinities of their respective binding compounds for the ligand, wherein the affinities are determined by comparing the number of times a nucleotide sequence is identified among binding compounds forming complexes with the ligand and the number of times the same nucleotide sequence is identified among the binding compounds free of the ligand; and (e) identifying among the ordering of nucleotide sequences those nucleotide sequences that are adjacent to (i.e., have affinity values close to) the nucleotide sequence encoding the standard, or reference, binding compound. In another aspect a method of characterizing affinities of a library of binding compounds for one or more ligands is provided by the steps; (a) reacting under binding conditions one or more ligands with a library of candidate binding compounds, each candidate binding compound comprised of or being encoded by a nucleotide sequence; (b) determining the nucleotide sequences of the candidate binding compounds forming complexes with the one or more ligands; and (c) determining for each binding compound an affinity based on a number of times a nucleotide sequence is identified with a binding compound forming a complex with the one or more ligands and a number of times the same nucleotide sequence is identified with the binding compound free of the one or more ligands. In one embodiment the total number of a binding compound may be determined by sequencing a sample of the library prior to the reaction. In another embodiment the total number of a binding compound is determined by determining the nucleotide sequences of candidate binding compounds free of ligand together with the nucleotide sequences of candidate binding compounds forming complexes with the one or more ligands. In this and other aspects an affinity may be a relative affinity of such binding compound with respect to other binding compounds in the same reaction. Also, in this and other aspects each relative affinity may be based on, or be taken as, a ratio of a number of nucleic acid sequences encoding a binding compound that forms a complex with the one or more ligands and a number of the same nucleic acid sequences encoding the same binding compound free of the one or more ligands in the same reaction, or a ratio of a number of nucleic acid sequences encoding a binding compound that forms a complex with the one or more ligands and a total number of the same nucleic acid sequences encoding the same binding compound in the same reaction. In its aspects and various embodiments, the invention permits reliable and exhaustive identification of "bio-similar" and "bio-better" binding compounds without the use of large inefficiently accessed libraries or repeated cycles of binding, selection and amplification. That is, the disclosure provides methods for obtaining novel binding compounds having equivalent or enhanced binding characteristics with respect to a reference (or wild type) binding compound (including affinity, specificity, lack of cross-reactivity, or the like), such as a known therapeutic antibody. In accordance with the methods of the disclosure candidate binding compounds having equivalent or superior affinity are readily obtained in a one-step process, after which such compounds may be further analyzed to identify members having improvements of other properties, such as increased stability, increased aggregation resistance, reduced immunogenicity, reduced cross reactivity, better manufacturability, or the like with respect to the reference binding compound. These above-characterized aspects and embodiments, as well as other aspects and embodiments, of the present invention are exemplified in a number of illustrated implementations and applications, some of which are shown in the figures and characterized in the claims section that follows. However, the above summary is not intended to describe each illustrated embodiment or every implementation of the present invention.
The practice of the present invention may employ, unless otherwise indicated, conventional techniques and descriptions of organic chemistry, molecular biology (including recombinant techniques), cell biology, and biochemistry, which are within the skill of the art. Such conventional techniques include, but are not limited to, preparation of synthetic polynucleotides, monoclonal antibodies, antibody display systems, nucleic acid sequencing and analysis, and the like. Specific illustrations of suitable techniques can be had by reference to the example herein below. However, other equivalent conventional procedures can, of course, also be used. Such conventional techniques and descriptions can be found in standard laboratory manuals such as The disclosure provides a method for obtaining statistically significant information about how structural elements of proteins, e.g. position and identity of amino acid residues in binding domains, relate to functional properties of interest, such as binding affinity, specificity, and the like. Such information is collected by reacting under binding conditions a set of candidate nucleic acid-encoded binding compounds with one or more target molecules, so that complexes form between the one or more target molecules and at least a portion of the candidate binding compounds (referred to herein as "binders"). Sufficient numbers of candidate binders and non-binders are then decoded by high throughput nucleic acid sequencing to give statistically significant data about the binding properties of substantially all the members of the set of candidate binding compounds. In other words, sample sizes are large enough so that the numbers of candidate binders and non-binders decoded and recorded are subject to minimal sampling error. In some embodiments, such sampling error, as measured by coefficient of variation, is less than 10 percent; in some embodiments, it is less than 5 percent; in some embodiments, it is less than 2 percent; and in some embodiments, it is less than 1 percent. As disclosed more fully below, embodiments of particular interest are those in which candidate binding compounds are related to a pre-existing reference binding compound, such as a pre-existing antibody, that binds to a target molecule of interest, such as a therapeutic target. In such embodiments, an object of the invention is to improve one or more characteristics of a reference binding compound by generating library of candidate binding compounds based on minimal changes or mutations of the reference binding compound, which, in turn, permits large scale repetitive sequencing of each library member from a binding reaction to obtain statistically significant binding information on each candidate binding compound of the library. From such information, binding compounds different from the reference binding compound are obtained which have equivalent or higher affinity and which may be subjected to further selection to reduce cross reactivity, reduce immunogenicity, increase solubility, increase stability, or the like. The statistically significant information is contained in the tabulations of the sequences of nucleic acids encoding the binders and the non-binders. Nucleic acid-encoded binding compounds may be obtained from the various antibody display techniques, aptamers, or the like, such as those described below. In some embodiments, the structural elements that are analyzed are spatially local in the sense that they exert their effects on binding within or near a limited volume of a larger molecule, such as, an enzyme active site, antibody binding site, complementary-determining regions, or the like. In particular, structural elements analyzed in an antibody binding interaction includes CDRs as well as framework regions of antibody variable regions. Alternatively, such information may be collected by first decoding the sequences of members of the total effective library of candidate nucleic acid-encoded binding compounds, (or an adequate sample thereof to ensure nearly complete coverage (e.g. at least 95%, or at least 98%, or at least 99% coverage)), prior to carrying out a binding reaction with the one or more target molecules, or ligands. As used herein, "total effective library" means the total library of nucleic acid-encoded binding compounds, subject to any biases in sequence representation that may arise in the course of expression, e.g. in phage, ribosomes, bacteria, yeast, or the like. A binding reaction is carried out as described above, after which the nucleic acid sequences of only the binders are determined. From this information, a ratio may be formed for each candidate nucleic acid-encoded binding compound that consists of the number of sequence reads among the binders over the number of sequence reads in the total library as a measure of its binding strength or affinity. That is, the larger the value of the ratio of a candidate binding compound, the stronger its affinity for the one or more target molecules and the lower the value of the ratio the lower its affinity. Generally, such ratios and other ratios, such as ratios of binders to nonbinders, provide relative affinities of each of the binding compounds in the reaction with the one or more ligands. Such measures of relative affinities are applicable to all embodiments of the invention. In regard to binding compounds derived from antibodies, Nucleic acids encoding the binders and non-binders from the samples may be sequenced using any of a variety of commercially available high-throughput DNA sequence analyzers (110), as described more fully below, to generate sequence data for binders (112) and non-binders (114). Conventional sample preparation procedures are employed that take into account the particular format of the candidate binding compounds. That is, binding compounds may be phage display, ribosome display, retroviral display, or the like, and may require different steps to extract their nucleic acids and to prepare them for sequencing. The results of the sequence analysis are typically at least two tabulations of sequences corresponding to the binders (116) and non-binders (118). From such data, relationships between sequence frequency of binding compound and binding compound type may be shown, as illustrated in In some embodiments of the invention, after binding compounds are ordered with respect to affinity for a desired antigen, e.g. as shown in In some embodiments of the invention, the number of candidate binding compounds under consideration may be reduced in cases where improvements are sought to a pre-existing binding compound, i.e., a standard or reference binding compound, such as pre-existing known antibody, such as a known therapeutic antibody. For example, for a pre-existing antibody where the amino acid sequence of both its scaffold and binding regions are known, limited, or subregions of such sequences may be assessed for the effect of every possible single amino acid change in such subregions only and an estimate the combinatorial effects of multiple mutations may be obtained by adding the measured effects of the individual single amino acid changes. In other embodiments, such a process may be generalized by assessing the effect of every possible two-way amino acid change in the subregion, with an increased number of mutants requiring assessment. Such methods require a much smaller library to assess the effects of all the possible amino acid changes. For example, in the former embodiment, in a limited region of 50 amino acid positions, only 50 x 20 = 1000 mutants would need to be analyzed. In addition the assumption of achieving independent effects from multiple mutations used in combination is a good approximation when working with a small number of positions (<20). Radioligand studies may be used to assess the above binding compound, but such studies usually are run serially, using multiple protein variants against a single radioligand in separate reactions, because the variant proteins are difficult to distinguish one from another. One could run multiple binding studies simultaneously, in the same reaction vessel, if the variant receptors were readily distinguishable from one another. This situation can be achieved using any of a number of viral, phage, or ribosome display formats, as described below. In these systems the variant receptors are displayed in low numbers (≤10 copies/particle) on the surface of viral, phage or ribosome particles. In these situations the specific nucleic acid that encoded the variant receptor is contained within the cognate virus/phage/ribosomal particle (also referred to herein as a nucleic acid-encoded binding compound). This allows easy identification of each specific protein variant by sequencing the nucleic acid that is attached to it. If this principle is applied to binding experiment described above, one can easily measure the binding affinities of large numbers of protein variants simultaneously by running an equilibrium binding assay using a virus/phage/ribosomal library (collection of variants) against a single ligand (either bound to a substrate or in solution). After equilibrium has been reached the bound receptors (phage/virus/ribosomal particles) can be collected by recovering the ligand molecules via immunoprecipitation or substrate recovery and the unbound receptors can be recovered from the supernatant. These two samples of phage/virus/ribosome particles can then be sequenced on a massively parallel fragment sequencer (as described below) to determine each clone's contribution to the bound and free pools of receptors. From this sequence information the bound percentage of each receptor in the library can be calculated. Those receptors with the highest percentage of bound phage/virus/ribosomes will have the highest affinities and those with the lowest bound percentages will have the lowest affinities. Using a single ligand concentration near the dissociation constant, KD, of the parent protein, it is possible to rank the affinities every protein variant for a given ligand. If the parent molecule is encoded in the library, then the affinities of all of the variants in the library can be assessed relative to the parent protein, which serves as an internal standard or reference. If the ligand is in great excess in the binding reaction (so its unbound concentration does not change appreciably during the binding reaction) and several binding reactions are run using varying ligand concentrations, then one is able to use non-linear regressions or equivalent calculation to rapidly calculate the KD for every variant in the population from the equation KD=[A][B]/[AB]. In some embodiments employing protein display systems, such as phage display libraries, affinities may be estimated as follows based on tabulated sequences of nucleic acids encoding binding compounds. Multiple reactions are set up, e.g. in wells of a microtiter plate, or the like, such that the reactions contain a dilution series of ligand, i.e. a series of lower and lower concentrations or amounts of ligand adsorbed or attached to a solid support, such as the surface of a microwell wall, magnetic bead, or the like. To each reaction is added a fixed number of display organism, such as aliquots of a phage display library, and the reactions are allowed to go to equilibrium. After equilibrium has been reached, bound and free display organisms are harvested and binding-compound encoding nucleic acids are amplified in separate polymerase chain reactions (PCRs) to determine the reaction in which the concentration, or amount, of ligand results in about equal amounts of display organism bound to ligand and free. Under such conditions, affinities of the binding compounds may be estimated as ratios of bound binding compound (determined by counting encoding nucleic acids) and unbound binding compound (also determined by counting encoding nucleic acids). In some embodiments, a similar operation may be used to estimate affinities of binding compounds of a library relative to that of a reference binding compound (as used herein, such values are referred to as "relative affinities" with respect to a selected reference compound). As above, multiple reactions are set up with a dilution series of immobilized ligand. To each reaction is added a fixed amount of reference binding compound (e.g. a single phage displaying the reference binding compound) and the reactions are allowed to go to equilibrium. After equilibrium has been reached, bound and free display organisms are harvested and their encoding nucleic acids are amplified in separate PCRs to determine the reaction in which the concentration, or amount, of ligand results in about equal amounts of reference binding compound bound to ligand and free of ligand. The determined reaction provides conditions for carrying out library-based binding reactions so that ratios of binders to nonbinders for each library member can be computed and compared to that of a reference binding compound to give a measure of the relative affinity of such member to a ligand. This information may be used to create an engineering diagram of the binding site in question (such as a heat map) which can be used to direct the engineering of any amino acid position within the binding site. Thus variants that have higher binding affinities than the parent molecule can be combined to markedly increase the protein's affinity for its ligand. Variants with the same binding affinities as the parent molecule can be used to increase the molecule's stability or solubility, reduce its immunogenicity or alter its serum half-life. In addition if the same protein library is run against multiple ligands, then the resulting heat maps can be overlaid to identify variants that differentially affect the binding of the ligands. Finally variants that reduce the binding affinity of the protein for its ligand(s) can be identified. In general these variants are to be avoided in future engineering projects, but in certain situations reducing a protein's activity by lowering its affinity for its ligand may be desirable. In some embodiments, the 2D maps, or heat maps, described above display relative affinity among candidate binding compounds as a function of position (where amino acid substitutions are made) and the kind of amino acid(s) substituted. For providing binding compounds with increased affinity, mutations (i.e. candidate binding compounds identified by row and column positions) that have the highest relative affinities are identified so that a subset of candidate binding compounds may be identified in which those mutations are fixed. Members of the subset may then be further assayed to identify mutants with other improved characteristics, along with the higher relative affinities. Also, such an initially identified subset may be used to generate further libraries. For example, a new library may be created from the above subset by fixing the amino acids conferring increased affinity and varying amino acids in the remaining positions, or a fraction of the remaining positions, or in additional positions in the same sequence that were not varied in the original library. Virtually every member of the originally identified subset will have increased affinity relative to wild-type and some will be substantially higher. To increase the solubility of a molecule, neutral mutations (with respect to binding affinity) are identified from the 2D map that replace uncharged surface residues with charged ones and the resultant molecules will have increased solubility. If it is desired to decrease pI (so increase half-life), the 2D map can be used to find neutral mutations in which positively charged surface residues are replaced with negatively or neutrally charged residues. In addition replacing neutrally charged surface residues with negatively charged residues will achieve the same goal. In some embodiments, the above may be implemented in accordance with the invention to increase the solubility of a selected nucleic acid-encoded binding compound (i.e. reference binding compound) without loss of affinity for a ligand by the following steps: (a) reacting under binding conditions one or more ligands with a library of candidate binding compounds, each candidate binding compound being comprised of or encoded by a nucleotide sequence; (b) determining the nucleotide sequences of the candidate binding compounds forming complexes with the one or more ligands; (c) determining for each candidate binding compound an affinity based on a number of nucleotide sequences of binding compounds forming a complex to its total number in the library; and (d) selecting at least one candidate binding compound from a subset of candidate binding compounds (i) whose affinity is equal to or greater than that of the selected nucleic acid-encoded binding compound and (ii) whose encoding nucleic acid encodes at least one charged amino acid residue in place of a neutral or hydrophobic amino acid residue occurring in the selected nucleic acid-encoded binding compound, thereby providing a nucleic acid-encoded binding compound with increased solubility with respect to the reference binding compound without loss of affinity. In one embodiment, the library of step (a) may be a first stage library as described above; or step (a) may be carried out in two phases using a first stage library in a first phase and a second stage library as described above in a second phase. In another embodiment, a second stage library as described above may be used in step (d). In some embodiments, the method of the invention may be used to obtain a binding compound with equivalent or better affinity as that of a reference binding compound, but which has superior stability with respect to selected destabilizing agents. A subset of candidate compounds identified as described above based on affinity is separated into at least two portions. Members of a first portion are compared to members of a second portion after members of the latter portion have been treated with a destabilizing agent (heat, low pH, proteases, or the like). That is, both portions originated from the same starting subset of candidate binding compounds, except that the members of the second portion are subjected to a destabilizing agent. In other words, its members form a "stressed" library. The candidate binding compounds from such a library that lose binding affinity after being "stressed" contain destabilizing residues. A goal is to identify mutants that bind the antigen at least as well or better than wild type in the "stressed" library. It is expected that several stabilizing mutations could be combined to dramatically increase the stability of the molecule, for example, by forming a second-stage library from such mutants and conducting a second round of selection. In some embodiments, the above may be implemented in accordance with the invention to increase stability of a selected nucleic acid-encoded binding compound (i.e. reference binding compound) without loss of affinity for a ligand by the steps of: (a) treating a library of candidate binding compounds with a destabilizing agent to form a treated library of candidate binding compounds, each candidate binding compound being comprised of or encoded by a nucleotide sequence; (b) reacting under binding conditions one or more ligands with the treated library of candidate binding compounds; (c) determining the nucleotide sequences of the candidate binding compounds forming complexes with the one or more ligands; (d) determining for each candidate binding compound an affinity based on a ratio of a number of nucleotide sequences of binding compounds forming a complex to its total number in the treated library; and (e) selecting at least one candidate binding compound from a subset of candidate binding compounds whose affinity is equal to or greater than that of the selected nucleic acid-encoded binding compound (that is, the reference binding compound), thereby providing a nucleic acid-encoded binding compound with increased stability with respect to the reference binding compound without loss of affinity. As above, in one embodiment, the library of step (a) may be a first stage library as described above; or step (a) may be carried out in two phases using a first stage library in a first phase and a second stage library as described above in a second phase. In another embodiment, a second stage library as described above may be used in step (d). In some embodiments, for example, for binding compounds expressed in phage display systems, exemplary conditions for stressing a subset include (i) exposing phage to elevated temperatures, e.g. in the range of 50-70°C for a period of time, e.g. in the range of 15-30 minutes; (ii) exposing phage to low pH, e.g. pH in the range of 1-4, for a period of time, e.g. in the range of 15-30 minutes; (iii) exposing phage to various proteases at various activities over a range for a period of time, e.g. 15-30 minutes, or 1-4 hours, or 1 hour to 24 hours, depending on the protease and specific activity. Exemplary proteases for stability testing include, but are not limited to, serum proteases; trypsin; chymotrypsin; cathepsins, including but not limited to cathepsin A and cathepsin B; endopeptidases, such as, matrix metalloproteinases (MMPs) including, but not limited to, MMP-1, MMP-2, MMP-9; or the like. In some embodiments, immunogenicity may be altered after the locations of immunogenic peptides within the protein of interest are identified. Immunogenicity, which can be a problem even with fully human antibodies, can make pharmacokinetic assessment more difficult, reduce safety, and inhibit effectiveness, e.g. by stimulating neutralizing host antibodies. Identifying peptides derived from a protein of interest that can stimulate helper T-cells (the first step in the immunogenicity cascade) has been described ( In some embodiments of the invention, a method of reducing the immunogenicity of a selected nucleic acid-encoded binding compound (i.e. reference binding compound) without loss of affinity comprises the following steps: (a) reacting under binding conditions one or more ligands with a library of candidate binding compounds, each candidate binding compound being comprised of or encoded by a nucleotide sequence; (b) determining the nucleotide sequences of the candidate binding compounds forming complexes with the one or more ligands; (c) determining for each candidate binding compound an affinity based on a ratio of a number of nucleotide sequences of binding compounds forming a complex to its total number in the library; (d) selecting at least one candidate binding compound from a subset of candidate binding compounds (i) whose affinity is equal to or greater than that of the selected nucleic acid-encoded binding compound and (ii) whose encoding nucleic acid encodes at least one amino acid residue different from that of the selected nucleic acid-encoded binding compound at the same location(s) and reduces the immunogenicity of such candidate binding compound relative to that of the selected nucleic acid-encoded binding compound. As above, in one embodiment, the library of step (a) may be a first stage library as described above; or step (a) may be carried out in two phases using a first stage library in a first phase and a second stage library as described above in a second phase. In another embodiment, a second stage library as described above may be used in step (d). In some embodiments, the method of the invention may be used to obtain a binding compound with equivalent or better affinity to a target antigen as that of a reference binding compound, but that has reduced cross reactivity, or in some embodiments, increased cross reactivity, with selected substances, such as ligands, proteins, antigens, or the like, other than the substance or epitope for which a reference binding compound is specific, or is design to be specific for. In regard to the latter, a candidate therapeutic antibody may be more successfully tested in animal models if the antibody reacted with both its human target and the corresponding target of the animal model, e.g. mouse. Thus, in some embodiments, the method of the invention may be employed to increase cross reactivity with selected substances, such as corresponding animal model targets. In other embodiments, the method of the invention is employed to reduce cross reactivity of a candidate therapeutic antibody, for example, to reduce potential side effects in a patient. As above, a subset of candidate compounds is identified based on affinity (i.e. having equivalent or higher affinity than that of the reference compound). Candidate compounds from the subset may then be combined with one or more substances other than the target antigen in one or more binding reactions (e.g. each at different phage concentrations) to determine the affinities of such candidate binding compounds to such substances. The choice of substances may vary widely, and may include tissues, cell lines, selected proteins, tissue arrays, protein microarrays, or other multiplex displays of potentially cross reactive compounds. Guidance for selecting such antibody cross reaction assays may be found in the following exemplary references: In some embodiments, the above may be implemented in accordance with the invention to identify one or more binding compounds with reduce cross reactivity with a selected set of substances compared to that of a reference binding compound without loss of affinity for a ligand. Such method may be carried out by the steps of: (a) reacting under binding conditions one or more substances with a subset of candidate binding compounds, each member of the subset having equivalent of greater affinity for a ligand than that of a reference compound; (b) determining the nucleotide sequences of the candidate binding compounds forming complexes with the one or more substances; (c) determining for each candidate binding compound an affinity based on a ratio of a number of nucleotide sequences of binding compounds forming a complex to its total number in the subset; and (d) selecting at least one candidate binding compound from the subset of candidate binding compounds whose affinity is equal to or less than that of the reference binding compound, thereby providing a nucleic acid-encoded binding compound with reduced cross reactivity for the one or more substances with respect to the reference binding compound without loss of affinity. Likewise, a method may be implemented for obtaining a binding compound with increased reactivity to a selected substance or compound or epitope by substituting step (d) with the following step: selecting at least one candidate binding compound from the subset of candidate binding compounds whose ratio is equal to or greater than that of the reference binding compound. Features of any peptide or protein display system are: 1. Tight linkage between the expressed proteins and their encoding nucleic acid; and 2. Expression of the protein in a format that allows it to be assayed and separated based on some biochemical activity (for example, binding strength, susceptibility to enzymatic action, or the like). For the purposes of this discussion, protein display systems can be separated into two groups based on the number of displayed proteins per display unit, either polyvalent or monovalent. The polyvalent display systems such as yeast display (references 1 and 2 below), mammalian display systems (references 3 and 4 below) and bacterial display systems (reference 5) express the gene(s) of interest (often diverse antibody libraries) as proteins tethered to the cell surface by means of a membrane anchor, similar to a native surface immunoglobulin found on the plasma membrane of normal B-cells. DNA encoding the library clones is transformed into the cell type of interest such that each cell receives at most one clone from the library. The resultant population of cells will each express tens to tens of thousands of copies of a single protein clone on their cell surfaces. This population of cells can then be exposed to limiting amounts of fluorescently labeled target antigen and the best binding clones will bind the most antigen and they can be identified and isolated using a fluorescence-activated cell sorter (FACS). Unfortunately accurate quantitation in polyvalent display systems is complicated by cooperative binding effects (avidity) between the multiple copies of the displayed molecule on the same cell (reference 6). This problem is especially pronounced if the antigen is polyvalent (TNF, IgG) or bound to a cell surface (e.g. CD 20). Many of the viral and phage-based protein display systems are also polyvalent in nature, but the display units are too small to detect on the FACS, so accurate quantitation is even more difficult. These systems also suffer from avidity problems if multiple binding compounds are expressed simultaneously on the same phage particle. Under such conditions it is difficult to determine whether an observed binding strength is due to the combined effect of two expressed binding compounds versus the effect of a single very high affinity binding compound. Such avidity problems may be minimized by regulating the expression of candidate binding compound in a host using conventional techniques. In one embodiment in which a phage display system expresses Fab fragments, e.g. as disclosed in The monovalent phage (reference 7) and viral (reference 8) systems, along with the ribosome display systems (references 9 and 10) express an average of ≤1 molecule of the displayed molecule per display unit. These systems yield accurate measurements of the true affinity of the binding site in question for each clone in the library. Generally these systems are used to display large, diverse libraries of binding elements. Small subpopulations of clones are then selected from these libraries based on their increased ability to bind the target antigen relative to other members of the library. After selection (often multiple rounds of selection) the resultant clones are isolated and characterized (e.g. as disclosed in Further protein display systems for use with the invention include baculoviral display systems, adenoviral display systems, lentivirus display systems, retroviral display systems, SplitCore display systems, as disclosed in the following references: In some embodiments, the invention employs conventional phage display systems for improving one or more properties of a antibody binding compound, particularly a preexisting antibody binding compound. Unlike prior applications of display technologies, which employ repeated cycles of selection, washing, elution and amplification, to identify individual phage from a large library, e.g. >108-109 clones, in the present invention, a single equilibrium binding reaction is created using a relatively small and focused library, e.g. 103-104 clones, or in some embodiments 104-105 clones, after which binder and non-binders are analyzed by large-scale sequencing. From such analysis, subsets are selected and, optionally, further selected based on other properties of interest, such as, solubility, stability, lack of immunogenicity, and the like. Factors affecting such equilibrium reactions are well-known in the art and include: the number of phage to include in the reaction, the stringency of the reaction mixture; the number of target molecules to include in the reaction; presence or absence of blocking agents, such as, bovine serum albumin, gelatin, casein, or the like, to reduce nonspecific binding; the length and stringency of a wash step to separate non-binders; the nature of an elution step to remove binders from the target molecules; the format of target molecules used in the reaction, which, for example, may be bound to a solid support or derivatized with a capture agent, e.g. biotin, and free in solution; the phage protein into which candidate binding compounds are inserted; and the like. In some embodiments, target molecules, such as proteins, are purified and directly immobilized on a solid support such as a bead or microtiter plate. This enables the physical separation of bound and unbound phage simply by washing the support. Numerous supports are available for this purpose, including modified affinity resins, glass beads, modified magnetic beads, plastic supports, and the like. Useful supports are those that have low background for nonspecific phage binding and that present the target molecules in a native configuration and at a desirable concentration. In some embodiments, a nucleic acid-encoded binding compound is an antibody fragment expressed by a phage. In one embodiment, such phage is a filamentous bacteriophage and the antibody fragment is expressed as part of a coat protein. In particular, such phage may be a member of the Ff class of bacteriophages. In a further embodiment, the host of such filamentous bacteriophage is E. coli. In another embodiment, a phagemid-helper phage system is used for displaying antibody fragments. Phagemids may be maintained as plasmids in a host bacteria and phage production induced by further infection with a helper phage. Exemplary phagemids include pComb3 and its related family members, e.g. disclosed in As mentioned above, a feature of the invention is the use of focused libraries from which reliable binding statistics can be obtained from a binding reaction. In some embodiments this eliminates the need for successive cycles of selection, elution, and amplification, as required in conventional approaches. The size of such focused libraries of candidate binding compounds is influenced by at least two factors: the scale of sequencing required for analyzing binders and nonbinders and the difficulty of synthesizing polynucleotides that encode library members. That is, the larger the library of candidate compounds and the higher the degree of confidence desired in the binding statistics of each compound both require that more binders and nonbinders be sequenced. Likewise, a larger library of candidate compounds means a greater number of polynucleotides need to be synthesized. Thus, particular applications may involve conventional design choices between scale of implementation and cost. In some embodiments, focused libraries are obtained by varying amino acids in a limited number of locations one or two at a time within a pre-existing binding compound, which may be the same as, or equivalent to, a reference binding compound. Preferably amino acids are varied at different positions one at a time. Thus, for example, members of a library of candidate binding compounds may have nucleotide sequences identical to that encoding the pre-existing binding compound except for a single codon position. At that position, each member will have a codon different from that of the pre-existing binding compound. Such libraries may include members having an amino acid deletion at such location and may not necessarily include members with every possible codon at such location. Libraries may contain members corresponding to such substitutions (and deletions) at each of a set of amino acid locations within the pre-existing binding compound. The locations may be contiguous or non-contiguous. In some embodiments, the number of locations where codons are varied are in the range of from 1 to 500; in some embodiments, the number of such locations are in the range of from 1 to 250; in other embodiments, the number of such locations are in the range of from 10 to 100; and in still other embodiments, the number of such locations are in the range of from 10 to 250. A pre-existing binding compound may be any pre-existing antibody for which sequence information is available (or can be obtained). Typically, a pre-existing binding compound is a commercially important binding compound, such as an antibody drug, for which one desires to modify one or more properties, such as solubility, immunogenicity, reduction of cross reactivity, increase in stability, aggregation resistance, or the like, as discussed above. In one embodiment, the locations where codons are varied comprise the VH and VL regions of the antibody, including both codons in framework regions and in CDRs; in another embodiment, the locations where codons are varied comprise the CDRs of the heavy and light chains of the antibody, or a subset of such CDRs, such as solely CDR1, solely CDR2, solely CDR3, or pairs thereof. In another embodiment, locations where codons are varied occur solely in framework regions; for example, a library may comprise single codon changes solely from a reference binding compound solely in framework regions of both VH and VL numbering in the range of from 10 to 250. In another embodiment, the locations where codons are varied comprise the CDR3s of the heavy and light chains of the antibody, or a subset of such CDR3s. In another embodiment, the number of locations where codons of VH and VL encoding regions are varied are in the range of from 10 to 250, such that up to 100 locations are in framework regions. In another embodiment, nucleic acid encoded binding compounds are derived from a pre-existing binding compound, such as a pre-existing antibody. Exemplary pre-existing binding compounds include, but are not limited to, antibody-targeted drugs or antibody-based drugs such as adalimumab (Humira), bevacizumab (Avastin), cetuximab (Erbitux), efalizumab (Raptiva), infliximab (Remicade), panitumumab (Vectubix), ranibuzumab (Lucentis), rituximab (Rituxan), trastuzumab (Herceptin), and the like. In some embodiments, the above codon substitutions are generated by synthesizing coding segments with degenerate codons. The coding segments are then ligated into a vector, such as a replicative form of a phage, to form a library. Many different degenerate codons may be used with the present invention, such as those shown in Table I. In some embodiments, the size of binding compound libraries used in the invention varies from about 1000 members to about 1 x 105 members; in some embodiments, the size of libraries used in the invention varies from about 1000 members to about 5 x 104 members; and in further embodiments, the size of libraries used in the invention varies from about 2000 members to about 2.5 x 104 members. Thus, nucleic acid libraries encoding such binding compound libraries would have sizes in ranges with upper and lower bounds up to 64 times the numbers recited above. As mentioned above, a variety of DNA sequence analyzers are available commercially to determine the nucleotide sequences of binder and non-binders in accordance with the invention. Commercial suppliers include, but are not limited to, 454 Life Sciences, Helicos, Life Technologies Corp., Illumina, Inc. (which produces sequencing instruments using Solexa-based sequencing techniques), Pacific Biosciences, and the like. Also, DNA sequencing techniques under commercial development may be used for implementing the invention, e.g. techniques disclosed in the following references. In one embodiment, a limited read length sequencing technique, such as that disclosed by Bentley et al (cited above), is employed to identify discrete regions of a longer encoding nucleic acid. As used herein, the term "limited read length" in reference to a sequencing method means that the longest sequence of nucleotides identified in a single sequencing reaction comprises less than about one hundred nucleotides. As described above, nucleic acids of binders and non-binders are sequenced to obtain structural information about a target molecule. Depending on the nature of the binding compounds employed, the sequencing task can vary widely. Generally, the number, sizes and separations of the regions where amino acids are varied in binding compounds will determine how much sequence information is required for identification. Typically, limited read length sequencing methods cannot provide enough sequence information from a single sequencing reaction for identification. However, in the case where binding compounds are antibodies whose CDRs are varied, complete identification may be obtained with a limited read length method if at least three sequencing reactions are performed on a single nucleic acid. Accordingly, in one embodiment of the invention, nucleic acids corresponding to CDRs from antibody-based binding compounds are serially analyzed by performing at least three sequencing reactions on the same target nucleic acid. The method is illustrated in Listed below are the sequences of the heavy chain variable region and the light chain variable region of the humanized antibody Avastin (bevacizumab), To gain a complete functional map of all the possible single amino acid substitutions in the binding site of Avastin, two libraries of variant molecules need to be constructed. A complete single amino substitution library of the Avastin heavy chain will include 820 proteins (41 positions x 20 amino acids). A complete single amino substitution library of the Avastin light chain will include 540 proteins (27 positions x 20 amino acids). Each of these libraries may be constructed in a number of ways, including the use of oligonucleotide-directed mutagenesis to create pools of variant molecules that each carry a randomization codon (NNN) at a different position within the CDR sequences. In this example the Avastin heavy chain library would be composed of 41 pools of genes each containing a randomization codon (NNN) at a different position in the Avastin heavy chain CDRs. This would yield a redundant library of 2624 genes (41 positions x 64 codons) for the heavy chain library. These 41 pools of sequences containing 2624 VH genes each differing from the parent by at most by a single codon can be cloned into a standard phagemid display vector either as a Fabs or single-chain Fv's in conjunction with the wild type light chain. (Note that each pool contains a member that is wild type and numerous silent wild type variants also exist within the larger population). Likewise the 27 pools of Avastin VL genes containing 1728 members each differing from the parent by at most one codon can be cloned into the same vector in conjunction with the wild type heavy chain gene to create the Avastin light chain library. Once created and confirmed, these two libraries can be transformed into an appropriate bacterial strain to create stably transformed bacterial cell libraries. In this situation each antibody variant is carried in a separate bacterial cell. These two populations of cells can then be induced to produce phage particles by infecting them with a helper phage. The helper phage carries the phage genes that are missing in the phagemid and allows the cells to start producing one type of phage per cell. Infecting a population of cells carrying the full spectrum of single amino acid variants will produce a full spectrum of phage each carrying a variant Fab or scFv at its tail which was encoded by the single stranded DNA in its attached genome. The two libraries can then be harvested and used in two ways. First their diversity can be efficiently characterized using a massively parallel fragment sequencer (454, Illumina, ABI) to make sure that full spectrum libraries have been created. Next the libraries can be titred and set up in equilibrium binding assays with several concentrations of the VEGF ligand fused to a tag useful for immunoprecipitation (i.e. Fc-fusion). For maximum resolution the differing concentrations of the ligand should center around the KD of the parent antibody and should vary in 2-10 fold increments. Care must be taken to scale the reactions to assure that the antigen is in large excess, so its free concentration will not be reduced during the binding reaction. These reactions are incubated until equilibrium is reached (for example, 22°C for 24 hr in conventional binding reaction mixture). Once equilibrium has been reached, the two types of phage can be separated. The phage that are bound to the soluble antigen can be immunoprecipitated. using a reagent that is specific for the ligand fusion, like protein A or an anti-Fc antibody. The unbound phage can then be isolated from the depleted supernatant from each reaction, e.g. by precipitating unbound binding-compound-expressing phage with anti-kappa chain antibody, anti-lambda chain antibody, anti-CH1 antibody, anti-tag antibody, such as a myc tag, polyhistidine tag, or the like. Specifically, in one embodiment, human Fab-bearing phage may be isolated either by binding goat anti-kappa chain antibody followed by capture with protein G coated beads, or by binding biotinylated anti-kappa chain antibody followed by capture with streptavidin-coated beads. Alternatively to the above, binders and non-binders may be identified in a competitive binding reaction where, for example, library binding compounds compete with a reference binding compound for binding to an immobilized antigen, either by displacing previously bound reference compound or by being combined with antigen and reference compound at the same time. Guidance for carrying out such reactions is found in
A library of such Avastin-based binding compounds was constructed as follows. Prior to inserting a mixture of synthetic segments to create a phagemid library, two phagemids were constructed with similar structures to the pHEN1 phagemid disclosed by Hoogenboom et al (cited above). Each of the phagemids includes a sequence that encodes an Fab fragment; however, one phagemid is engineered to accept variable light chain encoding sequences with a wild type heavy chain (i.e. the light chain library) and the other phagemid is engineered to accept variable heavy chain encoding sequences with a wild type light chain (i.e. the heavy chain library). The starting phagemid for both constructs was a pBCSK+ (Stratagene, San Diego, CA). Since the phagemids are grown in a conventional f+ E. coli host (XL1 Blue, Stratagene), a bacterial leader sequence (MKYLLPTAAAGLLLLAAQPAMA (SEQ ID NO: 3)) was added to each of the above sequences for the Avastin VH and VL regions. In addition, the following ribosome binding site sequences were appended to the 5' ends of the nucleotide sequences encoding the VH and VL regions: CTAGTTAATTAAaggaggagcaggg (SEQ ID NO: 4) for the light chain (designated Fab-12 LC) and CTAGGCGGCCGCaggaggagcaggg (SEQ ID NO: 5) for the heavy chain (designated Fab-12 HC). The Lac promoter and polylinker elements of the pBCSK vector were rearranged and gene III was inserted, after which the light and heavy chain encoding regions were inserted in several steps to give a construct pBD4 (500), illustrated in Unless otherwise specifically defined herein, terms and symbols of nucleic acid chemistry, biochemistry, genetics, and molecular biology used herein follow those of standard treatises and texts in the field, e.g. "Antibody" or "immunoglobulin" means a protein, either natural or synthetically produced by recombinant or chemical means, that is capable of specifically binding to a particular antigen or antigenic determinant, which may be a target molecule as the term is used herein. Antibodies, e.g. IgG antibodies, are usually heterotetrameric glycoproteins of about 150,000 daltons, composed of two identical light (L) chains and two identical heavy (H) chains, as illustrated in "Binding compound" means a compound that is capable of specifically binding to a particular target molecule or group of target molecules. Examples of binding compounds include antibodies, receptors, transcription factors, signaling molecules, viral proteins, lectins, nucleic acids, aptamers, and the like, e.g. "Complementary-determining region" or "CDR" means a short sequence (up to 13 to 18 amino acids) in the variable domains of immunoglobulins. The CDRs (six of which are present in IgG molecules) are the most variable part of immunoglobulins and contribute to their diversity by making specific contacts with a specific antigen, allowing immunoglobulins to recognize a vast repertoire of antigens with a high affinity, e.g. "Complex" as used herein means an assemblage or aggregate of molecules in direct or indirect contact with one another. In some embodiments, "contact," or more particularly, "direct contact" in reference to a complex of molecules, or in reference to specificity or specific binding, means two or more molecules are close enough so that attractive noncovalent interactions, such as Van der Waal forces, hydrogen bonding, ionic and hydrophobic interactions, and the like, dominate the interaction of the molecules. In such an embodiments, a complex of molecules is stable in that under assay conditions, the presence of the complex is thermodynamically favorable. As used herein, "complex" may refer to a stable aggregate of two or more proteins, which is equivalently referred to as a "protein-protein complex." A complex may also refer to an antibody bound to its corresponding antigen. Complexes of particular interest in the invention are protein-protein complexes and antibody-antigen complexes. As noted above, various types of noncovalent interactions may contribute to antibody binding of antigen, including electrostatic forces, hydrogen bonds, van der Waals forces, and hydrophobic interactions. The relative importance of each of these depends on the structures of the binding site of the individual antibody and of the antigenic determinant. The strength of the binding between a single combining site of an antibody and an epitope of an antigen, which can be determined experimentally by equilibrium dialysis (e.g. Abbas et al (cited above)), is called the affinity of the antibody. The affinity is commonly represented by a dissociation constant (Kd), which describes the concentration of antigen that is required to occupy the combining sites of half the antibody molecules present in a solution of antibody. A smaller Kd indicates a stronger or higher affinity interaction, because a lower concentration of antigen is needed to occupy the sites. For antibodies specific for natural antigens, the Kd usually varies from about 10-7 M to 10-11 M. Serum from an immunized individual will contain a mixture of antibodies with different affinities for the antigen, depending primarily on the amino acid sequences of the CDRs. "Ligand" means a compound that binds specifically and reversibly to another chemical entity to form a complex. Ligands include, but are not limited to, small organic molecules, peptides, proteins, nucleic acids, and the like. Of particular interest are protein-ligand complexes, which include protein-protein complexes, antibody-antigen complexes, enzyme-substrate complexes, and the like. "Phage display" is a technique by which variant polypeptides are displayed as fusion proteins to at least a portion of a coat protein on the surface of phage, e.g., filamentous phage, particles. A utility of phage display lies in the fact that large libraries of randomized protein variants can be rapidly and efficiently selected for those sequences that bind to a target molecule with high affinity. Display of peptide and protein libraries on phage has been used for screening millions of polypeptides for ones with specific binding properties. Polyvalent phage display methods have been used for displaying small random peptides and small proteins through fusions to either gene II or gene VIII of filamentous phage. "Phagemid" means a plasmid vector having a bacterial origin of replication, e.g., ColE1, and a copy of an intergenic region of a bacteriophage. The phagemid may be used on any known bacteriophage, including filamentous bacteriophage and lambdoid bacteriophage. The plasmid will also generally contain a selectable marker for antibiotic resistance. Segments of DNA cloned into these vectors can be propagated as plasmids. When cells harboring these vectors are provided with all genes necessary for the production of phage particles, the mode of replication of the plasmid changes to rolling circle replication to generate copies of one strand of the plasmid DNA and package phage particles. The phagemid may form infectious or non-infectious phage particles. This term includes phagemids, which contain a phage coat protein gene or fragment thereof linked to a heterologous polypeptide gene as a gene fusion such that the heterologous polypeptide is displayed on the surface of the phage particle. "Phage" or "phage vector" means a double stranded replicative form of a bacteriophage containing a heterologous gene and capable of replication. The phage vector has a phage origin of replication allowing phage replication and phage particle formation. The phage is preferably a filamentous bacteriophage, such as an M13, f1, fd, Pf3 phage or a derivative thereof, or a lambdoid phage, such as lambda, 21, phi80, phi81, 82, 424, 434, etc., or a derivative thereof. particle. "Primer" means an oligonucleotide, either natural or synthetic that is capable, upon forming a duplex with a polynucleotide template, of acting as a point of initiation of nucleic acid synthesis and being extended from its 3' end along the template so that an extended duplex is formed. Extension of a primer is usually carried out with a nucleic acid polymerase, such as a DNA or RNA polymerase. The sequence of nucleotides added in the extension process is determined by the sequence of the template polynucleotide. Usually primers are extended by a DNA polymerase. Primers usually have a length in the range of from 14 to 40 nucleotides, or in the range of from 18 to 36 nucleotides. Primers are employed in a variety of nucleic amplification reactions, for example, linear amplification reactions using a single primer, or polymerase chain reactions, employing two or more primers. Guidance for selecting the lengths and sequences of primers for particular applications is well known to those of ordinary skill in the art, as evidenced by the following references "Polypeptide" refers to a class of compounds composed of amino acid residues chemically bonded together by amide linkages with elimination of water between the carboxy group of one amino acid and the amino group of another amino acid. A polypeptide is a polymer of amino acid residues, which may contain a large number of such residues. Peptides are similar to polypeptides, except that, generally, they are comprised of a lesser number of amino acids. Peptides are sometimes referred to as oligopeptides. There is no clear-cut distinction between polypeptides and peptides. For convenience, in this disclosure and claims, the term "polypeptide" will be used to refer generally to peptides and polypeptides. The amino acid residues may be natural or synthetic. "Protein" refers to a polypeptide, usually synthesized by a biological cell, folded into a defined three-dimensional structure. Proteins are generally from about 5,000 to about 5,000,000 daltons or more in molecular weight, more usually from about 5,000 to about 1,000,000 molecular weight, and may include posttranslational modifications, such acetylation, acylation, ADP-ribosylation, amidation, disulfide bond formation, farnesylation, demethylation, formation of covalent cross-links, formation of cystine, glycosylation, hydroxylation, iodination, methylation, myristoylation, oxidation, phosphorylation, prenylation, selenoylation, sulfation, and ubiquitination, e.g. "Specific" or "specificity" in reference to the binding of one molecule to another molecule, such as a labeled target sequence for a probe, means the recognition, contact, and formation of a stable complex between the two molecules, together with substantially less recognition, contact, or complex formation of that molecule with other molecules. In some embodiments, "specific" in reference to the binding of a first molecule to a second molecule means that to the extent the first molecule recognizes and forms a complex with another molecule in a reaction or sample, it forms the largest number of the complexes with the second molecule. Preferably, this largest number is at least fifty percent. Generally, molecules involved in a specific binding event have areas on their surfaces or in cavities giving rise to specific recognition between the molecules binding to each other. Examples of specific binding include antibody-antigen interactions, enzyme-substrate interactions, formation of duplexes or triplexes among polynucleotides and/or oligonucleotides, receptor-ligand interactions, and the like. As used herein, "contact" in reference to specificity or specific binding means two molecules are close enough that weak noncovalent chemical interactions, such as Van der Waal forces, hydrogen bonding, base-stacking interactions, ionic and hydrophobic interactions, and the like, dominate the interaction of the molecules. "Wild type" or "reference" or "pre-existing" in reference to a binding compound are used synonymously to means a compound which is being analyzed or improved in accordance with the method of the invention. That is, such a compound serves as a starting material from which variant polypeptides are derived through the introduction of mutations. A "wild type" sequence for a given protein is usually the sequence that is most common in nature, but the term is used more broadly here to include compounds that have been engineered. Similarly, a "wild type" gene sequence is typically the sequence for that gene which is most commonly found in nature, but the usage here includes genes that may have been engineered from a natural compound, e.g. a gene which has been engineered to consist of bacterial codons even though it encodes a human protein. Mutations may be introduced into a "wild type" gene (and thus the protein it encodes) through any available process, e.g. site-specific mutation, insertion of chemically synthesized segments, or other conventional means. The products of such processes are "variant" or "mutant" forms of the original "wild type" protein or gene. Exemplary reference (or wild type or pre-existing) sequences include antibody-targeted drugs or antibody-based drugs such as adalimumab (Humira), bevacizumab (Avastin), cetuximab (Erbitux), efalizumab (Raptiva), infliximab (Remicade), panitumumab (Vectubix), ranibuzumab (Lucentis), rituximab (Rituxan), trastuzumab (Herceptin), and the like.
|