What is MFIB?

Mutual Folding Induced by Binding (MFIB) is a repository of protein complexes forwhich the folding of each constituent protein chain is coupled to the interaction forming the complex. This means that while the complexes are stable enough to have their structures solved by conventional structure determination methods (such as X-ray or NMR), the proteins or protein regions involved in the interaction do not have a stable structure in their free monomeric form (i.e. they are intrinsically disordered/unstructured).

What constitutes as being intrinsically disordered/unstructured?

It is a requisite that all interacting protein regions in MFIB complexes should be intrinsically disordered/unstructured. This means that in their monomeric state these protein regions lack a stable tertiary structure and thus their structure cannot be determined. In accordance, structures presented in MFIB were checked to exclude protein regions that have a solved monomeric structure in PDB. However, this condition leaves room for proteins with a wide range of different structural properties.

Some protein regions, such as the ACTR domain of the nuclear receptor coactivator 3, are near-random coils. This means that in their isolated monomeric form protein segments alternate rather freely through a wide range of different conformations, exhibiting the 'most disordered' state. In contrast, the binding partner of ACTR (NCBD domain from CBP) is a molten globule meaning that although not fully stable, it does contain a significant amount of residual structure (stable and near-stable secondary structural elements) in its unbound form. The interaction of ACTR and CBP forms an ordered complex where the two different kinds of disordered protein stabilize each other, and in accordance, it is included in MFIB (MF2201001).

An other – quite extreme – example of such non-stable structural elements in monomeric form is presented by the nucleoside diphosphate kinase. The native monomeric enzyme has no stable structure, but it forms a stable hexamer with six identical chains in interaction. However, a single amino acid mutation (the P105G substitution), which affects a loop implicated in subunit contacts, yields a protein that reversibly dissociates to folded monomers. This means that the monomeric kinase subunits are on the verge of order and hence mark the other, 'least disordered' end of the spectrum. However, as this behaviour still fits the criteria of unstructured monomers forming an ordered complex, the native form of nucleoside diphosphate kinase is included in MFIB (MF6110001).

What evidence is needed for complexes to be included?

The primary requisite for the inclusion of a complex into MFIB is to have experimental data that proves that all constituent protein chains that take part in the interaction only adopt a stable structure as a result of the complex formation. There are two principal ways this can be shown:

  • First, in some cases all protein chains (or at least the interacting regions of the proteins) have been shown to be intrinsically disordered in their monomeric form (such as for MF2201002). This is usually a challenging task, as disordered proteins are difficult to handle experimentally (they are generally unstable being prone to be preferentially targeted for proteolysis). Furthermore, this approach can only be applied for heteromeric complexes, as the constituent chains of e.g. a homodimer cannot be studied in their monomeric form under native conditions (as they would dimerize).

  • Second, in some cases the folding of all participating chains was measured simultaneously in the context of the complex. In essence, the basis of these measurements is that a protein complex is dissociated by varying some external factor (most commonly the temperature or the concentration of some denaturing agent, such as urea or guanidine hydrochloride). The structure content of the protein complex (or the solution of monomeric proteins after denaturaion) is monitored throughout the process. If the tertiary structure of the monomers disappears exactly when the complex is broken up, the complex is thought to form via a mutually coupled folding-and-binding process, where the folding of the monomers only happens upon interaction. This approach is particularly useful and common for homo-oligomeric complexes (such as for MF2110017).

For each entry in MFIB there should be enough experimental evidence for including it. This means that either there are evidence for the intrinsically unstructured nature of all participating protein chains, or there is evidence for the structured complex itself to arise directly from the interaction of unstructured monomers. In some rare cases both types of evidence is available for a complex, as in the case of MF2201001.

While the majority of entries in MFIB have direct experimental evidence supporting the disordered nature of all interacting chains, in some cases disorder evidences were assigned from proteins bearing a high level of homology. It has been shown that in case of ordered proteins a 30% sequence identity means true homology for the overwhelming majority of cases, and in the case of sufficiently long alignments the adoption of the same fold (Rost, 1999, PMID:10195279). No such systematic study has been conducted concerning protein disorder. However, it is safe to assume that if 30% identity is generally sufficient for two ordered proteins to share the same fold, the significantly higher level of identity/similarity guaranteed by belonging to the same UniRef90 cluster or bearing the same Pfam object should be sufficient for belonging to the same structural class (ordered or disordered).

Why aren't there complexes with DNA/RNA/other macromolecules?

The primary focus of MFIB is the collection of complexes where the folding of each participating macromolecule is linked to the interaction that stabilizes the complex. While there are proteins that only fold upon the interaction with DNA/RNA or other molecules (such as lipids or the membrane itself), such complexes are not included. The primary reason behind this is that protein-protein interactions are markedly different from protein-DNA or protein-RNA interactions and we opted to keep MFIB specific to the former. Furthermore, while in complexes where proteins fold with the help of DNA (like many dimeric transcription factors) or RNA (like many ribosomal proteins), the structure formation of the proteins is linked to the interaction, but the DNA/RNA partner usually already has a stable structure prior to the complex formation.

I know a certain complex fits the above criteria, but it still isn't included in MFIB. Why?

During the construction of MFIB several databases were integrated (like PDB, UniProt, Pfam, IDEAL and DisProt) to provide a means for the systematic collection of protein complexes with mutual synergistic folding. The results of this collection were manually curated and complemented with extensive literature searches to widen the coverage of MFIB as much as possible. However, undoubtedly there are many complexes that would fit MFIB but are not included yet. If you know such a complex, please let us know at mfib(at)ttk.mta.hu so we can include it.

Are proteins in MFIB disordered on their entire length? Or can they contain domains?

Many proteins are modular and contain domains that act mostly independently from each other in a structural sense. In accordance, the inclusion in MFIB only requires that the region of the proteins that directly take part in the interaction be disordered in their monomeric forms. Other regions of the interacting proteins that do not form part of the complex can be either disordered or ordered as they do not have a primary effect on the interaction covered by MFIB.

While MFIB only concentrates on the intrinsically unstructured regions of the interacting proteins, it gives an indication of the extent of other regions of the same proteins as well. This is found as the 'UniProt coverage' for each protein of every entry. This value describes the fraction of the whole protein that directly contributes to the interaction (and hence is visible in the corresponding structure).

How are MFIB accessions generated?

Each MFIB entry is assigned a unique accession, which is composed of the letters 'MF' at the beginning, followed by 7 digits. The first two digits mark the oligomeric state of the complex with the first digit being equal to the total number of interacting proteins and the second being equal to the number of unique proteins (this is '1' for all homo-oligomers and greater than one for all hetero-oligomers). In accordance, for example accessions for all homodimers start with MF21, accessions for heterotetramers start with MF4x (where x>1) and so on.

The third and fourth digits contain information about the taxonomic group(s) from which the interacting chains originate. The third digit shows the highest taxonomic group of all chains with '0' corresponding to human, '1' corresponding to all other eukaryotes, '2' meaning bacteria, '3' meaning archaea and '4' denoting viral proteins. The fourth digit shows the heterogeneity of the origin species of the interacting chains. It is '0' if all interacting proteins are from the same species, '1' if they cover more than one species but all are from the same taxonomic domain, and '2' if the proteins in the complex cover more than one taxonomic domain. For example the third and fourth digits of the entry containing the archaeal histone hMfB (MF2130001) are '30' as it only contains proteins from the archaea Methanothermus fervidus. In contrast, the third and fourth digits of the GTPase binding domain of neural Wiskott-Aldrich syndrome protein in complex with E. coli EspF(U) (MF2202001) are '02' as it contains a human and an E. coli protein that belong to different taxonomic domains.

The last three digits form a randomly assigned number that guarantees the uniqueness of the accession.

Why are certain PDB structures modified?

All protein complexes that are included in MFIB have a solved structure deposited in the PDB. However, in some cases the original PDB structure does not (or does not only) show the biologically relevant, core interaction. To remedy this, in these cases we generated a modified PDB file. A description of the transformations made on the PDB is given for each entry where relevant. These transformations can be the omission of protein chains (to reduce possible duplicity present in the PDB structure, such as for MF2120011), the generation of protein chains (based on the biomatrices described in the PDB file, e.g. for MF3140001), or truncations of protein chains (to only include regions of proteins that mediate the highlighted interaction, e.g. MF2120024). For each entry the modified PDB files are available for download and are displayed in the embedded structure viewer.

What does 'related structures' mean?

For each complex the PDB was scanned for highly similar other structures. PDB IDs of such structures are given for every entry where relevant. Two complexes are deemed related (or highly similar) if they contain the same number of proteins, and the proteins from the two structures show a sufficient degree of pairwise similarity, i.e. they belong to the same UniRef90 cluster (the full proteins exhibit at least 90% sequence identity) and convey roughly the same region to their respective interactions (the two regions from the two proteins share a minimum of 70% overlap).

How is redundancy treated in MFIB?

The basis of MFIB is the PDB database which - in certain cases - exhibits a high degree of redundancy. In order to reduce this redundancy, MFIB groups certain complexes that share a high degree of similarity (these are called 'related structures' - see above). From each cluster, one complex was chosen as a representative of the interaction based on structure determination methods, quality, and source organism (NRM structures were selected if available; in case of clusters with only X-ray structures, structures with better resolution were selected; and in case of structures with the same quality, proteins from higher order taxonomic groups were favoured over others).

As the criteria for being considered 'related' is very stringent, some level of redundancy is inevitable. E.g. while all histones share a high degree of structural (and some even sequential) similarity, histone dimers from various organisms often fall into separate entries. We believe that this amount of redundancy is useful, as it aids the comparison between similar structures emerging through different sequences. In order to offer an easy way of further reducing this redundancy, each entry is further classified into classes and subclasses (see below).

How are classes and subclasses defined/assigned?

MFIB entries are grouped into classes and subclasses. As of now, the following classes and their constituent subclasses are defined:


Classes Subclasses
Coils and zippers Coiled coil (dimeric)
Coiled coil (dimeric, forming a 4-helix bundle)
Coiled coil (trimeric)
Coiled coil (tetrameric)
Coiled coil (tetrameric, 4-helix bundle)
Coiled coil (pentameric)
Coiled coil (hexameric)
Alanine zipper (trimeric)
Leucine zipper (dimeric)
Leucine zipper (tetrameric)
Phenylalanine zipper (dimeric, forming a 4-helix bundle)
Other
Histone-like interactions Histones
Histone-like complexes
Bulb-type lectin domain Homodimeric lectin
Heterodimeric lectin
NGF-like proteins Homodimeric NGF-like proteins
Heterodimeric NGF-like proteins
L27 domains L27_1 type
L27_2/N type
Transthyretin-like folds Transthyretin
HIUase
Homooligomeric enzymes Homodimeric enzymes
Homotetrameric enzymes
Homohexameric enzymes
Other Basic helix-loop-helix (bHLH)
Coiled coil/foldon domain
E2 dimers
p53 tetramerization
Phd antitoxin
Ribbon-helix-helix (RHH)
Trp repressor-like
Other

Each complex in MFIB is assigned a class and a subclass during the manual annotation and curation step. This means that there is no automated categorization and no algorithm to group a certain complex into any class or subclass. The grouping is done by the curators of the given entry; however as almost all groups and subgroups represent a structurally well-defined set of complexes, the grouping is near-trivial in most cases. The only class that does not represent a structurally homogeneous set of complexes is 'Homooligomeric enzymes'. In contrast to other classes the elements of this class cover a wide range of structures; however, the function of the constituent complexes provide a firm basis for classification.

Can I use MFIB for my work?

MFIB is freely available for use in academic works - we only ask to cite MFIB if it has a substantial contribution to your project. Please use the reference below:

Erzsébet Fichó, István Reményi, István Simon and Bálint Mészáros:
MFIB: a repository of protein complexes with mutual folding induced by binding
Bioinformatics. 2017 Nov 15;33(22):3682-3684
PMID: 29036655
doi: 10.1093/bioinformatics/btx486

If you would like to use MFIB in a non-academic environment, please contact us at mfib(at)ttk.mta.hu