Analogous to genomic sequence alignment, biological network alignment identifies conserved regions between networks of different species. Then, function can be transferred from well- to poorly-annotated species between aligned network regions. Network alignment typically encompasses two algorithmic components: node cost function (NCF), which measures similarities between nodes in different networks, and alignment strategy (AS), which uses these similarities to rapidly identify high-scoring alignments. Different methods use both different NCFs and different ASs. Thus, it is unclear whether the superiority of a method comes from its NCF, its AS, or both. We already showed on state-of-the-art methods, MI-GRAAL and IsoRankN, that combining NCF of one method and AS of another method can give a new superior method. Here, we evaluate MI-GRAAL against a newer approach, GHOST, by mixing-and-matching the methods' NCFs and ASs to potentially further improve alignment quality. While doing so, we approach important questions that have not been asked systematically thus far. First, we ask how much of the NCF information should come from protein sequence data compared to network topology data. Existing methods determine this parameter more-less arbitrarily, which could affect alignment quality. Second, when topological information is used in NCF, we ask how large the size of the neighborhoods of the compared nodes should be. Existing methods assume that the larger the neighborhood size, the better.
Our findings are as follows. MI-GRAAL's NCF is superior to GHOST's NCF, while the performance of the methods' ASs is data-dependent. Thus, for data on which GHOST's AS is superior to MI-GRAAL's AS, the combination of MI-GRAAL's NCF and GHOST's AS represents a new superior method. Also, which amount of sequence information is used within NCF does not affect alignment quality, while the inclusion of topological information is crucial for producing good alignments. Finally, larger neighborhood sizes are preferred, but often, it is the second largest size that is superior. Using this size instead of the largest one would decrease computational complexity.
Taken together, our results represent general recommendations for a fair evaluation of network alignment methods and in particular of two-stage NCF-AS approaches.