A fresh look at protein interactions using data mining

A fresh look at protein interactions using data mining

Researchers have created a novel way to define individual protein associations in a quick, efficient, and informative way. These findings, published in the journal Nature Communications, show how the topological scoring (TopS) algorithm, by combining data sets - identify proteins that come together.

The approach is similar to looking at the activities and interactions of all the individuals in a community and then selecting out the most meaningful interactions, some which may be very rare. The researchers are looking for the biological equivalent of two individuals who may be the only two in the entire community that participate in an important interaction.

Not only does this help researchers identify how proteins perform biological functions or carry out biological processes, the algorithm can be applied to previously generated biological data and potentially other areas of science to glean new information.

"It's a form of big data analysis that we are applying to proteomics data to identify and understand protein interaction networks," says the senior author. "It's complementary to a lot of techniques already in use so it can be used to ask and answer new questions."

Protein data sets can be challenging to examine for meaningful information because they are so large. "You have thousands of proteins to look at," says the lead author. Understanding how a wide variety of proteins come together to do something, like repair DNA, is a difficult problem. "We wanted to simplify the problem."

That meant instead of taking an overall view of everything, they hunted for less common events. Researchers did this by looking for bait (proteins already known to be involved in processes of interest) and prey (proteins that could interact with bait proteins) to see how they interacted in human DNA repair and yeast chromatin remodeling complexes. Through TopS, data is analyzed in a parallel fashion, meaning that data from several biologically-related baits are considered at the same time. A key attribute of TopS is the ability to evaluate the preference of a prey protein for a bait relative to other baits. "Instead of calculating a score by concentrating only information of a single bait, we now aggregate information from the entire data set," explains the author.

The team has also published these findings on Github, a computer code repository, because they want to offer other researchers the opportunity to test the algorithm and see how they can apply it to their own projects.