Q: Given a social network containing N members and a log file containing M timestamps at which times pairs of members formed friendships, design an algorithm to determine the earliest time at which all members are connected (i.e., every member is a friend of a friend of a friend … of a friend). Assume that the log file is sorted by timestamp and that friendship is an equivalence relation. The running time of your algorithm should be MlogN or better and use extra space proportional to N.
(Java Code on github at the bottom of the post. )
My thoughts and solutions:
When solving dynamic connectivity problems (like whether there is a path between two nodes but we don’t care what the exact path is), we can try using the Union-Find algorithm. And this is a typical application of union-find in social network.
This question asks if we can determine the earliest time at which all members are connected, which means very time we union two objects, we have to check if all points are connected.
Well, if we use the naive Union-Find algorithm as a base, to check if all objects are connected is to check if they have the same group label in the array and that will take O(N) time for each union. So M unions take O(MN), which is unacceptable.
So we move on to a better version of Union-Find, the Weighted Union-Find algorithm. It uses a tree structure to represent the connected components. So if all members are connected, there should be only one root element left, which means there is only one array element at index i such that array[i] == i. OK, so we go and check how many elements satisfy this condition after each union. However, this takes O(N) time for each check too.
Well, all we care is the number of roots left, right? Remember initially there are N objects and each represents its own tree. So we have N roots at the beginning. Each time we union, we get one less root. So, why not use a variable count to track the number of roots and change its value along the way of doing the union operations? And to check if all members are connected, we just check if count == 1? This takes constant time and O(MlogN + M) = O(M(logN + 1)) = O(MlogN).
Some other thoughts:
Maybe we can work from this direction too: we know that Weighted Quick Union-Find takes O(MlogN) time for M union operations. Now we are doing something extra and we have to keep the time cost on the same level, then it has to be something constant. Now where can we fit in something constant and get the job done?
What else can we learn from Union-Find algorithm? Well at least one thing: to represent connected components, we can use an array. Elements in the same component share the same component id (same integer) or we represent them as a tree. This also suggests how to represent a tree in an array or when you have a tree, what data structure you can use to hold it.
Code on github: