Data Storage for Social Networks A Socially Aware Approach
Evidenced by the success of Facebook, Twitter, and LinkedIn, online social networks (OSNs) have become ubiquitous, offering novel ways for people to access information and communicate with each other. As the increasing popularity of social networking is undeniable, scalability is an important issue for any OSN that wants to serve a large number of users. Storing user data for the entire network on a single server can quickly lead to a bottleneck, and, consequently, more servers are needed to expand storage capacity and lower data request traffic per server. Adding more servers is just one step to address scalability. The next step is to determine how best to store the data across multiple servers. This problem has been widely-studied in the literature of distributed and database systems. OSNs, however, represent a different class of data systems. When a user spends time on a social network, the data mostly requested is her own and that of her friends; e.g., in Facebook or Twitter, these data are the status updates posted by herself as well as that posted by the friends. This so-called social locality should be taken into account when determining the server locations to store these data, so that when a user issues a read request, all its relevant data can be returned quickly and efficiently. Social locality is not a design factor in traditional storage systems where data requests are always processed independently. Even for today’s OSNs, social locality is not yet considered in their data partition schemes. These schemes rely on distributed hash tables (DHT), using consistent hashing to assign the users’ data to the servers. The random nature of DHT leads to weak social locality which has been shown to result in poor performance under heavy request loads. Data Storage for Social Networks: A Socially Aware Approach is aimed at reviewing the current literature of data storage for online social networks and discussing new methods that take into account social awareness in designing efficient data storage.