As probably the most well known social network available today, Facebook’s engineers have to manage the challenges that come with the ‘connected world’ they so often speak about. It is Software Engineering principles that enable them to provide us with a fast and intuitive service. At the heart of all systems lies data structures and algorithms and this aim of this post is to highlight just what’s going on ‘under the hood’ of Facebook.
You will often hear Facebook talk about your ‘Social Graph’ or the ‘Open Graph’ framework. This isn’t just some marketing name, this Graph is at the heart of Facebook. We couldn’t connect with and discover friends without it, we couldn’t like our friends posts without it. The Graph is the data structure at the heart of Facebook.
I should note I have no connection to Facebook, nor a bias towards them. The purpose of this blog post is to simply explain in real terms what Facebook really is underneath the various user interfaces. Therefore, there will be sections that don’t relate exactly to Facebook’s implementation, but the ideas behind Graphs will remain true.
Data Structure – Graph
What are Graphs
Look at the graph below, it could represent your social graph on Facebook. It shows who you are friends with, what services you listen to music on, what you like, as well as what you cook! However, no matter what the graph is displaying, it shares two common properties, Vertices and Edges.
So what are Vertices, or in the singular form, a Vertex? In the graph below, you, your friends, the objects you like, these are all Vertices. That leaves Edges, an edge is a connection between two Vertices. In our example, the line between two people is an edge, the line between a person and an object is an edge. It is a connection between two Vertices.
So this Vertex is you, me, the things we like, it is every ‘object’ on Facebook. Vertices can contain information, such as our name, our age, the location of a shop, a service description. Therefore, Facebook see’s you as a Vertex in the software realm, and you contain certain information that cannot be interacted on by the rest of the community. However, there is things about you that can be interacted with by the community, such as where you live, the posts you create, as well as the place you work. Therefore, not everything about you is contained within the Vertex, anything that your friends can interact with are also Vertices, this is the distinction.
We have all these Vertices, but as we know, we can like our friends posts, we can say we work at a certain company, even say that we have visited a certain location, or are listening to some music. A Graph provides just the element to connect the Vertices, they are Edges. Similar to Vertices, Edges contain information that details the connection between the two Vertices, i.e. I am ‘Friends’ with this person, I ‘like’ this persons post, I ‘shared’ this post. They can contain additional information, i.e. I am ‘Friends’ with this Person, they are also my Brother.
So how do we decide the importance of these edges, I may of liked a friends post a year ago, but at this moment in time, I don’t want it to appear on my news feed, but at the same time, I don’t want it to disappear from my graph. I may want to look back in the future at my interactions with that friend. We need some of ranking system.
Facebook Page Rank
So with all of these Vertices and Edges, how does Facebook decide what information to display in our Feed? Well, as we mentioned at the start of the blog, software is made up of data structures and algorithms. We have covered the fact the Facebook uses the Graph data structure, and we will now cover one algorithm they execute on the graph.
Facebook Edge Rank is an algorithm that determines what Vertices to display in our news feed, depending how ‘close’ we are to the vertex (i.e. it’s a post from my brother), how important Facebook feels the Vertex is (i.e. Facebook feels images should be more prominent than traditional posts in your feed), as well as when it was posted. For example, in the graph to the right, you can see that I have two friends, Fr1 and Fr2, I’ve already liked one of my Fr2’s posts (P1), but I’ve had no interaction with Fr1’s posts. Therefore, at an educated guess level, I probably care more about Fr2’s posts than Fr1’s.
As I interact with Fr2’s posts, Edge Rank is improving my ‘Affinity’ score with Fr2. Therefore, when Facebook looks at my friends to see what information it should put in my Feed, it will start with Fr2 because my Affinity with them is greater than Fr1‘s. So we now want to take some information from Fr2 to put in my feed, I can see post 2 (P2) has two comments whereas as P1 has none (obviously it’s only me that liked what they said!), which post should appear top of my feed? If we assigned liking a post the value 1, and someone who I am not friends with commenting on a post the value 0.4, we can sum up the ‘weight’ to determine the weight rank. P1 has a rank of 1, whereas P2 has a rank of 0.8. Ok, so P1 has a greater weight rank than P2, but what if P1 was created 2 years ago, I certainly don’t want it appearing in my news feed! Therefore I will assign it some value so that it is lower than a post created recently (1/Time Since Creation) Lets assign P1 a Time Decay value of 0.005 and P2 a value of 0.5. So if we multiply each individual value together for the sum of the edges, that is the Affinity, Weight, and Time Decay we will get a rank for P1 of 0.005 (1 Affinity * 1 Weight * 0.005 Time Decay) and P2 will get a rank of 0.4 (1Affinity * 0.8(0.4*2) Weight * 0.5 Time Decay). Therefore, P2 has the greater rank and should appear at the top of my news feed.
The algorithm will continue to iterate, assigning vertices a rank and determining where they will appear in my feed.
This is a very basic example, but should give you an idea of how the Graph data structure and implementation of algorithms serves you the information you care most about.
In the next blog post of this series, I will be going over how graph navigation algorithms can help Facebook recommend friends, locations, etc, to you. I will also go into a little more detail about graph Paths and Circuits, Cyclic and Acyclic graphs, as well as two important types of paths and circuits, Hamilton and Eulerian. I will also touch on the idea of connected graphs and how two disconnected graphs in Facebook become connected, and how that correlates to greater activity in your news feed.