Schematizing an Instagram-Style System for Likes, Hashtags, and Followers in SQL
If you’ve been following my previous blog posts, you’ll know I’ve been making my way through Stephen Grinder’s SQL and PostgreSQL: The Complete Developer’s Guide course on Udemy. For the past week, Grinder’s course has described how to model certain features found on the app Instagram in an SQL schema. As many of these ideas were on the theoretical side, I really enjoyed this section and wanted to summarize some of the takeaways I found while completing this section.
A user of Instagram might be surprised that the number of likes a post has is not stored as a number, but instead the number of likes a post has is the end result of an aggregate query, or is derived data. You might be wondering why Likes, whether for a post or for a comment is implemented this way.
Let’s think about the functionality of the Like system on Instagram. A user likes a post only once, the poster cannot like their own post, and a user can unlike a post that they liked. Now imagine if the number of likes a post has were stored as a number. How would the database check if a user only liked a post once? How would the database make sure the poster did not like their own post? In order to add for these checks each ‘like’ would be registered individual by the user_id who liked the post and the post_id of the liked post. This way there can be a verification that there must be unique combinations of user_id and post_id, and, through a join, that a post’s poster id does not match with the liker’s user_id.
A few years ago, Instagram added the functionality of liking comments. One question that arises is whether comment likes should on the same table as photo likes. While using two separate tables would be an easy solution, if we wanted to querying for interactions, to provide user recommendations, may pose some difficulties; such queries would need to perform a join each time.
Having these two types of likes on the same table would remedy this. This can be achieved in two ways,either by having one id column and a type column (polymorphic associations) or by having comment_id and post_id as two separate columns and having a check that at least one of these columns must be NULL, through the following check:
Add CHECK of (COALESCE((post_id)::BOOLEAN::INTEGER, 0)+COALESCE((comment_id)::BOOLEAN::INTEGER, 0))=1
Coalesce is a handy function that returns the first object in a grouping, so if post_id or comment_id is NULL, coalesce would return 0. Grinder’s solution using this option as it allows simpler queries to be run.
Unlike “Likes”, Grinder utilized two separate tables to model this relationship, a table of hashtags comprising the id, created_at timestamp, and the hashtag text and a second table of where these hashtags appear, comprising of the foreign key from the first table and the id of the post in each row. You might wonder as I did why this approach was chosen rather than having a single table with the text of the hashtag. If a single table were employed, the text of the hashtag would need to be repeated every time the hashtag was used, and strings take up more memory than integers do, so to have two separate tables would actually save more memory than a single table would.
Seeing how “Likes’ and “Hashtags” were implemented, you probably already have a good idea of how followers can be implemented. A separate Followers table would be created with the following columns: leader_id and follower_id, each of which would be a foreign key to the user table. A piece of derived data like follower count would be provided by an aggregate query on this data with a filter for the leader id.