Blog @ WhiteHedge

Importance of DevOps – Explained using Instagram Case Study

Explained using Instagram Case Study

“Its hard to convince people unless they land into trouble and the best way to learn is from experience”

Today we all know about Instagram and are fascinated about the number of users using the application on daily basis. If you aren’t please see the stats below

2015 – Instagram has 400 million users all over the world who upload 80 million photos and videos a day.

It is hard to believe how instagram scaled so well while it just started with two developers in year 2010. Lets take a dive in their story and try to understand how they learnt their lessons.

In 2010 just before the launch of instagram both the developers (founders Mike & Kevin) were wondering how many downloads will they have on the first day.

The number of downloads they had on first day was 25,000. But it just didn’t stop there. They got 100,000 users in their first week and all they had as their infrastructure was a server having less computing power then a Macbook Pro. So Soon they called up hosting provider asking for another server only to know that they will require around 2-4 days to provide them one. Looking at the unpredictable growth of instagram in the very first week they knew that asking for servers with such high turnaround time will not be working. This is when they decided to switch for Amazon Web Services (AWS). With AWS they got capability to get new servers as and when load increased. And the perk was whenever there was less load they could stop servers and reduce the cost.

Then in 2012 came the android app for instagram and it was the most anticipated one. Over a million new people joined Instagram in the first 12 hours of the launch — it was an incredible response. So instagram was growing and making all the noise until one day when instagram was DOWN. A quick check showed that Amazon Web Services was down. All this was because a huge storm had hit Virginia and half of the instagram instances had lost power. Next hours were very tedious and as they had to rebuild the whole infrastructure from almost scratch doing one server at a time. This was the time when team understood how important it was to automate their infrastructure. Not only it was useful to save time but also helped to work more effectively as there would be less of manual intervention.

So following year instagram automated their infrastructure using chef. This also helped new team members to easily get adjusted, as there were less fragile shell scripts.

There is one more lesson to take away, never keep your infrastructure dependent on one a single region as unpredictable things can always happen.

Written by Anand Karwa – WhiteHedge DevOps Team

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.