What I learned about Microsoft doing DevOps: Part 1 - Moving to Agile & the Cloud

2017-11-10

What would you do if you had the opportunity to have a look inside the kitchen of a Michelin Star restaurant? Not only would you be able to look around, you will even be able to meet members of the team who will explain to you what they did to get their Michelin Star. I bet that if you have a love for cooking, you would jump on the opportunity to learn from the best.

Recently I had this opportunity. Not with a Michelin Star restaurant but with Microsoft. The team that builds Visual Studio Team Services and Team Foundation Server, Microsoft’s DevOps products, invited me to come to Redmond for a whole week of intense seminars and discussions on how they build VSTS, what they’ve learned and how they’ve grown into a real High Performance organization. Being a Microsoft ALM | DevOps MVP I already had a lot of insider knowledge but this week blew me away. The content was fantastic and I learned a lot.

This week of training was part of the Microsoft DevOps FastTrack Airlift (yeah, they still love long names). The FastTrack program allows an enterprise to buy all the resources they need to move to DevOps and get two weeks of intense consultancy to help them get started all in one easy package. As of now I’m qualified to give these two weeks of consultancy and offer enterprises an inside look into the Microsoft kitchen.

In the following series of blog posts I want to share the Microsoft story with you. Of course I won’t be able to cover everything that we discussed during the week but I’ll share some of my own favorite highlights. Hopefully they will inspire you on your DevOps journey.

This is part 1 of a series on DevOps at Microsoft:

The need for change

Have you worked with Team Foundation Server 2005? 2008? 2010? The time between each of those releases was a couple of years. At that time Microsoft used rigorous planning to build their software. Because they were going to put in two or three years of work, it wasn’t unreasonable to spend many months on planning. After this, the first landmark that had to be reached was code complete. That was the moment where all features where implemented and a big party was thrown. After this, the second phase began: test & stabilize. After that beta releases where shipped, bugs fixed and finally RTM was hit. The product was now in the hands of the customer and work on the new version had already started.

Imagine being a customer during that period who tried one of the beta releases and didn’t like one of the features. Microsoft would listen to your feedback but since code complete was already done, the only thing they could do was schedule the feature for the next release. Which was in a couple of years.

In the era where this schedule was used, there was nothing wrong with it. Customers where used to real boxed products with a CD or DVD that you installed. However, the world has changed. You can now buy software in App Stores on your mobile, use Software as a Service applications on a subscription basis and do this from everywhere with everyone.

Figure 1 The old way of planning, building and releasing TFS

Maybe you recognize your team or organization in this picture. Planning, building and finally shipping features takes a long time. Customer feedback takes long to incorporate. And somehow, your competitors seem to be closing in on you. What did Microsoft do to turn this around and what can you learn from it?

Firms today experience a much higher velocity of business change. Market opportunities appear or dissolve in months or weeks instead of years.

Diego Lo Giudice and Dave West, Forrester February 2011

Moving to a cloud cadence

Team Foundation Server is an on-premises product. Microsoft acknowledged that although certain customers want to use an on-premises product, more and more companies are looking to the Cloud. To achieve this Microsoft launched a hosted version of TFS in June 2012 (the name changed a couple of times but let’s stick with VSTS to make it easier). Work on VSTS started in August 2010.

Figure 2 Important milestones in the development of VSTS

In August 2010, the VSTS team also started with Agile. They moved to a three week sprint cadence and planned on developing VSTS and TFS this way. In the beginning however, their thinking was still that of a boxed product. The team planned major and minor updates and installed these on the service when ready. In the end, all updates where major and some of them where very hard to deploy.

Take for example December 2011. Three months earlier, the service went public and a lot of customers started playing with it. Now take the following blog post: https://blogs.msdn.microsoft.com/devops/2011/12/08/team-foundation-service-december-update/. This post tells proudly that the updates of the last three months where mostly small bug fixes. This was to be the first big update that delivered new features. In the end, this update went very badly and took a week to complete.

The big lesson from this is: big deployments are hard. If you deploy a lot of code at once, a lot can go wrong. Doing more frequent and smaller deployments makes deployments much easier. But how do you make sure your code is tested and stable to be actually released to customers? Is that possible in such a short sprint?

Figure 3 Sprints of three weeks that actually ship code at the end of the sprint

Changing the organization chart

All customers that I speak to that are starting with Agile have questions on how they should organize testing. Making sure that code is planned, developed and tested within a sprint is a challenge for many teams. And Microsoft was no stranger to this.

When starting with Agile, Microsoft was afraid of the quality they would deliver. Because of this they scheduled stabilization sprints. Every fifth sprint the teams would work on fixing technical debt and bugs and make sure their code stayed in good shape. This lead to the behavior that some teams would postpone all bug fixing to this fifth sprint and would have a growing set of bugs during the first four sprints. Other teams would fix their bugs during each sprint. Unfortunately for those teams, they were assigned to help the other teams during the stabilization sprint.

Microsoft took radical steps to improve. Microsoft was used to have different roles in their teams:

Program Management
Development
Test

Program Managers are responsible for what the team builds. They interact with customers and set out the roadmap for the team.

This organization structure was hindering them in becoming more Agile. Manual testing was costing a lot of time and fitting manual testing work into a sprint proved to be impossible. Because of this they changed to the organization structure as shown in Figure 4 and removed the stabilization sprint. Instead of having separate test and development roles, they introduced a shared engineering role. Every engineer is responsible for the whole feature they develop.

Figure 4 Engineering became a combined test and development role

This meant that both developers and testers had to learn new skills. Developers needed to learn more about testing practices, automated test frameworks and overall writing high quality code. Testers needed to become better developers. Yes, this meant that some people couldn’t make this change and were assigned to different parts of the company or were even let go. In the end, having a full team responsible for the complete feature set they deliver is key to DevOps.

Having a cross-discipline team is very important. Many teams are organized around specific layers of an application architecture. For example, you have a database team and a UI team. These teams have to work together to deliver a single feature. Handoffs between teams lead to delays and waste. A cross discipline team has all the required knowledge to develop a feature from start to finish. Having all the required roles to develop a feature avoids handoffs and allows a team to optimize their workflow.

Figure 5 Cross discipline teams can deliver a feature without any dependencies on others

Each team within the VSTS product group consists of 10-12 people and is fully self-managing. Teams have a clear charter and goals. They know what they are going to work on and the value they’re going to deliver to the customer. To make sure that the team becomes high performing, the team is kept intact for 12-18 months. It’s not that after this period the whole team is dismantled. Instead, people get to choose if they want to move to a new team. The team is also located in a physical team room.

Figure 6 A VSTS team room

Bugs are kept under control by a simple rule: #number of engineers on the team * 5 = the maximum number of bugs you’re allowed to have. If your bug count exceeds your bug cap, stop working on new features and fix your bugs first.

Conclusion

The first big steps in Microsoft’s transformation were the adoption of Agile, a true cloud cadence and the implementation of cross-disciplined teams consisting of a product manager and engineers. I think this is one of the most important changes to make. Having a cross discipline team be fully responsible for the code they write and deliver removes silos, improves quality and increases speed. But of course the story is not over yet. In part 2 I’ll want to show you how Microsoft does it’s planning and how they make sure so many teams stay aligned with each other.

You can now find some of the original video content at https://www.visualstudio.com/learn/devops-at-microsoft/