Monday, January 5, 2015

What do you mean "it's in production"?

What do you mean "it's in production"?

Short Story: 

To many groups use the word "production" and that word changes meaning and risk depending on the group in question.

Long Story:

Our usage of the term "production" leads to some issues as it changes context based on audience. Operations can look at "production" as a matter of state where as Development may see it as a function or environment. To confuse matters worse Ops may also refer to it as an environment given it's history of working with Dev.

An example of the issue is demonstrated by a common statement.
Jane: I show server prodX is down, whats going on?
John: It's ok server prodX is not in production.
Jane may be reasonably confused by Johns statement. What does John mean by the server is not in production?

  1. "prodX" is not in the production environment. (Maybe the node name is mislabeled or misunderstood.)
  2. "prodX" is in the production environment but is not in a production state.
  3. "prodX" is is not in a production state and is not in a production environment. 

This also applies to the simple statement.
The code has been deployed to production.
This could mean:

  1. The code is servicing customer requests.
  2. The code is located in the production environment.
    1. It is servicing customer requests.
    2. It is not servicing customer requests.
  3. The code is not in a production environment but it is taking requests.

From an Ops perspective there are three options for any given service outage:

  • SEV1/2/3: Drop everything (Severity determines response time)
  • SEV4: Don't wake me I will get it when get in.
  • REQ#: Nothing is broke you should send in a request.

Operations service response for State \ Environment
Not Active \ Non-Prod => REQ#
Not Active \ Prod => SEV4
Active \ Non-Prod => SEV4
Active \ Prod => SEV1/2/3

Developers on the other hand have a near reverse perspective.

  • P1: Project is in active development.
  • P2: Project is waiting on resources.
  • SEV#: Help to make sure the application keeps working. There are constraints on what we can do.

Developers response for State \ Environment
Not Active \ Non-Prod => P2
Not Active \ Prod => P1
Active \ Non-Prod => P1
Active \ Prod => SEV#

When you merge the views you will see that there is a conflict for Not Active \ Prod, Active Non-Prod, and Active \ Prod.
In the case of "Not Active \ Prod" and "Active \ Non-Prod" the Ops teams will give low priority for supporting resources to the Development teams. This can impact speed of delivery of fixes and features to production but it conflicts with Ops immediate role of keeping things working. Likewise because the hands of the Dev teams are usually tied in "Active \ Prod" environments the Dev teams are slow to help seeing that it is Ops job to control those environments, even though it is the previous chain of work that feeds production.

How does DevOps resolve this issue?

There are two issues with understanding "what is production".

  • How do you deal with scope and work priority?
  • How do you deal with semantics?

How do you deal with scope and work priority?

In some ways DevOps flips the priory of both Dev and Ops. The problem area is what gets DevOps focus and it is where Developers and Operations must meet. The mission for each group stays the same, however structure needs to be added to have the groups work together in those contentious areas.
The Ops team needs to understand the work of the Dev's. They need to see the features and be active in understanding why a function is monitored or not monitored, what is the impact of a missing function, and what are the business drivers for a service. All of those things help in determining risk which Ops deals with regularly. The Ops team doesn't change how they respond to SLA's for Prod and Non-Prod but they should work with the Developers on seeing what is happening in those space.
Devs need visibility into what Ops is doing and dealing with. From a service perspective they need access to logs, monitors, and trends which should all be jointly reviewed by Ops and Dev as it may directly impact Dev's mission. Both groups need to create a constant feed back loop that helps push each team to better work quality and ultimately better service for the business.

How do you deal with semantics?

The issue of semantics is difficult. That "Sami Language of Norway, Sweden, and Finland have a 180+ snow and ice related words. This is needed because the distinctions are important. The more I see of companies dealing with this issue the more I think the same of ITSM and DevOps. However I do not now what that word should be or how it should be structured as both state and environment are important to IT, but both had different context for different groups.