Thursday, March 9, 2017

I am a terrible blogger.. that is all

I really am terrible at blogging. Apparently I have ignored my blog for over the year. Now I am just debating should I delete it all or pick it up again...

My first inclination is to leave it and learn from it, if anything it makes me laugh.

Wednesday, July 29, 2015

Fighting with Java SSL and Confluence

The Error:

Connection test failed. Response from the server:
ldaps.example.com:636; nested exception is javax.naming.CommunicationException: ldaps.example.com:636 [Root exception is javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative names matching IP address 172.0.0.20 found]

Some notes:
* Does not happen when using java 1.8.0_45 (Java 8 u45)
* I ran into problem when using java 1.8.0_51 (Java 8 u51)
* Running Atlassian Confluence 5.8.4 on EL6

Updates To Follow:




Monday, January 5, 2015

What do you mean "it's in production"?

What do you mean "it's in production"?

Short Story: 

To many groups use the word "production" and that word changes meaning and risk depending on the group in question.

Long Story:

Our usage of the term "production" leads to some issues as it changes context based on audience. Operations can look at "production" as a matter of state where as Development may see it as a function or environment. To confuse matters worse Ops may also refer to it as an environment given it's history of working with Dev.

An example of the issue is demonstrated by a common statement.
Jane: I show server prodX is down, whats going on?
John: It's ok server prodX is not in production.
Jane may be reasonably confused by Johns statement. What does John mean by the server is not in production?


  1. "prodX" is not in the production environment. (Maybe the node name is mislabeled or misunderstood.)
  2. "prodX" is in the production environment but is not in a production state.
  3. "prodX" is is not in a production state and is not in a production environment. 

This also applies to the simple statement.
The code has been deployed to production.
This could mean:

  1. The code is servicing customer requests.
  2. The code is located in the production environment.
    1. It is servicing customer requests.
    2. It is not servicing customer requests.
  3. The code is not in a production environment but it is taking requests.

From an Ops perspective there are three options for any given service outage:

  • SEV1/2/3: Drop everything (Severity determines response time)
  • SEV4: Don't wake me I will get it when get in.
  • REQ#: Nothing is broke you should send in a request.

Operations service response for State \ Environment
Not Active \ Non-Prod => REQ#
Not Active \ Prod => SEV4
Active \ Non-Prod => SEV4
Active \ Prod => SEV1/2/3

Developers on the other hand have a near reverse perspective.

  • P1: Project is in active development.
  • P2: Project is waiting on resources.
  • SEV#: Help to make sure the application keeps working. There are constraints on what we can do.

Developers response for State \ Environment
Not Active \ Non-Prod => P2
Not Active \ Prod => P1
Active \ Non-Prod => P1
Active \ Prod => SEV#

When you merge the views you will see that there is a conflict for Not Active \ Prod, Active Non-Prod, and Active \ Prod.
In the case of "Not Active \ Prod" and "Active \ Non-Prod" the Ops teams will give low priority for supporting resources to the Development teams. This can impact speed of delivery of fixes and features to production but it conflicts with Ops immediate role of keeping things working. Likewise because the hands of the Dev teams are usually tied in "Active \ Prod" environments the Dev teams are slow to help seeing that it is Ops job to control those environments, even though it is the previous chain of work that feeds production.

How does DevOps resolve this issue?

There are two issues with understanding "what is production".

  • How do you deal with scope and work priority?
  • How do you deal with semantics?

How do you deal with scope and work priority?

In some ways DevOps flips the priory of both Dev and Ops. The problem area is what gets DevOps focus and it is where Developers and Operations must meet. The mission for each group stays the same, however structure needs to be added to have the groups work together in those contentious areas.
The Ops team needs to understand the work of the Dev's. They need to see the features and be active in understanding why a function is monitored or not monitored, what is the impact of a missing function, and what are the business drivers for a service. All of those things help in determining risk which Ops deals with regularly. The Ops team doesn't change how they respond to SLA's for Prod and Non-Prod but they should work with the Developers on seeing what is happening in those space.
Devs need visibility into what Ops is doing and dealing with. From a service perspective they need access to logs, monitors, and trends which should all be jointly reviewed by Ops and Dev as it may directly impact Dev's mission. Both groups need to create a constant feed back loop that helps push each team to better work quality and ultimately better service for the business.

How do you deal with semantics?

The issue of semantics is difficult. That "Sami Language of Norway, Sweden, and Finland have a 180+ snow and ice related words. This is needed because the distinctions are important. The more I see of companies dealing with this issue the more I think the same of ITSM and DevOps. However I do not now what that word should be or how it should be structured as both state and environment are important to IT, but both had different context for different groups.

Thursday, September 4, 2014

Learned something new on my way to testing https posts

This is a quick post for testing https posts. If you were doing something simple and just sending http posts then you could use netcat (nc) with something like
$ sudo nc -l 80 < resp_200.txt
The file "resp_200.txt" simply has a a line "HTTP 200 OK". Netcat will open port 80 and respond to what ever connects to it with the text from that resp_200.txt file. It will dump output to the screen with the http post it received. Nice way to test your post. Ah but what do you do when you are sending a post to HTTPS?

OpenSSL can be used to provide some netcat type functionality. You can see a detailed view of this from Wsec "USING OPENSSL AS A NETCAT REPLACEMENT".

Quick how to is:

Create self signed cert
$  sudo openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem

Now use openssl to make a listener on port 443

$ sudo openssl s_server -accept 443 -cert mycert.pem

In my case I'm using Ruby to post to https similar to this example on Stackoverflow but with HTTPS instead of HTTP.

After you post you will see a bunch of text showing you the HTTP post information you sent.

This is a nice way to test your post code before you start hitting your production site.

Friday, August 8, 2014

Time to look at Ansible

I find it time I take a look at Ansible. From the ad-hoc perspective it seems to fit in nicely and probably works a little better than my "batchcmd" bash script I use to run commands across a list of hosts.

I am not sure I agree with Ansible's design concept. The marriage of configuration management and ad-hoc execution is prone to problems. Puppet Labs and R.I.Pienaar, creator of Mcollective, take a pretty strong stance of trying to avoid execution of scripts or code because some crazy things can happen. It's not so bad when it's one or two systems but when you do something on hundreds and they run into the problem that could be an issue.

The nice thing about automation is it enables you to do good things faster across many more systems.
The problem with automation is it enables you to do bad things faster across many more systems.

Both Puppet and Ansible are declarative in nature, so they do not require the item to change only that the item "becomes" a finished state. However given Ansible's "push" philosophy it is also looking for "immediate consistency". This may work for small deployments but in larger systems this becomes problematic as I can break everything from the start. Puppet follows the "eventual consistency" model which when properly accounted for leads to large scale services that deploy as opposed to small scale ones that for whatever reason will not get to the state you want when you think you want it. It also gives me an idio-second to change something back because I broke the first 5 nodes that checked in and not all 100.

Ansible does have a "pull" option which does allow for "eventual consistency" but this begs the question of "why have another configuration management system?". Which then just brings us back to what does Ansible give me that Puppet does not? At this point it gives me the ability to run ad-hoc command execution across multiple systems. Puppet already gives me configuration management and Mcollective gives me safer orchestration.

After trying Ansible out I may change my mind and there is nothing that fundamentally says you cannot use both tools. Ansible being agent-less has it's advantages. However puppet agent has saved me from my stupidity when I broke SSH and OpenSSL. It was nice to have Puppet correct my screw up after locking everyone out of SSH. I could see where it would be nice to have Ansible save me when I inevitably break Puppet doing something silly.

UPDATE: 2014-08-11T19:33-50
Ansible allows you to --ask-sudo-pass to prompt for your sudo password on the systems. This means that as long as your user has sudo rights to run the command in question you can do what you like. Not sure how --ask-sudo-pass stores your password though?

Thursday, July 24, 2014

Too many or too few STEM degrees?

Are there too many or too few STEM degrees and are they being utilized correctly? Infoworld's Patrick Thibodeau and Sharon Machlis note that "For 74 percent, STEM degrees lead to non-STEM jobs"

In my opinion geography plays a huge part in many people's reason to stay. Some of this depends on specialty but for the most part the coasts are looking to decrease their cost and trying to find degrees in other locations. However there are many cases where people simple do not want to move to the coasts. People with STEM degrees are not stupid generally weigh the benefits of moving to California, Texas, or New York / DC.

I have had several colleagues choose to move to the east or west coast. Some who have even moved back after a stent because they simile didn't like something about those places. There is also the cost of living that can change drastically. Trying finding 10 acres or 5 acres of land near San Francisco, Seattle, or New York. You won't not with out having a sizable commute. You can do that in the Midwest (well not Chicago) and to some degree in Texas, but even Texas is starting to show strain due to traffic. As much as I complain about traffic in Kansas City it is nothing compared to Texas, San Francisco, New York, or Seattle.

I have seen a number of tech workers wishing to be able to work remotely. This has strong pull to many but it is a new paradigm that many companies are not equipped culturally or technically to handle. However that is changing. I have seen startups and some large name places coming to understand that they can do work remotely and this will remove the geographic issue to some extent. In the past I have told several companies that I would love to work with them but I am in no position or have no desire to move to their city / state.

This of course has it's own trade offs. Companies in the Midwest are just now realizing the career options that companies like Google, Linkedin, Facebook, or Amazon offer to STEM employees. Otherwise they have been traditional limited in advancement options and frankly most people I know that are successful in STEM are generally drive to want to advance.

In short STEM will see better utilization when geography is removed from the equation as geographic markets play a huge role in the decision to move to the various STEM "meccas" in the US.

Monday, June 23, 2014

RedHat Subscription model and why OEL is easier.

DISCLAIMER: Some colleagues got me started on this rant, and make no mistake this is a "rant", so I fully intend to blame them for this lengthy deluge of vehemently worded information.

I often work on keeping our repository of various Linux distributions available for my job. Tool's like cobblerd are a great help in this effort. By having our repositories readily available and being able to automate much of our deployment makes deploying Ubuntu, CentOS, and OEL systems a snap. I can have a new system built from scratch in less than 30 minutes and I can do many of them in parallel. (NOTE to self, really really need to start looking at RAZOR)

Now this is all fine and dandy but there is something missing from this setup, Redhat or (RHEL). Why you ask? RHEL repositories require you to be subscribed to them in order to get the updates. You cannot mirror them like you can mirror Ubuntu, CentOS, or OEL. If you want to have a local mirror you are ardently encouraged to use Satellite server. Which is great until you start figuring out the cost of having the privilege of mirroring Redhat's package services. There are other problems with this process, mainly that the mechanics of registering a license on a server with Redhat is an automation headache. It is very possible that I am missing something in this process but frankly I do not see anything that allows this to be automated because I have about 6 billion licensing combinations to go with. (Yes, that is an exaggeration because this is a rant). The astute among you will think well, Redhat has come up with using the answers file to make this process work automatically... Please note this requires Satellite.
Satellite sucks.

"But why do you think Satellite is so bad?". First the cost of Satellite is something like $10K. This isn't much for an enterprise and you would be right, it is not. I have no problem with the cost of the server itself, but wait their is more. $10K gets you the server but you need another subscription for each server you have managed by it. It's not a little bit either. Last quote I had was close to $200 to $250. I hope price has changed and I know the advent of the virtual machine and having RHEL be your Hypbervisor has mucked with how things are but really? $200 per server + $10K for Satellite + Hardware needed to run it locally + Subscription = I can pay for a good Sysadmin and do it all with CentOS or Ubuntu or Debian. This has not even touched on it's usability or lack their of. Redhat would do well to hire a UI designer and process engineer to stream line the workflow for managing it's systems.

I want to like Redhat. I really really do but OEL hosts the current repo for you to mirror. It's licenses are cheaper and it's basically the same as RHEL. Yes, Redhat made it first and Oracle just added their secret sauce for Oracle stuff but OEL is like CentOS but with support. Oracle's Linux engineers are some decent guys and I find them fairly easy to work with. (Oracle, your application support sucks. Sucks so bad I would probably only call support if I was drunk. Regretfully I don't drink so it makes your support process very hard.)

Redhat has great people but they really need to figure out this pricing and licensing/subscription game. As it stands now it is more pain than it is worth using.