Docker Deployment Nightmares
Earlier this week a client of ours started a 36 hour deployment. Maintenance pages went up and the site went dark. This particular overhaul took down the internal billing and CRM system for over 24 hours in which no sales nor customer account changes could take place. This deployment, handled by another IT vendor of our client, is powered by docker and containerized services. It really made me wonder what they were doing behind the scenes that could take that long. For a quick comparison, Google, AWS and Digital Ocean all allow you to do rolling blue/green deployments in seconds. Google containers, for instance, give you a GUI in which you can deploy an update to a portion of your servers to test prior to a full rollout, all in just a few seconds. Tools, like our Perspective settings management tool, allow for instantaneous feature rollout and updates without code deployments.
In the case of this particular deployment, I like to imagine what this vendor could possibly be doing to deploy docker changes over 36 hour. Perhaps, they are hosting their Docker instances in the backs of a herd of large, majestic, artificial elephants. These nomadic, graceful yet elusive creatures must first be corralled from their home among the desert and led to a nearby charging station. This can take upwards of 12 hours. Upon arrival, the extremely delicate work of disassembling their hide to expose their server racks begins, another 6 hours. Next, the physical server blades can each be replaced with new server blades containing the updated software versions over 12 hours. Then the beautiful beasts can be reassembled and released back into the wild totaling 36 hours. Roll back plans? These are magical creatures. There are no roll back plans.
More seriously, that last part is completely true. Even in the event of catastrophic system failures, the plan is to troubleshoot on the spot until resolved. For major errors, they will be prioritized into another release within a week or so. Minor errors get backlogged and prioritized as if it was any other feature request. This is the case even when the client owns the code, servers and DevOps delivery pipeline. They simply outsource it to this vendor.
In the case of the mythical AI elephants, I can start to comprehend why it would take 36 hours to deploy an update on a technology stack designed for zero downtime deployments. And that leads to the critical message of this post that all leaders, inside and outside of I.T. should be aware of: if your deployments are not near zero downtime, you are not doing it right. In addition, if your vendor is not being straight forward or claiming ?proprietary? when you ask about how code you paid for is being deployed to servers you own, it should raise some eyebrows.
Reducing deployment times relates back to basic DevOps principles that have been around for a considerable amount of time. If you have any questions on how to get there, feel free to reach out and we would be glad to provide some guidance. Contact us here.
In the case of this particular deployment, I like to imagine what this vendor could possibly be doing to deploy docker changes over 36 hour. Perhaps, they are hosting their Docker instances in the backs of a herd of large, majestic, artificial elephants. These nomadic, graceful yet elusive creatures must first be corralled from their home among the desert and led to a nearby charging station. This can take upwards of 12 hours. Upon arrival, the extremely delicate work of disassembling their hide to expose their server racks begins, another 6 hours. Next, the physical server blades can each be replaced with new server blades containing the updated software versions over 12 hours. Then the beautiful beasts can be reassembled and released back into the wild totaling 36 hours. Roll back plans? These are magical creatures. There are no roll back plans.
More seriously, that last part is completely true. Even in the event of catastrophic system failures, the plan is to troubleshoot on the spot until resolved. For major errors, they will be prioritized into another release within a week or so. Minor errors get backlogged and prioritized as if it was any other feature request. This is the case even when the client owns the code, servers and DevOps delivery pipeline. They simply outsource it to this vendor.
In the case of the mythical AI elephants, I can start to comprehend why it would take 36 hours to deploy an update on a technology stack designed for zero downtime deployments. And that leads to the critical message of this post that all leaders, inside and outside of I.T. should be aware of: if your deployments are not near zero downtime, you are not doing it right. In addition, if your vendor is not being straight forward or claiming ?proprietary? when you ask about how code you paid for is being deployed to servers you own, it should raise some eyebrows.
Reducing deployment times relates back to basic DevOps principles that have been around for a considerable amount of time. If you have any questions on how to get there, feel free to reach out and we would be glad to provide some guidance. Contact us here.