Discover tools and frameworks in the DevOps landscape

Posted on

​I’m going to need you to learn all of these by Monday morning. Thanks!​


There are new awesome tools and frameworks being released everyday. This is an open and transparent attempt at aggregating all those. The entire source code and list of tools can be found at github.com/devopsbookmarks, and you are encouraged to contribute anything you come across.

Pick tags to start exploring!​​​


AWS Best Practices – Enterprise Perspective

Posted on

Amazon AWS has changed the IaaS game for startups and growth companies. There are several practices we must acknowledge the importance of during the adoption and implementation of AWS services. Readers are more than welcome to comment, suggest modifications and even add their own little practices they follow during roll outs and implementations. As clients start piloting cloud initiatives it is best to avoid common pitfalls. The below best practices are a good first step in that effort. Below we have compiled a top ten list to be expanded in the future of best practices we have learned during our aws deployments. We look forward to expanding this list in the future.

1.   Choose your VPC infrastructure carefully

a. As you move into the AWS environment often the first thing firms want to do is create a VPC, the question is what type of architecture is appropriate for you?  The answer to this question is driven by the intended use case.  Often internet based organizations will opt to go with a VPC containing Public and Private Subnets. Opting for this type of VPC ensures the services that don’t require direct access to the internet are in a segregated subnet while locating public services access to the public subnet.

b.   Existing enterprise firms intending to burst into the cloud often don’t intend to utilize public facing services and may opt instead for private only subnet and create a secure tunnel between their enterprise and the public cloud.

c.   The architecture with the greatest flexibility tends to be a public/private subnet ensuring the public subnet is available if necessary and can remain unpopulated if not needed.

2.   CIDR Block Selection is driven by two realities: VPC address space is fixed and can’t be changed after it is created.

a.Ensure your VPC address space doesn’t overlap with your corporate space, for internet based firms this is less of an issue, however for enterprise firms this can prevent the need to rebuild the environment after a significant investment in time and resources

b. While AWS allows CIDR blocks to be as large as /16 don’t waste space, having several very large subnets reduces one flexibility as needs arise to support multi-AZ services, or other eventual requirements.  On the other hand don’t overly constrain your environment.

3.   Environment Isolation is an important consideration, the same isolation that’s present in the physical environment should be present in the cloud environment.  The cloud give firms more flexibility in implementation of this isolation, in an effort to control cost, development, integration/test and production environments can be registered under separate accounts ensuring accurate billing and cost controls.

4.   Security: Use security groups to isolate services and management rules

a. Separate public traffic from private using subnets

b. Create a SSH gateway as an entry point for SSH communication

c. Create a VPN tunnel for management access

5.   VPC/Enterprise Integration

When integrating AWS into an existing enterprise environment planning will ensure minimal rework.  Often the key consideration is assigning IP ranges to AWS that do not overlap with the corporate space.  In this way routing between the corporate LAN and AWS will be seamless.

6. IAM: The primary AWS account should be treated like the traditional root user in Linux systems.

a. Create a separate account specifically for administration, with the minimum permissions necessary to perform the task at hand.

b. Create separate accounts for each user, or service to ensure audit traceablity

c. Permissions should be assigned to groups and users to the group, this will minimize duplication of effort

d. All users should utilize strong passwords, in this case refer to your corporate password policy.  Cloud passwords should be as secure as the corporate infrastructure.

e. Don’t share credentials this will wreck audit traceability, and corporate policy, use roles to assign permissions.

f.  Rotate credentials on a regular basis, the same way corporate credentials are rotated.

7.   Disaster Recovery :Remember Disaster Recovery is designed to get backup after a failure.  Traditional enterprise environments are often limited by the question do we need to building a duplicate system for recovery or can we through vendor SLAs perform a data recovery.  In terms of the cloud enterprise customers are able to (at a lower cost) duplicate their environments across availability zones, regions and vendors to create an exceptionally resilient service offering.

a. Do create multiple availability zones.

b. Do use an automated CM tool to maintain your configurations

c. Do create snapshot of your volumes.

d. Do run drills to ensure everything operates as it was architected.

e. Do run drill more than once.

f. Do ensure everyone relevant is aware of the procedure

g. Do architect you environment with sufficient redundancy, to minimize the need recovery efforts.

8.   Security Groups: Security Groups like traditional firewalls should be set with the least permission necessary to accomplish the mission.

a. Decide on a security group methodology; by creating groups to manage access to services ie allow access to ports 80 and 443 for nodes that will be a web server.

b. Create a ssh gateway/ Bastion node and group:

i. Only allow external SSH access to the SSH Gateway

ii.Only allow local SSH access from within a security group, or from the gateway node, this minimized an attackers ability to traverse the networks between zones.

c. Define an enterprise naming convention and stick to it.

d.Utilize the ability to assign multiple security groups to a single asset (up to 5 groups per asset).

9.   Naming Conventions: AWS assets should be named in such a way they can be readily identified, a suggested standardization around the purpose of the Instance, its environment, its region, and its sequence number.  An example of this might be haddop_dn_prod_usw1_001, this represents a production Hadoop data node, in the US-West 1 region.

a. When naming an AMI its a good practice to include the creation date as part of its name, and a complete description. This will minimize confusion when selecting ami for new instances.

b. When naming security groups and key pairs, continue to use the convention purpose, environment, region, function.  For example db_prod_usw1_key or ws_prod_usw1_sg

10. Elastic Load Balancing: Use multiple availability zones to balance traffic in the event of an environment failure (this can still happen within the cloud)

a. Use Route53 to balance traffic between regions, this is not supported by ELB

b. Use ELBs for more than just web traffic, most any protocol can be supported by the ELB, ensure your service can support it

c. ELB timeout after 40 seconds, ensure your application touches the socket before then to prevent the session from timing out.




Immutable Infrastructure: 6 questions to 6 experts

Posted on

​Immutable Infrastructure is not a new concept and while there are many examples of successful implementations it would be a lie if we said that the recent hype around containers in general and Docker in particular hasn’t made the concept more widespread.

As it often happens in these cases, we see a lot of different interpretations of its meaning, benefits, challenges, and adoption paths. We asked 6 experts who have been thinking, writing and implementing Immutable Infrastructures to share their experience by answering 6 questions on the topic:

1) What does Immutable Infrastructure mean to you?
2) What’s your position on Immutable Infrastructure and why?
3) What are the main benefits you see/care about?
4) Biggest adoption challenges/things that are not there yet in your opinion?
5+6) Starting from scratch is (relatively) easy. What about those with existing systems? any hints on how others could get started moving towards an Immutable Infrastructure?

The 6 experts

  • Kief Morris: Continuous Delivery Practice Lead for Europe with ThoughtWorksin London, specializing in tools, practices, and processes for the Continuous Delivery of software
  • Andrew Phillips: heads up product management at XebiaLabs, building tools to support DevOps and Continuous Delivery
  • Florian Motlik: CTO and Founder of Codeship where he makes sure the System is up and running as well as getting it into as many hands as possible
  • Julian Dunn: engineering manager at Chef, where he helps to build products on top of the core Chef product portfolio
  • Matthew Skelton: Continuous Delivery specialist, DevOps enthusiast, and an Operability nut. He set up and co-runs both LondonCD and PipelineConf. He is co-founder and Principal Consultant at Skelton Thatcher Consulting
  • Ben Butler-Cole: likes to build systems rather than software. He has spent the last twelve years looking for ways to avoid unnecessary work. When all else fails he likes to write code in languages that haven’t been designed to hurt him. He currently works for Neo Technology.

8 Characteristics of a DevOps Organization

Posted on

A great view of how CenturyLink Cloud division talks to the 8 characteristics of a DevOps organization. I like some of the points they added to what we have already been discussing. I especially like the VP roles 😉 — but seriously, point 8 is key.

How does CenturyLink Cloud Division do DevOps

  1. Simple reporting structure. Pretty much everyone is one step away from our executive leadership. We avoid complicated fiefdoms that introduce friction and foster siloed thinking. How are we arranged? Something like this:
    Business functions like marketing and finance are part of this structure as well. Obviously as teams continue to grow, they get carved up into disciplines, but the hierarchy remains as simplistic as possible.
  2. Few managers, all leaders. This builds on the above point. We don’t really have any pure “managers” in the cloud organization. Sure, there are people with direct reports. But that person’s job goes well beyond people management. Rather, everyone on EVERY team is empowered to act in the best interest of our product/service. Teams have leaders who keep the team focused while being a well-informed representative to the broader organization. “Managers” are encouraged to build organizations to control, while “leaders” are encouraged to solve problems and pursue efficiency.
  3. Development and Operations orgs are partners. This is probably the most important characteristic I see in our division. The leaders of Engineering (that contains development) and Service Engineering (that contains operations) are close collaborators who set an example for teamwork. There’s no “us versus them” tolerated, and issues that come up between the teams – and of course they do – are resolved quickly and decisively. Each VP knows the top priorities and pain points of the other. There’s legitimate empathy between the leaders and organizations.
  4. Teams are co-located. Our Cloud Development Center in Bellevue is the cloud headquarters. A majority of our Engineering resources not only work there, but physically sit together in big rooms with long tables. One of our developers can easily hit a support engineer with a Nerf bullet. Co-location makes our daily standups easier, problem resolution simpler, and builds camaraderie among the various teams that build and support our global cloud. Now, there are folks distributed around the globe that are part of this Engineering team. I’m remote (most of the time) and many of our 24×7 support engineers reside in different time zones. How do we make sure distributed team members still feel involved? Tools like Slack make a HUGE difference, and regular standups and meetups make a big difference.
  5. Everyone looks for automation opportunities. No one in this division likes doing things manually. We wear custom t-shirts that say “Run by Robots” for crying out loud! It’s in our DNA to automate everything. You cannot scale if you do not automate. Our support engineers use our API to create tools for themselves, developers have done an excellent job maturing our continuous integration and continuous delivery capability, and even product management builds things to streamline data analysis.
  6. All teams responsible for the service. Our Operations staff is not responsible for keeping our service online. Wait, what? Our whole cloud organization is responsible for keeping our service healthy and meeting business need. There’s very little “that’s not MY problem” in this division. Sure, our expert support folks are the ones doing 24×7 monitoring and optimization, but developers wear pagers and get the same notifications if there’s a blip or outage. Anyone experiencing an issue with the platform – whether it’s me doing a demo, or a finance person pulling reports – is expected to notify our NOC. We’re all measured on the success of our service. Our VP of Engineering doesn’t get a bonus for shipping code that doesn’t work in production, and our VP of Service Engineering doesn’t get kudos if he maintains 100% uptime by disallowing new features. Everyone buys into the mission of building a differentiating, feature-rich product with exceptional uptime and support. And everyone is measured by that criteria.
  7. Knowledge resides in team and lightweight documentation. I came from a company where I wrote beautiful design documentation that is probably never going to be looked at again. By having long-lived teams built around a product/service, the “knowledge base” is the team! People know how things work and how to handle problems because they’ve been working together with the same service for a long time. At the same time, we also maintain a documented public (and internal) Knowledge Base where processes, best practices, and exceptions are noted. Each internal KB article is simple and to the point. No fluff. What do I need to know? Anyone on the team can contribute to the Knowledge Base, and it’s teeming with super useful stuff that is actively used and kept up to date. How refreshing!
  8. We’re not perfect, or finished! There’s so much more we can do. Continuous improvement is never done. There are things we still have to get automated, further barriers to break down between team handoffs, and more. As our team grows, other problems will inevitably surface. What matters is our culture and how we approach these problems. Is it an excuse to build up a silo or blame others? Or is it an opportunity to revisit existing procedures and make them better?

DevOps can mean a lot of things to a lot of people, but if you don’t have the organizational culture set up, it’s only a superficial implementation.


Continuous Delivery is a Competitive Advantage

Posted on

“Non-contributing zero.” That’s how Louis C.K. referred to the guy next to him on the airplane who griped about the Wi-Fi going out only two minutes after he found out the plane even HAD Wi-Fi.

It’s funny and it’s sad and it’s true. Customers expect awesome things all the time and right away, and it doesn’t take long for the awesome to become commonplace.

That’s why now more than ever, the ability to turn an idea into working software in short order is your competitive advantage. Even when you’re way ahead of the competition, faster time to delivery on software projects means you capitalize more quickly on market trends, customer desires and even fix security issues faster.

Imagine if you could innovate and deliver software to your customers in weeks rather than months. Now, imagine if your competition could, and you couldn’t?

Companies that are taking on the challenge of moving to a Continuous Delivery model are finding it easier than ever before to deliver innovation and value quickly, but Continuous Delivery is neither the end, nor the beginning of the journey.

What is Continuous Delivery?

The term continuous delivery refers to a set of practices that make it possible to rapidly, reliably and repeatedly release software to customers with low risk and with minimal manual intervention. These practices include automated regression testing, automated build integration, configuration management and continuous deployment.

Companies that seek to implement Continuous Delivery will have some significant obstacles to overcome in the process. In fact, they are the very reason you need Continuous Delivery. Obstacles like long, complicated testing and release processes, low code-confidence and tightly coupled architectures all work together to perpetuate longer-than-necessary software delivery cycles.

In many organizations, releasing software to production is a big, monthly event that is the culmination of weeks of planning, meetings and coordination. Because there are so many integration points and so much that can go wrong, every precaution is taken to make sure that the software to be deployed not only works, but that it won’t break something else that was already working. These can be long, stressful days of error-prone, manual processes even when nothing goes wrong.

Continuous Delivery, on the other hand, makes releases to production much easier, faster and less risky because you’ve automated nearly everything about your build, testing and deployment processes. The repeatability that this automation provides helps to ensure that new code works, existing code still works, environments are configured correctly and that your applications and services still play well together.

This is possible because with a good Continuous Integration and Configuration Management system in place, every time a developer checks code in to source control, that code is verified against a suite of new and existing automated tests. You’ll know within minutes (not days) if the code would have not worked in production.

However, any Continuous Integration system will be only as valuable as your automated tests are well-written and adequately exercise the right areas of your code. The right combination of different types of automated (and manual) tests is fundamental to the success of Continuous Delivery. But how do you ensure your automated regression tests are reliable enough?

This is where Test-Driven Development (TDD) enters the picture. TDD is the software development practice in which developers write automated tests before writing any production code. Although the practice has been around for over a decade, it is still the focus of much confusion and debate. However, when done properly, the many benefits of TDD pay off. One of those benefits is a reliable set of automated regression tests…another important step on the path to Continuous Delivery.

Getting your projects on the path to Continuous Delivery is not a quick fix. It requires a commitment to incremental improvements to your project life cycle management, developers’ skills, cross departmental communication and organizational culture. Thankfully, these challenges have been met and overcome time and again. The solutions engineers at AIM Consulting have the skill and experience it takes to get your organization “continuously delivering” innovative and valuable software solutions to your customers, so you’ll never be accused of being a zero.