Eight lessons learned hacking on GitHub Pages for six months

by github

Believe it or not, just over a year ago, GitHub Pages, the documentation hosting service that powers nearly three-quarters of a million sites, was little more than a 100-line shell script. Today, it’s a fully independent, feature-rich OAuth application that effortlessly handles well over a quarter million requests per minute. We wanted to take a look back at what we learned from leveling up the service over a six month period.

What’s GitHub Pages

GitHub Pages is GitHub’s static-site hosting service. It’s used by government agencies like the White House to publish policy, by big companies like Microsoft, IBM, and Netflix to showcase their open source efforts, and by popular projects like Bootstrap, D3, and Leaflet to host their software documentation. Whenever you push to a specially named branch of your repository, the content is run through the Jekyll static site generator, and served via its own domain.

Eating our own ice cream

At GitHub, we’re a big fan of eating our own ice cream (some call it dogfooding). Many of us have our own, personal sites hosted on GitHub Pages, and many GitHub-maintained projects like Hubot and Electron, along with sites like help.github.com, take advantage of the service as well. This means that when the product slips below our own heightened expectations, we’re the first to notice.

We like to say that there’s a Venn diagram of things that each of us are passionate about, and things that are important to GitHub. Whenever there’s significant overlap, it’s win-win, and GitHubbers are encouraged to find time to pursue their passions. The recent improvements to GitHub Pages, a six-month sprint by a handful of Hubbers, was one such project. Here’s a quick look back at eight lessons we learned:

Lesson one: Test, test, and then test again

Before touching a single line of code, the first thing we did was create integration tests to mimic and validate the functionality experienced by users. This included things you might expect, like making sure a user’s site built without throwing an error, but also specific features like supporting different flavors of Markdown rendering or syntax highlighting.

This meant that as we made radical changes to the code base, like replacing the shell script with a fully-fledged Ruby app, we could move quickly with confidence that everyday users wouldn’t notice the change. And as we added new features, we continued to do the same thing, relying heavily on unit and integration tests, backed by real-world examples (fixtures) to validate each iteration. Like the rest of GitHub, nothing got deployed unless all tests were green.

Lesson two: Use public APIs, and when they don’t exist, build them

One of our goals was to push the Pages infrastructure outside the GitHub firewall, such that it could function like any third-party service. Today, if you view your OAuth application settingsyou’ll notice an entry for GitHub Pages. Internally, we use the same public-facing Git clone endpoints to grab your site’s content that you use to push it, and the same public-facing repository API endpoints to grab repository metadata that you might use to build locally.

For us, that meant adding a few public APIs, like the inbound Pages API and outboundPageBuildEvent webhook. There’s a few reasons why we chose to use exclusively public APIs and to deny ourselves access to “the secret sauce”. For one, security and simplicity. Hitting public facing endpoints with untrusted user content meant all page build requests were routed through existing permission mechanisms. When you trigger a page build, we build the site as you, not as GitHub. Second, if we want to encourage a strong ecosystem of tools and services, we need to ensure the integration points are sufficient to do just that, and there’s no better way to do that than to put your code where your mouth is.

Lesson three: Let the user make the breaking change

Developing a service is vastly different than developing an open source project. When you’re developing a software project, you have the luxury of semantic versioning and can implement radical, breaking changes without regret, as users can upgrade to the next major version at their convenience (and thus ensure their own implementation doesn’t break before doing so). With services, that’s not the case. If we implement a change that’s not backwards compatible, hundreds of thousands of sites will fail to build on their next push.

We made several breaking changes. For one, the Jekyll 2.x upgrade switched the default Markdown engine, meaning if users didn’t specify a preference, we chose one for them, and that choice had to change. In order to minimize this burden, we decided it was best for the user, not GitHub, to make the breaking change. After all, there’s nothing more frustrating than somebody else “messing with your stuff”.

For months leading up to the Jekyll 2.x upgrade users who didn’t specify a Markdown processor would get an email on each push, letting them know that Maruku was going the way of the dodo, and that they should upgrade to Kramdown, the new default, at their convenience. There were some pain points, to be sure, but it’s preferable to set an hour aside to perform the switch and verify the output locally, rather than pushing a minor change, only to find your entire site won’t publish and hours of frustration as you try to diagnose the issue.

Lesson four: In every communication, provide an out

We made a big push to improve the way we communicated with GitHub Pages users. First, we began pushing descriptive error messages when users’ builds failed, rather than an unhelpful “page build failed” error, which would require the user to either build the site locally or email GitHub support for additional context. Each error message let you know exactly what happened, and exactly what you needed to do to fix it. Most importantly, each error included a link to a help article specific to the error you received.

Errors were a big step, but still weren’t a great experience. We wanted to prevent errors before they occurred. We created the GitHub Pages Health Check and silently ran automated checks for common DNS misconfigurations on each build. If your site’s DNS wasn’t optimally configured, such as being pointed to a deprecated IP address, we’d let you know before it became a problem.

Finally, we wanted to level up our documentation to prevent the misconfiguration in the first place. In addition to overhauling all our GitHub Pages help documentation, we reimaginedpages.github.com as a tutorial quick-start, lowering the barrier for getting starting with GitHub Pages from hours to minutes, and published a list of dependencies, and what version was being used in production.

This meant that every time you got a communication from us, be it an error, a warning, or just a question, you’d immediately know what to do next.

Lesson five: Optimize for your ideal use case, not the most common

While GitHub Pages is used for all sorts of crazy things, the service is all about creating beautiful user, organization, and project pages to showcase your open source efforts on GitHub. Lots of users were doing just that, but ironically, it used to be really difficult to do so. For example, to list your open source projects on an organization site, you’d have to make dozens of client-side API calls, and hope your visitor didn’t hit the API limit, or leave the site while they waited for it to load.

We exposed repository and organization metadata to the page build process, not because it was the most commonly used feature, but because it was at the core of the product’s use case. We wanted to make it easier to do the right thing — to create great software, and to tell the world about it. And we’ve seen a steady increase in open source marketing and showcase sites as a result.

Lesson six: Successful efforts are cross-team efforts

If we did our job right, you didn’t notice a thing, but the GitHub Pages backend has been completely replaced. Whereas before, each build would occur in the same environment as part of a worker queue, today, each build occurs in its own Docker-backed sandbox. This ensured greater consistency (and security) between builds.

Getting there required a cross-team effort between the GitHub Pages, Importer, and Security teams to create Hoosegow, a Ruby Gem for executing untrusted Ruby code in a disposable Docker sandbox. No one team could have created it alone, nor would the solution have been as robust, with out the vastly different use cases, but both products and the end user experience is better as a result.

Lesson seven: Match user expectations, then exceed them

Expectations are a powerful force. Everywhere on GitHub you can expect @mentions and emoji to “just work”. For historical reasons, that wasn’t the case with GitHub Pages, and we got many confused support requests as a result. Rather than embark on an education campaign or otherwise go against user expectations, we implemented emoji and @mention support within Jekyll, ensuring an expectation-consistent experience regardless of what part of GitHub you were on.

The only thing better than meeting expectations is exceeding them. Traditionally, users expected about a ten to fifteen minute lag between the time a change was pushed and when that change would be published. Through our improvements, we were able to significantly speed up page builds internally, and by sending a purge request to our third-party CDN on each build, users could see changes reflected in under ten seconds in most cases.

Lesson eight: It makes business sense to support open source

Jekyll may have been originally created to power GitHub Pages, but since then, it has become its own independent open source project with its own priorities. GitHubbers have always been part of the Jekyll community, but if you look at the most recent activity, you’ll notice a sharp uptick in contributions, and many new contributors from GitHub.

If you use open source, whether it’s the core of your product or a component that you didn’t have to write yourself, it’s in your best interest to play an active role in supporting the open source community, ensuring the project has the resources it needs, and shaping its future. We’ve started “open source Fridays” here at GitHub, where the entire company takes a break from the day-to-day to give back to the open source community that makes GitHub possible. Today, despite their beginnings, GitHub Pages needs Jekyll, not the other way around.

The numbers

Throughout all these improvements, the number of GitHub Pages sites has grown exponentially, with just shy of a three-quarters of a million user, organization, and project sites being hosted by GitHub Pages today.

GitHub Pages sites over time

But the number of sites tells only half the story. Day-to-day use of GitHub Pages has also seen similar exponential growth over the past three years, with about 20,000 successful site builds completing each day as users continuously push updates to their site’s content.

GitHub Pages builds per day

Last, you’ll notice that when we introduced page build warnings in mid-2014, to proactively warn users about potential misconfigurations, users took the opportunity to improve their sites, with the percentage of failed builds (and number of builds generating warnings) decreasing as we enter 2015.

GitHub Pages is a small but powerful service tied to every repository on GitHub. Deceivingly simple, I encourage you to create your first GitHub Pages site today, or if you’re already a GitHub Pages expert, tune in this Saturday to level up your GitHub Pages game.

Happy publishing!

What DevOps Needs to Know About Microservices

by Andrew Phillips

The term “microservices” is popping up everywhere in software development circles. Indeed, microservices are frequently touted as The Next Big Thing in terms of how to write applications to avoid the problems of complex, monolithic architectures.

From a Devops perspective, the microservices discussion is usually related to Docker or other containers. Both share an emphasis on lightweight individual units that can be managed independently, so containers are often regarded as a natural implementation choice for a microservice architecture. But what are microservices actually?

Microservices constitute a software architectural style that approaches a single application as a suite of small services, each running its own process and communicating with lightweight mechanisms, such as an HTTP-based API. The services are small, highly decoupled and focus on providing a single “useful” business capability. Typically, microservices involve very little centralized management, may be written in different programming languages, and use different data storage technologies.

A good way to understand the microservice style is to compare it to the monolithic style, in which an application is built as a single unit. A classic example of this approach is the “standard” three-tier enterprise application, consisting of a client-side user interface, a database, and a server-side application. Changes made to monolithic applications can be both painful and costly as any change made to a small part of the application requires the entire monolith to be rebuilt and deployed.

Microservices clearly offer many advantages — notably the ability to make small changes to code quickly and efficiently, and with minimum risk. The services are organized around business capabilities, and they lend themselves to Continuous Delivery.

Some of these advantages include the following:

Small, Easy-to-understand Code

As a microservice app is responsible for only one thing it requires little code, is easy to understand, and involves minimal risks when changes are made.      

Easy-to-scale, Easy-to-deploy and Easy-to-discard

Size is important. Microservices technology make it easy for developers to scale, deploy and discard apps and features. In contrast, a single change to a monolithic app requires changing the whole application.

Smaller Teams

It’s much easier and faster to work with a small team than a large one. Each small team can own a microservice and access other services via a high-level API.   

Facilitates System Resilience

If a monolith application stops working a lot of functionality will stop working. In contrast, if a microservices app stops working only a small, specific functionality will be lost.

However they are not without their serious challenges, especially relevant from a DevOps perspective — such as coordinating interdependent deployments, fusing developer skills with technologies, and confronting disagreements between developers. 

As developers break down monolithic applications and move complexity from within to between applications, obvious challenges come to the fore. Some of these include the following:

Coordinating Interdependent Deployments

Individual microservices are supposed to be independently deployable, but in order to provide any kind of useful functionality from the perspective of the user you typically need to get a whole set of microservices running.

Deploying microservices requires a new generation of tools that can manage a multitude of simple deployments, while tracking and managing their dependencies. These tools need to be able to bring everything together on time to deliver an app without being too rigid, for instance by supporting version ranges for services rather than fixing on individual versions. These tools should help developers to deploy different bundles of services easily, without having to ask an administrator to reconfigure the deployment tool.

Planning and configuration should be done through a single interface with a real-time view of app deployment. Essentially, developers should seek to attain the same balance of speed and efficiency that Continuous Delivery offers.

Microservices Need Skilled Developers and Operators

While there are many potential productivity boosts to separating services, tracking all of the relationships between them is vital. The team needs to know the entire development landscape and be able to test live components together.

Testing isolated services is a simple enough proposition, but testing the entire user-facing business service can be problematical.

Developer Disagreements

There will be a lot of time-consuming debate and friction when developers butt heads about the pros and cons of keeping a monolithic stack versus embracing microservices: which applications to refactor, which to rewrite, which to leave alone, etc.

The worst-case scenario: developers lose sight of the overarching system goals, focus on the service they are working on, refuse to be flexible, and resist being part of an overall team.

Finding the right set of technologies for a microservices implementation will not be a quick or simple task. As this is a fast-moving, immature space, expect to spend a lot of time assessing where the greatest value lies for each team and the business as a whole. Delivering the greatest benefits may mean adopting a hybrid system that draws on and integrates different technologies.

Welcome to the ADC (After DevOps Connect) era of DevOps and Security


I was speaking with Britta Glade of RSA Conference after our DevOps Connect conference at RSA Conference Monday.  She congratulated us on putting together a great day of tracks and sessions (kudos to Gene Kim and Josh Corman).  But then she said something else that really struck me to my core.  She said after today there is no longer any question about security working with DevOps.

Think about it.  Three years after Josh and Gene first presented Rugged DevOps at RSA Conference. After so many papers, articles, presentations and tracks, it has finally sunk in.  Security will embrace DevOps, Security will be better because of DevOps and as importantly, DevOps will be better because of Security.

It is very fulfilling to know that this event we produced (along with Mark Miller and theNexus Community) was what finally pushed this over the finish line. For me personally it really represents one of the fundamental reasons I was so attracted to DevOps.

It was 4 or so years ago that I first met Gene Kim and over a bottle or two of wine he explained to me what DevOps was about and what he was working on with the Phoenix Project.  While DevOps may seem primarily about Dev and Ops to some, for me it was about making security work better. It seemed like such a no brainer that of course security should embrace DevOps.

But it was not that easy.  The besieged security industry faced with a never ending barrage of breaches and a continually escalating threat environment could not accept that automation, acceleration, velocity would also allow us to shift security left, leaving us more secure and more compliant. Many security people dug in their heels and said no, this wouldn’t work and they didn’t take the time to really explore it.

Monday all of that changed.  Security folk came out in droves to see people like Jez Humble (many people in the audience weren’t really familiar with Jez when he first took the stage, but they were enthralled with him by the time he stepped off. Jez finished to a standing room only crowd of close to 700. They heard Damon Edwards and Alex Honor talk. Though they were also unfamiliar to the crowd, their message resonated to the core.

When Julie Tsai of Walmart took the stage the audience heard how a “unicorn” like Wal-Mart used Agile and DevOps to be more secure. Terri Pots of Raytheon and Jessica Davita of Microsoft re-enforced the message that security needs to embrace this approach. I loved Jessica’s security org chart.  Chris Corriere who writes here on DevOps.com, Dr. Aaron Cois of Carnegie-Mellon also had great sessions.

Then names familiar to the crowd, Gene Kim and Josh Corman kicked it off with a great talk on Software Supply Chains.  Nick Galbreath had a great session. David Mortman delivered a terrific talk, Dan Cornell on web app security and more.  For me the perfect ending was when my friend Rich Mogull took the stage and demonstrated his Squirrel Monkey toolset. Rich’s scripts that he wrote himself showed how we could use automation to make our security better and easier.  People were stunned.

Throughout the day, the buzz that was coming down the big hallway was that there were some great things happening over at the DevOps tracks. People from the Cloud Security Alliance meeting next door were hopping over to check it out. Other people joined in.  By the end of the day we had given out all of the materials we had prepared.

By now I know you are wishing you were there.  The good news is the entire event was videotaped. We will have videos and slides of all of the presentations shortly so stay tuned.

We are already starting to plan next year’s event. Also, the call for speakers is now open for DevOps Connect: Rugged DevOps event @InfoSecurity Europe June 4th.  But after Monday DevOps and security will never be the same again.  No longer can security deny that they must be part of DevOps. We are and should be. We are now in the ADC (After DevOps Connect) era.

The new AWS DevOps Certification

AWS launched the AWS DevOps Engineer Professional Certification

DevOps (a combination of “development” and “operations”) is a software development methodology focusing on communication, information sharing, integration, and automation between software developers and other IT departments.

The goal of DevOps is to leverage this connectivity to speed up and improve a company’s software production. Good examples of software packages that support and promote DevOps implementation across data centers and cloud infrastructures are orchestration tools like Puppet, DistelliChef, and Ansible.

The AWS DevOps certification

AWS DevOpsAs more organizations planning cloud deployments adopt the philosophies and practices of DevOps, AWS launched The AWS DevOps Engineer Professional Certification.

By requiring significant knowledge and practical experience to pass the exam, the AWS DevOps certification encourages a higher standard of practice at every stage of application development and deployment to the AWS platform.

The AWS DevOps Engineer – Professional exam page outlines the concepts you’ll need to master for this exam:

  • Implement and manage continuous delivery systems and methodologies on AWS.
  • Understand, implement, and automate security controls, governance processes, and compliance validation.
  • Define and deploy monitoring, metrics, and logging systems on AWS.
  • Implement systems that are highly available, scalable, and self-healing on the AWS platform.
  • Design, manage, and maintain tools to automate operational processes.

The AWS DevOps exam is obviously not for you if you’re new to AWS in general. They won’t even let you take the exam if you’re not already certified as an AWS Developer – Associate or AWS SysOps Administrator – Associate. And all of those certifications are built on the AWS Solutions Architect – Associatecertification.

Beyond that, Amazon also strongly advises AWS DevOps candidates to ensure they possess the following general AWS knowledge:

  • AWS Services: Compute and Network, Storage and CDN, Database, Analytics, Application Services, Deployment, and Management.
  • Minimum of two years hands-on experience with production AWS systems.
  • Effective use of Auto Scaling.
  • Monitoring and logging.
  • AWS security features and best practices.
  • Design of self-healing and fault-tolerant services.
  • Techniques and strategies for maintaining high availability.

AWS DevOps, other prerequisites

You should also have broad general IT knowledge and experience. These areas, in particular, should be very familiar to you:

  • Networking concepts.
  • Strong system administration (Linux/Unix or Windows).
  • Strong scripting skills.
  • Multi-tier architectures: load balancers, caching, web servers, application servers, databases, and networking.
  • Templates and other configurable items to enable automation.
  • Deployment tools and techniques in a distributed environment.
  • Basic monitoring techniques in a dynamic environment.

As with all their other certifications, the AWS DevOps exam has its own guide. You can also download sample questions. While, like other AWS exams, the questions are either multiple choice or multiple answer, note how long and complicated they are. They’re a very good indication of the kind of scenario-based problems you’ll face on the real exam.

The AWS DevOps costs USD 300 and, rather than 80 minutes, you’ll have a total of 170 minutes to complete the exam.  So if it isn’t already obvious, before you sit for this one, you’d better be really well prepared and have loads of AWS experience.

While Cloud Academy does not yet have courses or learning paths specifically tailored to this particular certification, some of our AWS Certification Prep programs aimed at either the AWS Certified Developer – Associate or AWS SysOps Administrator – Associate will be very helpful.

If you haven’t had a lot of exposure to DevOps practices within AWS, this exam can seem daunting. However as more and more companies choose to deploy on the cloud, they often also adopt DevOps practices to benefit from stable, secure, and predictable IT environments and increased IT efficiency.

These companies are consequently on the lookout for IT professionals with cloud computing and DevOps skills. Even if you’re not quite ready yet, keep this certification in mind. Cloud Computing DevOps skills currently seem to be a particularly rare and valuable combination whose demand is sure to grow.

Why Chef Delivery Is a Big Step Forward

By Andrew Phillips

As a company building tools for the Continuous Delivery and DevOps space, we decided early on that the real value lies in managing the application layer. This was the driver for the launch, almost 18 months ago now, of XL Release: the first Continuous Delivery Management tool designed to allow teams and businesses to focus on getting code from development out to your users faster without compromising quality.

As such, we’ve been keen observers of the evolution of the Continuous Delivery space, and have been looking forward to the much-rumoured arrival of Chef Delivery for a long time now. Finally, it’s been announced!

Chef Delivery will likely end up competing with XL Release in some scenarios, but that’s just healthy. More importantly, a new tool will bring new ideas and new users to the Continuous Delivery arena. That means more learning, improved best practices and better tooling all round, from which we all benefit. The delivery and pricing model for Chef Delivery doesn’t seem to be clear yet, but I certainly hope there will eventually be a fully-featured free version similar to XL Release’s Community Edition.

xlr-xlt-chefI’m also curious to see how quickly the Chef community will be able to respond to the new Chef CD story. Many of our users that are now using XL Release to improve their delivery process on top of Chef added pipeline orchestration only after trying to do CD with just Chef, before concluding that an additional end-to-end layer would help. The Chef Delivery launch makes it very clear that a tool dedicated to the pipeline/release process is needed on top of the underlying automation provided by tools like Chef. This will hopefully eliminate a lot of the confusion we see in the community today.

In short, I think this new emphasis on the end-to-end process is a big and timely step that will really help teams and organizations get value out of CD. But one important lesson that we learned from our XL Release users was that trying to improve the ability to ship code more quickly really doesn’t work without a greatly increased focus on testing. And that was causing problems that they weren’t able to handle with existing tooling.

More specifically, what our users were seeing is that running a growing number of automated tests more and more frequently significantly increased the challenge of visualizing and analyzing all the test results. They explained that this was often their biggest bottleneck: moving code from Dev to Prod was now a piece of cake, but it was very difficult for them to determine with confidence whether the code they were shipping was actually usable.

This is why we created XL Test, which is the first tool designed to address the problem of visualizing and analyzing all the test results that you end up with in a CD environment quickly and effectively, on a very frequent basis. If you’re trying to figure out how to make sense of all your automated test results, it’s definitely worth a try.

And of course you can get started orchestrating your Chef, Docker, Azure, mobile etc. pipelines with XL Release today too!