## Choria Server and Broker 0.12.0

The next releases will start coming in over the next week or three, we’re getting going with quite a major release for the Choria Server and Broker and a few related packages, I’ll introduce some of the changes here today.

Choria Release 0.12.0 is available today, you can get it by updating your Hiera data choria::version.

## New pkcs11 Security Provider

The latest release of Choria will have a new security provider, called pkcs11! This blog post will go over how to use it in various configurations. But first, a review of what pkcs11 is and how it’s useful.

## What is pkcs11?

pkcs11 stands for “Public Key Cryptography Standard #11”. It’s a set of standards for how to interact with a cryptographic token. You may have heard of HSMs or smart cards. pkcs11 is how software interacts with these things.

## Why should I use pkcs11?

You may be compelled to use it due to the environment you work in. Yubikeys and CACs are being used more and more in large-scale environments. But it’s a good idea to investigate the use of these things if you already aren’t. The power of HSMs is that the sensitive cryptographic material is generated on the hardware and never leaves it. So instead of opening your private key file and signing hashes with it, you’re handing the hash to your Yubikey, which signs it and returns the data. There are compliance advantages too (because of the stronger security). Some HSMs are FIPS-compliant, which some computing environments require.

## NATS 2.0 Based Broker

It’s been a few months since our last releases - usually we release monthly but there’s been a delay, today we’ll give you some background on why.

We’ve been very hard at work on adopting NATS 2.0 for the Choria Network Broker. NATS 2.0 was released 5th of June 2019 and represents a huge improvement in capabilities.

The major feature is around security, NATS is now multi tenant capable which means you can create secure isolated collectives in a Choria Network Broker. This is very exciting since our subcollective, while still relevant, was always quite hard to secure.

Additionally there have been a huge shift in networking capabilities allowing new ways to form super clusters and to extend your broker foot print globally. This will allow us to look toward other models of federation rather than our own Federation Brokers.

If you read further you will find much have changed and many new features are available, however for the typical use case nothing has to change. Everything keeps working and even old clients and agents will continue to function with no configuration of behaviour change. If you do not use these features or do not need them, there is nothing new.

The challenge for us in integrating NATS 2.0 into the Choria Network Broker is to map these new capabilities onto Choria use cases and make them configurable in a way that make it comfortable within Choria.

Completing this and making everything aware of these new features is a big undertaking, we’ve been focussing on the network broker in the present push. The code is successfully running on some of my smallest clusters - around 10k nodes - and where we identified problems the NATS team have been quick to help stabilize.

As always you can expect when we release this we’ll certify it to work out of the box on 50 000 nodes. As a bonus we have found that the new Network Broker is quite a bit faster than the existing one.

Read on for the gory details!

## May 2019 Releases

This months releases come a bit late as things have been moving slow while I worked on a major new feature called Choria Autonomous Agents which releases in MVP today.

Keep an eye out for a follow up blog post that details those. Apart from that it’s just general house keeping releases.

One thing is worth pointing out: This is the last release of Choria modules that support Puppet prior to version 6

## Autonomous Agents

Today we’re launching a significant new feature that allow you to create a kind of automation that run on your nodes and do not need RPC interactions to initiate actions. We call it Choria Autonomous Agents or Choria Machine, it’s available as a preview feature in Choria Server 0.11.0

These run forever and continuously interact with your node, they keep working if the node is disconnected from the middleware and do not require a central component to function.

This release is a feature preview release, there are significant shortcomings and missing features but it’s already functional. We are launching this feature very early to solicit feedback and ideas to help us prioritize future work.

## Overview

The typical orchestrations that people have done with MCollective or Choria has always taken the form of a conductor that tells the fleet what to do every step of the way.

This works fine for a lot of things especially if you use features like Sub Collectives to create isolated network-near groups where you run a daemon that orchestrates just the little cluster. It’s work so well in fact that this has always just been acceptable.

Unfortunately there are number of draw backs to this:

• It requires a lot of network traffic as one entity communicate constantly with the fleet
• It does not scale to complex tasks
• The orchestrator, network and brokers are all single point of failures, any failure anywhere means the managed component is not managed anymore
• The central orchestrator can get very complex as it might need to maintain lots of state for every node

Mark Burgess has a little anecdote about this, the Mayor of a city does not constantly tell every street sweeper where and how to do their job, the sweepers are trained to do their thing on their own and so a city scales by applying this concept on every level.

For years I have tried to build some form of autonomous agent that let us describe a system being managed and it will constantly be managed. The conceptual component is a Finite State Machine - nothing new about this - but I always had concerns about visibility and operability. Recent advances in tools like Prometheus, but also my own work in events from the Choria daemons, have made this much more viable.

I think of this a bit like a Kubernetes Operator but for anything in any environment.

## February 2019 Releases

I typically release around the 20th of the month, this one was a bit delayed while I worked with the NATS project on some problems we encountered. Nothing major in these releases as I have been traveling and working on a large implementation.

Some work that is not mentioned here is that I am reworking my Choria network load tester tool, this essentially allow you to use lets say 20 AWS instances to run a Choria network of 15 000 nodes. It does this by starting multiple Choria Servers on a single node in Go routines and connecting them to the network in various formations. This is ongoing, reach out to me if anyone has interest in this tool. This focus is mainly to assist me in testing the upcoming NATS 2.0 release for uptake into the Choria Broker.

For Puppet users there is a potential big change to look out for, Choria has a stated goal of:

Choria sets up the popular Action Policy based authorization and does so in a default deny mode which means by default, no-one can make any requests


There was a problem though in that any modules that had no explicit policies would end up being in default allow mode, this addressed across a few of these updates so you might need to keep an eye on this in your environment.

Special thanks to Romain Tartière and Konrad Scherer for their contributions during this cycle.

## Centralised AAA

Choria is a very loosely coupled system with no central controller and in fact no shared infrastructure other than a middleware that is completely “dumb”. What this means is there is no per request processing anywhere centrally other than just to shift the packets. No inventory databases, user databases or other shared infrastructure to scale or maintain - though several integration options exist should you choose to do so.

There are many reasons for this - in a large scale environment there are always things broken and automation systems should do their best to keep working even in the face of uncertainty. This design extends from the servers, middleware all the way to the client code. The loosely coupled design ensures that what can be managed will be managed.

This is generally fine and works within my design parameters and goals. For the client though in enterprise environments this is problematic:

• Enterprises are heavily invested in SSO and entitlement based flows for permissions
• Enterprises and regulated environments have strong requirements for auditing to centralized systems
• Certificate management for individual users is a nearly impossible hurdle to scale

So today I would like to present a new extension point that allow you to fully centralize AAA for the Choria CLI.

## Limiting Clients to IP Ranges

The upcoming set of releases have a strong focus on security. We will introduce a whole new way to build centralized AAA if a site desires and a few smaller enhancements. One of these enhancements is the ability to limit where clients can be used on your network.

Today the security model allow anyone with the correctly issued and signed certificates to make client requests from anywhere on your network. This is generally fine as the certificates are not to be shared, however there are concerns that there might be rogue clients on your network perhaps outside of your update strategy or just as a form of shadow orchestration system. You could also have concerns about the fact that using just a server certificate one can read all replies the entire network sends that might contain sensitive information.

If you have this concern the upcoming version 0.10.0 of the Choria Broker will include the ability to limit what networks clients can come from.

## Choria Lifecycle Events

Events are small JSON documents that describe an event that happens in a system. Events come in many forms but usually they indicate things like startup, shutdown, aliveness, problems or major completed tasks. They tend to be informational and so should be considered lossy - in other words do not expect to get a shutdown event for every shutdown that happens, some kinds of shutdown can prevent it from reaching you. Likewise startups where the middleware connection is flakey.

These events come in many flavours and there are not really many standards around for this stuff, one effort cloudevents from the CNCF looks to be on a good path and once things mature we’ll look to adopt them as the underlying format for our lifecycle messages too.

In Choria we call these Lifecycle Events. I recently released an initial version 1.0.0 of the package that manages these, this post will introduce what we have today and what we use them for.

These kinds of event allow other tools to react to events happening from Choria components, some uses:

• Create a dashboard of active versions of a component by passively observing the network - use startup, shutdown and alive events.
• React to nodes starting up by activating other orchestration systems like continuous delivery systems
• React to a specific component starting up and provision them asap

There are many other cases where an event flow is useful and in time we will add richer event types.

Today Choria Server, Choria Backplane and Choria Provisioner produce events while Choria Provisioner consumes them. We are a bit conservative with when and where we emit them as the clusters we support can be in the 50k node range we need to consider each type of event and the need for it carefully.

Read on for full details.