## Writing agents in any language

MCollective has for a long time been extendible using purely Ruby means. This was fine in an earlier age where it seemed Ruby was going to rule the ops space but turned out that isn’t how things ended up, and even then there were lots of real interest in extending it using Python for example.

In the last 2 weeks this conversation came up again and in many respects the situation was worse now since the new Choria Server, being written in Go, is not extendible by external plugins, we supported the old Ruby agents but that was it. We had plans to support Tengo and Yaegi as ways to add agents to Choria but neither of those got past POC.

There were major hurdles to actually doing this in the new Go system:

• no action_policy type authorization system
• no data plugins
• no data aggregators
• limited DDLs
• do not have inputs and outputs properly mapped
• have no DDL validators
• have no way to set default data in requests

These missing features all worked fine for the Ruby agents though - since the MCollective compatability layer would just start up a subset of old MCollective on demand and so gain access to its versions of all of the above features. We did not need any of it.

In the last 3 weeks we addressed most of these missing features in the pure go daemon:

• We have shellsafe, ipv4address, ipv6address, ipaddress and regex validators. Analysis showed this hits 90%+ of what was ever used
• We have data aggregators for average, summary (including booleans) and a new chart one. Analysis showed these would hit 90%+ of what was ever used
• We have greatly extended the DDLs with full input, output and aggregate awareness
• We can set defaults in both inputs and outputs and it’s all type aware
• We can do full validation of requests based on the DDLs
• We have a action_policy plugin thats 1:1 compatible except for compound statements (planned)
• We can generate ruby DDLs from JSON ones

You’ll notice we are making deliberate choices about what we support here rather than expose a 100 extension points to the user - over extendibility was a real problem in the past - we now favour a batteries included approach where what people need is always available, new validators or aggregators would be via PR submission to the Go code and everyone benefit.

Above points represent a huge push in features, at this point if we add new ways to write agents they would get all these features for free and suddenly the prospect of doing just that is a lot more palatable. But why stop at supporting a specific language like tengo? Why not support all languages - especially with new movement in things like webasm?

That’s exactly what I did in a new feature called External Agents and it will be available in the next release, read on for the full details.

## Generating DDL Files

DDL files have been a utter bane of existence for MCollective users - and since Choria provide a compatability layer likewise for Choria users. It’s not limited to Choria though most remote invocation systems have these kinds of files and honestly they are all horrible - think WSDL, OpenAPI/Swagger etc, editing and maintaining any of these is really sub optimal.

I have some plans to improve the situation, as much as we reasonably can, the first of which will land in our next release - 0.12.1 - which revolves around generating these files interactively.

## Choria Server and Broker 0.12.0

The next releases will start coming in over the next week or three, we’re getting going with quite a major release for the Choria Server and Broker and a few related packages, I’ll introduce some of the changes here today.

Choria Release 0.12.0 is available today, you can get it by updating your Hiera data choria::version.

## New pkcs11 Security Provider

The latest release of Choria will have a new security provider, called pkcs11! This blog post will go over how to use it in various configurations. But first, a review of what pkcs11 is and how it’s useful.

## What is pkcs11?

pkcs11 stands for “Public Key Cryptography Standard #11”. It’s a set of standards for how to interact with a cryptographic token. You may have heard of HSMs or smart cards. pkcs11 is how software interacts with these things.

## Why should I use pkcs11?

You may be compelled to use it due to the environment you work in. Yubikeys and CACs are being used more and more in large-scale environments. But it’s a good idea to investigate the use of these things if you already aren’t. The power of HSMs is that the sensitive cryptographic material is generated on the hardware and never leaves it. So instead of opening your private key file and signing hashes with it, you’re handing the hash to your Yubikey, which signs it and returns the data. There are compliance advantages too (because of the stronger security). Some HSMs are FIPS-compliant, which some computing environments require.

## NATS 2.0 Based Broker

It’s been a few months since our last releases - usually we release monthly but there’s been a delay, today we’ll give you some background on why.

We’ve been very hard at work on adopting NATS 2.0 for the Choria Network Broker. NATS 2.0 was released 5th of June 2019 and represents a huge improvement in capabilities.

The major feature is around security, NATS is now multi tenant capable which means you can create secure isolated collectives in a Choria Network Broker. This is very exciting since our subcollective, while still relevant, was always quite hard to secure.

Additionally there have been a huge shift in networking capabilities allowing new ways to form super clusters and to extend your broker foot print globally. This will allow us to look toward other models of federation rather than our own Federation Brokers.

If you read further you will find much have changed and many new features are available, however for the typical use case nothing has to change. Everything keeps working and even old clients and agents will continue to function with no configuration of behaviour change. If you do not use these features or do not need them, there is nothing new.

The challenge for us in integrating NATS 2.0 into the Choria Network Broker is to map these new capabilities onto Choria use cases and make them configurable in a way that make it comfortable within Choria.

Completing this and making everything aware of these new features is a big undertaking, we’ve been focussing on the network broker in the present push. The code is successfully running on some of my smallest clusters - around 10k nodes - and where we identified problems the NATS team have been quick to help stabilize.

As always you can expect when we release this we’ll certify it to work out of the box on 50 000 nodes. As a bonus we have found that the new Network Broker is quite a bit faster than the existing one.

Read on for the gory details!

## May 2019 Releases

This months releases come a bit late as things have been moving slow while I worked on a major new feature called Choria Autonomous Agents which releases in MVP today.

Keep an eye out for a follow up blog post that details those. Apart from that it’s just general house keeping releases.

One thing is worth pointing out: This is the last release of Choria modules that support Puppet prior to version 6

## Autonomous Agents

Today we’re launching a significant new feature that allow you to create a kind of automation that run on your nodes and do not need RPC interactions to initiate actions. We call it Choria Autonomous Agents or Choria Machine, it’s available as a preview feature in Choria Server 0.11.0

These run forever and continuously interact with your node, they keep working if the node is disconnected from the middleware and do not require a central component to function.

This release is a feature preview release, there are significant shortcomings and missing features but it’s already functional. We are launching this feature very early to solicit feedback and ideas to help us prioritize future work.

## Overview

The typical orchestrations that people have done with MCollective or Choria has always taken the form of a conductor that tells the fleet what to do every step of the way.

This works fine for a lot of things especially if you use features like Sub Collectives to create isolated network-near groups where you run a daemon that orchestrates just the little cluster. It’s work so well in fact that this has always just been acceptable.

Unfortunately there are number of draw backs to this:

• It requires a lot of network traffic as one entity communicate constantly with the fleet
• It does not scale to complex tasks
• The orchestrator, network and brokers are all single point of failures, any failure anywhere means the managed component is not managed anymore
• The central orchestrator can get very complex as it might need to maintain lots of state for every node

Mark Burgess has a little anecdote about this, the Mayor of a city does not constantly tell every street sweeper where and how to do their job, the sweepers are trained to do their thing on their own and so a city scales by applying this concept on every level.

For years I have tried to build some form of autonomous agent that let us describe a system being managed and it will constantly be managed. The conceptual component is a Finite State Machine - nothing new about this - but I always had concerns about visibility and operability. Recent advances in tools like Prometheus, but also my own work in events from the Choria daemons, have made this much more viable.

I think of this a bit like a Kubernetes Operator but for anything in any environment.

## February 2019 Releases

I typically release around the 20th of the month, this one was a bit delayed while I worked with the NATS project on some problems we encountered. Nothing major in these releases as I have been traveling and working on a large implementation.

Some work that is not mentioned here is that I am reworking my Choria network load tester tool, this essentially allow you to use lets say 20 AWS instances to run a Choria network of 15 000 nodes. It does this by starting multiple Choria Servers on a single node in Go routines and connecting them to the network in various formations. This is ongoing, reach out to me if anyone has interest in this tool. This focus is mainly to assist me in testing the upcoming NATS 2.0 release for uptake into the Choria Broker.

For Puppet users there is a potential big change to look out for, Choria has a stated goal of:

Choria sets up the popular Action Policy based authorization and does so in a default deny mode which means by default, no-one can make any requests


There was a problem though in that any modules that had no explicit policies would end up being in default allow mode, this addressed across a few of these updates so you might need to keep an eye on this in your environment.

Special thanks to Romain Tartière and Konrad Scherer for their contributions during this cycle.

## Centralised AAA

Choria is a very loosely coupled system with no central controller and in fact no shared infrastructure other than a middleware that is completely “dumb”. What this means is there is no per request processing anywhere centrally other than just to shift the packets. No inventory databases, user databases or other shared infrastructure to scale or maintain - though several integration options exist should you choose to do so.

There are many reasons for this - in a large scale environment there are always things broken and automation systems should do their best to keep working even in the face of uncertainty. This design extends from the servers, middleware all the way to the client code. The loosely coupled design ensures that what can be managed will be managed.

This is generally fine and works within my design parameters and goals. For the client though in enterprise environments this is problematic:

• Enterprises are heavily invested in SSO and entitlement based flows for permissions
• Enterprises and regulated environments have strong requirements for auditing to centralized systems
• Certificate management for individual users is a nearly impossible hurdle to scale

So today I would like to present a new extension point that allow you to fully centralize AAA for the Choria CLI.

If you have this concern the upcoming version 0.10.0 of the Choria Broker will include the ability to limit what networks clients can come from.