choria.io/blog

Autonomous Agents

Posted on May 13, 2019 | 6 minutes | R.I.Pienaar

Today we’re launching a significant new feature that allow you to create a kind of automation that run on your nodes and do not need RPC interactions to initiate actions. We call it Choria Autonomous Agents or Choria Machine, it’s available as a preview feature in Choria Server 0.11.0

These run forever and continuously interact with your node, they keep working if the node is disconnected from the middleware and do not require a central component to function.

This release is a feature preview release, there are significant shortcomings and missing features but it’s already functional. We are launching this feature very early to solicit feedback and ideas to help us prioritize future work.

Overview

The typical orchestrations that people have done with MCollective or Choria has always taken the form of a conductor that tells the fleet what to do every step of the way.

This works fine for a lot of things especially if you use features like Sub Collectives to create isolated network-near groups where you run a daemon that orchestrates just the little cluster. It’s work so well in fact that this has always just been acceptable.

Unfortunately there are number of draw backs to this:

It requires a lot of network traffic as one entity communicate constantly with the fleet
It does not scale to complex tasks
The orchestrator, network and brokers are all single point of failures, any failure anywhere means the managed component is not managed anymore
The central orchestrator can get very complex as it might need to maintain lots of state for every node

Mark Burgess has a little anecdote about this, the Mayor of a city does not constantly tell every street sweeper where and how to do their job, the sweepers are trained to do their thing on their own and so a city scales by applying this concept on every level.

For years I have tried to build some form of autonomous agent that let us describe a system being managed and it will constantly be managed. The conceptual component is a Finite State Machine - nothing new about this - but I always had concerns about visibility and operability. Recent advances in tools like Prometheus, but also my own work in events from the Choria daemons, have made this much more viable.

I think of this a bit like a Kubernetes Operator but for anything in any environment.

[Read More]

machines

February 2019 Releases

Posted on March 4, 2019 | 2 minutes | R.I.Pienaar

I typically release around the 20th of the month, this one was a bit delayed while I worked with the NATS project on some problems we encountered. Nothing major in these releases as I have been traveling and working on a large implementation.

Some work that is not mentioned here is that I am reworking my Choria network load tester tool, this essentially allow you to use lets say 20 AWS instances to run a Choria network of 15 000 nodes. It does this by starting multiple Choria Servers on a single node in Go routines and connecting them to the network in various formations. This is ongoing, reach out to me if anyone has interest in this tool. This focus is mainly to assist me in testing the upcoming NATS 2.0 release for uptake into the Choria Broker.

For Puppet users there is a potential big change to look out for, Choria has a stated goal of:

Choria sets up the popular Action Policy based authorization and does so in a default deny mode which means by default, no-one can make any requests

There was a problem though in that any modules that had no explicit policies would end up being in default allow mode, this addressed across a few of these updates so you might need to keep an eye on this in your environment.

Special thanks to Romain Tartière and Konrad Scherer for their contributions during this cycle.

[Read More]

releases

Centralised AAA

Posted on January 23, 2019 | 7 minutes | R.I.Pienaar

Choria is a very loosely coupled system with no central controller and in fact no shared infrastructure other than a middleware that is completely “dumb”. What this means is there is no per request processing anywhere centrally other than just to shift the packets. No inventory databases, user databases or other shared infrastructure to scale or maintain - though several integration options exist should you choose to do so.

There are many reasons for this - in a large scale environment there are always things broken and automation systems should do their best to keep working even in the face of uncertainty. This design extends from the servers, middleware all the way to the client code. The loosely coupled design ensures that what can be managed will be managed.

This is generally fine and works within my design parameters and goals. For the client though in enterprise environments this is problematic:

Enterprises are heavily invested in SSO and entitlement based flows for permissions
Enterprises and regulated environments have strong requirements for auditing to centralized systems
Certificate management for individual users is a nearly impossible hurdle to scale

So today I would like to present a new extension point that allow you to fully centralize AAA for the Choria CLI.

[Read More]

security

Limiting Clients to IP Ranges

Posted on January 17, 2019 | 4 minutes | R.I.Pienaar

The upcoming set of releases have a strong focus on security. We will introduce a whole new way to build centralized AAA if a site desires and a few smaller enhancements. One of these enhancements is the ability to limit where clients can be used on your network.

Today the security model allow anyone with the correctly issued and signed certificates to make client requests from anywhere on your network. This is generally fine as the certificates are not to be shared, however there are concerns that there might be rogue clients on your network perhaps outside of your update strategy or just as a form of shadow orchestration system. You could also have concerns about the fact that using just a server certificate one can read all replies the entire network sends that might contain sensitive information.

If you have this concern the upcoming version 0.10.0 of the Choria Broker will include the ability to limit what networks clients can come from.

[Read More]

security

Choria Lifecycle Events

Posted on January 3, 2019 | 10 minutes | R.I.Pienaar

Events are small JSON documents that describe an event that happens in a system. Events come in many forms but usually they indicate things like startup, shutdown, aliveness, problems or major completed tasks. They tend to be informational and so should be considered lossy - in other words do not expect to get a shutdown event for every shutdown that happens, some kinds of shutdown can prevent it from reaching you. Likewise startups where the middleware connection is flakey.

These events come in many flavours and there are not really many standards around for this stuff, one effort cloudevents from the CNCF looks to be on a good path and once things mature we’ll look to adopt them as the underlying format for our lifecycle messages too.

In Choria we call these Lifecycle Events. I recently released an initial version 1.0.0 of the package that manages these, this post will introduce what we have today and what we use them for.

These kinds of event allow other tools to react to events happening from Choria components, some uses:

Create a dashboard of active versions of a component by passively observing the network - use startup, shutdown and alive events.
React to nodes starting up by activating other orchestration systems like continuous delivery systems
React to a specific component starting up and provision them asap

There are many other cases where an event flow is useful and in time we will add richer event types.

Today Choria Server, Choria Backplane and Choria Provisioner produce events while Choria Provisioner consumes them. We are a bit conservative with when and where we emit them as the clusters we support can be in the 50k node range we need to consider each type of event and the need for it carefully.

Read on for full details.

[Read More]

lifecycle

CfgMgmtCamp 2019

Posted on December 29, 2018 | 2 minutes | R.I.Pienaar

I will be giving a talk at the 2019 installment of CfgMgmtCamp in Ghent on 4 to 6th February 2019

The talk will be focussed on Choria Data Adpaters, NATS Streaming, metadata and will discuss the design of the Choria Stream Replicator.

I’ll hopefully also show off something new I’ve been hacking on on and off!

The CFP submission can be seen below the fold, I hope to see many Choria users there!

[Read More]

talks

Choria Server 0.9.0

Posted on December 28, 2018 | 3 minutes | R.I.Pienaar

Today I released version 0.9.0 of the Choria Server along with an update to the Ruby plugin for MCollective.

This is a significant milestone release that give us full support for custom Certificate Authorities including chains of Intermediates. The Choria Provisioner supports requesting CSR’s from nodes and supplying those nodes with signed certs and you can integrate it with any CA with an API of your choosing.

We’ve also fixed some bugs, tweaked some things and generally iterated ever forward.

[Read More]

releases server

Puppet 6 Support

Posted on December 1, 2018 | 2 minutes | R.I.Pienaar

Back in July 2018 Puppet Inc officially announced that The Marionette Collective was being deprecated and will not be included in the future Puppet Agent releases.

This presented a problem for us as we relied on this packaging to install mcollective, services and its libraries. We would now have to do all this ourselves.

At the same time I was working on the Choria Server and giving it backward compatibility capabilities (still in progress to hit 100%) so we couldn’t support Puppet 6 on release day.

Today we published a bunch of releases and as of version 0.12.0 of the choria/choria release we support Puppet 6 out of the box.

[Read More]

releases server puppet

Mass Provisioning Choria Servers

Posted on August 13, 2018 | 7 minutes | R.I.Pienaar

The Choria Server is the agent component of the Choria Orchestrator system, it runs on every node and maintains a connection to the middleware.

Traditionally we’ve configured it using Puppet along with its mcollective compatibility layer. We intend to keep this model for the foreseeable future. Choria Server though has many more uses – it’s embeddable so can be used in IoT, tools like our go-backplane, side cars in kubernetes in more. In these and other cases the Puppet model do not work:

You do not have CM at all
You do not own the machines where Choria runs on, you provide a orchestration service to other teams
You are embedding the Choria Server in your own code, perhaps in a IoT device where Puppet does not make sense
Your scale makes using Puppet not an option
You wish to have very dynamic decision making about node placement
You wish to integrate Choria into your own Certificate Authority system

In all these cases there are real complex problems to solve in configuring Choria Server. We’ve built a system that can help solve this problem, it’s called the Choria Server Provisioner and this post introduce it.

[Read More]

operations provisioning

50 000 node network

Posted on March 7, 2018 | 5 minutes | R.I.Pienaar

I’ve been saying for a while now my aim with Choria is that someone can get a 50 000 node Choria network that just works without tuning, like, by default that should be the scale it supports at minimum.

I started working on a set of emulators to let you confirm that yourself – and for me to use it during development to ensure I do not break this promise – though that got a bit side tracked as I wanted to do less emulation and more just running 50 000 instances of actual Choria, more on that in a future post.

Today I want to talk a bit about a actual 50 000 real nodes deployment and how I got there – the good news is that it’s terribly boring since as promised it just works.

[Read More]

operations scaling