February 2024 Releases

After a few months break on sabbatical and other changes we’re back with some new releases.

The main focus here is Puppet 8 and Ruby 3.2 support and this has largely been a community effort which was very good to see.

While we’re moving our Puppet support forward we are also working on a future independent of Puppet. You might be aware that we have a Provisioner that can guide unconfigured Choria instances to being fully managed instances. The Provisioner though is focussed on configuration only, not on deployment of agents and other plugins.

Over the last year or so we have started designing ways to deliver Autonomous Agents at runtime, we now also support delivering External Agents in this manner. In this way by placing a manifest of agents in a Key-Value bucket Choria Server can completely deploy itself and it’s plugins. Next we’ll either turn our Ruby agents into Extenral Agents or add the same ability to Ruby agents.

This is a big milestone in being Puppet free as there are no single point that still requires Puppet to go from zero to a full Choria environment. We have a bit to go here, but it’s conceivable that we might switch our standard deploy to start using these mechanisms to deploy plugins and configure Choria.

In the end we will still support a Puppet deployment method but it will become much simpler as we will only need to do a small part of the overall heavy lifting using Puppet. We will also unlock full featured environments backed by Kubernetes and other hosting environments.

Thanks to Jeff McCune, Pieter Loubser, Trey Dockendorf, Ryan Dill, Mark Dechiaro and Romain Tartière, Vincent Janelle for their support in making these releases possible.

[Read More]

March 2023 Releases

It’s been a long time since our previous release in November as we have been working on some major architectural changes and new features.

This is quite a big release but the majority of changes will not affect Puppet users today.

New Core Contributor

We have a new Core Contributor who some of you might recognise from MCollective days, please welcome Pieter Loubser.

New Docker Registry

Due to recent changes at the official Docker Hub we now have a new Docker Registry. Please read the announcement. No more containers will be pushed to the official Docker Hub.

While implementing this change we also activated automated nightly container builds for most components.

New Project Websites

We have documentation for our overall distribution of Choria at choria.io/docs but this documentation is slanted heavily to the Puppet user - in essence choria.io documents a distribution of Choria for Puppet users. For more in-depth looks into what the components can do and features not exposed to Puppet users we launched a number of new websites:

App Builder Experiments

App Builder is our little no-code tool to build admin command line tools. We are experimenting with a project-level task mode that allow you to place a file in a directory and then run abt, which will then be a different command depending on where you are.

This is an experimental command, we’d love some feedback. It’s documented in the new Experiments page.

New Security Model and Protocol

We have introduced a new security model based on JWT files and ed25519 signatures that provide huge improvements for deployments at scale:

  • Based on ed25519 signed JWT files
  • A chain of trust between a new component called an Organization Issuer and various components allowing full delegation of credential issuance
  • Strong set of role based permissions for clients, servers, provisioners and unprovisioned nodes
  • Policy embedded in the JWT files for lighter touch server configuration - no more policy files on servers
  • Strict deny-all security on the Choria Brokers for much improved security and privacy of traffic
  • Vastly improved choria jwt command that includes monitoring of tokens and early integration with Hashicorp Vault
  • Redesigned the protocol moving on from some legacy decisions made in Marionette Collective

Longer term this will allow us to not rely on Certificate Authorities for identity and give us far greater control in how, when and for how long clients are enrolled.

In the process we had to do a lot of internal refactoring to the main Choria Framework and related systems like AAA Service, Provisioners, Replicators and more.

In time, as we complete some other efforts around delivery of RPC Agents into this system, we’ll slowly move all users over to this model. For now this should not concern Puppet users.

Minor Server Features

External Agents can now have multi-arch binaries allowing the same tar file to be deployed to a mix of servers when the agents require compilation.

We have a new output format in the choria req command enabled using the --jsonl flag, this will produce JSON Lines output for every major event. Using this high quality wrapper libraries can be created in any language quite rapidly. A ruby wrapper that supported progress bars, discovery and all other behaviors was less than 200 lines. We look forward to seeing what the Python users in our community do with this!

When a Choria Server is managed by the Choria Provisioner it now supports in-place over-the-air upgrades of itself at provisioning time.

We also landed many bug fixes and UX improvements to various choria commands.

Thanks to Romain Tartière and Pieter Loubser for their contributions to this release

[Read More]

Docker Changes

We used publish our releases to Docker Hub for some time now, we use those with Docker Compose demos and our Helm charts.

Docker have over the years been clawing back their free tiers, first starting to delete old containers which killed off many of our old artifacts and now entirely stopping the free teams tier.

We thus had to make alternate plans. We approached Container Registry about their dedicated plans, and they graciously offered us a handsome discount over their usual fees which we gladly accepted.

Container Registry is run by maintainers of the Harbor Project, so I am happy to support them in their efforts.

A significant feature of their offering is the ability to host our registry on a custom domain, this means should we have to move again in future we will hopefully do so with a smaller impact on our users as we can take our name and paths with.

As of today, 2023-03-20, we publish all our containers to our new registry at registry.choria.io and will soon move our helm charts to this registry as well as it’s a full OCI registry.

Today we publish the following releases:

We also publish Nightly builds - and these now are all automatically build and published every day and retained for around 30 days:

  • registry.choria.io/choria-nightly/choria
  • registry.choria.io/choria-nightly/aaasvc
  • registry.choria.io/choria-nightly/provisioner
  • registry.choria.io/choria-nightly/stream-replicator

Once again huge shout out to Container Registry for the discounted offer.

November 2022 Releases

It’s time for a release of the Choria Server and a few related components. We are hard at work on upgrading the core security layer and network protocol but managed to slip a few things in while that large piece of work is progressing.

The security and protocol work is part of a long term goal to not rely so much on Certificate Authorities. One of the goals is to make Choria more accessible to non Puppet users. Choria does not require Puppet but we only really support a Puppet based distribution for general use. Long term we hope this will not be the case.

For those who pay attention to such thing you might be surprised to find the change set between these releases is in the order of 12,000 LOC, but this is largely an under the covers refactor of code and new features that are under feature flags. For the typical Puppet user nothing should be changing in this regard apart from the new features we’ll call out here.

Distributed Goss Validations

We have had support for running Goss manifests as health checks for a while and we’re expanding that a bit in this release.

First here’s some Hiera data to create a goss manifest and then run that as a health check on some nodes, this will integrate with prometheus for alerting and more:

choria::scout_checks:
  check_vpnhosts:
    builtin: goss
    gossfile: /etc/choria/vpnhosts.yaml

choria::scout_gossfile:
  /etc/choria/vpnhosts.yaml:
    addr:
      tcp://10.1.1.1:80:
        reachable: true
        timeout: 500
      tcp://10.1.2.1:80:
        reachable: true
        timeout: 500

Goss is great for automated testing in CI and Monitoring but we think it’s underappreciated as a debugging tool.

Lets say you are working an outage of a service, you know the service spans many compute nodes and is made up of many components. What you want to do is an immediate, deep, health check of the entire service, deeper than an individual health check tend to do.

$ choria scout validate /etc/service/goss.yaml -W service=acme
Discovering nodes .... 10

10 / 10    0s [==========================================================] 100%

example.net: Count: 25, Failed: 1, Duration: 0.549s

X Addr: tcp://10.1.2.1:80: reachable:
Expected
bool: false
	to equal
bool: true

Nodes: 10, Failed: 1, Skipped: 0, Success: 250, Duration: 1.251s

Here we ran the /etc/service/goss.yaml manifest on all machines offering the acme service, it found 10 nodes and noted that one of the machines port 80 isn’t reachable. It ran 250 checks as this is quite a small goss file, I have manifests with 100s of resources that execute in sub 1 second.

With the abilities we are adding today you can store a Goss manifest on each machine and then invoke that in an ad-hoc manner for immediate feedback. You could store different sets of check in the manifest for different kinds of node - database, processors, API servers etc - and when invoking the check the right deep validate will be done for the kind of machine.

You can mix in variables either from your shell or from the individual nodes and more.

Manifests can be on the individual nodes as described here or sent from your local shell for truly adhoc inspections. Manifests sent from local shells can mix in per-node variables to cater for node specific differences and more.

To facilitate sharing between teams you can also store manifests in our Key-Value store, reliably stored in Choria Streams, so that any team member can access the same set of manifests without any shell setup.

It’s early days but I am quite excited for the possibilities here and will look to expand this in time with more features both client and server side.

Removal of the Security Cache

We have had a concept of a security cache, here we would store public keys of known users and should the key change the cache would deny changes. Initially we thought this would be an extra layer of security but it was largely just a huge UX failure. It made individuals using multiple machines really awkward, resulting in keys being copied between machines. Most users disabled this feature.

We have entirely removed this concept in this release as it was not useful.

Special thanks to Romain Tartière, Vincent Janelle and Alexander Olofsson for his contributions to this release!

[Read More]

August 2022 Releases

It’s time for a minor release of Choria and a few related components, not a huge release as we are preparing a big refactor but still some user facing improvements - especially to the recently added App Builder component.

New Election features

We’ve used leader elections for a while internally for various components, these are backed by Choria Streams.

In this release we added an choria election command, using this command you can:

  • Manage, view, evict elections and leaders
  • Designate a particular node as a leader by managing a file under election using choria election file
  • Run a command under leader election using choria election run

See the Election documentation.

New Governor features

Governors can traditionally only control maximum concurrent executions, use this to limit how many nodes are actively running Puppet for example.

With a small change we were able to make them support a mode of maximum executions per period.

Imagine you have 10 nodes running the same cron job, if you need the job to run only on one of the servers every hour - essentially creating a failover pool - you can use the new --max-per-period flag.

See the Governor documentation.

Go Versions

The Go team released 1.19 last night, this means going forward we will support only go 1.18 and 1.19. The github.com/choria-io/go-choria tag v0.26.1 will be the last to work on Go 1.17.

Special thanks to Tim Meusel for his contributions to this release!

[Read More]

June 2022 Releases

It’s time for our next release and this one is packed full of goodies after quite a long development period. While our previous release was more about internal plumbing this one brings exciting user visible quality of life improvements and significant new features.

App Builder

We recently introduced App Builder, a operations tool for building custom CLI commands that wrap many tools into one single app.

In Choria we extend the App Builder to have the ability to interact with Key-Value Stores, initiate RPC requests and perform discovery.

To use this simply change your app symlink from appbuilder to choria.

See the App Builder Documentation for full reference of the extensions Choria adds.

choria Command UX

We forked the Kingpin CLI parser into a Choria project called Fisk that extends Kingpin in various ways and introduce some breaking changes. We now use Fisk in all our tools.

As part of moving to Fisk the help output from choria has been changed significantly to be shorter and easier to navigate with less superfluous information on every page.

We also add a new cheat feature to the CLI that gives you access to cheat sheet style help, here’s an example:

$ choria cheat req
# further documentation https://choria.io/docs/concepts/cli/

# request the status of a service
choria req service status service=httpd

...

Run choria cheat for a list of available cheats, contributions to these would be greatly appreciated!

Improved Testing

We now have full integration testing where real running clusters are built and real network round trips are tested, this is a significant step forward in achieving reliability in the long run.

We also now do daily tests of installing Choria using Puppet on every support Linux Distro.

Operating Systems

We now publish packages for EL9 and Ubuntu 22.04. Debian packages are tagged by their distribution name to help with mirroring.

We removed Ubuntu Xenial, Debian Stretch and EL6.

Contributors

Special thanks to Jonathan Matthews, Vincent Janelle, Nicolas Le Gaillart, Lena Schneider and Romain Tartière for their contributions.

[Read More]

Introducing App Builder

Today I am pleased to announce a new operations tool, it’s a small little thing, but it can be a huge help, I hope you will love it.

Operations teams tend to use a large selection of shell scripts, piped commands, kubectl invocations and more in their day to day job.

To a large extend these are tribal knowledge and something that is a big hurdle for new members of the team. The answer is often to write wiki pages capturing run books that has these commands documented.

This does not scale well and does not stay up to date.

What if there was a CLI tool that encapsulated all of these commands in a single, easy to use and easy to discover command.

The appbuilder project lets you build exactly that by specifying a model for your CLI application in a YAML file and then building custom interfaces on the fly.

There will be 2 versions of this, the standalone version mentioned above that has only non Choria related features, we will then use it as a library in the normal Choria release where it will gain discovery, rpc and kv support. You will see that in the next Choria release (0.26.0).

There is an optional video introducing the idea behind it.

See the full entry for an example of this tool in use and what it is all about.

View the Documentation and GitHub Repo for more

[Read More]

Supported OS and Go versions update

Just a small headsup to alert users of a few changes in the Operating Systems we support.

Operating Systems

We support Operating System packages primarily for our Puppet users, and, so we tend to follow along with their deprecations.

Yesterday Puppet announced they will drop support for Ubuntu Xenial which prompted me to do the same here. Xenial has been a long and painful distribution to support, I am eager to not have to deal with it anymore. At the same time we are removing support for Debian Stretch and EL6 (though we have not done packages for EL6 for a long time). We will not build packagers for these Operating Systems and future releases will not have any RPMs or DEBs published for them.

We will in the near term future archive the repositories that was used to serve these Operating Systems.

Golang

The Golang team announced Go 1.18 this week, in line with Go support policies we therefore also dropped support for Go 1.16 and made some breaking changes that would prevent Go 1.16 from compiling the code base. Nightly Choria builds are already done using 1.18 to give us ample time to test the new compiler.

These updates will slowly ripple through Puppet modules and other related projects also.

Stream Replicator 0.5.0

In the past we’ve had a project called Stream Replicator that was used to copy data between independent NATS Streaming Server instances. I’ve needed an updated version of this, see the full text for links to a brand new ground-up rewrite of this tool that support JetStream.

At a basic level the system simply takes all data in one Stream found in a cluster and copies it all to another stream in, potentially, another cluster. We maintain order and, it’s a long-running process so the 2 streams are kept up to date.

The streams can have different configurations - different storage types, different retention periods, different replication factors and more, even different subject spaces.

That’s the easy part, the harder part is where we meet some Choria specific needs. Choria Servers support sending chunky packets of metadata containing lots of metrics and metadata about the nodes. In a single Data Center sending this data frequently all is fine, but when you are operating 100s of thousands of nodes no central metadata store can realistically keep up with the demand this place on it. At a medium scale this can equal to many MB/sec.

The Choria Stream Replicator supports inspecting the data streams, tracking individual senders and sampling data out of the stream - sending data for any given node once per hour, while the node itself publishes every 5 minutes.

Using this method one can construct tree structures - city-level data centers feeding regional aggregators which in turn feed a large central data store. 5 minute freshness at the city level, 30 minutes at the region and hourly at the central.

Further, while replicating and sampling data the Stream Replicator will track nodes and send small advisories about new nodes, nodes not recently seen and nodes that are deemed retired. By ingesting these advisories regionally or centrally a real time view of global node availability can be built without the cost of actually processing node level data.

Choria Fleet Streams

Given the above diagram, the Replicator supports:

  1. Choria Fleet Nodes publish their metadata every 300 seconds
  2. Choria Data Adapters place the data in the CHORIA_REGISTRATION stream with per-sender identifying information
  3. Stream Replicator reads all messages in the CHORIA_REGISTRATION Stream
  4. Sampling is applied and advisories are sent to the CHORIA_REGISTRATION_ADVISORIES stream about node movements and health
  5. Sampled Fleet Node metadata is replicated to central into the CHORIA_REGISTRATION stream
  6. All advisories are replicated to central into the CHORIA_REGISTRATION_ADVISORIES stream without sampling

I gave a talk detailing this pattern at Cfgmgmt Camp 2019 that might explain the concept further.

This is quite niche stuff, though the Replicator would be generically useful, it’s tailored to the needs of our Large Scale Choria Deploy reference architecture.

[Read More]

February 2022 Releases

It’s been almost 5 months since our last release, not because nothing has been happening but because so much has been happening, good problems to have!

So this is a bit of a massive release, however I think the bulk of the changes will not affect our typical Puppet based users.

Choria Registry

This introduces first work of a new Choria Registry. We have a long-standing pain point around managing DDL files on clients, it’s a technical requirement to describe remote services but it’s just a pain to maintain, Puppet helps but for clients in CI, desktops etc, the DDL requirement is just too much.

Choria Server now has an option to act as a Registry where it can read it’s local DDL directory and serve that up to clients on demand. When a client tries to access a new agent it has never accessed before it will ask the registry for the DDL describing that agent. It will also do so regularly to ensure the local cache is still accurate.

This means that we can now have truly single-file client deployments. With just the choria binary and a running Registry that choria client can interact with the entire fleet and do everything it wants. This is a great improvement for deployment of client machines and making Choria more generally useful without Configuration Management.

The Choria Server can be a Registry, running multiple Servers with registry enabled will create a failure tolerant HA cluster of registry servers.

This is a brand-new feature, so I am not yet documenting it publicly, but I am keen to talk to users who wish to help in validating this before we look to supporting this more widely.

Non mTLS communications

The major work here that contributed to the 20 000 line code change in Choria Server is that we now support a secure non mTLS mode of communication. This is of no consequence for Puppet users so if that’s you feel free to skip this section.

With a typical deployment we use the Puppet CA to create a fully managed and closed mTLS based network. For some enterprises replicating that with their internal PKI infrastructure is nearly impossible. So we looked to, optionally, move away from a pure mTLS mode to a mixed setup where we use ED25519 keypair and signed JWTs to provide equivalent security.

Essentially we now have formalized our use of JWT into a new tokens package where servers and clients have their own JWT. We hope to move entirely over to this model in time as we were able to create a greatly enhanced security model:

  • Servers are restricted to only certain collectives, attempting to enter non defined collectives will be denied by the broker
  • Servers are restricted to only server traffic flows. A server token cannot make a request to any other server, enforced by the broker
  • Servers have a default deny permission set allow specific access to Streams, Governors, Hosting Services and being able to be a Submission Server
  • Clients have private reply channels, clients cannot view each others replies
  • In addition to Open Policy Agent a set of default deny permissions allowing access to use Streams, administer Streams, use Elections, view Events, use Governors etc

Using these settings moves us to a much more secure and private setup where even between 2 Choria Users traffic is now isolated and secure and this introduces the first of a security model around our adoption of Choria Streams. We cannot replicate these policies using just certificates. We hope to move even Puppet users to this model in future but that’s a big undertaking to get right without additional services.

To enable these features one needs to deploy AAA Service and Provisioner - and both of those had recent releases supporting this mode.

As mentioned this is not really a thing that Puppet users should worry about however those in large enterprises who deploy in non-Puppet ways should keep an eye out for incoming documentation around this feature.

Package Repository Changes

As notified back in September we are moving away from Packagecloud to our own package hosting infrastructure. I am keeping the Packagecloud infrastructure up for a while but this release and all future ones will not be uploaded there to promote users moving to the new infrastructure.

Thanks to Romain Tartière, Steffy Fort, Tim Meusel and Alexander Olofsson for their contributions to this release

[Read More]