choria.io/blog

Stream Replicator 0.5.0

Posted on March 16, 2022 | 3 minutes | R.I.Pienaar

In the past we’ve had a project called Stream Replicator that was used to copy data between independent NATS Streaming Server instances. I’ve needed an updated version of this, see the full text for links to a brand new ground-up rewrite of this tool that support JetStream.

At a basic level the system simply takes all data in one Stream found in a cluster and copies it all to another stream in, potentially, another cluster. We maintain order and, it’s a long-running process so the 2 streams are kept up to date.

The streams can have different configurations - different storage types, different retention periods, different replication factors and more, even different subject spaces.

That’s the easy part, the harder part is where we meet some Choria specific needs. Choria Servers support sending chunky packets of metadata containing lots of metrics and metadata about the nodes. In a single Data Center sending this data frequently all is fine, but when you are operating 100s of thousands of nodes no central metadata store can realistically keep up with the demand this place on it. At a medium scale this can equal to many MB/sec.

The Choria Stream Replicator supports inspecting the data streams, tracking individual senders and sampling data out of the stream - sending data for any given node once per hour, while the node itself publishes every 5 minutes.

Using this method one can construct tree structures - city-level data centers feeding regional aggregators which in turn feed a large central data store. 5 minute freshness at the city level, 30 minutes at the region and hourly at the central.

Further, while replicating and sampling data the Stream Replicator will track nodes and send small advisories about new nodes, nodes not recently seen and nodes that are deemed retired. By ingesting these advisories regionally or centrally a real time view of global node availability can be built without the cost of actually processing node level data.

Choria Fleet Streams

Given the above diagram, the Replicator supports:

Choria Fleet Nodes publish their metadata every 300 seconds
Choria Data Adapters place the data in the CHORIA_REGISTRATION stream with per-sender identifying information
Stream Replicator reads all messages in the CHORIA_REGISTRATION Stream
Sampling is applied and advisories are sent to the CHORIA_REGISTRATION_ADVISORIES stream about node movements and health
Sampled Fleet Node metadata is replicated to central into the CHORIA_REGISTRATION stream
All advisories are replicated to central into the CHORIA_REGISTRATION_ADVISORIES stream without sampling

I gave a talk detailing this pattern at Cfgmgmt Camp 2019 that might explain the concept further.

This is quite niche stuff, though the Replicator would be generically useful, it’s tailored to the needs of our Large Scale Choria Deploy reference architecture.

[Read More]

releases streams

February 2022 Releases

Posted on February 23, 2022 | 6 minutes | R.I.Pienaar

It’s been almost 5 months since our last release, not because nothing has been happening but because so much has been happening, good problems to have!

So this is a bit of a massive release, however I think the bulk of the changes will not affect our typical Puppet based users.

Choria Registry

This introduces first work of a new Choria Registry. We have a long-standing pain point around managing DDL files on clients, it’s a technical requirement to describe remote services but it’s just a pain to maintain, Puppet helps but for clients in CI, desktops etc, the DDL requirement is just too much.

Choria Server now has an option to act as a Registry where it can read it’s local DDL directory and serve that up to clients on demand. When a client tries to access a new agent it has never accessed before it will ask the registry for the DDL describing that agent. It will also do so regularly to ensure the local cache is still accurate.

This means that we can now have truly single-file client deployments. With just the choria binary and a running Registry that choria client can interact with the entire fleet and do everything it wants. This is a great improvement for deployment of client machines and making Choria more generally useful without Configuration Management.

The Choria Server can be a Registry, running multiple Servers with registry enabled will create a failure tolerant HA cluster of registry servers.

This is a brand-new feature, so I am not yet documenting it publicly, but I am keen to talk to users who wish to help in validating this before we look to supporting this more widely.

Non mTLS communications

The major work here that contributed to the 20 000 line code change in Choria Server is that we now support a secure non mTLS mode of communication. This is of no consequence for Puppet users so if that’s you feel free to skip this section.

With a typical deployment we use the Puppet CA to create a fully managed and closed mTLS based network. For some enterprises replicating that with their internal PKI infrastructure is nearly impossible. So we looked to, optionally, move away from a pure mTLS mode to a mixed setup where we use ED25519 keypair and signed JWTs to provide equivalent security.

Essentially we now have formalized our use of JWT into a new tokens package where servers and clients have their own JWT. We hope to move entirely over to this model in time as we were able to create a greatly enhanced security model:

Servers are restricted to only certain collectives, attempting to enter non defined collectives will be denied by the broker
Servers are restricted to only server traffic flows. A server token cannot make a request to any other server, enforced by the broker
Servers have a default deny permission set allow specific access to Streams, Governors, Hosting Services and being able to be a Submission Server
Clients have private reply channels, clients cannot view each others replies
In addition to Open Policy Agent a set of default deny permissions allowing access to use Streams, administer Streams, use Elections, view Events, use Governors etc

Using these settings moves us to a much more secure and private setup where even between 2 Choria Users traffic is now isolated and secure and this introduces the first of a security model around our adoption of Choria Streams. We cannot replicate these policies using just certificates. We hope to move even Puppet users to this model in future but that’s a big undertaking to get right without additional services.

To enable these features one needs to deploy AAA Service and Provisioner - and both of those had recent releases supporting this mode.

As mentioned this is not really a thing that Puppet users should worry about however those in large enterprises who deploy in non-Puppet ways should keep an eye out for incoming documentation around this feature.

Package Repository Changes

As notified back in September we are moving away from Packagecloud to our own package hosting infrastructure. I am keeping the Packagecloud infrastructure up for a while but this release and all future ones will not be uploaded there to promote users moving to the new infrastructure.

Thanks to Romain Tartière, Steffy Fort, Tim Meusel and Alexander Olofsson for their contributions to this release

[Read More]

releases

Provisioning Improvements

Posted on December 8, 2021 | 3 minutes | R.I.Pienaar

The typical Choria Deployment method is to use Puppet to provisioning everything on the managed nodes. This works fine for those users, however on Large Scale this just does not really work.

Large Enterprises have a vastly varied infrastructure, and you simply do not find Puppet in use across all tiers. We therefore support provisioning Choria in a way that’s entirely configuration management free.

Essentially this is the “IoT Light-bulb” mode, you start a Choria Server and in short period of time its figured out how to provision itself, connected to the provisioning infrastructure and were on-boarded.

The Choria Provisioner can provision thousands of nodes a minute, is highly available and extendible and can integrate with enterprise CAs.

In August we blogged about some enhancements to make this processes better, today we follow up with further improvements. Read on for full details.

[Read More]

operations provisioning

AAA Improvements

Posted on December 7, 2021 | 6 minutes | R.I.Pienaar

Choria supports a distributed authentication model as well as a centralised model using our Choria AAA Service. A Puppet user uses the distribution method by default.

In distributed mode every client has a certificate, signs his request with it and the certificate becomes the identity. The servers will verify using their RPC Authorization system if that certificate (id) can perform an action.

In the centralised setup each client do not have a certificate but it has a JWT token obtained from a sign-in service often using choria login. The JWT holds the identity, policies, permissions and more. The AAA Service signs requests using its certificate allowing clients to publish signed requests. Effectively the signing step gets outsourced to a trusted 3rd party. Before signing a request a policy is evaluated on the AAA Service to determine if the request should be allowed.

The AAA Service was introduced in 2019 and we’ve improved on it in 2020 by allowing a client certificate free operation.

The Certificate Free operation was a big win, however it came at a considerable cost of requiring additional Choria Brokers to take client connections.

We made a number of improvements in Release 0.6.0, read the full entry for details.

[Read More]

security

September 2021 Releases

Posted on September 22, 2021 | 4 minutes | R.I.Pienaar

Today we’re releasing the next Choria Server and a few Puppet modules. Primarily this is a bug fix and general improvement release with few real big ticket user facing items.

We have a major breaking change relating to our Package Repositories. For most people who use our public repositories nothing will change, but those using internal mirrors should probably read the full post for details. In short, we are moving from Package Cloud to our own infrastructure hosted in EU, UK and US. Our packages and repositories are now signed using our own keys.

We’ve had some great feedback on Choria Governors and we’ve improved the CLI tooling a bit, we’ve also added a new Puppet Type and Provider to manage these. Thanks to users who have been testing these new features.

We have an opt-in new feature that should significantly improve the default broadcast based discovery system. Usually we wait for 2 seconds for discovery results, but in most cases most discovery results came in within the first few 100ms. By setting plugin.choria.discovery.broadcast.windowed_timeout=1 in your client configuration file we now do a windowed discovery that will terminate if after the last received result no more results were received in 300ms. In most cases this will be a massive improvement in UX. Please test it, we aim to flip this to default on in near future.

We’ve had a big set of refactors on the Debian packaging and should have functioning Debian Bullseye packages for this release. There’s also been a few improvements to the Debian packages in general.

We have started the process of supporting a new style of agent called a Choria Service. These services will be used to perform AAA signing over the NATS protocol, to facilitate DDL free clients thanks to central Schema Registries and more. Today this is mainly under the cover improvements but expect big changes coming soon in areas of client deployment simplification.

Thanks to Romain Tartière, Romuald Conty and Tim Meusel for their contributions to this release

[Read More]

releases

August 2021 Releases

Posted on August 24, 2021 | 5 minutes | R.I.Pienaar

This is the first release since April, and it’s a massive release bringing many enhancements and new features.

We are introducing Choria Streams - a Stream Processing framework built into the Choria Broker powered by NATS JetStream. I wrote a blog post about this Introducing Choria Streams that’s worth a read.

Additionally, we added Choria Key-Value Store, Choria Governor and Choria Message Submit all powered by Choria Streams and each in their own right a big feature.

Other major enhancements are that we now support Websockets for the network connections between Servers, Broker and Go clients.

Autonomous Agents now have a data layer meaning within an Autonomous Agent data can be fetched from stores like other Key-Value stores and this data can be accessed by Watchers at run time. We expose node facts to Autonomous Agents in the data layer. Additionally, we support watching Choria Key-Value Store for changes which updates the data layer and trigger transitions. Exec Watchers also support Governors to create orchestration-free rolling upgrades etc.

We made huge improvements to Provisioning, we blogged about this in Provisioning HA and Security. There you can also see we support Leader Election against Choria Streams as a library feature.

On the documentation front we added a big section about Choria Streams but also received permission to Open Source some documentation that shows how a very large - millions of nodes - Choria deployment might look. This is a proven design in active use in production for a few years already. We are busy building another such network at the moment, and a lot of the enhancements in Provisioning is as a result of this work. Find the document at Large Scale Design.

Thanks to Chris Boulton, Romain Tartière, Tim Meusel, Dominic Vallejo, Vincent Janelle and Franciszek Klajn for their contributions to this release

[Read More]

releases

Provisioning HA and Security

Posted on August 13, 2021 | 5 minutes | R.I.Pienaar

The Choria Provisioner is a niche component that can onboard Choria Servers into a Choria environment without needing Puppet or other CM. I often refer to this as light-bulb mode, ie. a IoT device style on-boarding rather than traditional CM.

I’ve written in the past about this in Mass Provisioning Choria Servers for background.

Today I want to talk about upcoming changes to significantly improve this process from a security and reliability perspective and talk a bit about what is next.

Read on for more details.

[Read More]

operations provisioning streams

Introducing Choria Streams

Posted on August 5, 2021 | 7 minutes | R.I.Pienaar

Choria Broker is based on the excellent NATS Server technology, this technology has been instrumental to moving Choria from its MCollective roots where 1 000 managed nodes required a big hardware investment to where we are today with a $40 Linode being enough to manage 50 000 nodes in an easy to manage and run single binary package.

NATS Server recently introduced a new capability called NATS JetStream and today I want to show a bit where we are with making that available to Choria users as Choria Streams.

JetStream is a Streaming Server that uses a WAL to create an append-only log of messages. Messages get stored to disk or memory, can be replicated within a cluster and can later be consumed by different consumers using any of the 40+ programming languages supported by NATS.

By embedding this technology in the Choria Broker we enable a number of use cases around our Metadata processing features, Autonomous Agents, CloudEvents as produced by Choria Scout, and we also introduce 2 major new features: Choria Key-Value Store and Choria Concurrency Governor.

This will all be available in our upcoming 0.23.0 release.

Read the full entry for an overview of where we are.

[Read More]

streams

New Project Visuals

Posted on May 6, 2021 | 2 minutes | R.I.Pienaar

Till recently our documentation had a mix of visual styles for diagrams - mixing icons from Cisco, AWS etc - I recently wanted to document the supported network topologies and realised I need a more unified visual style for the documentation.

For some time now I am using diagrams.net to generate diagrams for blog posts and such, this tool is ok for diagrams but what really sets it apart for me is that even when exporting a PNG file it can embed the diagram vector source in the resulting PNG image.

This means any image on the website can simply be loaded and edited as a vector in the diagram editor, this is huge for ease of maintenance of the website, docs etc.

After some googling I found the Affinity symbol set - a public domain icon set in SVG format. Using these I came up with set of on-brand colored icons for our various components you can see below.

See the full post for links to assets and libraries for diagrams.net.

[Read More]

documentation

Reducing connection overhead for branch office scenarios

Posted on April 26, 2021 | 3 minutes | Romain Tartière

Because Choria allows you to manage nodes spread all around the world, and because you might be working from your laptop, far away from the (bad) Wi-Fi access points that connects you through (bad) PLC to the (bad) internet connection from the (not bad) island you are on, you may experience inconvenient latency and unreliabilities.

The reason is quite simple: while the Choria servers maintain a permanent connection with the message broker, the Choria client has to establish a new connection with the middleware for each request. Latency and packet loss do not help with establishing TLS encrypted connections in a timely fashion.

But good news everyone! NATS — the messaging system Choria is built on — has built-in support for so-called leaf nodes which offer a solution to this problem.

[Read More]

leafnodes