streams

Stream Replicator 0.5.0

Posted on March 16, 2022 | 3 minutes | R.I.Pienaar

In the past we’ve had a project called Stream Replicator that was used to copy data between independent NATS Streaming Server instances. I’ve needed an updated version of this, see the full text for links to a brand new ground-up rewrite of this tool that support JetStream.

At a basic level the system simply takes all data in one Stream found in a cluster and copies it all to another stream in, potentially, another cluster. We maintain order and, it’s a long-running process so the 2 streams are kept up to date.

The streams can have different configurations - different storage types, different retention periods, different replication factors and more, even different subject spaces.

That’s the easy part, the harder part is where we meet some Choria specific needs. Choria Servers support sending chunky packets of metadata containing lots of metrics and metadata about the nodes. In a single Data Center sending this data frequently all is fine, but when you are operating 100s of thousands of nodes no central metadata store can realistically keep up with the demand this place on it. At a medium scale this can equal to many MB/sec.

The Choria Stream Replicator supports inspecting the data streams, tracking individual senders and sampling data out of the stream - sending data for any given node once per hour, while the node itself publishes every 5 minutes.

Using this method one can construct tree structures - city-level data centers feeding regional aggregators which in turn feed a large central data store. 5 minute freshness at the city level, 30 minutes at the region and hourly at the central.

Further, while replicating and sampling data the Stream Replicator will track nodes and send small advisories about new nodes, nodes not recently seen and nodes that are deemed retired. By ingesting these advisories regionally or centrally a real time view of global node availability can be built without the cost of actually processing node level data.

Choria Fleet Streams

Given the above diagram, the Replicator supports:

Choria Fleet Nodes publish their metadata every 300 seconds
Choria Data Adapters place the data in the CHORIA_REGISTRATION stream with per-sender identifying information
Stream Replicator reads all messages in the CHORIA_REGISTRATION Stream
Sampling is applied and advisories are sent to the CHORIA_REGISTRATION_ADVISORIES stream about node movements and health
Sampled Fleet Node metadata is replicated to central into the CHORIA_REGISTRATION stream
All advisories are replicated to central into the CHORIA_REGISTRATION_ADVISORIES stream without sampling

I gave a talk detailing this pattern at Cfgmgmt Camp 2019 that might explain the concept further.

This is quite niche stuff, though the Replicator would be generically useful, it’s tailored to the needs of our Large Scale Choria Deploy reference architecture.

[Read More]

releases streams

Provisioning HA and Security

Posted on August 13, 2021 | 5 minutes | R.I.Pienaar

The Choria Provisioner is a niche component that can onboard Choria Servers into a Choria environment without needing Puppet or other CM. I often refer to this as light-bulb mode, ie. a IoT device style on-boarding rather than traditional CM.

I’ve written in the past about this in Mass Provisioning Choria Servers for background.

Today I want to talk about upcoming changes to significantly improve this process from a security and reliability perspective and talk a bit about what is next.

Read on for more details.

[Read More]

operations provisioning streams

Introducing Choria Streams

Posted on August 5, 2021 | 7 minutes | R.I.Pienaar

Choria Broker is based on the excellent NATS Server technology, this technology has been instrumental to moving Choria from its MCollective roots where 1 000 managed nodes required a big hardware investment to where we are today with a $40 Linode being enough to manage 50 000 nodes in an easy to manage and run single binary package.

NATS Server recently introduced a new capability called NATS JetStream and today I want to show a bit where we are with making that available to Choria users as Choria Streams.

JetStream is a Streaming Server that uses a WAL to create an append-only log of messages. Messages get stored to disk or memory, can be replicated within a cluster and can later be consumed by different consumers using any of the 40+ programming languages supported by NATS.

By embedding this technology in the Choria Broker we enable a number of use cases around our Metadata processing features, Autonomous Agents, CloudEvents as produced by Choria Scout, and we also introduce 2 major new features: Choria Key-Value Store and Choria Concurrency Governor.

This will all be available in our upcoming 0.23.0 release.

Read the full entry for an overview of where we are.

[Read More]

streams