choria.io/blog

Scout Components

Posted on July 3, 2020 | 6 minutes | R.I.Pienaar

Yesterday I introduced a new Choria component called Scout which helps you build scalable monitoring pipelines. Today, we’ll look a bit at what makes a Scout install and how it is built.

In a follow up post I’ll dive a bit into Autonomous Agents - an infrequently used but very powerful building block found in Choria.

[Read More]

releases scout

Introducing Choria Scout

Posted on July 2, 2020 | 4 minutes | R.I.Pienaar

Overview

We’re happy to announce a new project called Choria Scout - a highly scalable system health monitoring framework and monitoring data pipeline released under the Apache 2.0 license.

Initially we support the ability to execute Nagios compatible plugins on Choria managed nodes with results sent centrally in a standard CloudEvents format, and optionally, integrated into Prometheus.

These are framework level building blocks that will in time be used to create a full monitoring stack built on Choria technologies. Checks and value overrides can already be configured using our Puppet modules. You can also use these building blocks to build entirely custom solutions for your own needs.

Scout will be a cloud native project with central components capable of being hosted on Kubernetes and using data formats supported by commercial clouds and projects like KNative. It will have a focus on integration, open data exchange and extensibility.

Despite being cloud native we will of course support monitoring anything where Choria, or the upcoming Scout agent, can run which includes traditional baremetal, VMs, containers and pods and small devices.

[Read More]

releases scout

Choria Server 0.15.0

Posted on July 1, 2020 | 4 minutes | R.I.Pienaar

We have quite a significant release of the Choria Server today that lays ground work for an upcoming Choria monitoring pipeline.

Read the full post for the details.

[Read More]

releases

April 2020 Releases

Posted on April 19, 2020 | 2 minutes | R.I.Pienaar

It’s been quite some time since we’ve had releases and there’s been a huge list of small improvements.

Thanks to those who contributed to these releases: David Gardner, Mark Frost, Romain Tartière, Yury Bushmelev, @rjd1, Tim Meusel, Alexander Hermes, Vincent Janelle

[Read More]

releases

NATS Messaging - Part 9

Posted on April 3, 2020 | 7 minutes | R.I.Pienaar

Yesterday we made some quite minor changes to our app and got it to use JetStream on both the Producer and Consumer side. These changes solved several problems for us, like being able to restart Consumers without losing any messages.

The last remaining issue was around handling messages that fail to be delivered. Imagine the case where our disk is full on the Consumer, wouldn’t it be great if we can somehow communicate our inability to handle messages to the network and have it retry later?

That’s the role of Acknowledgements and JetStream supports several modes. Today we’ll look at those.

[Read More]

nats development architecture jetstream

NATS Messaging - Part 8

Posted on April 2, 2020 | 7 minutes | R.I.Pienaar

In our previous post, we dived a bit into JetStream API, and how to interact with it, many people would not need to know this all to get going. The CLI or Terraform management approaches would be perfectly fine. And today we’ll use the CLI rather than the API.

In this post, we’re back on our codebase, and we’ll see how we might need to change the tools to support JetStream well. To be honest, I could have made some better decisions early on about the shipper design, but that gives us more opportunity to see how some apps might need to adapt.

[Read More]

nats development architecture jetstream

NATS Messaging - Part 7

Posted on March 31, 2020 | 7 minutes | R.I.Pienaar

Yesterday we did a quick intro to JetStream, before we jump in and write some code we have to talk a bit about how to configure it via its API and how it relates to core NATS.

NATS Streaming Server, while built using the NATS broker for communication, is, in fact, a different protocol altogether. It’s relation to NATS is more like that of HTTP to TCP. It uses NATS transport, but its protocol is entirely custom and uses Protobuf messages. This design presented several challenges to users - authentication and authorization specifically were quite challenging to integrate with NATS.

NATS 2.0 brought a significant rework of the Authentication and Authorization in NATS and integrating the new world with NATS Streaming Server would have been too disruptive. Further NATS 2.0 is Multi-Tenant which NATS Streaming Server couldn’t be without a massive rework.

So JetStream was started to be a much more natural fit in the NATS ecosystem, indeed, as you saw Yesterday the log shipper Producer did not need a single line of code change to gain persistence via JetStream. Additionally, it is a comfortable fit in the Multi-tenant land of NATS 2.0. All the communication uses plain NATS protocol, and some JSON payloads in its management API.

[Read More]

nats development architecture jetstream

NATS Messaging - Part 6

Posted on March 30, 2020 | 6 minutes | R.I.Pienaar

Last week we built a tool that supported shipping logs over NATS, and consume them on a central host.

We were able to build a horizontally scalable tool that can consume logs from thousands of hosts. What we could not solve was doing so reliably since if we stopped the Consumer, we would drop messages.

I mentioned NATS does not have a persistence store today, and that one called JetStream was in the works. Today I’ll start a few posts looking into JetStream and show how we can leverage it to solve our problems.

[Read More]

nats development architecture jetstream

NATS Messaging - Part 5

Posted on March 27, 2020 | 6 minutes | R.I.Pienaar

Yesterday we wrote our first end to end file tail tool and a consumer for it. It was fatally flawed though in that we could not scale it horizontally and had to run one consumer per log file. This design won’t work for long.

The difficulty lies in the ordering of messages. We saw that NATS supports creating consumer groups and that it load-shares message handling in a random manner between available workers. The problem is if we have more than one worker, there are no longer any ordering guarantees as many workers are concurrently handling messages each processing them at different rates. We’d end up with 1 file having many writers and soon it will be shuffled.

We’re fortunate in that the ordering requirements aren’t necessary for all of the messages, we only really need to order messages from a single host. Host #1 messages can be mixed with host #2, as long as all of the host 1 messages are in the right order. This behaviour is a big plus and lets us solve the problem. We can have 100 workers receiving logs, as long as 1 worker always handles logs from one Producer.

Read on for the solution!

[Read More]

nats development architecture

NATS Messaging - Part 4

Posted on March 26, 2020 | 6 minutes | R.I.Pienaar

Previously we used the nats utility to explore the various patterns of messaging in NATS, today we’ll write a bit of code, and in the following few posts, we’ll expand this code to show how to scale it up and make it resilient.

We’ll write a tool that tails a log file and publishes it over NATS to a receiver that writes it to a file. The intent is that several nodes use this log file Publisher and a central node consume and saves the data to a file per Publisher with log rotation in the central node.

I should be clear that there are already solutions to this problem, I am not saying you should solve this problem by writing your own. It’s a good learning experience, though because it’s quite challenging to do right in a reliable and scalable manner.

Producers can be all over the world in many locations
It’s challenging to scale problem as you do not always control the rate of production and have an inherent choke point on the central receiver
You do not want to lose any logs, so we probably need persistence
Ordering matters, there’s no point in getting your logs in random order
Scaling consumers horizontally while having order guarantees is difficult, additionally you need to control who writes to log files
Running a distributed network with all the firewall implications in enterprises is very hard

So we’ll look if we can build a log forwarder and receiver that meet these criteria, in the process we’ll explore the previous sections in depth.

We’ll use Go for this but the NATS ecosystem supports almost 40 languages today, you’re spoiled for choice.

[Read More]

nats development architecture