Discovery

Discovery in Choria is the system used to address nodes on the Choria network. For example when issuing the command choria req service restart service=httpd -C apache I am saying all machines tagged class with apache should have their httpd service restarted.

In the example above we discovered all nodes matching a specific criteria. This document will explore discovery in depth.

Reliable or Best Efforts

Conceptually when managing large, dynamic, fleets of nodes or IoT devices it’s very hard to maintain an up to date and richly decorated metadata service that could be queried about a near real time state of the network. This is because the network is constantly in flux. Users turning IoT devices on or off, administrators in other teams doing maintenance or hardware failing, disks filling up, there is always some number of transient and emergent behaviours in any sizable fleet.

A key concept of Choria is to be able to manage what is there now, without trying to access things that are not. Contrast this with a webservers.txt style file used to drive discovery, it’s very hard to be accurate always and when trying to ssh to a disconnected host many seconds are wasted on machines that will never work.

The native Choria discovery system is tailored to this dynamic world, it can discover nodes and operate on the ones that’s there now. This promotes a style of administration where you build a backplane based on a continuous control loop of discover -> query -> remediate. A down machine is down, but if it ever comes back your loop will pick it up and remediate anything that requires it. It’s done efficiently and without wasting time on down machines.

In other scenarios where you do know and do care for a known set of machines - like when deploying a new version of your software that requires careful orchestration with database upgrades and more - it’s vital to know what should be there and to know when a machine that should be there is not, or failed. Choria supports this by querying databases like PuppetDB or flat files or its own fleet inventory format.

In a sense this is like UDP and TCP, dynamic discovery does the best it can with what it has and lets you build resilient long term management backplanes that is stable in the face of uncertainty while the ability to query a data store for what should be there lets you build management systems with appropriate feedback when needed.

Choria supports a rich feature set in both these modes of discovery.

Node Metadata

Choria maintains 2 sets of metadata about any machine. Nodes can be tagged using a list of items - we call them classes as borrowed from Puppet - this is simply a free-form list of words like webserver, database, app_server or Puppet classes like profile::acme_server. In the typical case this is supplied by Puppet but it is simply a text file with 1 word per line, you can source this from anywhere.

In addition to the classes we support facts, facts are YAML or JSON data for example {“country”:“uk”}. We support arbitrarily nested data such as those produced by facter, you can supply any JSON or YAML structure.

Every single node maintains its own metadata store internally, requests directed at the fleet are evaluated against this metadata store before they are acted on.

Choria supports publishing the metadata it holds on a regular basis using a feature called registration, we have data adapters for NATS Streaming Server and NATS JetStream to make this data available to 3rd parties for analysis.

Querying node metadata

Tip

Most examples shown were made using our Vagrant Demo environment

Given a fleet of Choria nodes we provide CLI tools and RPC APIs to query the metadata the network holds.

First let’s look at one specific node using the choria inventory command:

$ choria inventory choria0.choria
Inventory for choria0.choria

  Choria Server Statistics:

                    Version: 0.99.0.20210115
                 Start Time: 2021-01-15 14:19:17 +0000 UTC
                Config File: /etc/choria/server.conf
                Collectives: mcollective
            Main Collective: mcollective
...

  Configuration Management Classes:

    settings                          default
    roles::managed                    profiles::common
    mcollective                       mcollective::plugin_dirs
    mcollective::config               mcollective::facts
    mcollective_data_sysctl           mcollective_agent_shell
    mcollective_agent_process         mcollective_agent_nettest
    mcollective_agent_bolt_tasks      mcollective_choria
    mcollective_agent_puppet          mcollective_agent_service
    mcollective_agent_package         mcollective_agent_filemgr
    mcollective_util_actionpolicy     choria
    choria::repo                      choria::install
    choria::config                    choria::scout_checks
    choria::service                   choria::scout_metrics
    prometheus                        prometheus::node_exporter
    systemd                           systemd::systemctl::daemon_reload
    systemd::journald

  Facts:

    {
      "aio_agent_version": "6.19.1",
      "architecture": "x86_64",
      "os": {
        "architecture": "x86_64",
        "family": "RedHat",
        "hardware": "x86_64",
        "name": "CentOS",
        "release": {
          "full": "7.8.2003",
          "major": "7",
          "minor": "8"
        },
      },
    }

Above you see a truncated view of a node, it shows all tagged classes and all facts about this specific node. In this case data is from facter and puppet.

We can do a fleet wide report of a specific fact using the choria facts command, this is a real time view of the network, effectively treating the network as a data source:

$ choria facts os.name --nodes
Report for fact: os.name

  CentOS found 3 times

    choria0.choria
    choria1.choria
    puppet.choria


Finished processing 3 / 3 hosts in 2.456s

Here –nodes ask it to show the matching node identities, –table will show tabular markdown format data and –json will output structured data.

Under the cover this is using the rpcutil agent get_fact` action, the raw data can be seen using choria req rpcutil get_fact fact=os.name.

Note the os.name, this is a GJSON Path Syntax query across the nested facts, you can go deep into arrays and hashes using this format.

Discovering nodes using basic attributes

Armed with knowledge of what is out there, and a way to see a fleet wide report of available values we can now look at how we can select nodes as targets for orchestration tasks.

In the section I will use choria discover to show matching nodes, these arguments are accepted on almost all choria and mco commands that operate on groups of nodes.

Basic tagged class discovery using -C, any nodes with exactly that class.

$ choria discover -C choria
choria1.choria
choria0.choria
puppet.choria

Class discovery supports regular expressions, so -C /choria|mcollective/ would work too.

Basic fact discovery using -F, any nodes with exactly that fact.

$ choria discover -F os.name=centos
puppet.choria
choria0.choria
choria1.choria

Fact discovery supports a number of operators, !=, <=, >=, >, <, =~ and is data type aware. Regular expression are supported as os.name=~/entos/, matching is not case-sensitive.

Boolean and discovery combining class and facts using -W. This is a quick way to just combine multiple class and fact discoveries:

$ choria discover -W "choria os.name=centos"
choria1.choria
choria0.choria
puppet.choria

This also supports regular expressions -W “/^c/ os.name=~c.nt.s”.

These queries are all supported by our native Choria network based discovery and PuppetDB based discovery, meaning you can use them in a reliable or best-efforts basis.

PuppetDB PQL

PuppetDB has a rich query language called PQL, we support executing PQL queries as discovery as long as those queries return just lists of certname.

$ choria discover -I 'pql:nodes[certname] { certname ~ ".choria" }' --dm=puppetdb

Compound Filters

Things get more interesting when we look at something called Compound Filters. This is a new feature in the latest Choria Server. Previously MCollective had Compound Filters, but we’ve had to change the language to one that’s more extendable and will grow with us.

We use a library called expr with its own Language Definition, we augment this with GJSON based lookup for nested data to create something that can really go deep into your infrastructure.

These do not support querying PuppetDB - they only work with the network based discovery and the inventory file based discovery.

First a basic case - we want all machines for a certain customers staging environment or all machines with prometheus installed.

$ choria discovery -S '(with("customer=acme") && with("environment=staging")) || with("/prometheus/")'

Within the expressions we have defined some variables:

Variable Description
agents List of known agents
classes List of classes this machine belongs to
facts Facts for the machine as raw JSON

And we made a few functions available:

Function Description
with Equivalent of a -W filter - class and fact matches combined with regular expression support
fact Retrieves a fact from the nested fact data using GJSON path syntax
include Checks if an array includes a specific element

We can go really deep as here:

with('apache') and                              # class or agent 'apache'
  with('/t.sting/') and                         # class or agent regex match 't.sting'
  with('fnumber=1.2') and                       # fact fnumber with a float value equals 1.2
  fact('nested.string') matches('h.llo') and    # lookup a fact 'nested.string' and regex match it with 'h.llo'
  include(fact('sarray'), '1') and              # check if the 'sarray' fact - a array of strings - include a value '1'
  include(fact('iarray'), 1)                    # check if the 'iarray' fact - a array of ints - include a value 1

Here the include command is a basic function to check if an array contains something and fact() just looks up a fact using GJSON.

Lets dig deep into some data. Here’s a typical facter networking fact (truncated):

networking:
  interfaces:
  eth0:
    bindings:
      - address: 10.0.2.15
        netmask: 255.255.255.0
        network: 10.0.2.0
    bindings6:
      - address: fe80::5054:ff:fe4d:77d3
        netmask: 'ffff:ffff:ffff:ffff::'
        network: 'fe80::'
    dhcp: 10.0.2.2
    ip: 10.0.2.15
    ip6: fe80::5054:ff:fe4d:77d3
    mac: 52:54:00:4d:77:d3
    mtu: 1500
    netmask: 255.255.255.0
    netmask6: 'ffff:ffff:ffff:ffff::'
    network: 10.0.2.0
    network6: 'fe80::'
    scope6: link
  eth1:
    bindings:
      - address: 192.168.190.5
        netmask: 255.255.255.0
        network: 192.168.190.0
    bindings6:
      - address: fe80::a00:27ff:fe2b:7d42
        netmask: 'ffff:ffff:ffff:ffff::'
        network: 'fe80::'
    ip: 192.168.190.5
    ip6: fe80::a00:27ff:fe2b:7d42
    mac: '08:00:27:2b:7d:42'
    mtu: 1500
    netmask: 255.255.255.0
    netmask6: 'ffff:ffff:ffff:ffff::'
    network: 192.168.190.0
    network6: 'fe80::'
    scope6: link

The problem is machines can all have different NIC names - even multiple NICs and bonds - and any NIC can have many IP addresses bound. We would though love to discover all machines that on any binding belongs to a specific network:

$ choria discover -S 'include(fact("networking.interfaces.*.bindings.#.network"), "10.0.2.0")'
choria1.choria
choria0.choria
puppet.choria

We get the all the networks across all NICs and all Bindings using GJSON, and then we check if any of them equals the 10.0.2.0 address.

It can be a bit tricky to get this right, we’ll add some tooling to help try out a few queries in a future release.

Data Providers

Data Providers expose real time state from the running node to the discovery system, today we support choria, config_item and scout Data Provider, in time users will be able to provide their own.

A Data Provider can be queried in a Compound filter when using the broadcast or mc discovery methods, here we find all machines where a specific Scout check is not OK:

$ choria req service restart service=foo -S 'scout("check_foo").state != "OK"'

We can also discover all nodes connected to a specific broker and restart them (if provisioning is enabled):

$ choria req choria_provision restart token=s3cret splay=10 -S 'choria().connected_broker match "choria1.example.net"'

The data that a Data Provider provides can be seen using the CLI:

choria req rpcutil get_data source=choria -I choria1.choria
Discovering nodes using the mc method .... 1

1 / 1    0s [====================================================================] 100%

dev1.devco.net

        agents_count: 13
       classes_count: 95
         config_file: /etc/choria/server.conf
    connected_broker: nats://puppet.choria:4222
    expired_messages: 0
   filtered_messages: 1
    invalid_messages: 0
      machines_count: 17
     passed_messages: 15
        provisioning: false
      reply_messages: 14
      total_messages: 16
              uptime: 6010
      valid_messages: 15


Finished processing 1 / 1 hosts in 780ms

In time users will be able to add their own Data Providers, for the moment these are all that is available.

Some plugins take a query, like the scout one, here we find all nodes that had a CRITICAL state at any time in their last 10 checks:

$ choria discover -S '"CRITICAL" in scout("check_puppet_run").history'

Using RPC queries for discovery

Also in the latest Choria release we support the ability for the choria req command to do some Powershell inspired chaining of queries. This is also a feature MCollective had, one that required the jgrep utility to be installed, in Choria we will use our new expr based infrastructure to avoid this extra dependency.

Let’s say we have a scenario where a specific version of PHP causes a problem, and we need to restart Apache nightly to deal with that leak while a better solution is found. We don’t want to restart the entire fleet and while we could query the fleet it would be a bunch of awkward jq to get this all working into a flat file of affected nodes.

We can though do a rpc query, filter its results using expr and then use the filtered result set as discovery source for a follow up rpc call.

$ choria req package status package=php --json --filter-replies 'ok() && data("ensure")=="5.3.3-49.el6"' | \
     choria req service restart service=httpd

Here we use expr and the –filter-replies option to select out of all the replies received where the response indicated that the package status was successfully obtained and where the returned data key ensure matches a specific version.

We then pipe that result set as json into the choria req service to restart httpd on the matching machines.

Within the expression we have a few variables:

Variable Description
msg The Statusmsg of the RPC reply
code The Statuscode of the RPC reply as an integer

Within the expression we have a few functions, more might be added later:

Function Description
ok() If the status code is mcorpc.OK (0)
aborted() If the status code is mcorpc.Aborted (1)
unknown_action() If the status code is mcorpc.UnknownAction (2)
missing_data() If the status code is mcorpc.MissingData (3)
invalid_data() If the status code is mcorpc.InvalidData (4)
unknown_error() If the status code is mcorpc.UnknownError (5)
data(query) Queries the reply data using GJSON Path Syntax
include(hay, needle) Looks for needle in an array
sender() The sender id where the reply is from
time() The timestamp of the reply

Today only the choria req command supports this behaviour, once we have it just right we’ll extend it to all other choria commands.

Discovery Methods

In Choria the various backends that implement discovery are called Discovery Methods, this section provides detail of each of the core ones. Most CLI tools support the --dm or --discovery-method option to pick a backend.

The following configuration options impact discovery backend selection and settings.

Configuration Flag Valid Options Description
default_discovery_method mc, broadcast, puppetdb, choria, external When not specified on the CLI, this will be used
discovery_timeout Integer in seconds How long discovery is allowed to run, meaning might differ between methods
default_discovery_options Options to pass to discovery plugins unless --do is set on the CLI

broadcast or mc

This is the default method of discovery, and the only one that is supported without any external dependencies. The Choria client sends an empty message with just a filter attached, all nodes that match the filter responds. We gather the replies, and those are the discovered nodes.

The discovery_timeout is how long the client waits for responses from the fleet after publishing the message asking for responses.

Supported Filters: Class, Agent, Identity, Facts, Compound and Combined

puppetdb or choria

This method makes a request to PuppetDB with a PQL query structured to find nodes matching the filter query.

Configuration Flag Valid Options Description
plugin.choria.puppetdb_host puppet.example.net The hostname where PuppetDB can be found
plugin.choria.puppetdb_port 8080 The port the PuppetDB server listen on
plugin.choria.srv_domain example.net When SRV lookups are enable, the domain to find PuppetDB in
plugin.choria.use_srv true Enable or Disable SRV lookups

When SRV lookups is enabled PuppetDB is resolved using a _x-puppet-db._tcp.example.net query.

flatfile

The flatfile discovery method supports a number of file related discovery sources, you enable it by passing --nodes to commands that supports that.

If an -I filter is supplied on the CLI only nodes matching that filter will be picked out of the file, regular expressions are supported.

It accepts the following run-time options, for example choria req rpcutil ping --do 'filter=groups.#(name=="linux").targets'

Option Description
filter GJSON path syntax query to dig into complex JSON or YAML
file Sets the file to use
format Sets the format - json, yaml, yml, rpc, txt

Text files

The most basic format is just a file with names like this:

web1.example.net
web2.example.net

A command like choria req rpcutil ping --nodes list.txt will parse a file like above and discover those nodes.

JSON and YAML Files

When the --nodes argument ends in json, yaml or yml this format is automatically chosen, below are valid files.

---
- web1.example.net
- web2.example.net
[
  "web1.example.net",
  "web2.example.net"
]

These files are not particularly useful, but lets look at a more complex example:

groups:
  - name: linux
    targets:
      - target1.example.com
      - target2.example.com
    config:
      transport: ssh
  - name: windows
    targets:
      - target3.example.com
      - target4.example.com
    config:
      transport: winrm

We can discover all the targets in the linux group using this command:

$ choria discover --nodes bolt.yaml --do 'filter=groups.#(name=="linux").targets'
target1.example.com
target2.example.com

The filter here ues GJSON path syntax

inventory

Choria supports inventory files that holds within them full facts, agent lists, classes lists and collective membership information, enough to build rich discovery supporting our full feature set including Compound filters (without Data Providers).

Additionally, uniquely, Inventory files can hold named searched allowing you to save an often used set of discovery filters by name and reuse it.

Inventory files can be very large, an inventory of 10 000 nodes can take 180MB on disk as JSON data. While we think this feature is good for inventories with a few thousand nodes, a 10 000 node inventory works and can perform full compound search across all nodes in a few seconds.

Usage

General usage of Choria remains the same, all types of filter work including -S or Compound filters. There’s one additional behaviour and that is if you filter for -I group:named_group the plugin will execute the stored discovery query against it’s data.

Configuration

Configuration Flag Valid Options Description
plugin.choria.discovery.inventory.source ~/choria-inventory Path to the Inventory file

It accepts the following run-time options, for example choria req rpcutil ping --do 'filter=groups.#(name=="linux").targets'

Option Description
file The file to load the inventory from, overriding the configured setting
noverify When set to any value will disable JSON Schema based verification

Inventory Format

The inventory format is a YAML or JSON file that is strictly validated before use. File names must end in .yaml, .yml or .json.

Tip

We publish a JSON Schema of the inventory, you can configure your editor to validate and assist your editing of this file

$schema: https://choria.io/schemas/choria/discovery/v1/inventory_file.json

groups:
  - name: acme
    filter:
      facts:
        customer=acme
  - name: all
    filter:
      identities:
        - /./

nodes:
  - name: a1.example.net
    collectives:
      - acme
      - mcollective
    facts:
      customer=acme
    classes:
      apache
  - name: w1.example.net
    collectives:
      - widgets
      - mcollective
    facts:
      customer=widgets
    classes:
      mysql

This file will allow for commands like, essentially all discovery features will work:

  • all nodes in the widgets collective - choria discover -T widgets
  • all nodes for customer widgets - choria discover -F customer=widgets
  • all nodes for customer acme by using the node group acme - choria discover -I group:acme

When maintaining this command, after editing I suggest running choria tool inventory --validate inventory.yaml to make sure it’s well formed.

Creating the inventory

Creating this file can be a pain, especially keeping it up to date. We have some initial tooling to help you with this, with more to come.

You can ask Choria to create or update this file for you, for example, if we just want to build the file based on an initial query we can do:

$ choria tool inventory inventory.yaml -W customer=acme
Discovering nodes .... 19

19 / 19    0s [====================================================================] 100%

Wrote 19 nodes to /home/rip/inventory.yaml

This will write a full inventory for all the discovered nodes and add an all group. Once you’ve made some changes to the file, like add more groups for example, Choria can update the file for you:

$ choria tool inventory inventory.yaml --update

This will do a fresh network query for node metadata, update the file and leave your groups alone.

Node this tool is doing discovery and then a rpcutil#inventory RPC request. Being that this is discovery driven you can discover nodes from any supported source, for example:

$ cat nodes.txt
a1.example.net
a2.example.net
$ choria tool inventory inventory.yaml --update --nodes nodes.txt

And, for full inception, you can use the current file as discovery - meaning it will try to update all the nodes in the file with new data:

$ choria tool inventory inventory.yaml --update --dm inventory --do file=inventory.yaml

Choria Results

Finally, this discovery method is also the one implementing the RPC Request Chaining for data on the CLI

external

The external method allow you to implement a Discovery Method using any programming language, and the Choria Client will execute your plugin when needed.

Configuration Flag Valid Options Description
plugin.choria.discovery.external.command /some/command The command to run for discovery

It accepts the following run-time options, for example choria req rpcutil ping --do command=/another/command

Option Description
command Path to an alternative command overriding what is configured

Additionally, any other options, excluding command, that gets passed this way will be sent to the external plugin in the request.

When run the command will be executed as command <request file> <reply file> io.choria.choria.discovery.v1.external_request, the following environment variables will be set:

Variable Description
CHORIA_EXTERNAL_REQUEST Where the request in JSON format can be found
CHORIA_EXTERNAL_REPLY Where to write the response
CHORIA_EXTERNAL_PROTOCOL io.choria.choria.discovery.v1.external_request

The request will look like this:

{
  "$schema": "https://choria.io/schemas/choria/discovery/v1/external_request.json",
  "protocol": "io.choria.choria.discovery.v1.external_request",
  "filter": {
    "fact": [{"fact": "country", "operator": "==","value": "mt"}],
    "cf_class": [],
    "agent": ["rpcutil"],
    "compound": [],
    "identity": []
  },
  "options": {},
  "collective": "mcollective",
  "timeout": 2
}

And the response can be either:

{
  "protocol": "io.choria.choria.discovery.v1.external_reply",
  "nodes": ["n1.example.net"]
}

If there is a failure you can return:

{
  "error": "Error shown to user"
}

For Golang the External, for Python use the py-mco-agent package can be used to easily implement a discovery source.