Node Management
Each Choria managed node expose a RPC API accessible over the Choria network for managing the checks on the node.
Available actions include:
- checks - listing checks
- trigger - trigger an instant check
- maintenance - pause regular checks for a specific check
- resume - resume previously paused regular checks
- goss_validate - performs a goss validation on a node
In time, we will include a CLI for interacting with these APIs, today we publish a Golang API and it’s usable from the CLI.
CLI Utilities
A number of utilities have been developed to assist with interacting with the Scout nodes. These utilities are built using the API documented below.
Node Status
$ choria scout status dev1.example.net
+-----------------------+-------+------------+-------------------------------+
| NAME | STATE | LAST CHECK | HISTORY |
+-----------------------+-------+------------+-------------------------------+
| mailq | OK | 1m20s | OK OK OK OK |
| ntp_peer | OK | 1m32s | OK OK OK OK OK OK OK OK OK OK |
| pki | OK | 2m28s | OK OK OK OK OK OK OK OK OK OK |
| puppet_failures | OK | 2m3s | OK OK OK OK WA WA CR CR OK OK |
| puppet_run | OK | 24s | OK OK OK |
| swap | OK | 4m23s | OK OK OK OK OK OK OK |
| zombieprocs | OK | 2m23s | OK OK OK OK OK OK OK OK OK OK |
| goss | OK | 3m12s | OK OK OK |
| heartbeat | OK | 57s | OK OK OK OK OK OK OK OK OK OK |
+-----------------------+-------+------------+-------------------------------+
This retrieves the live status from the dev1.example.net node and shows up to 10 historical values for each check, these are retrieved directly from the node and does not require any central storage. The oldest check status is on the left of the History column.
Here we can see the puppet_failures check went into WARNING then CRITICAL before recovering.
Trigger, Maintenance and Resume checks
Checks can be triggered for immediate check, placed in maintenance which prevents further checks and scheduled for new checks after maintenance from the CLI.
$ choria scout maintenance --check check_pki
Discovering nodes .... 27
27 / 27 0s [====================================================================] 100%
Placed 27 checks into maintenance mode on 27 nodes
Finished processing 27 / 27 hosts in 847.020668ms
Options exist to select nodes based on Choria filters, limit which checks and so forth, see --help
of these commands.
RPC CLI
On the CLI the API can be accessed using the normal choria req command:
Listing checks
This is a list of running checks and their status:
$ choria req scout checks -I dev1.example.net
Discovering nodes .... 1
1 / 1 0s [====================================================================] 100%
dev1.example.net
Checks: [
{
"name": "mailq",
"start_time": 1594911784,
"state": "OK",
"status": {.....}
}
]
Finished processing 1 / 1 hosts in 1s
The status
field - not shown here - holds the full io.choria.machine.watcher.nagios.v1.state document for each check.
Triggering checks
One or all checks can be triggered on a matched node, be careful when running this without a filter as it will trigger all nodes concurrently.
$ choria req scout trigger check=mailq -I dev1.example.net
The check
argument is optional, when not given all checks will be triggered.
Maintenance
Checks may be put into maintenance mode, they will not change state or be regularly checked until resumed. We will in future support a timeout setting which will auto resume checks after a period.
$ choria req scout maintenance check=mailq -I dev1.example.net
Checks can be resumed later:
$ choria req scout resume check=mailq -I dev1.example.net
The check
argument is optional, when not given all checks will be affected
Invoking Goss validations
We support a goss
builtin for running a specific goss validation regularly, but we also support adhoc validations
via the Node API:
$ choria req goss_validate file=/etc/goss.yaml vars=/etc/goss-vars.yaml
Here we invoke goss against /etc/goss.yaml
with variables loaded from /etc/goss-vars.yaml
, both these files must
exit on the node already.
In time we will add support for sending files as a byte stream from a central location for truely adhoc validations.
Go API
We have a Golang API that can be used to create custom automation tools to manage Scout checks, here’s an example using it to trigger a check.
package main
import (
"context"
"fmt"
scoutagent "github.com/choria-io/go-choria/scout/agent/scout"
scoutclient "github.com/choria-io/go-choria/scout/client/scout"
)
func main() {
scout := scoutclient.Must()
scout.OptionIdentityFilter("dev1.example.net")
res, err := scout.Trigger().Checks([]interface{}{"mailq"}).Do(context.Background())
if err != nil {
panic(err)
}
res.EachOutput(func(r *scoutclient.TriggerOutput) {
data := &scoutagent.TriggerReply{}
err = r.ParseTriggerOutput(data)
if err != nil {
fmt.Printf("Invalid result from: %s: %s\n", r.ResultDetails().Sender(), err)
return
}
fmt.Printf("%s:\n", r.ResultDetails().Sender())
fmt.Printf("\tTransitioned: %v\n", data.TransitionedChecks)
fmt.Printf("\tSkipped: %v\n", data.SkippedChecks)
fmt.Printf("\tFailed: %v\n", data.FailedChecks)
})
}
This is the equivalent of choria req scout trigger check=mailq -I dev1.example.net.
When run this produce the following output:
$ ~/trigger
dev1.example.net:
Transitioned: [mailq]
Skipped: []
Failed: []