Harvest Templates and Metrics¶
Harvest collects ONTAP counter information, augments it, and stores it in a time-series DB. Refer ONTAP Metrics for details about ONTAP metrics exposed by Harvest.
flowchart RL
Harvest[Harvest<br>Get & Augment] -- REST<br>ZAPI --> ONTAP
id1[(Prometheus<br>Store)] -- Scrape --> Harvest
Three concepts work in unison to collect ONTAP metrics data, prepare it and make it available to Prometheus.
- ZAPI/REST
- Harvest templates
- Exporters
We're going to walk through an example from a running system, focusing on the disk
object.
At a high-level, Harvest templates describe what ZAPIs to send to ONTAP and how to interpret the responses.
- ONTAP defines twos ZAPIs to collect
disk
info- Config information is collected via
storage-disk-get-iter
- Performance counters are collected via
disk:constituent
- Config information is collected via
- These ZAPIs are found in their corresponding object template file
conf/zapi/cdot/9.8.0/disk.yaml
andconf/zapiperf/cdot/9.8.0/disk.yaml
. These files also describe how to map the ZAPI responses into a time-series-friendly format - Prometheus uniquely identifies a time series by its metric name and optional key-value pairs called labels.
Handy Tools¶
- dasel is useful to convert between XML, YAML, JSON, etc. We'll use it to make displaying some of the data easier.
ONTAP ZAPI disk example¶
We'll use the bin/harvest zapi
tool to interrogate the cluster and gather information about the counters. This is one way you
can send ZAPIs to ONTAP and explore the return types and values.
bin/harvest zapi -p u2 show attrs --api storage-disk-get-iter
Output edited for brevity and line numbers added on left
The hierarchy and return type of each counter is shown below. We'll use this hierarchy to build a matching Harvest
template.
For example, line 3
is the bytes-per-sector
counter, which has an integer value, and is the child
of storage-disk-info > disk-inventory-info
.
To capture that counter's value as a metric in a Harvest, the ZAPI template must use the same hierarchical path. The matching path can be seen below.
building tree for attribute [attributes-list] => [storage-disk-info]
1 [storage-disk-info] - *
2 [disk-inventory-info] -
3 [bytes-per-sector] - integer
4 [capacity-sectors] - integer
5 [disk-type] - string
6 [is-shared] - boolean
7 [model] - string
8 [serial-number] - string
9 [shelf] - string
10 [shelf-bay] - string
11 [disk-name] - string
12 [disk-ownership-info] -
13 [home-node-name] - string
14 [is-failed] - boolean
15 [owner-node-name] - string
16 [disk-raid-info] -
17 [container-type] - string
18 [disk-outage-info] -
19 [is-in-fdr] - boolean
20 [reason] - string
21 [disk-stats-info] -
22 [average-latency] - integer
23 [disk-io-kbps] - integer
24 [power-on-time-interval] - integer
25 [sectors-read] - integer
26 [sectors-written] - integer
27 [disk-uid] - string
28 [node-name] - string
29 [storage-disk-state] - integer
30 [storage-disk-state-flags] - integer
Harvest Templates¶
To understand templates, there are a few concepts to cover:
There are three kinds of information included in templates that define what Harvest collects and exports:
- Configuration information is exported into the
_labels
metric (e.g.disk_labels
see below) - Metrics data is exported as
disk_"metric name"
e.g.disk_bytes_per_sector
,disk_sectors
, etc. Metrics are leaf nodes that are not prefixed with a ^ or ^^. Metrics must be one of the number types: float or int. - Plugins may add additional metrics, increasing the number of metrics exported in #2
A resource will typically have multiple instances. Using disk as an example, that means there will be one disk_labels
and a metric row per instance. If we have 24 disks and the disk template lists seven metrics to capture, Harvest will
export a total of 192 rows of Prometheus data.
24 instances * (7 metrics per instance + 1 label per instance) = 192 rows
Sum of disk metrics that Harvest exports
curl -s 'http://localhost:14002/metrics' | grep ^disk | cut -d'{' -f1 | sort | uniq -c
24 disk_bytes_per_sector
24 disk_labels
24 disk_sectors
24 disk_stats_average_latency
24 disk_stats_io_kbps
24 disk_stats_sectors_read
24 disk_stats_sectors_written
24 disk_uptime
# 192 rows
Read on to see how we control which labels from #1 and which metrics from #2 are included in the exported data.
Instance Keys and Labels¶
- Instance key - An instance key defines the set of attributes Harvest uses to construct a key that uniquely identifies
an object. For example, the disk template uses the
node
+disk
attributes to determine uniqueness. Usingnode
ordisk
alone wouldn't be sufficient since disks on separate nodes can have the same name. If a single label does not uniquely identify an instance, combine multiple keys for uniqueness. Instance keys must refer to attributes that are of typestring
.
Because instance keys define uniqueness, these keys are also added to each metric as a key-value pair. ( see Control What Labels and Metrics are Exported for examples)
- Instance label - Labels are key-value pairs used to gather configuration information about each instance. All of the
key-value pairs are combined into a single metric named
disk_labels
. There will be onedisk_labels
for each monitored instance. Here's an example reformatted so it's easier to read:
disk_labels{
datacenter="dc-1",
cluster="umeng-aff300-05-06",
node="umeng-aff300-06",
disk="1.1.23",
type="SSD",
model="X371_S1643960ATE",
outage="",
owner_node="umeng-aff300-06",
shared="true",
shelf="1",
shelf_bay="23",
serial_number="S3SENE0K500532",
failed="false",
container_type="shared"
}
Harvest Object Template¶
Continuing with the disk example, below is the conf/zapi/cdot/9.8.0/disk.yaml
that tells Harvest which ZAPI to send to
ONTAP (storage-disk-get-iter
) and describes how to interpret and export the response.
- Line
1
defines the name of this resource and is an exact match to the object defined in yourdefault.yaml
orcustom.yaml
file. Eg.
# default.yaml
objects:
Disk: disk.yaml
- Line
2
is the name of the ZAPI that Harvest will send to collect disk resources - Line
3
is the prefix used to export metrics associated with this object. i.e. all metrics will be of the formdisk_*
- Line
5
the counter section is where we define the metrics, labels, and what constitutes instance uniqueness - Line
7
the double hat prefix^^
means this attribute is an instance key used to determine uniqueness. Instance keys are also included as labels. Uuids are good choices for uniqueness - Line
13
the single hat prefix^
means this attribute should be stored as a label. That means we can include it in theexport_options
section as one of the key-value pairs indisk_labels
- Rows 10, 11, 23, 24, 25, 26, 27 - these are the metrics rows - metrics are leaf nodes that are not prefixed with a ^ or ^^. If you refer back to the ONTAP ZAPI disk example above, you'll notice each of these attributes are integer types.
- Line 43 defines the set of labels to use when constructing the
disk_labels
metrics. As mentioned above, these labels capture config-related attributes per instance.
Output edited for brevity and line numbers added for reference.
1 name: Disk
2 query: storage-disk-get-iter
3 object: disk
4
5 counters:
6 storage-disk-info:
7 - ^^disk-uid
8 - ^^disk-name => disk
9 - disk-inventory-info:
10 - bytes-per-sector => bytes_per_sector # notice this has the same hierarchical path we saw from bin/harvest zapi
11 - capacity-sectors => sectors
12 - ^disk-type => type
13 - ^is-shared => shared
14 - ^model => model
15 - ^serial-number => serial_number
16 - ^shelf => shelf
17 - ^shelf-bay => shelf_bay
18 - disk-ownership-info:
19 - ^home-node-name => node
20 - ^owner-node-name => owner_node
21 - ^is-failed => failed
22 - disk-stats-info:
23 - average-latency
24 - disk-io-kbps
25 - power-on-time-interval => uptime
26 - sectors-read
27 - sectors-written
28 - disk-raid-info:
29 - ^container-type => container_type
30 - disk-outage-info:
31 - ^reason => outage
32
33 plugins:
34 - LabelAgent:
35 # metric label zapi_value rest_value `default_value`
36 value_to_num:
37 - new_status outage - - `0` #ok_value is empty value, '-' would be converted to blank while processing.
38
39 export_options:
40 instance_keys:
41 - node
42 - disk
43 instance_labels:
44 - type
45 - model
46 - outage
47 - owner_node
48 - shared
49 - shelf
50 - shelf_bay
51 - serial_number
52 - failed
53 - container_type
Control What Labels and Metrics are Exported¶
Let's continue with disk
and look at a few examples. We'll use curl
to examine the Prometheus wire format that
Harvest uses to export the metrics from conf/zapi/cdot/9.8.0/disk.yaml
.
The curl below shows all exported disk metrics. There are 24 disks on this cluster, Harvest is collecting seven
metrics + one disk_labels + one plugin-created metric, disk_new_status
for a total of 216 rows.
curl -s 'http://localhost:14002/metrics' | grep ^disk | cut -d'{' -f1 | sort | uniq -c
24 disk_bytes_per_sector # metric
24 disk_labels # labels
24 disk_new_status # plugin created metric
24 disk_sectors # metric
24 disk_stats_average_latency # metric
24 disk_stats_io_kbps # metric
24 disk_stats_sectors_read # metric
24 disk_stats_sectors_written # metric
24 disk_uptime # metric
# sum = ((7 + 1 + 1) * 24 = 216 rows)
Here's a disk_labels
for one instance, reformated to make it easier to read.
curl -s 'http://localhost:14002/metrics' | grep ^disk_labels | head -1
disk_labels{
datacenter = "dc-1", # always included - value taken from datacenter in harvest.yml
cluster = "umeng-aff300-05-06", # always included
node = "umeng-aff300-06", # node is in the list of export_options instance_keys
disk = "1.1.13", # disk is in the list of export_options instance_keys
type = "SSD", # remainder are included because they are listed in the template's instance_labels
model = "X371_S1643960ATE",
outage = "",
owner_node = "umeng-aff300-06",
shared = "true",
shelf = "1",
shelf_bay = "13",
serial_number = "S3SENE0K500572",
failed = "false",
container_type = "",
} 1.0
Here's the disk_sectors
metric for a single instance.
curl -s 'http://localhost:14002/metrics' | grep ^disk_sectors | head -1
disk_sectors{ # prefix of disk_ + metric name (line 11 in template)
datacenter = "dc-1", # always included - value taken from datacenter in harvest.yml
cluster = "umeng-aff300-05-06", # always included
node = "umeng-aff300-06", # node is in the list of export_options instance_keys
disk = "1.1.17", # disk is in the list of export_options instance_keys
} 1875385008 # metric value - number of sectors for this disk instance
Number of rows for each template = number of instances * (number of metrics + 1 (for <name>_labels row) + plugin additions)
Number of metrics = number of counters which are not labels or keys, those without a ^ or ^^
Common Errors and Troubleshooting¶
1. Failed to parse any metrics¶
You add a new template to Harvest, restart your poller, and get an error message:
WRN ./poller.go:649 > init collector-object (Zapi:NetPort): no metrics => failed to parse any
This means the collector, Zapi NetPort
, was unable to find any metrics. Recall metrics are lines
without prefixes. In cases where you don't have any metrics, but still want to collect labels, add
the collect_only_labels: true
key-value to your template. This flag tells Harvest to ignore that you don't have
metrics and
continue. Example.
2. Missing Data¶
- What happens if an attribute is listed in the list of
instance_labels
(line 43 above), but that label is missing from the list of counters captured at line 5?
The label will still be written into disk_labels
, but the value will be empty since it's missing. e.g if line 29 was
deleted container_type
would still be present in disk_labels{container_type=""}
.
Prometheus Wire Format¶
https://prometheus.io/docs/instrumenting/exposition_formats/
Keep in mind that Prometheus does not permit dashes (-
) in labels. That's why Harvest templates use name replacement
to convert dashed-names to underscored-names with =>
. e.g. bytes-per-sector => bytes_per_sector
converts bytes-per-sector
into the Prometheus accepted bytes_per_sector
.
Every time series is uniquely identified by its metric name and optional key-value pairs called labels.
Labels enable Prometheus's dimensional data model: any combination of labels for the same metric name identifies a particular dimensional instantiation of that metric (for example: all HTTP requests that used the method POST to the /api/tracks handler). The query language allows filtering and aggregation based on these dimensions. Changing any label value, including adding or removing a label, will create a new time series.
<metric_name>{<label_name>=<label_value>, ...} value [ timestamp ]
- metric_name and label_name carry the usual Prometheus expression language restrictions
- label_value can be any sequence of UTF-8 characters, but the backslash (), double-quote ("), and line feed (\n) characters have to be escaped as \, \", and \n, respectively.
- value is a float represented as required by Go's ParseFloat() function. In addition to standard numerical values, NaN, +Inf, and -Inf are valid values representing not a number, positive infinity, and negative infinity, respectively.
- timestamp is an int64 (milliseconds since epoch, i.e. 1970-01-01 00:00:00 UTC, excluding leap seconds), represented as required by Go's ParseInt() function