Labeling Controls | NoSQLBench Project (PREVIEW)

👉 NOTE: The labeling control features are new within NoSQLBench and are subject to changes.

Labeling Standards

In order to make sure that your metrics and annotations have the necessary labels so that the data is traceable, identifiable, and contextual, you may want to set some label validation rules. These rules simply say which label names are required and which are not allowed. If you have a shared system of record for your metrics with other engineers or teams, then some discipline about how these metrics are reported is always required, but not often adhered to, even if there is a shared idea of what the metrics should look like. The net effect is that some of your most useful data gets lost among and indistinguishable from a trove of other metrics data which is extraneous, or worse, your data has this effect on others'.

It is recommended that you have a labeling standard, and that is is somewhere that is easy to share, verify, and enforce. For lack of this, NoSQLBench provides a client-side mechanism to make this possible in practice, although it is opt-in and not enforced from any consuming system. It is expected that NoSQLBench will support loading any provided labeling standards from consuming system automatically in the future.

Some specific examples for why this is important:

You want to look the results of a test which were run 6 months ago, there are 37 other sets of metrics data from the same time frame. Either your data is labeled so you can distinguish the studies, or you're just out of luck. You could have simply required a single label with a unique identifier in your standard to avoid this.
You want to aggregate results across a set of nodes. Either your data is distinguished by node, or you have bad data. You could simply require a node label to be provided, within the context of your other labels. The validity of your aggregating metrics depends directly on this being stable and distinctly addressable.

Filtering and Validation

example label:

alpha=one,beta=two,gamma=three,delta=four

Labeling Specifications

There are two ways to ensure that labels you send are valid:

Adjust the provided labels to conform to the standard before sending.
Validate that the labels provided already conform to the standard before sending.

These two are distinct methods which are complimentary, and thus NB does both. This is because there are label sets which can be aligned to a standard simply by filtering, and there are other cases where this is not possible due to missing data.

Label Specs

These two method are combined in practice into one configuration parameter (for now at least), since the expression of validation and pre-filtering for that validation is effectively one and the same. This is known as the labelspec, and it is simply a one-line pattern which specifies which labels are required, which labels are disallowed, and so on.

Label Filtering

Included labels are indicated as +<labelname> or simply <labelname>.
Excluded labels are indicated as -<labelname>.
The labelname can be any valid regular expression, and is always treated as such. Bare words are simply literal expressions.
If a provided label set includes label names which are not indicated as either, then they are Included by default.

So why would you have a bare label name in lieu of + or vice-versa if they mean the same thing? The reason is that the labelname filter operates as a tri-state filter, going from left to right over the labelspec. You can include, or exclude any label name, and you can also use wildcard patterns simply by using regular expressions. Thus, a spec of lioness,-lion.* indicates that you want to allow lioness labels but disallow any other labels which start with lion. However, this does not say that you require a lioness label, which looks like this +lioness,-lion.*. In both cases, any label which is not mentioned will pass through the filter unchanged. If you wanted to reject all others, you can always end the labelspec with a -.*, such as +lioness, -lion.*,-.*, which is also duplicitous and can simply be reduced to +lioness,-.*.

Label Filtering Effects

The label filter is applied at the point of label usage on outbound signals. Metrics and Annotations both have the filter applied. The underlying label sources are not affected. When a label set is passed through the label filter, some fields may be removed. Otherwise label filtering is entirely passive. The order of labels is preserved if they came from an ordered map.

Label Validation

Label validation occurs on a label set immediately after filtering. The specification is identical, although it weighs a little more heavily in terms of side-effects for validation.

Required label names are indicated with +<labelname>
Disallowed label names are indicated with -<labelname>
Bare words are disregarded by validation.
Patterns are still applied with the expected behavior.

Label Validation Effects

Label validation occurs immediately after label filtering, and will throw an error in your test, with an error that indicates why the label set could not be validated.

This is because invalid labels will cause pain downstream. In most cases, this will manifest early in a test so that you don't have any wasted bake time. It is recommended that you always run a full end-to-end functional validation of all your test workflows before running the more intensive versions. This helps you shake out any issues with late-validation, such as a bad label set in a later stage of testing.

TBD: conditional validation

There is a very clear need to make the validation work a bit more reasonably with regard to labeling context. For example, requiring an activity label at the top level does not work well for the session annotation, nor does requiring and activity label when the scenario label is not present. There are some simple forms which can support these coherently and these will be added shortly.

sketch on conditional validation

+appname,+instance:\w+,+session,+node,activity->k,activity->dataset,activity->dimensions

appname is a required label name
instance is a required label name and its value must match regex pattern \w+, word chars only
session is a required label
node is a required label
given activity is present as a label name, then k must also be present
given activity is present as a label name, then dataset must also be present
given activity is present as a label name, then dimension must also be present

if a then b a -> b if b then c b -> c if a then b and c a -> b -> c A, thus b! and c!

A! ->b ->c a->b->c +a->b->c

Special cases

assertions:
conjunctions like tag filtering: none(), any(), all()
require a named label
require a named label with a specific value
require a named label with a pattern
require a pattern matching label name with a pattern matching value pattern
conditional

Back to top