Validation configuration (admin only)

Every in-situ measurement file sumbitted to the OCDB system is validated against a list of rules before being accepted by the system. The validation rules can be freely configured by admin users using the configuration file “validation_config.json”

The validation system checks both header fields and measurement records using the rules defined in the configuration file. Each rule relates to a section in the configuration file. Also, error and warning messages can be freely configured and associated to the rules.

The configuration file contains four major sections:

{
”header”: [],
”record”: [],
”errors”: [],
”warnings”: []
}

which are explained in details in the following sections.

Records

This set of rules performs checks on the record section covering two entities:

  • field unit(s)

  • field value ranges

The list of variable names to be validated is derived from the header field “fields” in the submitted file, defining the columns of the record section. The list of acceptable units for each variable are defined in the header section “units”, as a comma separated list of unit names.

After these global checks have successfully been passed, all values are checked to be included in the defined value ranges for the variable. These checks are fill-value-aware, i.e. possible fill values as defined in the header section “missing” are treated as check-passed.

The check of the record content supports the following data types:

Number record

This set of rules checks numerical measurement variable.
Parameters are:

  • data_type: always set to “number”

  • name: the name of the variable

  • unit: the list of physical units as comma separated values. Set to “none” or “unitless” if the variable is dimensionless.

  • lower_bound: lower numerical bound for the variable values. Skip field if no lower bound is to be applied.

  • upper_bound: upper numerical bound for the variable values. Skip field if no upper bound is to be applied.

  • unit_error: the error message to be displayed in case of error related to the physical units, either direct or as a reference (see below). Unit error messages can contain the tags

    • {field_name}: resolves to the variable name

    • {unit}: resolves to the required unit(s)

    • {bad_unit}: resolves to the unit detected as incorrect

  • value_error: the error message to be displayed in case of error related to the variable values, either direct or as a reference (see below). Value error messages can contain the tags

    • {field_name}: resolves to the variable name

    • {line}: resolves to the record line number where the error was detected

    • {value}: resolves to the erroneous value

    • {lower_bound}: resolves to the lower numerical bound (if defined)

    • {upper_bound}: resolves to the upper numerical bound (if defined)

Example:

{
  "name": "adg",
  "unit": "1/m",
  "lower_bound": 0,
  "data_type": "number",
  "value_error": "@field_out_of_bounds",
  "unit_error": "@field_has_wrong_unit"
} 

Date record

This set of rules checks measurement variables containing a date value, formatted as “YYYYMMDD”. Checks that the date format is correct, month and day values are within the standard boundaries, and that the date is after the min_date defined.

Parameters are:

  • data_type: always “date”

  • name: the name of the variable

  • min_year: The lower bound of dates accepted as minimal year.

Example:

{
  "name": "date_processed",
  "min_year": 1975,
  "data_type": "date"
}

Time record

This set of rules checks measurement variables defining a time, formatted as “hh:mm:ss”. Checks that the time format is correct and the values for hours, minutes and seconds are within their ranges.

Parameters are:

  • data_type: always set to “time”

  • name: the name of the variable

  • unit: the list of physical units as comma separated values. Set to “none” or “unitless” if the variable is dimensionless.

  • unit_error: the error message to be displayed in case of error related to the physical units, either direct or as a reference (see below). Unit error messages can contain the tags

    • {field_name}: resolves to the variable name

    • {unit}: resolves to the required unit(s)

    • {bad_unit}: resolves to the unit detected as incorrect

Example:

{
  "name": "time_processed",
  "unit": "hh:mm:ss",
  "data_type": "time",
  "unit_error": "@field_has_wrong_unit"
}

String record

This set of rules checks measurement variables containing a string. It checks that the string value is not empty.

Parameters are:

  • data_type: always set to “string”

  • name: the name of the variable

  • error: the error message to be displayed, either direct or as a reference (see below). Error messages can contain the tags

    • {field_name}: resolves to the variable name

    • {line}: resolves to the record line number where the error was detected

Example:

{
  "name": "hplc_gsfc_id",
  "data_type": "string",
  "error": "@field_is_empty"
}

Errors and Warnings

The OCDB validation system allows to customize most of the error and warning messages. Messages can be either literal messages (i.e. these are displayed “as is” ) or can refer to a predefined message using the “@” character.

Example:

{
  "name": "adg",
  "unit": "1/m",
  "lower_bound": 0,
  "data_type": "number",
  "value_error": "@field_out_of_bounds",
  "unit_error": "This field has a faulty unit"
} 

The “value_error” is referring to a message template while the “unit_error” is using a literal error message.

The predefined messages are stored in the sections “errors” and “warnings” of the configuration file. A message is defined by a message name and the message text. Validation rules always refer to the name using the pattern “@name”.

Example:

"errors": [
    {
        "name": "south_north_mismatch",
        "message": "South_latitude is larger than north_latitude"
       }
]

The versatility of this message-system is supported by the possibility to use tags in the message templates. A tag is added to a message by writing the tag name in curly braces. Example:

{
  "name": "field_has_wrong_unit",
  "message": "The units of '{field_name}', should be '{unit}', not '{bad_unit}'."
} 

The possible tags for each validation rule is listed above in the descriptions of the rule parameters.

Modifiers

A variable name can optionally be followed by any number of digits to e.g. denote the wavelength at which the measurement was taken. The name in the configuration file must be the raw name, without any suffix nor modifier; these are automatically stripped for the validation chores. A list of acceptable suffix and modifiers are listed here.