Creating and using data schemas with SPL2 data types

SPL2 allows for flexible data typing, enabling schema creation with data types to add structure, validation, and handling logic to your data.

By design, SPL2 and SPL are loosely and implicitly typed languages that do not define the schema of the data. You don't need to define specific types for your data before working with it, and the language infers types as needed in order to determine if a piece of data is valid for a given operation. For example, when an expression like eval errors = 10 is processed, the errors field is implicitly typed as an integer and considered to be valid input for numerical operations.

However, when working in SPL2, you can choose to create a schema by using data types. Data types are classifications that specify the allowed format and range of values for a given piece of data. By constraining your data to well-defined types, you can add structure, descriptive metadata, validation logic, and handling logic to your data. Data types serve as the basic building blocks for constructing a data schema.

Flexible data typing in SPL2

SPL2 provides an extensible data typing system that you can use to define the schema of your data on an as-needed basis:

You can control when to constrain data to specific types and when to leave the data loosely typed.
You can choose to constrain various levels of data to different types. For example, a dataset can contain records that match the object type, and the individual fields in the records can match other types such as string or number.
You can expand beyond the default set of supported data types by creating fine-tuned custom data types that match the shape of your actual data.

You can also choose to not use data types, and instead rely on the loose typing logic that the Splunk platform uses by default.

When to use strong typing instead of loose typing

SPL2 is loosely typed by default. You don't need to define specific types for your data before successfully ingesting, searching, and processing it, and you can define functions that allow the input and output to be any type of data. This flexibility reduces the amount of overhead required to work with the language, and supports use cases where the schema of the data is unknown or highly variable.

However, there are also situations where it is beneficial to constrain your data to specific types. Compared to strongly typed data, loosely typed data is relatively ambiguous. For example, if an event field named customer is untyped, it is unclear if the field should contain names, ID numbers, detailed records in JSON object format, or something else. The field might even contain a mix of those values if consistency is not enforced during data entry.

You can use strong typing to eliminate this ambiguity and set logical guidelines for your data. For example, assume that the customer event field is intended to contain ID numbers only. You can make this requirement explicit by constraining the customer field to the int data type, which corresponds to integers or whole numbers.

You can verify whether all the values in the customer field are ID numbers instead of names or other kinds of text strings. The following expression returns true if the customer value matches the int type, and returns false otherwise:
CODE
... | eval type_check_results = if(isint(customer), "true", "false")
```
... | eval type_check_results = if(isint(customer), "true", "false")
```
You can filter the data so that only records that have ID numbers in the customer field are allowed to continue downstream. The following expression filters the data and only retains records where the value in the customer field matches the int type:
CODE
... | where isint(customer)
```
... | where isint(customer)
```

Specifying precise data requirements using custom data types

In addition to using the default data types that are built into SPL2, you can also create custom data types that specify more precise requirements.

For example, assume that a valid customer ID number must be exactly 6 digits, and that the first digit cannot be a 0. In this case, constraining the customer field to the int type will not suffice, since integers include numbers that have fewer or more than 6 digits as well as numbers that have 0 as the first digit. To capture these requirements for customer ID numbers, you can define a custom data type that is based on the int type but restricts the allowed range of values. The following expression defines a custom type named id_number, which corresponds to integers that are between 100000 and 999999, inclusive:

CODE

type id_number = int where ($value BETWEEN 100000 AND 999999)

type id_number = int where ($value BETWEEN 100000 AND 999999)

You can then constrain the customer field to this custom type.

The following expression returns true if the customer value matches the id_number type, and returns false otherwise:
CODE
... | eval type_check_results = if(customer IS id_number, "true", "false")
```
... | eval type_check_results = if(customer IS id_number, "true", "false")
```
The following expression filters the data and only retains records where the value in the customer field matches the id_number type:
CODE
... | where customer IS id_number
```
... | where customer IS id_number
```

You can also combine multiple data types together in order to create an advanced data type that layers together the requirements defined in those individual types.

As another example, assume that the _raw field in your data contains complete customer records, which are JSON objects containing the keys customer, name, email, and vip_member. A valid customer record looks like this:

JSON

{customer: 109351, name: "Buttercup Games Company", email: "info@buttercupgames.com", vip_member: "true"}

{customer: 109351, name: "Buttercup Games Company", email: "info@buttercupgames.com", vip_member: "true"}

You can use a combination of data types to constrain the values in the _raw field to this customer record format and ensure that the values for each key are valid. To do this, start by identifying data types that describe the validation requirement for each key in the JSON object:


Validation requirement	Data type
The `customer` key must contain a 6-digit ID number where the first digit is not a 0.	Use the following SPL2 expression to define a custom data type named `id_number`: CODE type id_number = int where ($value BETWEEN 100000 AND 999999) `type id_number = int where ($value BETWEEN 100000 AND 999999)`
The `name` key must contain the name of the customer.	Use the built-in `string` type, which accepts any sequence of alphanumeric characters and special characters as a valid data value.
The `email` key must contain an email address that uses the format <username>@<domain>.	Use the following SPL2 expression and regular expression to define a custom data type named `email_address`: JSON type email_address = string WHERE match($value, /(?P<Email>[a-zA-Z][a-zA-Z0-9._-]+@(\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))(\.?\|\b)))/); `type email_address = string WHERE match($value, /(?P<Email>[a-zA-Z][a-zA-Z0-9._-]+@(\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))(\.?\|\b)))/);`
The `vip_member` key must contain either `true` or `false`.	Use the built-in `boolean` type, which only accepts `true` or `false` as a valid data value.

Then, to specify how these data types must be contained in a JSON object format, combine them in a structured data type named customer_record:

JSON

type customer_record = {
    customer: id_number,
    name: string,
    email: email_address,
    vip_member: boolean
}

type customer_record = {
    customer: id_number,
    name: string,
    email: email_address,
    vip_member: boolean
}

You can then constrain the _raw field to this customer_record type in order to ensure that the customer records are all JSON objects that contain the keys customer, name, email, and vip_member, and that the value in each key is valid according to your requirements.

By constraining a piece of data to a particular type, you can specify and enforce rules about the structure and content of that data.

Using schemas to work with your data

When you create a schema for your data using data types, you implement a metadata framework that describes the characteristics of the data, such as the relevant fields and the allowed range of values. This framework allows you to distinguish between different kinds of data, identify data of interest, and selectively filter and process specific subsets of data.

Depending on your particular needs, you might want to use schemas during ingest time, search time, or both. Constraining incoming data to types as it streams through an Edge Processor or Ingest Processor pipeline allows you to categorize that data, process and route different categories differently, and validate and correct the data before writing it to storage. In contrast, constraining search results to types allows you to categorize and selectively process data while investigating, analyzing, and reporting on indexed data.

Across all contexts, schematizing your data allows you to achieve a broad range of data processing outcomes, including the following:

Standardize your data

Ensure that your data follows a consistent schema by constraining the data to an appropriate type. For more information, see Standardize data using SPL2 data types.

Validate and improve data quality

Assess the quality of your data by checking whether the structure and content of the data meets the requirements specified in the type definitions. Improve the quality of your data by identifying invalid data, preventing it from reaching production environments, and correcting it. For more information, see Validate and improve data quality using type checks.

Implement data handling

Distinguish between different subsets of data based on their schemas so that you can select, process, and route data of interest in different ways. For more information, see Implement data handling logic using SPL2 data types.

Splunk Cloud Platform

Creating and using data schemas with SPL2 data types

Flexible data typing in SPL2

When to use strong typing instead of loose typing

Specifying precise data requirements using custom data types

Using schemas to work with your data

See also

ON THIS PAGE

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Creating and using data schemas with SPL2 data types

Flexible data typing in SPL2

When to use strong typing instead of loose typing

Specifying precise data requirements using custom data types

Using schemas to work with your data

See also