JSON standards for the data and partition schemas

When you define the data schema or partition schema manually via the UI using the JSON view, follow these JSON standards for the schemas.

When you define the data schema or partition schema manually via the UI using the JSON view, follow these JSON standards for the schemas.

Data schema

The data schema (dataSchema) is the schema you use if you choose to manually define the columns in your dataset and you select JSON view when you define the schema through the UI.

The data schema is expressed as a subset of JSON Schema. Because field order matters and some JSON parsers do not preserve the order of object properties, you must use the required array to ensure the column name order that you want. See JSON Schema.

Data schema OpenAPI spec

CODE
dataSchema:
  type: object
  description: JSON Schema for the dataset
  required:
    - properties
    - required
  properties:
    properties:
      type: object
      description: The definition of the dataset's schema
    required:
      type: array
      items:
        type: string
      description: Defines the order of the fields in the schema

Data schema example

Here is an example of a data schema for a dataset with 3 columns, including an object type that contains two string types. The required array sets the required order of the columns.

JSON
{
  "properties": {
    "zipcode": {
      "type": "integer"
    },
    "city": {
      "type": "string"
    },
    "room": {
      "type": "object",
      "properties": {
        "roomfloor": { "type": "string" },
        "roomname": { "type": "string" }
      }
    }
  },
  "required": ["zipcode", "city", "room"]
}

Data types supported by the data schema

The following data types are supported in the data schema (dataSchema). The data schema supports only a single data type per instance of the type keyword. The null type is not required or supported for schema input as all fields are treated as nullable by default.

For CSV and JSON formats, there is no static enforcement of integer and number data types at read time. You can choose precision from the value range or omit format (default is int64). For Parquet, the schema must match the landed integer data types exactly.

Type Description Example
Array A sequence of arbitrary length where each item matches the same schema. Null values can be included. {"type": "array", "items": {"type": "any_type_in_this_list"}}
Boolean Represents logical true or false values. {"type": "boolean"}
Integer 32 A 32-bit signed integer. {"type": "integer", "format": "int32"}
Integer 64 A 64-bit signed integer (default for integer types). {"type": "integer"} or {"type": "integer", "format": "int64"}
Float A single-precision floating-point number. {"type": "number", "format": "float"}
Double A double-precision floating-point number. {"type": "number", "format": "double"}
Object A collection of key-value pairs defined by properties. {"type": "object", "properties": {"prop1": {"type": "string"}}}
String A standard sequence of characters. {"type": "string"}
Date String A string formatted with Splunk time variables. {"type": "string", "format": "%Y"}
BYTE_ARRAY Binary data (Parquet only). {"type": "string", "format": "byte_array"}
FIXED_LEN_BYTE_ARRAY Fixed-length binary data (Parquet only). {"type": "string", "format": "fixed_len_byte_array"}

Partition schema

The partition schema (dataPartition.PartitionSchema) is the schema you use if you choose to manually define the partition keys in your dataset, and you select JSON view when you define the partition keys through the UI.

The partition schema is expressed as a subset of the JSON Schema, using the same format as the data schema (dataSchema). Because field order matters and some JSON parsers do not preserve the order of object properties, you must use the required array to ensure the column name order that you want. See JSON Schema.

Partition schema OpenAPI spec

CODE
partition:
  type: object
  description: JSON Schema (same subset as dataSchema) defining the partition fields
  required:
    - properties
    - required
  properties:
    properties:
      type: object
      description: The definition of the partition fields
    required:
      type: array
      items:
        type: string
      description: Defines the order of the partition fields

Partition schema example

Here is an example of a partition schema for a set of date-based partitions: year, month, and day. The required array is used to set the required order of the partition keys.

JSON
{
  "properties": {
    "year": {
      "type": "integer",
      "format": "int32"
    },
    "month": {
      "type": "integer",
      "format": "int32"
    },
    "day": {
      "type": "integer",
      "format": "int32"
    }
  },
  "required": ["year", "month", "day"]
}

Data types supported by the partition schema

The only data types supported by the partition schema (dataPartition.PartitionSchema) are string and integer.