OCSF data conversion process

The Ingest Processor parses the incoming data, detects its event type, and then maps the data to an OCSF schema.

When configuring the Ingest Processor to convert data to OCSF format, you must specify a source type, which allows the Ingest Processor to do the following:
  • Parse the incoming data according to the data format that is associated with the specified source type.

  • Identify the type of event that the data represents.

  • Map the data to an appropriate OCSF schema based on the source type and event type.

For example, the following is a log emitted by a Cisco Secure Firewall ASA device. This log describes an authentication event, and matches the expected format for data that is associated with the cisco:asa source type:
<166>Jan 05 2024 03:21:14 10.194.183.195 : %ASA-6-611101: User authentication succeeded: Uname: <sasha_patel>
When you configure the Ingest Processor to parse this log as cisco:asa data and convert it to OCSF format, the Ingest Processor identifies the log as an authentication event based on the 611101 message ID, and then maps the log to the Authentication schema from OCSF. The converted data looks like the following:
{
    category_uid: 3,
    metadata: {
        uid: "13003627b465aab8433481239578db50",
        product: {
            name: "ASA",
            vendor_name: "Cisco"
        },
        log_name: "Syslog",
        event_code: "611101",
        profiles: [
            "host"
        ],
        original_time: "Jan 05 2024 03:21:14",
        version: "1.5.0"
    },
    session: {
        is_vpn: true
    },
    message: "User authentication succeeded: Uname: <sasha_patel>",
    unmapped: {
        level: "6",
        facility: 20
    },
    status_id: 1,
    service: {
        name: "ASA"
    },
    activity_id: 1,
    class_uid: 3002,
    dst_endpoint: {
        ip: "10.194.183.195"
    },
    severity_id: 1,
    time: 1704424874000,
    device: {
        type_id: 9,
        ip: "10.194.183.195"
    },
    user: {
        name: "sasha_patel"
    },
    type_uid: 300201
}

For information about the Authentication schema, see “Authentication” in the OCSF schema browser: https://schema.ocsf.io/classes/authentication.

Fallback behavior for failed conversions

The Ingest Processor can fail to convert data to OCSF format for reasons including the following:
  • The source type or event type of the data is not supported. For more information, see Supported source types and event types.

  • The data doesn't match the source type specified in the pipeline configuration.

  • The source type is unknown due to a configuration error.

If the conversion fails, then the Ingest Processor maps the data to the generic Application Error schema from OCSF. For information about this schema, see “Application Error” in the OCSF schema browser: https://schema.ocsf.io/classes/application_error.

For example, consider the following cisco:asa log:
<166>Jan 05 2024 03:21:14 10.194.183.195 : %ASA-6-611101: User authentication succeeded: Uname: <sasha_patel>
If you configure the Ingest Processor to parse this log as pan:globalprotect data, the conversion fails and the Ingest Processor produces the following result. Notice that the result includes a message attribute that contains an error message, and a raw_data attribute that contains a copy of the original raw data.
Note: In OCSF, each key in the JSON-formatted data is called an "attribute".
{
    category_uid: 6,
    metadata: {
        uid: "ecb1da171382f0b33d3608eeab4b902a",
        product: {
            path: "/processor",
            feature: {
                name: "to_ocsf eval function"
            },
            name: "Splunk SPL2 Processor",
            vendor_name: "Splunk"
        },
        version: "1.5.0"
    },
    status_id: 2,
    activity_id: 2,
    class_uid: 6008,
    severity_id: 6,
    time: 1748295979566,
    message: "OCSF translation failed: no matched predicate rules and no default rule; source type "pan:globalprotect": no translation",
    raw_data: "<166>Jan 05 2024 03:21:14 10.194.183.195 : %ASA-6-611101: User authentication succeeded: Uname: <sasha_patel>",
    type_uid: 600802
}
Application Error events are identified by the class_uid value 6008. You can configure a pipeline to filter failed OCSF conversion results out of your data by using the following where command:
| where json_extract(_raw, "class_uid") != "6008"
Additionally, you can use the route command to send failed OCSF conversion results to a different destination than successfully converted data. For example, the following pipeline sends failed conversion results to $destination2, and sends successfully converted data to $destination:
import { ocsf, route } from /splunk/ingest/commands

$pipeline = | from $source 
| ocsf
| route json_extract(_raw, "class_uid") == 6008, [
    | into $destination2
]
| into $destination;

For more information about the route command, see Process a subset of data using Ingest Processor.

Retaining a copy of the original data

You can choose to include a copy of the original data in the OCSF-formatted output by configuring one of the following advanced options:
  • When configuring the Convert _raw to OCSF format pipeline action, which represents the ocsf command, turn on the Include original raw data option.

  • When configuring the to_ocsf function, set the include_raw option to true.

If these options are turned on, the OCSF-formatted output will include a raw_data attribute containing a copy of the original data.

You can also use the thru command in your pipeline to create a backup copy of the original data and send it to a different data destination than the OCSF-formatted data. For example, the following pipeline sends an unaltered copy of the original data to $destination2, and then sends the OCSF-formatted data to $destination:
import ocsf from /splunk.ingest.commands

$pipeline = | from $source | thru [
    | into $destination2
]
| ocsf
| into $destination;

For more information about the thru command, see Process a copy of data using Ingest Processor.

Including sibling strings for enum attributes

In OCSF, ID values from the data are stored in enum attributes. Some enum attributes are paired with other attributes, known as sibling strings, that provide descriptive labels for the ID values in the enum attributes.

By default, the Ingest Processor doesn't include sibling strings for enum attributes when converting data to OCSF format. However, you can choose to include sibling strings by turning on the Include sibling strings for enum attributes option in the ocsf command or the add_enum_siblings option in the to_ocsf function.

For example, assume that the converted data includes the key-value pair severity_id: 1. According to the Authentication schema in OCSF, severity_id is an enum attribute that has a corresponding sibling string called severity. If you configure the Ingest Processor to include sibling strings, then the converted data will include this key-value pair: severity: Informational.

For information about the Authentication schema, see “Authentication” in the OCSF schema browser: https://schema.ocsf.io/classes/authentication.

Including observables

A security observable is any piece of information from your event data that is especially relevant for detecting and analyzing potential security threats. Observables can be found in a variety of different fields or attributes in your events. For example, IP addresses are observables, and they might be found in attributes such as device, src_endpoint, and dst_endpoint.

When converting data to OCSF format, you can choose to summarize the attributes that contain observables into an array of objects called observables. You can then verify the presence of a specific observable by checking this observables array instead of checking multiple attributes individually.

The following is an example of an observables array from an OCSF-formatted event:
{ 
...
    observables: [
        {
            type_id: 20,
            name: "dst_endpoint"
        },
        {
            type_id: 2,
            name: "dst_endpoint.ip",
            value: "10.160.0.10"
        },
        {
            type_id: 20,
            name: "device"
        },
        {
            type_id: 2,
            name: "device.ip",
            value: "10.160.0.10"
        },
        {
            type_id: 20,
            name: "src_endpoint"
        },
        {
            type_id: 2,
            name: "src_endpoint.ip",
            value: "10.160.39.123"
        }
    ],
... 
}

By default, the Ingest Processor doesn't include the observables array when converting data to OCSF format. However, you can choose to include it by turning on the Include observables option in the ocsf command or the add_observables option in the to_ocsf function.

See also