Create modular inputs
This topic provides details on creating a modular input script, defining an introspection scheme, and the impact of enabling, disabling, and updating modular input scripts. It also covers overriding default modular input script run behavior for *nix and Windows.
Other features regarding creating modular inputs, listed below, are covered elsewhere in this manual:
Create a modular input script
A script that implements modular inputs runs in three scenarios:
- Returns the introspection scheme to splunkd.The introspection scheme defines the behavior and endpoints of the script, as described in Define a scheme for introspection. Splunkd runs the script to determine the behavior and configuration.
- Validates the script's configuration.The script has routines to validate its configuration, as described in Set up external validation.
- Streams data.The script streams event data that can be indexed. The data can be streamed as plain text or as XML, as described in Set up streaming.
The following pseudo-code describes the behavior of a modular input script. This example assumes that there is a valid spec file, as described in Modular inputs spec file. This also assumes that you are checkpointing data to avoid reading from the same source twice, as described in Data checkpoints.
Define an introspection scheme
  Implement --scheme arg to print the scheme to stdout (scenario 1)
Implement routines to validate configuration
  Implement --validate-arguments arg to validate configuration (scenario 2)
  If validation fails, exit writing error code to stdout
Read XML configuration from splunkd
Stream data as text or as XML, using checkpoints (scenario 3)
  If checkpoint exists
    Exit
  Else
    While not done
      Write event data to stdout
    Write checkpoint
Architecture-specific scripts
Typically, you use the default bin directory for scripts:
$SPLUNK_HOME/etc/apps/<myapp>/bin/<myscript>
However, you can provide an architecture-specific version of a modular input script by placing the appropriate version of the script in the corresponding architecture-specific bin directory in your Splunk Enterprise installation. Architecture-specific version directories are only available for the following subset of architectures that Splunk Enterprise supports. The architecture-specific directories are all Intel-based.
- Linux
- Windows
- Apple (darwin)
The following bin directories, relative to $SPLUNK_HOME/etc, are available for the corresponding Intel architectures:
/apps/<myapp>/linux_x86/bin/<myscript>
/apps/<myapp>/linux_x86_64/bin/<myscript>
\apps\<myapp>\windows_x86\bin\<myscript>
\apps\<myapp>\windows_x86_64\bin\<myscript>
/apps/<myapp>/darwin_x86/bin/<myscript>
/apps/<myapp>/darwin_x86_64/bin/<myscript>
If you place a script in an architecture-specific directory, the script runs the appropriate version of the script if installed on that platform. Otherwise, a platform-neutral version of the script runs in the default bin directory.
Note: Always have a platform-neutral version of the script in the default bin directory. Only use a platform-specific directory if required for that architecture.
Executable files recognized for introspection
The following type of executable files are recognized for introspection:
- *Nix platformsfilename.shfilename.pyfilename (executable file without an extension)
- Windows platformsfilename.batfilename.cmdfilename.pyfilename.exe
Example scripts
See Modular Inputs examples for listings and descriptions of Modular Inputs example scripts. It contains the following examples:
General tips on writing scripts
See Tips for writing scripts for modular and scripted inputs in Splunk Cloud Platform or Splunk Enterprise on the Splunk developer portal for tips and best practices for writing scripts.
Define a scheme for introspection
You define both the behavior and endpoints for a script in an XML scheme that the script returns to splunkd.
During introspection, splunkd reads the scheme to implement your script as a modular input. Introspection determines the following:
- The endpoint definition for your script, which includes required and optional parameters to create and modify the endpoint.
- The title and description for the script, which is used in the Settings pages for creating or editing instances of the script.
- Behavior for the script such as: - Streaming in XML or plain text
- Use a single script or multiple script instances
- Validate your scheme configuration
 
Introspection defaults
Providing an introspection scheme with your script is optional.
If you do not provide the introspection scheme, the Settings page displays default values, which may or may not be appropriate for your script.
If you do provide an introspection scheme, each element in the scheme is optional. If you do not provide an element, then Splunk software uses the default value for that element, which may or may not be appropriate for your script.
Your script must provide a "--scheme" argument, which when specified, does the following:
- If you implement an introspection scheme, writes the scheme to stdout.
- If you do not provide an introspection scheme, exits with return code 0. Splunk software uses the default introspection scheme in this scenario.
Example scheme
The following snippet from a script contains an example XML scheme. It also contains snippets that show the routines to return the scheme for splunkd introspection. The introspection scheme must be UTF-8 encoded.
Note: See also Introspection scheme and Splunk Manager pages to view how this scheme affects the display in Splunk Web.
XML scheme snippets
. . .
SCHEME = """<scheme>
    <title>Amazon S3</title>
    <description>Get data from Amazon S3.</description>
    <use_external_validation>true</use_external_validation>
    <streaming_mode>xml</streaming_mode>
    <endpoint>
        <args>
            <arg name="name">
                <title>Resource name</title>
                <description>An S3 resource name without the leading s3://.  
                   For example, for s3://bucket/file.txt specify bucket/file.txt.  
                   You can also monitor a whole bucket (for example by specifying 'bucket'),
                   or files within a sub-directory of a bucket
                   (for example 'bucket/some/directory/'; note the trailing slash).
                </description>
            </arg>
            <arg name="key_id">
                <title>Key ID</title>
                <description>Your Amazon key ID.</description>
            </arg>
            <arg name="secret_key">
                <title>Secret key</title>
                <description>Your Amazon secret key.</description>
            </arg>
        </args>
    </endpoint>
</scheme>
"""
. . .
def do_scheme():
    print SCHEME
. . .
if __name__ == '__main__':
    if len(sys.argv) > 1:
        if sys.argv[1] == "--scheme":
            do_scheme(). . .
Introspection scheme details
Use <scheme> tags to define an introspection scheme. <scheme> can contain the following top-level elements:
Top-level elements for introspection <scheme>
| Tag | Description | 
|---|---|
| <title> | Provides a label for the script. The label appears in the Settings page for Data inputs. | 
| <description> | Provides descriptive text for title in the Setings page for Data inputs. The description also appears on the Add new data inputs page. | 
| <use_external_validation> | true | false. (Default is false.) Enables external validation. | 
| <streaming_mode> | xml | simple (Default is simple, indicating plain text.) Streams inputs as xml or plain text. | 
| <use_single_instance> | true | false (Default is false.) Indicates whether to launch a single instance of the script or one script instance for each input stanza. The default value, false, launches one script instance for each input stanza. | 
| <endpoint> | Contains one or more <arg> elements that can be used to change the default behavior that is defined in the inputs.conf.specfile.The parameters to an endpoint are accessible from the management port to Splunk Enterprise. Additionally, Splunk Web uses the endpoint to display each <arg> as an editable field in the Add new data inputs Settings page. See below for details on specifying <endpoint>. | 
The <endpoint> element allows you to modify the default behavior that is defined in the inputs.conf.spec file. The following table lists the child elements to <endpoint>: 
Details for the <endpoint> element
| Tag | Description | 
|---|---|
| <args> | Can contain one or more <arg> elements, defining the parameters to an endpoint. | 
| <arg> | Defines the details of a parameter. <arg> can contain the following elements: <title> <description> <validation> <data_type> <required_on_edit> <required_on_create> | 
| <title> | Provides a label for the parameter. | 
| <description> | Provides a description of the parameter. | 
| <validation> | Define rules to validate the value of the argument passed to an endpoint create or edit action. See Validation of arguments for details. You can also perform a higher level validation on a script, using the <use_external_validation> tag. See Set up external validation for more information. | 
| <data_type> | Specify the data type for values returned in JSON format. Splunk endpoints can return data in either JSON or Atom (XML) format. To handle data returned in JSON format, use <data_type> to properly define the datatype for the streamed data. Default datatype is string. Valid values are: string number boolean This has no effect for data returned in Atom format. New to Atom? For an introduction go to AtomEnabled.org. | 
| <required_on_edit> | true | false (Default is false.) Indicates whether the parameter is required for edit. Default behavior is that arguments for edit are optional. Set this to true to override this behavior, and make the parameter required. | 
| <required_on_create> | true | false (Default is true.) Indicates whether the parameter is required for create. Default behavior is that arguments for create are required. Set this to false to override this behavior, and make the parameter optional. | 
Built-in arguments and actions
There are several arguments and actions that are always supported by a modular input endpoint.
The following arguments are implicit, and do not need to be defined in an introspection scheme:
source
sourcetype
host
index
disabled
interval
persistentQueue
persistentQueueSize
queueSize
The following actions are also implicit, and do not need to be defined in an introspection scheme:
enable/disable
Disabling an item shuts down a script. Enabling starts it up.
reload
Works on the endpoint level. Scripts that handle all of the enabled input stanzas are restarted.
Validation of arguments
Use the <validation> tag to define validation rules for arguments passed to an endpoint create or edit action. This allows you to provide input validation for users attempting to modify the configuration using the endpoint. 
For example, the following validation rule tests if the value passed for the argument is a boolean value:
<arg name="myParam">
   <validation>is_bool('myParam')</validation>
   . . .
</arg>
You can specify a validation rule for each arg, as shown in the above example for the myParam argument. The parameter passed to the function must match the name of the argument. 
The Splunk platform provides built-in validation functions that you can use. param in each function must match the name specified for <arg>. 
| Validation function | Description | 
|---|---|
| is_avail_tcp_port(param) | Is the value a valid port number, available for TCP listening. | 
| is_avail_udp_port(param) | Is the value a valid port number, available for UDP listening. | 
| is_nonneg_int(param) | Is the value a non-negative integer. | 
| is_bool(param) | Is the value a boolean expression ("true", "false", "yes", "no", "1", "0"). | 
| is_port(param) | Is the value a valid port number (1-65536) | 
| is_pos_int(param) | Is the value a positive integer. | 
You can also define your own validation rules using eval expressions that evaluate to true or false. Place the eval expression within a validate() function. See eval in the Splunk Search Reference for information on creating eval expressions. 
For example, the following validation rules determine if the argument is in the form of a hypen-separated Social Security number:
<arg name="ssn">
 <validation>
   validate(match('ssn', '^\d{3}-\d{2}-\d{4}$'), "SSN is not in valid format")
 </validation>
 . . .
 </arg>
Another example defining a validation rule:
<arg name="bonus">
 <validation>
   validate(is_pos_int(bonus) AND bonus > 100, "Value must be a number greater than 100.")
 </validation>
 . . .
 </arg>
Single or multiple instances of a script
The default behavior for a script is to run in one script instance per input stanza mode. This results in multiple instances of the script, one for each input stanza. This default behavior is useful in multi-thread environments or in situations that require different security contexts or access to different databases.
In a single-threaded environment you might want to run in single script instance mode. For example, in a WMI environment you would run a single instance of a script so you can re-use connections.
You can override the default multiple instances of a script behavior by enabling single script instance mode in the introspection scheme:
<use_single_instance>true</use_single_instance>
Introspection scheme and Splunk Manager pages
This section contains screen captures that illustrates how an introspection scheme affects the pages available from Settings.
Compare the screen captures here with the XML tags in the Example scheme listed above:
Figure 1: Settings showing Modular Inputs with other data inputs
 
            
Figure 2: Settings showing custom fields to add a modular input
 
            
Read XML configuration from splunkd
A modular input script uses stdin to read inputs.conf configuration information from splunkd. The script parses the XML configuration information.
The XML format of the configuration information passed to the script depends on in which mode the script is running:
- single script instance per input stanza mode
- single script instance mode
Note:Developer tools for modular inputs in this manual shows how you can use the modular inputs utility to preview the configuration and the results returned by the script.
Configuration for single script instance per input stanza mode
In single script instance per input stanza mode, the XML configuration passed to the script looks something like this:
<input>
  <server_host>myHost</server_host>
  <server_uri>https://127.0.0.1:8089</server_uri>
  <session_key>123102983109283019283</session_key>
  <checkpoint_dir>/opt/splunk/var/lib/splunk/modinputs</checkpoint_dir>
  <configuration>
    <stanza name="myScheme://aaa">
        <param name="param1">value1</param>
        <param name="param2">value2</param>
        <param name="disabled">0</param>
        <param name="index">default</param>
    </stanza>
  </configuration>
</input>
| Tag | Description | 
|---|---|
| <server_host> | The hostname for the Splunk Enterprise server. | 
| <server_uri> | The management port for the Splunk Enterprise server, identified by host, port, and protocol. | 
| <session_key> | The session key for the session with splunkd. The session key can be used in any REST session with the local instance of splunkd. | 
| <checkpoint_dir> | The directory used for a script to save checkpoints. This is where the input state from sources from which it is reading is tracked. | 
| <configuration> | The child tags for <configuration> are based on the schema you define in the inputs.conf.specfile for your modular inputs. Splunk software reads all the configurations in the Splunk Enterprise installation and passes them to the script in <stanza> tags. | 
Configuration for single script instance mode
The XML configuration information passed when running in single script instance mode varies slightly. When running in single script instance mode, all configuration stanzas have to be included because there is only one instance of the script running.
<input>
  <server_host>myHost</server_host>
  <server_uri>https://127.0.0.1:8089</server_uri>
  <session_key>123102983109283019283</session_key>
  <checkpoint_dir>/opt/splunk/var/lib/splunk/modinputs</checkpoint_dir>
  <configuration>
    <stanza name="myScheme://aaa">
        <param name="param1">value1</param>
        <param name="param2">value2</param>
        <param name="disabled">0</param>
        <param name="index">default</param>
    </stanza>
    <stanza name="myScheme://bbb">
        <param name="param1">value11</param>
        <param name="param2">value22</param>
        <param name="disabled">0</param>
        <param name="index">default</param>
    </stanza>
  </configuration>
</input>
If you are running the modular input script in single script instance mode, and there are no configuration stanzas for your input scheme configured in inputs.conf, Splunk software passes in an empty configuration tag, as illustrated below. Your modular input script must be able to handle the empty configuration tag.
<input>
  <server_host>myHost</server_host>
  <server_uri>https://127.0.0.1:8089</server_uri>
  <session_key>123102983109283019283</session_key>
  <checkpoint_dir>/opt/splunk/var/lib/splunk/modinputs</checkpoint_dir>
  <configuration/>
</input>
Example code reading XML configuration
The following example shows how to read the XML configuration from splunkd. This script has been made cross-compatible with Python 2 and Python 3 using python-future.
# read XML configuration passed from splunkd
from builtins import str
def get_config():
    config = {}
    try:
        # read everything from stdin
        config_str = sys.stdin.read()
        # parse the config XML
        doc = xml.dom.minidom.parseString(config_str)
        root = doc.documentElement
        conf_node = root.getElementsByTagName("configuration")[0]
        if conf_node:
            logging.debug("XML: found configuration")
            stanza = conf_node.getElementsByTagName("stanza")[0]
            if stanza:
                stanza_name = stanza.getAttribute("name")
                if stanza_name:
                    logging.debug("XML: found stanza " + stanza_name)
                    config["name"] = stanza_name
                    params = stanza.getElementsByTagName("param")
                    for param in params:
                        param_name = param.getAttribute("name")
                        logging.debug("XML: found param '%s'" % param_name)
                        if param_name and param.firstChild and \
                           param.firstChild.nodeType == param.firstChild.TEXT_NODE:
                            data = param.firstChild.data
                            config[param_name] = data
                            logging.debug("XML: '%s' -> '%s'" % (param_name, data))
        checkpnt_node = root.getElementsByTagName("checkpoint_dir")[0]
        if checkpnt_node and checkpnt_node.firstChild and \
           checkpnt_node.firstChild.nodeType == checkpnt_node.firstChild.TEXT_NODE:
            config["checkpoint_dir"] = checkpnt_node.firstChild.data
        if not config:
            raise Exception("Invalid configuration received from Splunk.")
        # just some validation: make sure these keys are present (required)
        validate_conf(config, "name")
        validate_conf(config, "key_id")
        validate_conf(config, "secret_key")
        validate_conf(config, "checkpoint_dir")
    except Exception as e:
        raise Exception("Error getting Splunk configuration via STDIN: %s" % str(e))
    return config
Enable, disable, and update modular input scripts
As with any other Splunk Enterprise app, you can enable, disable, or update the script that implements modular inputs. These actions produce the following behavior for modular inputs.
- Disabling a modular input scriptWhen a modular input script is in the disabled state, the input is not initialized. The Settings pages do not reference the script. Splunk software ignores any inputs.conf files that reference the disabled modular input script.If the modular input script is enabled, and then disabled while Splunk Enterprise is running, the script is stopped and unregistered. The endpoints for the script cannot be accessed and the Settings pages no longer reference the script.
- Enabling a modular input scriptIf you enable a modular input script that was previously disabled, the script is registered with the Splunk platform. The endpoints for the script are accessible and the Settings pages for the script are available.
- Updating a modular input scriptIf you update a modular input script, then when it is enabled the previous version is disabled and the updated version is registered, updating the endpoints and Settings pages.
- Changes to other appsIf other apps are enabled, disabled, or updated, all active modular inputs reload. This is to ensure that updates to inputs.conf files properly reflect the modular inputs.
Override default run behavior for modular input scripts
Adjust the start_by_shell parameter in inputs.conf to override default script running behavior for *nix and Windows. This setting works similarly for scripted inputs and modular inputs. In most cases, the default setting does not need to be adjusted, but it can be set to false for scripts that do not need UNIX shell meta-character expansion.
The default settings for start_by_shell are:
- For *nix: true. Scripts are passed to/bin/sh -c.
- For Windows: false. Scripts are started directly.
If the modular input runs in one-instance-per-stanza mode, override the default start_by_shell setting in the scheme default stanza. This setting is inherited by all of the scheme's input stanzas. You can also change the setting in any individual input stanza for more granular control.
If the modular input runs in single instance mode, override the default start_by_shell parameter setting in the scheme default stanza only. Other individual start_by_shell settings are ignored in this case.