Troubleshoot HTTP Event Collector

You can troubleshoot HTTP Event Collector (HEC) by viewing error logs. You can also set up logging using configuration files, investigate instance performance with dashboards included in the Monitoring Console, and detect other scaling problems.

Logging

HTTP Event Collector saves usage data about itself to log files. You can search these usage metrics using Splunk Cloud Platform or Splunk Enterprise to explore usage trends system-wide, per token, per source type, and more, as well as to evaluate HEC performance. Metrics are logged whenever HEC is active. HEC is disabled by default, so it does not log data until you enable it.

You can also view HEC error logs in the splunkd.log log file on Splunk Enterprise. See Enable debug logging in the Troubleshooting Manual for how to enable debugging on your Splunk Enterprise instance.

Log file location and management

Splunk Enterprise writes HTTP Event Collector metrics to the $SPLUNK_HOME/var/log/introspection/http_event_collector_metrics.log file.

The Splunk platform creates a new http_event_collector_metrics.log file when you log off of and back on to Splunk Cloud Platform or start your Splunk Enterprise instance. Any existing file with that name is renamed.

You configure the logging frequency of HTTP Event Collector metrics in the limits.conf configuration file. 60 seconds is the default frequency. HEC continues logging system-level metrics even when there is no data input activity. When there is no activity, you can expect about 200 kilobytes (KB) of metrics log data to be produced every 24 hours. The maximum size of a metrics log file is 25 megabytes (MB). If a log file reaches that limit, the Splunk platform renames the log file and creates a new file. Up to five metrics log files can be stored at a time.

The props.conf configuration file defines parameters for reading and indexing the metrics log file.

Searching HTTP Event Collector metrics data

The Splunk platform puts HEC metrics data into the _introspection index. To search the accumulated HEC metrics with the Splunk platform, use the following search command:

index="_introspection" token

Metrics log data format

The Splunk platform records HEC metrics data to the log in JSON format. This means that the log is both human-readable and consistent with other Splunk Cloud Platform or Splunk Enterprise log formats. A single entry consists of both input summary metrics (series = http_event_collector) and per-token metrics (series = http_event_collector_token), as shown in the following example:

{  
   "datetime":"09-01-2016 19:21:19.014 -0700",
   "log_level":"INFO",
   "component":"HttpEventCollector",
   "data":{  
      "series":"http_event_collector",
      "transport":"http",
      "format":"json",
      "total_bytes_received":0,
      "total_bytes_indexed":0,
      "num_of_requests":0,
      "num_of_events":0,
      "num_of_errors":0,
      "num_of_parser_errors":0,
      "num_of_auth_failures":0,
      "num_of_requests_to_disabled_token":0,
      "num_of_requests_to_incorrect_url":0,
      "num_of_requests_in_mint_format":0,
      "num_of_ack_requests":0,
      "num_of_requests_acked":0,
      "num_of_requests_waiting_ack":0
   }
}

{  
   "datetime":"08-22-2016 12:38:04.854 -0700",
   "log_level":"INFO",
   "component":"HttpEventCollector",
   "data":{  
      "token_name":"test",
      "series":"http_event_collector_token",
      "transport":"http",
      "format":"json",
      "total_bytes_received":57000,
      "total_bytes_indexed":44000,
      "num_of_requests":1000,
      "num_of_events":1000,
      "num_of_errors":0,
      "num_of_parser_errors":0,
      "num_of_requests_to_disabled_token":0,
      "num_of_requests_in_mint_format":0
   }
}

HEC summary metrics

The Splunk platform accumulates system-wide summary metrics even if there is no input activity. These metrics are identified by "series":"http_event_collector".

See the following table for a description of the fields for HEC summary metrics:

Field Description Value
component HTTP Event Collector metrics data identifier. HttpEventCollector
data:format HTTP Event Collector data format. json
data:num_of_auth_failures Total number of authentication failures due to invalid token. unsigned integer
data:num_of_errors Total number of per-token errors, which include the following options:
  • Bad data format
  • No authorization
  • Bad authorization
  • Connectivity problems
unsigned integer
data:num_of_events Total number of per-token events received by the HTTP Event Collector endpoint. unsigned integer
data:num_of_parser_errors Total number of per-token parser errors due to incorrectly formatted event data. unsigned integer
data:num_of_requests Total number of valid per-token individual HTTP or HTTPS requests received by an HTTP Event Collector endpoint. Each request can have one or more data events. unsigned integer
data:num_of_ack_requests Total number of HEC request indexer status queries received. unsigned integer
data:num_of_requests_acked Total number of HEC requests that Splunk successfully indexed and acknowledged. unsigned integer
data:num_of_requests_waiting_ack Total number of HEC requests received with indexer acknowledgements enabled. unsigned integer
data:num_of_requests_to_incorrect_url Total number of requests to an incorrect URL. unsigned integer
data:num_of_requests_in_mint_format Total number of requests from Splunk MINT. unsigned integer
data:num_of_requests_to_disabled_token Total number of per-token requests to disable token. unsigned integer
data:series Metrics data type. http_event_collector
data:total_bytes_indexed Total amount of per-token data sent to the indexer. unsigned integer
data:total_bytes_received Total amount of per-token data received by calling the receive/token endpoint. unsigned integer
data:transport Data transport protocol for HTTP Event Collector data. http
datetime Date and time associated with the data. Takes the following format: MM-DD-YYYY HH:MM:SS.SSS +/-GMTDELTA string
log_level Log severity level. INFO

Per-token metrics

In contrast to the system-wide summary metrics, the Splunk platform accumulates per-token metrics only when HEC is active. These metrics are identified by "series":"http_event_collector_token".

The [http_input] stanza in the limits.conf configuration file defines the logging interval and maximum number of tokens logged for these metrics.

See the following table for a description of the fields for per-token metrics:

Field Description Value
component HTTP Event Collector metrics data identifier. HttpEventCollector
data:format HTTP Event Collector data format. Always JSON format for metrics logging. json
data:num_of_errors Number of errors, which include the following:
  • Bad data format
  • No authorization
  • Bad authorization
  • Connectivity problems
unsigned integer
data:num_of_events Number of events received by the HTTP Event Collector endpoint. unsigned integer
data:num_of_parser_errors Number of parser errors due to incorrectly formatted event data. unsigned integer
data:num_of_requests Number of valid individual HTTP or HTTPS requests received by an HTTP Event Collector endpoint. Each request can have one or more data events. unsigned integer
data:num_of_requests_in_mint_format Total number of requests from Splunk MINT. unsigned integer
data:num_of_requests_to_disabled_token Number of requests to a disabled token. unsigned integer
data:series Metrics data type. http_event_collector_token
data:token_name Token name. string
data:total_bytes_indexed Total amount of data sent to the indexer. unsigned integer
data:total_bytes_received Total amount of data received by calling the receive/token endpoint. unsigned integer
data:transport Data transport protocol for HTTP Event Collector data. http
datetime Date and time associated with the data. Takes the following format: MM-DD-YYYY HH:MM:SS.SSS +/-GMTDELTA string
log_level Log severity level. INFO

Logging with configuration files

The limits.conf and props.conf files control metrics data logging and indexing behavior.

limits.conf

The [http_input] stanza in the $SPLUNK_HOME/etc/system/default/limits.conf file controls HTTP Event Collector metrics data logging.

Note: For information about all HTTP Event Collector-related parameters, including those not related to metrics, see the [http_input] stanza documentation on limits.conf in the Splunk Enterprise Admin Manual.

Limits.conf takes the following parameters:

ParameterDefault valueDescription
max_number_of_tokens10000An unsigned integer that represents the maximum number of tokens reported by HTTP Event Collector metrics.
metrics_report_interval60An unsigned integer that represents the number of seconds in an HTTP Event Collector metrics report interval.

props.conf

The [http_event_collector_metrics] stanza in the $SPLUNK_HOME/etc/system/default/props.conf file controls reading and indexing the HTTP Event Collector log files.

See the following example:

[source::.../http_event_collector_metrics.log(.\d+)?]
sourcetype = http_event_collector_metrics

...

[http_event_collector_metrics]
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = datetime
TIME_FORMAT = %m-%d-%Y %H:%M:%S.%l %z
INDEXED_EXTRACTIONS = json
KV_MODE = none
JSON_TRIM_BRACES_IN_ARRAY_NAMES = true

Props.conf takes the following parameters:

Parameter Default Description
SHOULD_LINEMERGE false Specifies layout of events per line. Setting to true allows multiple events in the same line. Setting to false puts multiple events in separate lines.
TIMESTAMP_FIELDS datetime Log entry time field name.
TIME_FORMAT %m-%d-%Y %H:%M:%S.%l %z Log entry time field format.
INDEXED_EXTRACTIONS json Metrics log format. Always in JSON format for metrics logging.
KV_MODE none Key-value data indicator. Setting to none means no key-value data. Always none for metrics logging.
JSON_TRIM_BRACES_IN_ARRAY_NAMES true Whether to trim brace characters from JSON array names.

Possible error codes

The following status codes have particular meaning for all HTTP Event Collector endpoints:

Status codeHTTP status code IDHTTP status codeStatus message
0200OK Success
1403Forbidden Token disabled
2401Unauthorized Token is required
3401Unauthorized Invalid authorization
4403Forbidden Invalid token
5400Bad Request No data
6400Bad Request Invalid data format
7400Bad Request Incorrect index
8500Internal Error Internal server error
9503Service Unavailable Server is busy
10400Bad Request Data channel is missing
11400Bad Request Invalid data channel
12400Bad Request Event field is required
13400Bad Request Event field cannot be blank
14400Bad Request ACK is disabled
15400Bad Request Error in handling indexed fields
16400Bad Request Query string authorization is not enabled
17200OK HEC is healthy
18503Service Unavailable HEC is unhealthy, queues are full
19503Service Unavailable HEC is unhealthy, ack service unavailable
20503Service Unavailable HEC is unhealthy, queues are full, ack service unavailable
21400Bad Request Invalid token
22400Bad Request Token disabled
23503Service Unavailable Server is shutting down
24200OK HEC queue is approaching its capacity limit
25200OK HEC ACK is approaching its capacity limit
26429Too Many Requests HEC queue is at capacity and cannot process any more requests
27429Too Many Requests HEC ACK channel is at capacity and cannot process any more requests
CAUTION: To ensure data is successfully ingested into the Splunk platform, configure your clients with the ability to act on response codes returned by the HEC endpoint. If the client can't take an action based on the resulting response code, data loss might occur.

Investigate instance performance with the Monitoring Console

The Monitoring Console provides pre-built dashboards for HEC that you can use to investigate your instance performance. See the following topics for more information:

The Monitoring Console provides a pre-built dashboard to monitor HTTP Event Collector. See Indexing: Inputs: HTTP Event Collector in the Monitoring Splunk Enterprise manual.

Detect scaling problems

If you are experiencing performance slowdowns or want to speed up your HTTP Event Collector deployment, the following factors can affect performance.

HTTP and HTTPS

Sending data over HTTP results in a significant performance improvement compared to sending data over HTTPS.

Batching

If you batch multiple events into single requests, it can speed up data transmission. Because the request metadata applies to all events in the request, less data is sent overall. For more information about how event data is packaged, see Format events for HTTP Event Collector.

HTTP Keep-alive

Setting keep-alive on your connection can improve performance. As long as the client sending the data supports HTTP 1.1 and is set up to support HTTP persistent connection, you can optimize performance with keep-alive.

Persistent queues

Persistent queuing slows down performance by storing data in an input queue to disk. For more information, see Use persistent queues to help prevent data loss.