Configure the Prometheus receiver to collect KServe metrics
Configure the Splunk Distribution of the OpenTelemetry Collector to send KServe metrics to Splunk Observability Cloud.
You can monitor the performance of KServe by configuring the Splunk Distribution of the OpenTelemetry Collector to send KServe predictor, queue proxy, and controller metrics to Splunk Observability Cloud.
This solution uses the Prometheus receiver to collect metrics from KServe, which exposes the following endpoints to publish Prometheus-compatible metrics:
-
Port 8443 with HTTPS/auth for secured KServe installations.
-
Port 8080 with HTTP for standard installations.
Configuration settings
To view the configuration options for the Prometheus receiver, see Settings.
Metrics
The following metrics are available for KServe.
These metrics are considered custom metrics in Splunk Observability Cloud.
| Metric name | Metric type | Description |
|---|---|---|
request_predict_seconds |
histogram |
Time taken in seconds for actual model inference (core prediction). Can be used to calcuate P95 latency, average latency, and request rate. |
request_preprocess_seconds |
histogram |
Time taken in seconds to preprocess input data before inference. Can be used to:
|
request_postprocess_seconds |
histogram |
Time taken in seconds to format prediction results for response. Can be used to:
|
request_explain_seconds |
histogram |
Time taken in seconds for model explanation/interpretation. Can be used to:
|
| Metric name | Metric type | Description |
|---|---|---|
revision_app_request_count |
counter |
Total count of requests that reached the application container. Can be used to:
|
revision_app_request_latencies |
histogram |
End-to-end latency in seconds for requests to the application. Can be used to:
|
revision_request_count |
counter |
Total count of requests received by the revision (includes queued). Can be used to:
|
revision_request_latencies |
histogram |
Total request latency in seconds, including queue proxy processing. Can be used to:
|
queue_operations_per_second |
gauge |
Current rate of queue operations per second. Can be used to:
|
queue_depth |
gauge |
Current number of requests in queue. Can be used to:
|
Attributes
| Attribute name | Description |
|---|---|
model_name |
Model identifier (e.g., "sklearn-iris", "tensorflow-mnist"). Used for per-model filtering. |
inferenceservice |
InferenceService name (e.g., "sklearn-iris") |
configuration_name |
Knative configuration name (e.g., "sklearn-iris-predictor") |
revision_name |
Knative revision name (e.g., "sklearn-iris-predictor-00001") |
service_name |
Knative service name |
component |
Component type (e.g., "predictor", "transformer", "explainer") |
response_code |
HTTP status code (e.g., "200", "400", "500"). Used for error filtering. |
response_code_class |
HTTP status class (e.g., "2xx", "4xx", "5xx") |
method |
HTTP method (e.g., "POST", "GET") |
k8s.cluster.name |
Cluster name |
k8s.namespace.name |
Namespace (e.g., "default", "production") |
k8s.pod.name |
Full pod name with hash |
k8s.pod.uid |
Pod unique identifier |
k8s.node.name |
Kubernetes node name |
k8s.deployment.name |
Deployment name |
k8s.replicaset.name |
ReplicaSet name |
container_name |
Container name (e.g., "kserve-container", "queue-proxy") |
pod_name |
Pod name (alternative to k8s.pod.name) |
namespace |
Namespace (alternative to k8s.namespace.name) |
namespace_name |
Namespace (another variant) |
host_name |
Host name |
host_kernel_name |
Kernel name (e.g., "Linux") |
host_kernel_release |
Kernel version |
host_physical_cpus |
Number of physical CPUs |
host_logical_cpus |
Number of logical CPUs |
host_cpu_cores |
Number of CPU cores |
host_cpu_model |
CPU model name |
host_processor_name |
Processor name |
host_machine |
Machine architecture |
host_os_name |
OS name (e.g., "Ubuntu") |
host_linux_version |
Linux distribution version |
host_mem_total |
Total system memory |
os.type |
OS type (e.g., "linux") |
sf_service |
Splunk service name |
sf_tags |
Splunk tags |
service.instance.id |
Service instance identifier |
serving.knative.dev/service |
Knative service |
serving.knative.dev/configuration |
Knative configuration |
serving.knative.dev/revision |
Knative revision |
serving.knative.dev/revisionUID |
Revision unique ID |
Next steps
After you set up data collection, the data populates built-in dashboards that you can use to monitor and troubleshoot your instances.
For more information on using built-in dashboards in Splunk Observability Cloud, see: