Configure the Prometheus receiver to collect KServe metrics

Configure the Splunk Distribution of the OpenTelemetry Collector to send KServe metrics to Splunk Observability Cloud.

You can monitor the performance of KServe by configuring the Splunk Distribution of the OpenTelemetry Collector to send KServe predictor, queue proxy, and controller metrics to Splunk Observability Cloud.

This solution uses the Prometheus receiver to collect metrics from KServe, which exposes the following endpoints to publish Prometheus-compatible metrics:

  • Port 8443 with HTTPS/auth for secured KServe installations.

  • Port 8080 with HTTP for standard installations.

To use this data integration, KServe must be deployed in your Kubernetes cluster.
  1. Install the Collector for Kubernetes using Helm.
  2. To manually activate the Prometheus receiver for KServe predictor and queue proxy metrics, make the following changes to your Collector values.yaml configuration file.
    1. Add prometheus/kserve-predictors to the receivers section. For example:
      YAML
      agent:
        config:
          receivers:
            # KServe receiver for predictor and queue-proxy metrics
            prometheus/kserve-predictors:
              config:
                scrape_configs:
                # Job 1: Predictor metrics (port 8080)
                - job_name: kserve-predictor-models
                  scrape_interval: 30s
                  kubernetes_sd_configs:
                  - role: pod
                    namespaces:
                      names:
                      - default           # Add namespaces where models are deployed
                      # - production      # Add other namespaces as needed
                  relabel_configs:
                  # Only scrape pods with KServe InferenceService label
                  - action: keep
                    regex: (.+)
                    source_labels:
                    - __meta_kubernetes_pod_label_serving_kserve_io_inferenceservice
                  # Set scrape target to pod IP + port 8080
                  - replacement: $1:8080
                    source_labels:
                    - __meta_kubernetes_pod_ip
                    target_label: __address__
                  # Add inferenceservice label for filtering
                  - source_labels:
                    - __meta_kubernetes_pod_label_serving_kserve_io_inferenceservice
                    target_label: inferenceservice
                  metric_relabel_configs:
                  # Only keep inference latency metrics
                  - source_labels: [__name__]
                    regex: (request_.*_seconds.*)
                    action: keep
                
                # Job 2: Queue proxy metrics (port 9091)
                - job_name: kserve-queue-proxy
                  scrape_interval: 30s
                  kubernetes_sd_configs:
                  - role: pod
                    namespaces:
                      names:
                      - default           # Same namespaces as above
                      # - production
                  relabel_configs:
                  # Same label-based discovery
                  - action: keep
                    regex: (.+)
                    source_labels:
                    - __meta_kubernetes_pod_label_serving_kserve_io_inferenceservice
                  # Scrape port 9091 instead of 8080
                  - replacement: $1:9091
                    source_labels:
                    - __meta_kubernetes_pod_ip
                    target_label: __address__
                  # Add inferenceservice label
                  - source_labels:
                    - __meta_kubernetes_pod_label_serving_kserve_io_inferenceservice
                    target_label: inferenceservice
                  metric_relabel_configs:
                  # Only keep revision and queue metrics
                  - source_labels: [__name__]
                    regex: (revision_.*|queue_.*)
                    action: keep
    2. Add prometheus/kserve-predictors to the metrics pipeline of the service section. For example:
      YAML
      service:
            pipelines:
              metrics:
                receivers:
                - prometheus/kserve-predictors
  3. (Optional) To manually activate the Prometheus receiver for KServe controller (control plane health) metrics, make the following changes to your Collector values.yaml configuration file.
    1. Add prometheus/kserve-controller to the receivers section. For example:
      YAML
      agent:
        config:
          receivers:
            # Add controller receiver
            prometheus/kserve-controller:
              config:
                scrape_configs:
                - job_name: kserve-controller
                  scheme: https
                  scrape_interval: 30s
                  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  tls_config:
                    insecure_skip_verify: true
                  kubernetes_sd_configs:
                  - role: pod
                    namespaces:
                      names:
                      - kserve    # Controller namespace
                  relabel_configs:
                  - source_labels: [__meta_kubernetes_pod_label_control_plane]
                    regex: kserve-controller-manager
                    action: keep
                  - source_labels: [__meta_kubernetes_pod_ip]
                    target_label: __address__
                    replacement: $1:8443
                  metric_relabel_configs:
                  - source_labels: [__name__]
                    regex: (controller_reconcile_.*|workqueue_.*|controller_runtime_.*)
                    action: keep
    2. Add prometheus/kserve-controller to the metrics pipeline of the service section. For example:
      YAML
      service:
            pipelines:
              metrics:
                receivers:
                - prometheus/kserve-predictors
                - prometheus/kserve-controller
  4. To receive histogram metrics as native histograms in Splunk Observability Cloud, make the following changes to your Collector values.yaml configuration file.
    Note:
    The send_otlp_histograms: true setting is required to use histogram functions in Splunk Observability Cloud and enables histograms to be received as a single metric. If this setting is not enabled, histogram metrics are received as multiple metrics separated by their function type. For example, the request_predict_seconds histogram metric would be received as the following metrics:
    • request_predict_seconds_bucket
    • request_predict_seconds_count

    • request_predict_seconds_sum

    For more information on histogram metrics, see Histogram metrics in Splunk Observability Cloud.

    1. Add send_otlp_histograms: true to the signalfx or otlphttp/splunk exporter section. For example:
      YAML
      agent:
        config:
          exporters:
            signalfx:
              send_otlp_histograms: true    
            # Or for OTLP exporter:
            otlphttp/splunk:
              send_otlp_histograms: true
    2. Add the signalfx or otlphttp/splunk exporter to the metrics pipeline of the service section. For example:
      YAML
      service: 
       pipelines: 
         metrics: 
           receivers: [receiver_creator, prometheus/kserve-predictors, prometheus/kserve-controller] 
           exporters: [signalfx, otlphttp/splunk]
  5. Restart the Splunk Distribution of the OpenTelemetry Collector.

Configuration settings

To view the configuration options for the Prometheus receiver, see Settings.

Metrics

The following metrics are available for KServe.

These metrics are considered custom metrics in Splunk Observability Cloud.

KServe predictor metrics:
Metric name Metric type Description
request_predict_seconds histogram

Time taken in seconds for actual model inference (core prediction).

Can be used to calcuate P95 latency, average latency, and request rate.

request_preprocess_seconds histogram

Time taken in seconds to preprocess input data before inference. Can be used to:

  • Identify preprocessing bottlenecks.

  • Compare preprocessing vs. prediction time.

  • Optimize your data transformation pipeline.

request_postprocess_seconds histogram

Time taken in seconds to format prediction results for response.

Can be used to:

  • Identify postprocessing bottlenecks.

  • Compare postprocessing overhead.

  • Optimize response formatting.

request_explain_seconds histogram

Time taken in seconds for model explanation/interpretation.

Can be used to:

  • Monitor explainability overhead.

  • Track explanation request rate.

  • Optimize explainer performance.

KServe queue proxy metrics:
Metric name Metric type Description
revision_app_request_count counter

Total count of requests that reached the application container.

Can be used to:

  • Calculate request rate.

  • Calculate traffic distribution by revision.

  • Identify dropped requests (by comparing with revision_request_count).

revision_app_request_latencies histogram

End-to-end latency in seconds for requests to the application.

Can be used to:

  • Calculate total request latency (by including preprocess + predict + postprocess).

  • Monitor overhead by comparing with request_predict_seconds.

  • Monitor P95/P99 latency.

revision_request_count counter

Total count of requests received by the revision (includes queued).

Can be used to:

  • Monitor total traffic received (before queuing).

  • Detect queue drops (by comparing with revision_app_request_count).

  • Monitor ingress traffic.

revision_request_latencies histogram

Total request latency in seconds, including queue proxy processing.

Can be used to:

  • Monitor complete latency, including queuing time.

  • Identify queuing delays (by comparing with revision_app_request_latencies).

  • Monitor autoscaling impact on latency.

queue_operations_per_second gauge

Current rate of queue operations per second. Can be used to:

  • Monitor queue throughput.

  • Identify queue saturation.

  • Monitor the autoscaling trigger metric.

queue_depth gauge

Current number of requests in queue.

Can be used to:

  • Identify queue backlog.

  • Monitor autoscaling effectiveness.

  • Alert on queue saturation.

Attributes

The following resource attributes are available for KServe metrics.
Attribute name Description
model_name Model identifier (e.g., "sklearn-iris", "tensorflow-mnist"). Used for per-model filtering.
inferenceservice InferenceService name (e.g., "sklearn-iris")
configuration_name Knative configuration name (e.g., "sklearn-iris-predictor")
revision_name Knative revision name (e.g., "sklearn-iris-predictor-00001")
service_name Knative service name
component Component type (e.g., "predictor", "transformer", "explainer")
response_code HTTP status code (e.g., "200", "400", "500"). Used for error filtering.
response_code_class HTTP status class (e.g., "2xx", "4xx", "5xx")
method HTTP method (e.g., "POST", "GET")
k8s.cluster.name Cluster name
k8s.namespace.name Namespace (e.g., "default", "production")
k8s.pod.name Full pod name with hash
k8s.pod.uid Pod unique identifier
k8s.node.name Kubernetes node name
k8s.deployment.name Deployment name
k8s.replicaset.name ReplicaSet name
container_name Container name (e.g., "kserve-container", "queue-proxy")
pod_name Pod name (alternative to k8s.pod.name)
namespace Namespace (alternative to k8s.namespace.name)
namespace_name Namespace (another variant)
host_name Host name
host_kernel_name Kernel name (e.g., "Linux")
host_kernel_release Kernel version
host_physical_cpus Number of physical CPUs
host_logical_cpus Number of logical CPUs
host_cpu_cores Number of CPU cores
host_cpu_model CPU model name
host_processor_name Processor name
host_machine Machine architecture
host_os_name OS name (e.g., "Ubuntu")
host_linux_version Linux distribution version
host_mem_total Total system memory
os.type OS type (e.g., "linux")
sf_service Splunk service name
sf_tags Splunk tags
service.instance.id Service instance identifier
serving.knative.dev/service Knative service
serving.knative.dev/configuration Knative configuration
serving.knative.dev/revision Knative revision
serving.knative.dev/revisionUID Revision unique ID