Skip to main content
Version: 0.9.0

Controller Metrics

kro exposes Prometheus metrics for monitoring controller health and performance. These metrics are available at the metrics endpoint (default :8078/metrics), configurable via Helm values.

Metrics Stability

kro-specific metrics (Dynamic Controller, Schema Resolver, REST Client) are in ALPHA and subject to change or removal without notice. Metric names, labels, and types may change between releases with no backwards-compatibility guarantees.

Only controller-runtime and workqueue metrics are considered STABLE.

Enabling Metrics

metrics:
service:
create: true
port: 8080
serviceMonitor:
enabled: true # If using Prometheus Operator
interval: 1m

Dynamic Controller Metrics

MetricTypeDescriptionStability
dynamic_controller_reconcile_totalCounterTotal number of reconciliations per GVRALPHA
dynamic_controller_reconcile_duration_secondsHistogramDuration of reconciliations per GVR in secondsALPHA
dynamic_controller_requeue_totalCounterTotal number of requeues per GVR and requeue typeALPHA
dynamic_controller_handler_errors_totalCounterTotal number of handler errors per GVRALPHA
dynamic_controller_queue_lengthGaugeCurrent length of the workqueueALPHA
dynamic_controller_gvr_countGaugeNumber of instance GVRs currently managed by the controllerALPHA
dynamic_controller_handler_count_totalGaugeNumber of active handlers by type (parent or child)ALPHA
dynamic_controller_handler_attach_totalCounterTotal number of handler attachments by typeALPHA
dynamic_controller_handler_detach_totalCounterTotal number of handler detachments by typeALPHA
dynamic_controller_informer_events_totalCounterTotal number of events processed by informers per GVR and event typeALPHA
dynamic_controller_informer_sync_duration_secondsHistogramDuration of informer cache sync per GVR in secondsALPHA

Schema Resolver Metrics

MetricTypeDescriptionStability
schema_resolver_cache_hits_totalCounterTotal number of schema resolver cache hitsALPHA
schema_resolver_cache_misses_totalCounterTotal number of schema resolver cache missesALPHA
schema_resolver_cache_sizeGaugeCurrent number of entries in the schema resolver cacheALPHA
schema_resolver_cache_evictions_totalCounterTotal number of entries evicted from the schema resolver cacheALPHA
schema_resolver_api_call_duration_secondsHistogramDuration of API calls to fetch schemas in secondsALPHA
schema_resolver_singleflight_deduplicated_totalCounterTotal number of requests deduplicated by singleflightALPHA
schema_resolver_errors_totalCounterTotal number of schema resolution errorsALPHA

Runtime Metrics

MetricTypeDescriptionStability
runtime_node_eval_totalCounterTotal number of node evaluations during reconciliationALPHA
runtime_node_eval_duration_secondsHistogramDuration of node evaluations during reconciliation in secondsALPHA
runtime_node_eval_errors_totalCounterTotal number of node evaluation errorsALPHA
runtime_creation_totalCounterTotal number of runtimes createdALPHA
runtime_creation_duration_secondsHistogramDuration of runtime creation setup phase in secondsALPHA
runtime_node_ignored_check_totalCounterTotal number of node includeWhen checksALPHA
runtime_node_ignored_totalCounterTotal number of nodes skipped due to includeWhen conditionsALPHA
runtime_node_ready_check_totalCounterTotal number of node readyWhen checksALPHA
runtime_node_not_ready_totalCounterTotal number of times nodes were not ready and blocked reconciliationALPHA
runtime_collection_sizeHistogramNumber of items generated by forEach collection expansionALPHA

CEL Expression Metrics

MetricTypeDescriptionStability
cel_expr_eval_totalCounterTotal number of CEL expression evaluationsALPHA
cel_expr_eval_duration_secondsHistogramDuration of CEL expression evaluations in secondsALPHA

ResourceGraphDefinition Controller Metrics

All RGD controller metrics include the label name (the metadata.name of the ResourceGraphDefinition).

MetricTypeDescriptionStability
rgd_graph_build_totalCounterTotal number of RGD graph validations during reconciliationALPHA
rgd_graph_build_duration_secondsHistogramDuration of RGD graph validations in secondsALPHA
rgd_graph_build_errors_totalCounterTotal number of RGD graph validation errorsALPHA
rgd_state_transitions_totalCounterTotal number of RGD state transitions (additional labels: from, to)ALPHA
rgd_deletions_totalCounterTotal number of RGD deletionsALPHA
rgd_deletion_duration_secondsHistogramDuration of RGD deletions in secondsALPHA

REST Client Metrics

Registered via the centralized pkg/metrics.Register(...) path. These fill gaps left by controller-runtime v0.16+, which stopped registering client-go latency, size, and retry histograms.

MetricTypeLabelsDescriptionStability
rest_client_request_duration_secondsHistogramverbRequest latency in secondsALPHA
rest_client_rate_limiter_duration_secondsHistogramverbClient-side rate limiter latency in secondsALPHA
rest_client_request_size_bytesHistogramverbRequest payload size in bytesALPHA
rest_client_response_size_bytesHistogramverbResponse payload size in bytesALPHA
rest_client_request_retries_totalCountercode, methodTotal number of request retriesALPHA

Controller Runtime Metrics

The RGD reconciler uses controller-runtime and exposes its standard metrics:

MetricTypeDescriptionStability
controller_runtime_reconcile_totalCounterTotal number of reconciliations per controllerSTABLE
controller_runtime_reconcile_time_secondsHistogramLength of time per reconciliation per controllerSTABLE
controller_runtime_reconcile_errors_totalCounterTotal number of reconciliation errors per controllerSTABLE
controller_runtime_terminal_reconcile_errors_totalCounterTotal number of terminal reconciliation errors per controllerSTABLE
controller_runtime_reconcile_panics_totalCounterTotal number of reconciliation panics per controllerSTABLE
controller_runtime_max_concurrent_reconcilesGaugeMaximum number of concurrent reconciles per controllerSTABLE
controller_runtime_active_workersGaugeNumber of currently used workers per controllerSTABLE

Workqueue Metrics

MetricTypeDescriptionStability
workqueue_adds_totalCounterTotal number of adds handled by workqueueSTABLE
workqueue_depthGaugeCurrent depth of workqueueSTABLE
workqueue_queue_duration_secondsHistogramHow long in seconds an item stays in workqueue before being requestedSTABLE
workqueue_work_duration_secondsHistogramHow long in seconds processing an item from workqueue takesSTABLE
workqueue_retries_totalCounterTotal number of retries handled by workqueueSTABLE
workqueue_longest_running_processor_secondsGaugeHow many seconds has the longest running processor for workqueue been runningSTABLE
workqueue_unfinished_work_secondsGaugeHow many seconds of work has been done that is in progress and hasn't been observed by work_durationSTABLE

Brought to you with ♥ by SIG Cloud Provider