Skip to main content
Version: main

Controller Metrics

kro exposes Prometheus metrics for monitoring controller health and performance. These metrics are available at the metrics endpoint (default :8078/metrics), configurable via Helm values.

Metrics Stability

kro-specific metrics are in ALPHA and subject to change or removal without notice. Metric names, labels, and types may change between releases with no backwards-compatibility guarantees.

Only controller-runtime and workqueue metrics are considered STABLE.

Enabling Metrics

metrics:
service:
create: true
port: 8080
serviceMonitor:
enabled: true # If using Prometheus Operator
interval: 1m

Dynamic Controller Metrics

MetricTypeDescriptionStability
dynamic_controller_reconcile_totalCounterTotal number of reconciliations per GVRALPHA
dynamic_controller_reconcile_duration_secondsHistogramDuration of reconciliations per GVR in secondsALPHA
dynamic_controller_requeue_totalCounterTotal number of requeues per GVR and requeue typeALPHA
dynamic_controller_handler_errors_totalCounterTotal number of handler errors per GVRALPHA
dynamic_controller_queue_lengthGaugeCurrent length of the workqueueALPHA
dynamic_controller_gvr_countGaugeNumber of instance GVRs currently managed by the controllerALPHA
dynamic_controller_handler_count_totalGaugeNumber of active handlers by type (parent or child)ALPHA
dynamic_controller_handler_attach_totalCounterTotal number of handler attachments by typeALPHA
dynamic_controller_handler_detach_totalCounterTotal number of handler detachments by typeALPHA
dynamic_controller_informer_events_totalCounterTotal number of events processed by informers per GVR and event typeALPHA
dynamic_controller_informer_sync_duration_secondsHistogramDuration of informer cache sync per GVR in secondsALPHA
dynamic_controller_watch_countGaugeNumber of active informers managed by the WatchManagerALPHA
dynamic_controller_instance_watch_countGaugeNumber of active instance watchers by parent GVRALPHA
dynamic_controller_watch_request_countGaugeNumber of active watch requests by GVR and type (scalar/collection)ALPHA
dynamic_controller_route_totalCounterTotal events routed through the coordinator by GVRALPHA
dynamic_controller_route_match_totalCounterTotal events that matched at least one instance by GVRALPHA

Schema Resolver Metrics

MetricTypeDescriptionStability
schema_resolver_cache_hits_totalCounterTotal number of schema resolver cache hitsALPHA
schema_resolver_cache_misses_totalCounterTotal number of schema resolver cache missesALPHA
schema_resolver_cache_sizeGaugeCurrent number of entries in the schema resolver cacheALPHA
schema_resolver_cache_evictions_totalCounterTotal number of entries evicted from the schema resolver cacheALPHA
schema_resolver_api_call_duration_secondsHistogramDuration of API calls to fetch schemas in secondsALPHA
schema_resolver_singleflight_deduplicated_totalCounterTotal number of requests deduplicated by singleflightALPHA
schema_resolver_errors_totalCounterTotal number of schema resolution errorsALPHA

Runtime Metrics

MetricTypeDescriptionStability
runtime_node_eval_totalCounterTotal number of node evaluations during reconciliationALPHA
runtime_node_eval_duration_secondsHistogramDuration of node evaluations during reconciliation in secondsALPHA
runtime_node_eval_errors_totalCounterTotal number of node evaluation errorsALPHA
runtime_creation_totalCounterTotal number of runtimes createdALPHA
runtime_creation_duration_secondsHistogramDuration of runtime creation setup phase in secondsALPHA
runtime_node_ignored_check_totalCounterTotal number of node includeWhen checksALPHA
runtime_node_ignored_totalCounterTotal number of nodes skipped due to includeWhen conditionsALPHA
runtime_node_ready_check_totalCounterTotal number of node readyWhen checksALPHA
runtime_node_not_ready_totalCounterTotal number of times nodes were not ready and blocked reconciliationALPHA
runtime_collection_sizeHistogramNumber of items generated by forEach collection expansionALPHA

CEL Expression Metrics

MetricTypeDescriptionStability
cel_expr_eval_totalCounterTotal number of CEL expression evaluationsALPHA
cel_expr_eval_duration_secondsHistogramDuration of CEL expression evaluations in secondsALPHA

ResourceGraphDefinition Controller Metrics

All RGD controller metrics include the label name (the metadata.name of the ResourceGraphDefinition).

MetricTypeDescriptionStability
rgd_graph_build_totalCounterTotal number of RGD graph validations during reconciliationALPHA
rgd_graph_build_duration_secondsHistogramDuration of RGD graph validations in secondsALPHA
rgd_graph_build_errors_totalCounterTotal number of RGD graph validation errorsALPHA
rgd_state_transitions_totalCounterTotal number of RGD state transitions (additional labels: from, to)ALPHA
rgd_deletions_totalCounterTotal number of RGD deletionsALPHA
rgd_deletion_duration_secondsHistogramDuration of RGD deletions in secondsALPHA
rgd_graph_revision_issue_totalCounterTotal number of GraphRevision objects issued by the RGD controllerALPHA
rgd_graph_revision_wait_totalCounterTotal number of times the RGD controller waited for GraphRevision progressALPHA
rgd_graph_revision_resolution_totalCounterTotal number of GraphRevision resolution outcomesALPHA
rgd_graph_revision_registry_miss_totalCounterTotal number of times the RGD controller observed GraphRevisions ahead of the in-memory registryALPHA
rgd_graph_revision_gc_deleted_totalCounterTotal number of GraphRevision objects deleted by RGD garbage collectionALPHA
rgd_graph_revision_gc_errors_totalCounterTotal number of GraphRevision garbage collection errorsALPHA

Instance Controller Metrics

MetricTypeDescriptionStability
instance_state_transitions_totalCounterTotal number of instance state transitions per GVRALPHA
instance_reconcile_duration_secondsHistogramDuration of instance reconciliation in seconds per GVRALPHA
instance_reconcile_totalCounterTotal number of instance reconciliations per GVRALPHA
instance_reconcile_errors_totalCounterTotal number of instance reconciliation errors per GVRALPHA
instance_graph_resolution_success_totalCounterTotal number of successful graph resolutions during instance reconciliationALPHA
instance_graph_resolution_failures_totalCounterTotal number of graph resolution failures during instance reconciliationALPHA
instance_graph_resolution_pending_totalCounterTotal number of graph resolutions deferred due to pending revisionALPHA

GraphRevision Controller Metrics

MetricTypeDescriptionStability
graph_revision_compile_totalCounterTotal number of GraphRevision compile attempts by resultALPHA
graph_revision_compile_duration_secondsHistogramDuration of GraphRevision compile attempts in secondsALPHA
graph_revision_status_update_errors_totalCounterTotal number of GraphRevision status update failuresALPHA
graph_revision_activation_deferred_totalCounterTotal number of times GraphRevision activation was deferred until status persistence succeedsALPHA
graph_revision_finalizer_evictions_totalCounterTotal number of registry evictions triggered by GraphRevision finalizer cleanupALPHA

Revision Registry Metrics

MetricTypeDescriptionStability
graph_revision_registry_entriesGaugeCurrent number of GraphRevision entries in the in-memory registry by stateALPHA
graph_revision_registry_transitions_totalCounterTotal number of GraphRevision registry state transitionsALPHA
graph_revision_registry_evictions_totalCounterTotal number of GraphRevision registry evictionsALPHA

REST Client Metrics

Registered via the centralized pkg/metrics.Register(...) path. These fill gaps left by controller-runtime v0.16+, which stopped registering client-go latency, size, and retry histograms.

MetricTypeLabelsDescriptionStability
rest_client_request_duration_secondsHistogramverbRequest latency in secondsALPHA
rest_client_rate_limiter_duration_secondsHistogramverbClient-side rate limiter latency in secondsALPHA
rest_client_request_size_bytesHistogramverbRequest payload size in bytesALPHA
rest_client_response_size_bytesHistogramverbResponse payload size in bytesALPHA
rest_client_request_retries_totalCountercode, methodTotal number of request retriesALPHA

Controller Runtime Metrics

The RGD reconciler uses controller-runtime and exposes its standard metrics:

MetricTypeDescriptionStability
controller_runtime_reconcile_totalCounterTotal number of reconciliations per controllerSTABLE
controller_runtime_reconcile_time_secondsHistogramLength of time per reconciliation per controllerSTABLE
controller_runtime_reconcile_errors_totalCounterTotal number of reconciliation errors per controllerSTABLE
controller_runtime_terminal_reconcile_errors_totalCounterTotal number of terminal reconciliation errors per controllerSTABLE
controller_runtime_reconcile_panics_totalCounterTotal number of reconciliation panics per controllerSTABLE
controller_runtime_max_concurrent_reconcilesGaugeMaximum number of concurrent reconciles per controllerSTABLE
controller_runtime_active_workersGaugeNumber of currently used workers per controllerSTABLE

Workqueue Metrics

MetricTypeDescriptionStability
workqueue_adds_totalCounterTotal number of adds handled by workqueueSTABLE
workqueue_depthGaugeCurrent depth of workqueueSTABLE
workqueue_queue_duration_secondsHistogramHow long in seconds an item stays in workqueue before being requestedSTABLE
workqueue_work_duration_secondsHistogramHow long in seconds processing an item from workqueue takesSTABLE
workqueue_retries_totalCounterTotal number of retries handled by workqueueSTABLE
workqueue_longest_running_processor_secondsGaugeHow many seconds has the longest running processor for workqueue been runningSTABLE
workqueue_unfinished_work_secondsGaugeHow many seconds of work has been done that is in progress and hasn't been observed by work_durationSTABLE

Brought to you with ♥ by SIG Cloud Provider