Controller Metrics
kro exposes Prometheus metrics for monitoring controller health and performance. These metrics are available at the metrics endpoint (default :8078/metrics), configurable via Helm values.
kro-specific metrics are in ALPHA and subject to change or removal without notice. Metric names, labels, and types may change between releases with no backwards-compatibility guarantees.
Only controller-runtime and workqueue metrics are considered STABLE.
Enabling Metrics
metrics:
service:
create: true
port: 8080
serviceMonitor:
enabled: true # If using Prometheus Operator
interval: 1m
Dynamic Controller Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
dynamic_controller_reconcile_total | Counter | Total number of reconciliations per GVR | ALPHA |
dynamic_controller_reconcile_duration_seconds | Histogram | Duration of reconciliations per GVR in seconds | ALPHA |
dynamic_controller_requeue_total | Counter | Total number of requeues per GVR and requeue type | ALPHA |
dynamic_controller_handler_errors_total | Counter | Total number of handler errors per GVR | ALPHA |
dynamic_controller_queue_length | Gauge | Current length of the workqueue | ALPHA |
dynamic_controller_gvr_count | Gauge | Number of instance GVRs currently managed by the controller | ALPHA |
dynamic_controller_handler_count_total | Gauge | Number of active handlers by type (parent or child) | ALPHA |
dynamic_controller_handler_attach_total | Counter | Total number of handler attachments by type | ALPHA |
dynamic_controller_handler_detach_total | Counter | Total number of handler detachments by type | ALPHA |
dynamic_controller_informer_events_total | Counter | Total number of events processed by informers per GVR and event type | ALPHA |
dynamic_controller_informer_sync_duration_seconds | Histogram | Duration of informer cache sync per GVR in seconds | ALPHA |
dynamic_controller_watch_count | Gauge | Number of active informers managed by the WatchManager | ALPHA |
dynamic_controller_instance_watch_count | Gauge | Number of active instance watchers by parent GVR | ALPHA |
dynamic_controller_watch_request_count | Gauge | Number of active watch requests by GVR and type (scalar/collection) | ALPHA |
dynamic_controller_route_total | Counter | Total events routed through the coordinator by GVR | ALPHA |
dynamic_controller_route_match_total | Counter | Total events that matched at least one instance by GVR | ALPHA |
Schema Resolver Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
schema_resolver_cache_hits_total | Counter | Total number of schema resolver cache hits | ALPHA |
schema_resolver_cache_misses_total | Counter | Total number of schema resolver cache misses | ALPHA |
schema_resolver_cache_size | Gauge | Current number of entries in the schema resolver cache | ALPHA |
schema_resolver_cache_evictions_total | Counter | Total number of entries evicted from the schema resolver cache | ALPHA |
schema_resolver_api_call_duration_seconds | Histogram | Duration of API calls to fetch schemas in seconds | ALPHA |
schema_resolver_singleflight_deduplicated_total | Counter | Total number of requests deduplicated by singleflight | ALPHA |
schema_resolver_errors_total | Counter | Total number of schema resolution errors | ALPHA |
Runtime Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
runtime_node_eval_total | Counter | Total number of node evaluations during reconciliation | ALPHA |
runtime_node_eval_duration_seconds | Histogram | Duration of node evaluations during reconciliation in seconds | ALPHA |
runtime_node_eval_errors_total | Counter | Total number of node evaluation errors | ALPHA |
runtime_creation_total | Counter | Total number of runtimes created | ALPHA |
runtime_creation_duration_seconds | Histogram | Duration of runtime creation setup phase in seconds | ALPHA |
runtime_node_ignored_check_total | Counter | Total number of node includeWhen checks | ALPHA |
runtime_node_ignored_total | Counter | Total number of nodes skipped due to includeWhen conditions | ALPHA |
runtime_node_ready_check_total | Counter | Total number of node readyWhen checks | ALPHA |
runtime_node_not_ready_total | Counter | Total number of times nodes were not ready and blocked reconciliation | ALPHA |
runtime_collection_size | Histogram | Number of items generated by forEach collection expansion | ALPHA |
CEL Expression Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
cel_expr_eval_total | Counter | Total number of CEL expression evaluations | ALPHA |
cel_expr_eval_duration_seconds | Histogram | Duration of CEL expression evaluations in seconds | ALPHA |
ResourceGraphDefinition Controller Metrics
All RGD controller metrics include the label name (the metadata.name of the ResourceGraphDefinition).
| Metric | Type | Description | Stability |
|---|---|---|---|
rgd_graph_build_total | Counter | Total number of RGD graph validations during reconciliation | ALPHA |
rgd_graph_build_duration_seconds | Histogram | Duration of RGD graph validations in seconds | ALPHA |
rgd_graph_build_errors_total | Counter | Total number of RGD graph validation errors | ALPHA |
rgd_state_transitions_total | Counter | Total number of RGD state transitions (additional labels: from, to) | ALPHA |
rgd_deletions_total | Counter | Total number of RGD deletions | ALPHA |
rgd_deletion_duration_seconds | Histogram | Duration of RGD deletions in seconds | ALPHA |
rgd_graph_revision_issue_total | Counter | Total number of GraphRevision objects issued by the RGD controller | ALPHA |
rgd_graph_revision_wait_total | Counter | Total number of times the RGD controller waited for GraphRevision progress | ALPHA |
rgd_graph_revision_resolution_total | Counter | Total number of GraphRevision resolution outcomes | ALPHA |
rgd_graph_revision_registry_miss_total | Counter | Total number of times the RGD controller observed GraphRevisions ahead of the in-memory registry | ALPHA |
rgd_graph_revision_gc_deleted_total | Counter | Total number of GraphRevision objects deleted by RGD garbage collection | ALPHA |
rgd_graph_revision_gc_errors_total | Counter | Total number of GraphRevision garbage collection errors | ALPHA |
Instance Controller Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
instance_state_transitions_total | Counter | Total number of instance state transitions per GVR | ALPHA |
instance_reconcile_duration_seconds | Histogram | Duration of instance reconciliation in seconds per GVR | ALPHA |
instance_reconcile_total | Counter | Total number of instance reconciliations per GVR | ALPHA |
instance_reconcile_errors_total | Counter | Total number of instance reconciliation errors per GVR | ALPHA |
instance_graph_resolution_success_total | Counter | Total number of successful graph resolutions during instance reconciliation | ALPHA |
instance_graph_resolution_failures_total | Counter | Total number of graph resolution failures during instance reconciliation | ALPHA |
instance_graph_resolution_pending_total | Counter | Total number of graph resolutions deferred due to pending revision | ALPHA |
GraphRevision Controller Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
graph_revision_compile_total | Counter | Total number of GraphRevision compile attempts by result | ALPHA |
graph_revision_compile_duration_seconds | Histogram | Duration of GraphRevision compile attempts in seconds | ALPHA |
graph_revision_status_update_errors_total | Counter | Total number of GraphRevision status update failures | ALPHA |
graph_revision_activation_deferred_total | Counter | Total number of times GraphRevision activation was deferred until status persistence succeeds | ALPHA |
graph_revision_finalizer_evictions_total | Counter | Total number of registry evictions triggered by GraphRevision finalizer cleanup | ALPHA |
Revision Registry Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
graph_revision_registry_entries | Gauge | Current number of GraphRevision entries in the in-memory registry by state | ALPHA |
graph_revision_registry_transitions_total | Counter | Total number of GraphRevision registry state transitions | ALPHA |
graph_revision_registry_evictions_total | Counter | Total number of GraphRevision registry evictions | ALPHA |
REST Client Metrics
Registered via the centralized pkg/metrics.Register(...) path. These fill gaps left by controller-runtime v0.16+, which stopped registering client-go latency, size, and retry histograms.
| Metric | Type | Labels | Description | Stability |
|---|---|---|---|---|
rest_client_request_duration_seconds | Histogram | verb | Request latency in seconds | ALPHA |
rest_client_rate_limiter_duration_seconds | Histogram | verb | Client-side rate limiter latency in seconds | ALPHA |
rest_client_request_size_bytes | Histogram | verb | Request payload size in bytes | ALPHA |
rest_client_response_size_bytes | Histogram | verb | Response payload size in bytes | ALPHA |
rest_client_request_retries_total | Counter | code, method | Total number of request retries | ALPHA |
Controller Runtime Metrics
The RGD reconciler uses controller-runtime and exposes its standard metrics:
| Metric | Type | Description | Stability |
|---|---|---|---|
controller_runtime_reconcile_total | Counter | Total number of reconciliations per controller | STABLE |
controller_runtime_reconcile_time_seconds | Histogram | Length of time per reconciliation per controller | STABLE |
controller_runtime_reconcile_errors_total | Counter | Total number of reconciliation errors per controller | STABLE |
controller_runtime_terminal_reconcile_errors_total | Counter | Total number of terminal reconciliation errors per controller | STABLE |
controller_runtime_reconcile_panics_total | Counter | Total number of reconciliation panics per controller | STABLE |
controller_runtime_max_concurrent_reconciles | Gauge | Maximum number of concurrent reconciles per controller | STABLE |
controller_runtime_active_workers | Gauge | Number of currently used workers per controller | STABLE |
Workqueue Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
workqueue_adds_total | Counter | Total number of adds handled by workqueue | STABLE |
workqueue_depth | Gauge | Current depth of workqueue | STABLE |
workqueue_queue_duration_seconds | Histogram | How long in seconds an item stays in workqueue before being requested | STABLE |
workqueue_work_duration_seconds | Histogram | How long in seconds processing an item from workqueue takes | STABLE |
workqueue_retries_total | Counter | Total number of retries handled by workqueue | STABLE |
workqueue_longest_running_processor_seconds | Gauge | How many seconds has the longest running processor for workqueue been running | STABLE |
workqueue_unfinished_work_seconds | Gauge | How many seconds of work has been done that is in progress and hasn't been observed by work_duration | STABLE |