Controller Metrics
kro exposes Prometheus metrics for monitoring controller health and performance. These metrics are available at the metrics endpoint (default :8078/metrics), configurable via Helm values.
kro-specific metrics (Dynamic Controller, Schema Resolver, REST Client) are in ALPHA and subject to change or removal without notice. Metric names, labels, and types may change between releases with no backwards-compatibility guarantees.
Only controller-runtime and workqueue metrics are considered STABLE.
Enabling Metrics
metrics:
service:
create: true
port: 8080
serviceMonitor:
enabled: true # If using Prometheus Operator
interval: 1m
Dynamic Controller Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
dynamic_controller_reconcile_total | Counter | Total number of reconciliations per GVR | ALPHA |
dynamic_controller_reconcile_duration_seconds | Histogram | Duration of reconciliations per GVR in seconds | ALPHA |
dynamic_controller_requeue_total | Counter | Total number of requeues per GVR and requeue type | ALPHA |
dynamic_controller_handler_errors_total | Counter | Total number of handler errors per GVR | ALPHA |
dynamic_controller_queue_length | Gauge | Current length of the workqueue | ALPHA |
dynamic_controller_gvr_count | Gauge | Number of instance GVRs currently managed by the controller | ALPHA |
dynamic_controller_handler_count_total | Gauge | Number of active handlers by type (parent or child) | ALPHA |
dynamic_controller_handler_attach_total | Counter | Total number of handler attachments by type | ALPHA |
dynamic_controller_handler_detach_total | Counter | Total number of handler detachments by type | ALPHA |
dynamic_controller_informer_events_total | Counter | Total number of events processed by informers per GVR and event type | ALPHA |
dynamic_controller_informer_sync_duration_seconds | Histogram | Duration of informer cache sync per GVR in seconds | ALPHA |
Schema Resolver Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
schema_resolver_cache_hits_total | Counter | Total number of schema resolver cache hits | ALPHA |
schema_resolver_cache_misses_total | Counter | Total number of schema resolver cache misses | ALPHA |
schema_resolver_cache_size | Gauge | Current number of entries in the schema resolver cache | ALPHA |
schema_resolver_cache_evictions_total | Counter | Total number of entries evicted from the schema resolver cache | ALPHA |
schema_resolver_api_call_duration_seconds | Histogram | Duration of API calls to fetch schemas in seconds | ALPHA |
schema_resolver_singleflight_deduplicated_total | Counter | Total number of requests deduplicated by singleflight | ALPHA |
schema_resolver_errors_total | Counter | Total number of schema resolution errors | ALPHA |
Runtime Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
runtime_node_eval_total | Counter | Total number of node evaluations during reconciliation | ALPHA |
runtime_node_eval_duration_seconds | Histogram | Duration of node evaluations during reconciliation in seconds | ALPHA |
runtime_node_eval_errors_total | Counter | Total number of node evaluation errors | ALPHA |
runtime_creation_total | Counter | Total number of runtimes created | ALPHA |
runtime_creation_duration_seconds | Histogram | Duration of runtime creation setup phase in seconds | ALPHA |
runtime_node_ignored_check_total | Counter | Total number of node includeWhen checks | ALPHA |
runtime_node_ignored_total | Counter | Total number of nodes skipped due to includeWhen conditions | ALPHA |
runtime_node_ready_check_total | Counter | Total number of node readyWhen checks | ALPHA |
runtime_node_not_ready_total | Counter | Total number of times nodes were not ready and blocked reconciliation | ALPHA |
runtime_collection_size | Histogram | Number of items generated by forEach collection expansion | ALPHA |
CEL Expression Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
cel_expr_eval_total | Counter | Total number of CEL expression evaluations | ALPHA |
cel_expr_eval_duration_seconds | Histogram | Duration of CEL expression evaluations in seconds | ALPHA |
ResourceGraphDefinition Controller Metrics
All RGD controller metrics include the label name (the metadata.name of the ResourceGraphDefinition).
| Metric | Type | Description | Stability |
|---|---|---|---|
rgd_graph_build_total | Counter | Total number of RGD graph validations during reconciliation | ALPHA |
rgd_graph_build_duration_seconds | Histogram | Duration of RGD graph validations in seconds | ALPHA |
rgd_graph_build_errors_total | Counter | Total number of RGD graph validation errors | ALPHA |
rgd_state_transitions_total | Counter | Total number of RGD state transitions (additional labels: from, to) | ALPHA |
rgd_deletions_total | Counter | Total number of RGD deletions | ALPHA |
rgd_deletion_duration_seconds | Histogram | Duration of RGD deletions in seconds | ALPHA |
REST Client Metrics
Registered via the centralized pkg/metrics.Register(...) path. These fill gaps left by controller-runtime v0.16+, which stopped registering client-go latency, size, and retry histograms.
| Metric | Type | Labels | Description | Stability |
|---|---|---|---|---|
rest_client_request_duration_seconds | Histogram | verb | Request latency in seconds | ALPHA |
rest_client_rate_limiter_duration_seconds | Histogram | verb | Client-side rate limiter latency in seconds | ALPHA |
rest_client_request_size_bytes | Histogram | verb | Request payload size in bytes | ALPHA |
rest_client_response_size_bytes | Histogram | verb | Response payload size in bytes | ALPHA |
rest_client_request_retries_total | Counter | code, method | Total number of request retries | ALPHA |
Controller Runtime Metrics
The RGD reconciler uses controller-runtime and exposes its standard metrics:
| Metric | Type | Description | Stability |
|---|---|---|---|
controller_runtime_reconcile_total | Counter | Total number of reconciliations per controller | STABLE |
controller_runtime_reconcile_time_seconds | Histogram | Length of time per reconciliation per controller | STABLE |
controller_runtime_reconcile_errors_total | Counter | Total number of reconciliation errors per controller | STABLE |
controller_runtime_terminal_reconcile_errors_total | Counter | Total number of terminal reconciliation errors per controller | STABLE |
controller_runtime_reconcile_panics_total | Counter | Total number of reconciliation panics per controller | STABLE |
controller_runtime_max_concurrent_reconciles | Gauge | Maximum number of concurrent reconciles per controller | STABLE |
controller_runtime_active_workers | Gauge | Number of currently used workers per controller | STABLE |
Workqueue Metrics
| Metric | Type | Description | Stability |
|---|---|---|---|
workqueue_adds_total | Counter | Total number of adds handled by workqueue | STABLE |
workqueue_depth | Gauge | Current depth of workqueue | STABLE |
workqueue_queue_duration_seconds | Histogram | How long in seconds an item stays in workqueue before being requested | STABLE |
workqueue_work_duration_seconds | Histogram | How long in seconds processing an item from workqueue takes | STABLE |
workqueue_retries_total | Counter | Total number of retries handled by workqueue | STABLE |
workqueue_longest_running_processor_seconds | Gauge | How many seconds has the longest running processor for workqueue been running | STABLE |
workqueue_unfinished_work_seconds | Gauge | How many seconds of work has been done that is in progress and hasn't been observed by work_duration | STABLE |