Architecture
Kubently follows a modular, black-box architecture where each component exposes only its public interface while hiding implementation details. This design enables independent development, testing, and replacement of components without affecting the overall system. The system is LLM-agnostic, supporting multiple providers through the cnoe_agent_utils LLMFactory interface (currently including Google, Anthropic, and OpenAI).
System Overview
High-Level Architecture
Design Principles
- Black Box Modules: Each module exposes only its interface, hiding implementation
- Primitive-First: Everything flows through three core primitives: Command, Session, and Result
- Single Responsibility: Each module has one clear job that one person can maintain
- Interface Stability: APIs remain stable even if implementations change completely
- Replaceable Components: Any module can be rewritten using only its public API
Core Components
Kubently API (Horizontally Scalable)
The API service orchestrates debugging sessions and command execution across multiple pods.
Key Features:
- Horizontal Scaling: Multiple pods with Redis pub/sub distribution
- SSE Endpoint: Real-time executor streaming via Server-Sent Events
- A2A Support: Full A2A protocol implementation with tool call interception
- LLM Integration: Multiple LLM providers supported through LLMFactory
- Todo Management: Built-in todo tool for systematic troubleshooting
- Stateless Design: All state in Redis for perfect scaling
Endpoints:
GET /executor/stream- SSE connection for executorsPOST /debug/execute- Execute kubectl commandsPOST /debug/session- Create debugging sessionPOST /executor/results- Receive command resultsGET /health- Health check/a2a/*- A2A protocol endpoints (mounted on main port)
Performance:
- ~50ms command delivery via SSE
- Supports 1000+ commands/sec
- Unlimited API pod replicas
- Real-time streaming with tool call visibility
Kubently Executor (Per-Cluster)
SSE-connected component deployed in each target cluster for instant command execution.
Key Features:
- SSE Client: Instant command reception (no polling)
- Dynamic Whitelist: Configurable security modes
- Auto-reconnection: Resilient connection handling
- Token Authentication: Secure cluster identification
Security Modes (Configurable via Helm):
readOnly: Safe read operations only (default)extendedReadOnly: Includes auth/certificate operationsfullAccess: All operations (requires explicit acknowledgment)
RBAC Configuration:
- Fully customizable RBAC rules via Helm values
- Per-cluster security overrides supported
- Dynamic whitelist with runtime reloading
Performance:
- Instant command delivery via SSE
- Zero polling overhead
- Memory footprint < 128MB
- Automatic connection recovery
Redis (Pub/Sub + State)
Redis handles message distribution and state management for the entire system.
Usage:
- Pub/Sub Channels: Command distribution to executors
- Session State: Active debugging sessions with metadata
- Command Queues: Per-cluster command queues
- Results Cache: Command execution results
- TTL Management: Automatic cleanup of expired data
Channel Format:
executor-commands:{cluster_id} # Commands for specific executor
executor-results:{command_id} # Command results
Data Types Stored:
- Sessions: Active debugging sessions with metadata
- Command Queues: Pending commands for each cluster
- Results: Command execution results and history
- Agent Status: Health and connectivity information
Performance Characteristics:
- In-memory storage for sub-millisecond access
- Pub/Sub for real-time notifications
- TTL-based automatic cleanup
- Optional persistence for durability
Data Flow
Command Execution Flow (SSE-based)
Session Lifecycle
- Session Creation
- Client requests new session for cluster
- API validates cluster availability
- Session metadata stored in Redis
- Session ID returned to client
- Active Session
- Commands queued for execution
- Agent polls and executes commands
- Results cached with TTL
- Session activity tracked
- Session Cleanup
- Automatic expiration after inactivity
- Manual session closure
- Resource cleanup in Redis
- Agent notification of closure
Security Architecture
Authentication Layers
- API Authentication
- Bearer token authentication (X-API-Key header)
- OAuth 2.0/OIDC support
- API key validation
- Rate limiting per key
- Executor Authentication
- Unique tokens per cluster (Authorization: Bearer header)
- TLS support with cert-manager integration
- Automatic token generation if not provided
- Token rotation support
- Kubernetes RBAC
- Minimal required permissions
- Read-only access only
- Namespace-scoped when possible
Command Security
# Example RBAC for Kubently Agent
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubently-agent
rules:
- apiGroups: [""]
resources: ["pods", "nodes", "services", "endpoints", "events"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments", "replicasets", "daemonsets"]
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
resources: ["ingresses", "networkpolicies"]
verbs: ["get", "list", "watch"]
Network Security
- API Service: Exposed via LoadBalancer or Ingress
- Agents: Outbound connections only
- Redis: Internal cluster communication only
- Optional: Network policies for additional isolation
Performance Characteristics
Latency Targets
| Operation | Target | Typical |
|---|---|---|
| Session Creation | < 100ms | ~50ms |
| Command Queuing | < 50ms | ~20ms |
| Command Execution | < 500ms | ~200-300ms |
| Result Retrieval | < 50ms | ~10-20ms |
Throughput Targets
| Metric | Target | Tested |
|---|---|---|
| Concurrent Sessions | 100+ | 150+ |
| Commands/Second | 100+ | 200+ |
| API Requests/Second | 1000+ | 1500+ |
Resource Usage
| Component | Memory | CPU |
|---|---|---|
| API Service | 200-500MB | 0.5-1.0 cores |
| Agent | 50-100MB | 0.1-0.3 cores |
| Redis | 100-500MB | 0.2-0.5 cores |
Scalability
Horizontal Scaling
API Service:
- Stateless design enables easy horizontal scaling
- Load balancer distributes requests
- Session affinity not required
Agents:
- One agent per cluster (not horizontally scaled)
- Agent restarts handled gracefully
- No shared state between agents
Redis:
- Redis Cluster for horizontal scaling
- Redis Sentinel for high availability
- Read replicas for read-heavy workloads
Vertical Scaling
Memory Scaling:
- Redis memory scales with active sessions
- API memory scales with concurrent requests
- Agent memory remains constant
CPU Scaling:
- API CPU scales with request rate
- Agent CPU scales with command complexity
- Redis CPU scales with data operations
High Availability
API Service HA
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubently-api
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
spec:
containers:
- name: api
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 30
Redis HA
# Redis Sentinel configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-sentinel
data:
sentinel.conf: |
port 26379
dir /data
sentinel monitor mymaster redis-master 6379 2
sentinel auth-pass mymaster your-redis-password
sentinel down-after-milliseconds mymaster 5000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 10000
Agent HA
- Agents automatically reconnect on failure
- Command queues preserved during agent restarts
- Health monitoring with automatic recovery
Monitoring and Observability
Metrics
API Metrics:
- Request rate and latency
- Session creation/closure rates
- Command execution times
- Error rates by endpoint
Agent Metrics:
- Command execution success/failure rates
- Queue depth and processing time
- Connection health to API
- Resource utilization
Redis Metrics:
- Memory usage and hit rates
- Connection counts
- Command execution times
- Pub/Sub message rates
Logging
Structured Logging:
{
"timestamp": "2024-01-20T10:30:45Z",
"level": "INFO",
"service": "kubently-api",
"session_id": "sess_abc123",
"cluster_id": "prod-cluster",
"command": "get pods",
"execution_time_ms": 234,
"correlation_id": "trace-xyz789"
}
Distributed Tracing
- OpenTelemetry integration
- Correlation IDs across services
- Request flow visualization
- Performance bottleneck identification
Future Architecture Enhancements
Advanced Caching
- Command result caching
- Cluster state caching
- CDN integration for static resources
Plugin Architecture
- Custom command handlers
- Third-party integrations
- Extensible security policies