Advanced Self-Service Platform & Infrastructure Engineering. MLOps, AI/LLM systems, Kubernetes, cloud-native architectures, and scalable infrastructure automation.
Production-grade Kubernetes platform for cloud/edge robotics engineering. Features KubeEdge for edge computing at the network periphery, ROS2 with multi-domain DDS networking, and ArgoCD-driven GitOps deployment. Implements eBPF-based Cilium CNI enabling seamless ROS2 FastDDS interoperability for UDP multicast discovery. Includes CLI tooling for dynamic edge node management, automated ROS2 deployment, comprehensive observability with real-time resource monitoring, and GitOps-first provisioning for reproducible infrastructure.
View on GitHub →Physics-based framework applying classical mechanics principles (Hamiltonian and Lagrangian formulations) to software project modeling. Enables predictive analysis of project trajectories through computational simulation, quantifies team momentum and project friction as measurable, optimizable values. Provides data-driven insights for project management and organizational scaling decisions. Bridges theoretical physics and practical project engineering with rigorous mathematical foundations.
View on GitHub →Kubernetes kopf-based operator for automated IPv4 DNS A record drift detection and state management using finite state machine modeling. Enables reproducible DNS infrastructure with complete audit trails and GitOps workflows. Features Grafana Cloud monitoring with comprehensive SLO tracking, CI/CD pipelines with GitHub Actions automation, and bare-metal microk8s cluster automation. Implements RBAC security models and production-grade reliability patterns for DNS operations at scale.
View on GitHub →Fully agentic, A2A (Agent-to-Agent)-compliant web crawler with Python, Flask, and modular async crawling logic. Exposes CLI and HTTP+JSON/JSON-RPC interfaces for flexible integration. Features Analyzer Agent (LLM-driven failure analysis with Claude/GPT integration) and GitHub Code Analysis Agent with seamless GitHub API integration. Demonstrates advanced agentic orchestration patterns with multi-agent collaboration, state persistence, and intelligent error handling across distributed crawling operations.
View on GitHub →MLOps-driven agentic platform orchestrating distributed AI agents for end-to-end web state capture, LLM-powered change detection, and RAG-enabled Playwright test generation. Achieves reproducible automated testing for complex web applications through intelligent agent coordination. Engineered robust infrastructure integrating PostgreSQL, MinIO object store, Qdrant vector database, Redis memory management, and Kafka for scalable MLOps workflows. Implements persistent artifact storage, semantic search capabilities, and real-time agent orchestration with Apache Kafka. Features GPU-accelerated LLM deployments with KV cache optimization for throughput maximization and multi-agent state management across distributed infrastructure.
View on GitHub →GPU acceleration with CUDA and MIG (Multi-Instance GPU) configurations. LLM/SLM deployments (HuggingFace, TensorFlow, PyTorch, LM Studio) with KV cache optimization for inference performance. RAG (Retrieval-Augmented Generation) systems with vector databases (Qdrant, pgvector) and semantic search. AI Agents and agentic orchestration with A2A (Agent-to-Agent) patterns. Production experience managing 500+ GPU nodes (8x H100s per node) with advanced inference optimization, quantization, and batching strategies. SageMaker, Conversational AI, and pre-trained model containerization.
AWS (EKS, EC2, Lambda, ECS Fargate, Aurora, RDS, S3, Route53, ECR, Auto Scaling, CloudWatch, Organizations automation), Azure (AKS, Database Services, Network Watcher, Log Analytics, Autoscaling), GCP (GKE, Cloud Run, Cloud Build, Autoscaling). Kubernetes cluster design, scaling, and operations with Helm, Kustomize, ArgoCD, Argo Workflows, KubeEdge, multi-tenancy patterns, namespace isolation, and cost optimization. Kubernetes operators (kopf-based), custom resource definitions, and RBAC security models.
Everything-as-Code culture with Terraform, Ansible, GitHub Actions, and CI/CD pipelines following 12 Factor Manifesto principles. GitOps workflows enabling reproducible, auditable cluster provisioning with full traceability. Flux CD and ArgoCD for declarative infrastructure. AWS Organizations automation and multi-account governance. Velero for backup/restore and disaster recovery. Reduced MTTR by 98.96% through comprehensive automation. VPC/subnet architecture, load balancing strategies, VPN/Wireguard configurations, and network segmentation across cloud providers.
PostgreSQL (with pgvector for semantic search), MySQL, and persistence layer optimization. Redis for KV optimization and memory efficiency. Apache Kafka for real-time agent communication and event streaming. RabbitMQ for message queuing. MinIO object store for artifact storage. S3, Kinesis, Firehose, and BigQuery for data pipelines. Qdrant and Pinecone for vector databases. Redshift for analytics and data warehousing. Parquet for columnar data storage. High-volume workload management exceeding 10 petabytes in traffic.
Datadog, Prometheus, Grafana, ELK Stack for comprehensive observability. CloudWatch for AWS monitoring. Distributed tracing, SLO/SLA monitoring, log aggregation, and alerting hierarchies. Lacework and GuardDuty for cloud security. OWASP ZAP for dynamic testing. DevSecOps practices and automated security scanning. Keycloak SSO with MFA (AWS IAM, Azure AD, Okta via OIDC/SAML). IAM/RBAC and Just-In-Time Access. ISO 27001, SOC 2 Type II, and NIST SP 800-53 readiness. CloudTrail auditing, WAF deployment, and Wireguard VPN.
Python (Django, Flask) for backend systems. Bash and Ansible for automation. Git for version control. Docker and container orchestration (DockerHub, ECR registries). Selenium and Appium for test automation. JMETER for performance testing. ROS2 and FastDDS for robotics. eBPF and Cilium CNI for advanced networking. TCP/IP, HTTP/MQTT protocols. Wireshark for network analysis. Advanced understanding of OSI model, BGP routing, L2/L3 networking for inter-cluster communication.
Aziz Kurbanov is a principal platform architect specializing in production-quality platform engineering, MLOps systems, AI infrastructure, and enterprise cloud-native solutions.
Architect of agentic AI systems, distributed robotics infrastructure, and self-service platform ecosystems. Led teams across security, infrastructure automation, observability, and quality engineering. Expertise spans advanced infrastructure patterns, GPU-accelerated LLM deployments, Kubernetes orchestration, GitOps automation, and building high-velocity engineering organizations.