Senior Site Reliability Engineer

Adalat AI

Adalat AI

Software Engineering
Posted on Jan 23, 2026

Role Overview

Were hiring a Senior Site Reliability Engineer to own and scale the infrastructure behind our courtroom transcription platform. This is not a routine ops role - youll work on high-availability Kubernetes clusters, manage complex deployments with ArgoCD, and ensure reliability for a system processing sensitive, real-time data. Youll collaborate with a small team of elite builders and be the go-to expert for keeping our platform robust, secure, and fast.

Key Responsibilities

  • Deploy, manage, and optimize Kubernetes clusters in production environments.
  • Operate and maintain ArgoCD for GitOps-based deployments.
  • Troubleshoot and iron out performance, reliability, and scaling issues across our clusters.
  • Build and maintain observability (metrics, logging, alerting) to catch and resolve issues proactively.
  • Collaborate with backend and product teams to ensure smooth, reliable releases.
  • Define and enforce infrastructure best practices, focusing on security, scalability, and resilience.

Qualifications

  • 10+ years of experience in production infrastructure, reliability, or DevOps roles.
  • Proven experience deploying and managing Kubernetes clusters at scale.
  • Experience maintaining CI/CD with GitHub actions.
  • Hands-on expertise with ArgoCD (setup, tuning, troubleshooting).
  • Solid foundation in Linux systems, networking, and container internals.
  • Experience with monitoring/alerting stacks (Prometheus, Grafana, Loki, etc.).
  • Comfortable diving into complex problems and quickly stabilizing systems.

    Bonus:

  • Experience with GCP.
  • Contributions to open-source infrastructure or reliability tooling.