We are seeking a highly skilled and motivated Site Reliability Engineer to join our team at Kalibrr. As a Site Reliability Engineer, you will play a critical role in ensuring the availability, scalability, and reliability of our services. You will be responsible for the design, deployment, operation, and refinement of our systems, with a focus on automation and continuous improvement.
Responsibilities:
Engage in the entire lifecycle of our services, including design, deployment, operation, and refinement.
Practice incident response and participate in blameless postmortems to drive continuous improvement.
Take part in an on-call rotation to promptly address any system issues or emergencies.
Scale our systems and operations through the implementation of automation.
Monitor the availability, latency, and overall health of our services to ensure optimal performance.
Optimize our services and systems to improve efficiency and reliability.
Minimum Qualifications
Required Skills and Qualifications:
Strong experience with GCP, AWS, Helm, Kubernetes, Jenkins CI, and version control systems (Git).
Proficiency in Elasticsearch administration and configuration, as well as Postgres administration and configuration.
Familiarity with Kafka, Linux (or other *nix systems), Docker, Bash, and Python.
Experience in utilizing open-source libraries to meet project requirements.
Knowledge of disaster response and recovery procedures.
Expertise in monitoring tools such as Prometheus, Grafana, etc.
Ability to manage data backup processes effectively.
Your measure of success as a Site Reliability Engineer include, but are not exclusive to the following:
Feel secure when applying: look for the verified icon and always do your research on a company. Avoid and report situations when employers require payment or work without compensation as part of their application process.