Senior Infrastructure Site Reliability Engineer

Crisis Text Line

Crisis Text Line

Software Engineering, Other Engineering
Remote
Posted on May 8, 2024

Overview:

Crisis Text Line provides free, 24/7, high-quality text-based mental health support and crisis intervention by empowering a community of trained volunteers to support people in their moments of need.

Our mission is at the intersection of empathy and innovation — we promote mental well-being for people wherever they are.

Our vision is an empathetic world where nobody feels alone.

Our core values are at the heart of all we do: connect with empathy, center equity, get it done together, and reflect and evolve.

Why you should join our team:

Our work is transforming the way people in pain access support at their fingertips

Our work is innovative in the crisis response space

Our dynamic, fun, and diverse culture

Our meaningful cause, led by empathy and innovation

Our strong values at the center of all we do

Our commitment to diversity, equity and inclusion

Our commitment to engagement and belonging

Our commitment to our employees and their holistic wellbeing

Our value of work/life balance

Our growth mindset and prioritize professional development

Our leaders who truly care

What you'll be doing:

At Crisis Text Line, the engineering, product, and design teams are commonly referred to as Build. The vision of the team is to:

  • Deliver the most trusted, innovative, and easy-to-use Crisis Care Platform in the industry and drive unprecedented levels of growth for people in need worldwide.
  • Ensure that every user feels a sense of community on our platform, allowing us to build trust, and grow our impact.
  • Allow our volunteers to spend all their time supporting people in need in an environment with few constraints, and minimal time searching for supporting information, resources, or support.
  • Provide a services/API-first architecture based on federated sources of data and infuses predictive insights (ML and otherwise) in every aspect of our Platform and Experience.

Role:
As a Site Reliability Engineer (SRE) at Crisis Text Line, you will be responsible for designing, implementing, and maintaining our cloud infrastructure to ensure optimal performance, availability, and security. You will work closely with our engineering and operations teams to streamline our deployment processes, enhance our monitoring and alerting systems, and drive continuous improvements to our platform reliability. This role offers an exciting opportunity to leverage your expertise in AWS Fargate, CloudWatch alerting, and monitoring to support our mission-critical applications and services.

Responsibilities:

  • Lead, and maintain highly available, scalable, and secure infrastructure on AWS Fargate.
  • Design and maintain CloudWatch alerting and monitoring configurations to proactively identify and resolve potential issues.
  • Mentor and guide junior team members, sharing best practices and promoting a culture of excellence.
  • Collaborate with cross-functional teams to define and implement best practices for infrastructure as code (IaC), continuous integration/continuous deployment (CI/CD), and site reliability engineering (SRE) methodologies.
  • Lead in incident response and resolution, including troubleshooting complex system issues and implementing preventive measures to minimize downtime.
  • Automate repetitive tasks and processes to improve operational efficiency and reduce manual intervention.
  • Conduct performance tuning and optimization of infrastructure components to ensure optimal resource utilization and cost efficiency.
  • Stay up-to-date with emerging technologies and industry trends to drive innovation and continuous improvement.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related field (Master's degree preferred) or equivalent experience.
  • Experience in site reliability engineering (SRE) or related roles, with a focus on cloud infrastructure management.
  • Hands-on experience with AWS services, particularly AWS Fargate, CloudWatch, and related tools.
  • Proficiency with infrastructure as code (IaC) tools such as Terraform or CloudFormation.
  • Strong scripting and automation skills using languages such as Python, Bash, or PowerShell.
  • Experience with container orchestration platforms such as Kubernetes or Amazon ECS.
  • Solid understanding of networking concepts, security best practices, and DevOps principles.
  • Strong problem-solving skills and the ability to work effectively in a fast-paced, collaborative environment.
  • AWS certifications (e.g., AWS Certified Solutions Architect, AWS Certified DevOps Engineer) are a plus.

Reliable High-Speed Internet Required: Must have a stable high-speed internet connection to support seamless remote collaboration, virtual meetings, online job tasks, etc.

The full salary range for this position, across all United States geographies, is $107,000-$162,000 per year. The upper portion of the salary range is typically reserved for existing employees who demonstrate strong performance over time. Starting salary will vary by location, qualifications, and prior experience; during the interview process, candidates will learn the starting salary range applicable for their location. We pay competitively in the tech-forward nonprofit space and offer a robust benefits package.

Only candidates in the following states will be eligible for employment: CA, CO, CT, FL, GA, HI, IL, IN, IA, MD, MA, MI, MN, MO, NJ, NM, NY, NC, OH, PA, TN, TX, UT, VA, WA.

Benefits:

Crisis Text Line employee benefits are thoughtfully designed using an equity lens, acknowledging that we are all unique human beings with individual life circumstances that require flexibility and support.

Benefits include:

  • 20 paid holidays including:
    • Federal holidays like Juneteenth and Labor Day
    • Election day
    • Holiday break from Dec 24 through January 1
    • 2 renewal days
    • 2 floating holidays
  • Flexible paid time off, including:
    • 15 vacation days
    • 3 personal days
    • 7 sick days
  • Medical, dental, and vision benefits for the staff member and family at no cost to the employee
  • 403B retirement plan (the nonprofit equivalent of a 401K): 3% contribution by Crisis Text Line to support building financial wellness, regardless of personal contribution
  • 12 weeks paid parental leave (after 6 months of employment)
  • Student loan repayment (after 2 years of continuous full time service)
  • Family support through a virtual childcare platform
  • Stipends/Allowances
    • Mental health (Monthly)
    • Internet Service (Monthly)
    • Professional Development (Annual)
    • Wellness (Annual)
    • Home office setup (One time/First year)

(Benefits are only for US-based employees. International benefits may differ).

Crisis Text Line is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. We provide reasonable accommodation to individuals who have a disability and meet the skill, experience, education, and other job-related requirements of the role to allow the individual to perform the essential functions of the job.