Site Reliability Engineer - Web and Cloud Technologies Long Term REMOTE possible

Job Description:

Key Responsibilities and Duties

  • Follow and implement SRE best practices, troubleshoot application and infrastructure issues.
  • Improve availability, performance and stability of the applications and platforms.
  • Build end to end monitoring by closely working with application / infrastructure partners.
  • Establish SLIs, SLOs, Error Budgets, and other SRE metrics to ensure the better reliability.
  • Maintain effective knowledge base and runbooks to bring faster resolution to production issues.
  • Automate first mindset to identify gaps / manual process and automate toil.
  • Communicate with stakeholders using strong written and verbal communication.
  • Constantly update personal technical and business knowledge and skills and mentor others to increase the knowledge and skills of the team.
  • Customer first mindset in resolving issue and building new products.

Required Skills:

  • 3+ years of hands-on experience in application and technical support role in live production environment following Development, DevOps, and SRE best practices

Preferred Skills:

  • Bachelor's degree in computer science or equivalent combination of education and experience.
  • Financial Services experience
  • 4+ years of hands-on experience with configuring and monitoring via tools such as Splunk, Dynatrace, ELK, ServiceNow, JavaScript Framework, etc
  • Strong automation / coding skills (preferably Python, Java, Java script, React) 
  • Experience using Machine learning features to improve and innovate operational processes

Share Profile