Site Reliability Engineer, NOC Location: Seattle, Washington, United States

SEATTLE, WA

100,000 - 200,000

Job Description:

Job Description

Job Posting Description

If you want to be a part of a team of engineers that monitors and troubleshoots client??s infrastructure 24x7 and is the Eyes-on-Glass first point of contact for all issues, and will work closely with our Incident Response and Site Reliability teams, and thrive in an intense, fast paced, highly visible environment then we should talk.

Responsibilities:

  • Providing Tier 1 support for application and infrastructure issues across the enterprise
  • Monitoring, triaging, and coordinating incident response when service failures, infrastructure issues, or deployment issues occur
  • Hands on analysis and troubleshooting of production
  • Identifying, defining, and building improvements to support tools, processes, and the service itself
  • Improve customer experience with delivering new service monitoring, alarming and scripting
  • You own this if you have...
  • Familiarity with site and infrastructure monitoring systems (like AWS Cloudwatch, Datadog)
    UNIX/LINUX sysOps tasks, including expertise in administration, monitoring, troubleshooting, performance tuning, preventative maintenance and capacity planning.
    Networking (TCP/IP, routing, network topologies and hardware, SDN, etc).
    Broad understanding of large scale system architecture, automation, integration, and processes
    Ability to debug and optimize code and to automate routine tasks.
    Ability to work night/weekend shifts
    4+ year of work experience with production Linux systems administration
    2+ years with configuration management, source control and containerization tools
    2+ year of work experience managing Cloud based infrastructure and automation
    2+ year of experience with at least one scripting language ( eg Bash, Python, Ruby, Go )
    Motivated, critical thinker with proven skills to troubleshoot and solve problems in a production support environment
    Ability to successfully manage competing priorities in critical incident situations
    Strong desire to learn and understand new technologies
    Excellent verbal and written communication skills
    Experience working with ITIL and Service Management best practices is a plus.

Share Profile