Site Reliability Engineer (SRE). Loc: Irving, TX Dur: one year
Job Description:
Site Reliability Engineer (SRE).
Loc: Irving, TX
Dur: one year
As SRE, you will be an integral member of a dynamic team continuously improving our Enterprise CI/CD platform, automating all the things, in support of our rapidly expanding portfolio. As a Cloud Engineer and architect, you will be responsible for developing and implementing a standard model for Cloud migration and operations. This is a role for someone from a strong development background who also specializes in AWS, automation and tooling. You need to be equally comfortable writing application code, managing infrastructure-as-code, or architecting Cloud applications for high availability.As part of the platform engineering team we hold ourselves accountable to keep our systems up and running to ensure our business partners have the best experience. You will be responsible for scaling and optimizing the reliability, availability, and performance of our infrastructure and platform services, and partnering with developers to build highly available and performant services. Our SREs intellectual curiosity, and problem solving is key to its success.
Lead initiatives to continuously refine our on-premise and AWS deployment practices for improved reliability, repeatability and security.
Youll create plans, collaborate with other DevOps team members, and coordinate with development and business teams. These high-visibility initiatives will help to increase service levels, lower costs, and deliver features more quickly.
- Owning end-to-end availability and performance of mission critical services and building monitoring and automation to prevent problem recurrence; automating response to all non-exceptional service conditions.
- Write code and scripts to fully automate the migration of applications from on premise to AWS.
- Design effective monitoring / alerting and log aggregation approaches to proactively notify business stakeholders of issues and communicate metrics, working closely with these stakeholders, using tools including AWS CloudWatch, New Relic, etc.
- Configure build pipelines to support automated testing and deployment using CloudFormation and Ansible.
- Develop and implement a standard model of Cloud migration and operations for the portfolio.
- Identifying opportunities to improve current platform, enhancing current tool chain or identifying new tools to POC.
- Writing code for the continuing reduction of human intervention in operational tasks and automation of processes.
- Redefining governance models around the automation tools that allow for their use throughout the enterprise.
- Building automation to deliver metrics reports.
- Communicating continuously with management and peers about the status and progress of ongoing projects.
- Handle deployment and operations of Cloud enabled applications and services in AWS and Private Cloud infrastructure.
- Utilize DevOps methodologies and work with application developers and operations to guide the development and implementation of Cloud applications, systems and processes.
- Deploy and orchestrate Docker and Kubernetes containers on a Cloud platform.
- Employ DevOps and agile principles; utilize Jira, Jenkins and Ansible to enable CI / CD of various cloud applications.
- Build AWS JSON templates.
- Monitor and support day to day operations of Cloud and legacy applications including tools such as New Relic, Splunk, DataDog, Cloudwatch, IPM, etc.
- Develop automation using scripting languages.
- Handle critical operational tasks as well as on demand requests.
Required:
- Bachelors degree or four or more years of work experience.
- Four or more years of relevant work experience.
- Experience with Amazon Web Services (AWS) technologies:
- Cloud Formation, EC2, S3, EMR, Autoscale, Cloudwatch.
- Knowledge of one of the container technologies (Docker/Kubernetes).
- Experience with CI / CD using JIRA, Jenkins, and Ansible.
- Strong UNIX, Linux and Databases skills.
- Experience with any scripting language such as BASH, Perl, Ruby or Python. Java programming experience.
- Experience providing production operations support and 24/7 support.