This site uses cookies. To find out more, see our Cookies Policy

Site Reliability Engineer in Charlotte, NC at Vaco

Date Posted: 4/10/2019

Job Snapshot

Job Description

**One of our clients is building out a team of Site Reliability Engineers for all non-production environments. The biggest thing we're looking for is candidates who have top to bottom knowledge of how web applications operates. Will trace, monitor, identify issues from front-end web apps all the way down to web servers. Have a passion for automation and bringing higher level reliability and stability to web apps.**

The ideal candidate should have hands on experience learning, triaging (both proactive and reactive) and documenting application stacks, using monitoring tools (Splunk, AppDynamics, UI-session replay, Sentry, and/or others) and have expert-level proficiency in at least one area such as content delivery, application development (Java, JavaScript), networking or infrastructure. They should understand web traffic movement through all layers of infrastructure including F5 load balancers and firewalls.

The SRE will partner with application development and API teams to gain understanding of the application stacks, triage environment issues, design monitoring methods, and provide reporting to executive leadership Will lead a small tactical team which will be the single point of contact for our Agile development and product teams regarding all non-production environment issues.

Job Responsibilities
* Partner with the Agile development teams to learn and assume responsibility for documentation, logging, and monitoring for various systems
* Partner with DevOps on CI/CD improvements using Bitbucket, Jenkins, & OpenShift
* Implementation of monitoring on various online applications using solutions such as Splunk, UI-session replay, AppDynamics, etc. and ability to determine the right toolset to accomplish monitoring goals on net new application stacks
* Strong knowledge of custom alerts and ability to integrate data housed in disparate data sources to create workflow driven alerting
* Have understanding of administration of application servers like Node.js, NGINX, JBoss, Apache, Spring Boot, etc.
* Continuously tune and validate quality of current tools for network, system monitoring, UI-session replay, log file parsing, and implement a toolkit that works
* Assist in vulnerability scanning, RCA proposals for defects in Scrum team backlogs
* Participate in routine Agile and Scrum ceremonies