Site Reliability Engineer
Description
We have built an industry leading Sales Enablement platform on managed cloud services. With a best-in-class cloud-based SaaS product, we are constantly looking to improve automation and analysis of our infrastructure health and reliability. As we continue to grow and scale our product our capabilities to drive operational readiness becomes more critical.
Our Site Reliability Engineering team is an integral member of a dynamic group focused on continuously improving our cloud deployment platform, "automating all the things", in support of our award-winning Enterprise SaaS solution. They are critical to understanding where improvements can be made across the software platform and infrastructure.
What's Expected:
Build, scale, and secure SaaS application infrastructure on multiple cloud providers
Establish policies and best practices for operational readiness and partner with development to ensure adoption
Ensure maintenance of production resource including load balancing and API gateways.
Work closely with developers during the deployment and testing phases to provide insight into operational, security, and performance considerations
Advocate and implement industry best practices for operational monitoring and analysis
Develop and maintain operational administration, system and data backup, disaster recovery, and security/performance monitoring policies and tools
Automate processes for log analysis and conduct root cause analysis to ensure resolution of operational errors
Establish adherence to production KPIs
Act as first responder to triage and analyze abnormalities in system operation
What you'll need for success:
Bachelor's degree in Computer Science or related field
5+ years' experience in a DevOps engineering, Site Reliability, or Systems Engineering role
2+ years' experience in managing cloud based or hybrid infrastructures
Experience with monitoring tools like New Relic, Data Dog, and Operations Management Suite tools
SQL database development and optimization skills preferred
Experience with Agile/Scrum, Continuous Integration/Delivery, Automated Deployment, and System monitoring
Knowledge of basic networking technology and concepts: TCP/UDP, SSL, HTTP, NAT
Understanding and passion for developing highly secure and highly available systems
Experience with cloud platforms Experience working with C# and .Net a plus