Job description
Our company's machine-learning solution empowers retailers with the relevant themes and categories they should feature in today's campaigns while continuously learning to inform the campaigns of tomorrow. By creating an optimized email diet that caters to each customer's evolving tastes and moods, Coherent Path's software helps retailers quickly engage with and cross-sell to customers and promote strategic product categories while reducing email fatigue. Coherent Path has offices in Boston and Toronto.
We are looking for a talented Senior Site Reliability Engineer to take ownership of all things infrastructure and deliver a highly scalable, performant, and available platform that our portfolio of applications can rely on. There's lots to do and lots to learn, so we hope you are also a fast learner who can grow with Coherent Path as we build the future of email marketing!
Responsibilities
- Own SLOs/SLIs across all services and applications to provide metrics to the development teams and facilitate continuous improvement
- Work together with the engineering team to improve CI/CD pipeline with a focus on successful deployment of services and applications
- Drive improvements in the infrastructure, ensure that all the infrastructure can be consistently reproduced with Terraform
- Maintain Incident Playbooks and ensure that a consistent process is followed to guarantee a rapid response
- Enforce regular Infrastructure Security Audits, drive automation where appropriate
- Continuously improve user experience as it relates to deployment and delivery
- Optimize Production and lower environments and infrastructure through monitoring and automation
- Drive platform management and capacity planning discussions
- Assist with setup and deployment of new services as needed
- Relentlessly eliminate false positive alerts
- Perform application load testing/scalability
- Participate in an on-call rotation to provide rapid response to critical issues in production
Requirements
- 3-5 years in a SRE or related role
- Intellectual curiosity and a strong desire to learn
- Problem solving skills, including the ability to disaggregate complex problems and incrementally implement solutions
- Great communication skills to lead post-incident reviews, writing client-facing communication
- A passion to efficiently support always-available applications
- Able to multitask, prioritize, and manage time efficiently
- Write and review application code: Python/TypeScript/JavaScript
- Experience with Django web framework
- Experience with configuration management and infrastructure deployment using Terraform
- Experience with monitoring and visualization tools like Prometheus and Grafana
- Experience with deployment, logging, monitoring, securing services on GCP, AWS cloud providers
- Experience with containerization and deployment automation tools: Docker, Kubernetes
- Experience writing, maintaining, optimizing CI/CD pipelines
- Experience with databases
- Experience with Linux
- Experience using Git
Nice to Have
- Experience setting up an Application Platform Monitoring tool (New Relic, Datadog, Splunk, Dynatrace, etc.)
- Write and review application code: Elixir
colinoncars.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, colinoncars.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, colinoncars.com is the ideal place to find your next job.