Director, Site Reliability Engineering
Element Fleet Management
Date: 2 weeks ago
City: Mississauga, ON
Contract type: Full time
Get started on an exciting career at Element!
Element employees make a difference in the lives of others every day. We are re-defining the fleet management industry to be people first, then business – delivering on our promise of a superior client experience. This takes hard work and innovation, and we need more like-minded people on our team.
What We Need
We are looking for a Director, Site Reliability Engineering to join Element Fleet Management. As the largest pure-play fleet manager in the world, we provide unmatched products and services and solutions to our clients.
At Element, employees play a critical role in delivering value to customers and ensuring an exceptional client experience. We are committed to the success of our clients, employees, and investors by fostering a culture where every employee can make a difference!
Are You
A Day in the Life
What’s In It For You
Element Fleet Management and its wholly owned subsidiaries are an equal opportunity employer committed to diversity, equity, inclusion, and belonging. We are pleased to consider all qualified applicants for employment without regard to race, color, religion, gender identity, age, sex, sexual orientation, disability, national origin, Aboriginal/Native American status, protected veterans’ status or any other legally-protected factors. Disability-related accommodations during the application and interview process are available upon request. Should you require an accommodation with our hiring process please send an email to [email protected] or call (800) 665-9744.
Element employees make a difference in the lives of others every day. We are re-defining the fleet management industry to be people first, then business – delivering on our promise of a superior client experience. This takes hard work and innovation, and we need more like-minded people on our team.
What We Need
We are looking for a Director, Site Reliability Engineering to join Element Fleet Management. As the largest pure-play fleet manager in the world, we provide unmatched products and services and solutions to our clients.
At Element, employees play a critical role in delivering value to customers and ensuring an exceptional client experience. We are committed to the success of our clients, employees, and investors by fostering a culture where every employee can make a difference!
Are You
- An individual with strong customer focus, adaptability, and a proactive approach to problem-solving?
- Someone with experience using data analytics to drive decision-making for system improvements and incident prevention?
A Day in the Life
- Team Leadership and Development: Hire, mentor, and develop a high-performing SRE team. Foster a culture of collaboration, continuous learning, and innovation. Provide ongoing training and development opportunities for team growth.
- Incident Management and Response: Lead the team in incident response, coordinating with cross-functional stakeholders to ensure timely resolution. Conduct thorough post-mortems, identifying and implementing preventive measures.
- Problem Management: Analyze and address underlying issues in applications and systems to prevent recurring incidents. Establish and maintain processes for identifying, tracking, and resolving long-term problems, promoting continual improvement.
- Change Management and Release Engineering: Implement and oversee change management practices, ensuring safe and reliable releases. Work closely with development and QA teams to standardize and optimize deployment pipelines for maximum reliability and scalability.
- Service Level Objectives (SLOs) and SLAs: Establish, monitor, and enforce SLOs, SLIs, and SLAs that align with business requirements. Regularly review and update SLOs to reflect changing system needs and customer expectations.
- Monitoring, Alerting, and Reporting: Build and maintain robust monitoring, logging, and alerting solutions for system health and application performance. Develop regular reports on reliability metrics and trends to identify areas for improvement.
- Automation and Tooling: Drive the adoption of automation and self-healing systems to reduce manual intervention, improve efficiency, and minimize human error. Oversee the development of tools and frameworks to support automation in deployment, monitoring, and incident response.
- Capacity Planning and Disaster Recovery: Conduct capacity planning and manage resources to ensure systems can handle current and future demands. Establish and maintain disaster recovery and business continuity plans for critical systems.
- Audit and Compliance: Collaborate with internal and external audit teams to ensure that our production systems meet SOC1, SOX, and other regulatory requirements. Oversee the creation of reports and documentation to support compliance and audit processes.
- Vendor Management: Manage relationships with external vendors to ensure they meet performance and service level agreements. Work with vendors on troubleshooting, support, and continuous improvement initiatives.
- Bachelor's degree in computer science, engineering, or a related field; advanced degree preferred.
- 10+ years of experience in IT operations, SRE, or related field, with a strong record of managing high-availability systems in production environments.
- In-depth knowledge of cloud infrastructure (AWS, Azure, or GCP), containerization (Docker, Kubernetes), and infrastructure as code (Terraform, Ansible).
- Solid understanding of SRE principles and practices, including error budgets, service level objectives (SLOs), and service level indicators (SLIs).
- Strong background in automation, CI/CD, and DevOps practices, with experience using tools such as Jenkins, GitLab CI/CD, or similar.
- Experience with observability tools such as Prometheus, Grafana, ELK Stack, Splunk, or DataDog, and the ability to design, implement, and interpret monitoring and alerting systems.
- Proven ability to lead and manage incident response and post-incident analysis, with a focus on improving response times and reducing incident frequency.
- Proficiency in scripting and programming languages such as Python, Go, or Bash, with an ability to build automation scripts and tooling.
- Familiarity with SOC1, SOX, and other regulatory compliance frameworks, and experience in maintaining audit and compliance documentation.
- Strong project management skills with a focus on prioritization, resource planning, and risk assessment.
- Google Cloud Professional DevOps Engineer, AWS Certified DevOps Engineer, or Certified Kubernetes Administrator (CKA)
- ITIL Certification, ITSM Certification, or PMP certification
- Familiarity with advanced SRE tools and practices such as chaos engineering, load testing, and synthetic monitoring
- Experience managing third-party relationships to ensure vendors meet performance and service level expectations
- Hands-on experience in coordinating with audit teams for compliance documentation and requirements
What’s In It For You
- A culture of innovation, empowerment, decision-making, and accountability
- Comprehensive health and welfare benefits that serve the needs of you and your family and foster a culture of wellness
- Additional benefits and amenities, including paid time-off programs (vacation, sick leave, and holidays)
Element Fleet Management and its wholly owned subsidiaries are an equal opportunity employer committed to diversity, equity, inclusion, and belonging. We are pleased to consider all qualified applicants for employment without regard to race, color, religion, gender identity, age, sex, sexual orientation, disability, national origin, Aboriginal/Native American status, protected veterans’ status or any other legally-protected factors. Disability-related accommodations during the application and interview process are available upon request. Should you require an accommodation with our hiring process please send an email to [email protected] or call (800) 665-9744.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Executive Assistant
EllisDon,
Mississauga, ON
1 day ago
Connect with us LinkedIn, Instagram, Facebook, TwitterThinking about a change? We recognize that the construction industry is changing at a rapid pace and we continually strive to be at the forefront. Our core values empower people to deliver great careers to one another and develop creative solutions for complex problems on some of the most exciting projects. It doesn’t matter...
Marketing Manager - Omni Channel
Maple Leaf Foods Inc,
Mississauga, ON
5 days ago
The Opportunity: The Omni-channel Marketing Manager will lead the strategic planning, execution, and optimization of shopper marketing, eCommerce, and retailer media initiatives. This role is essential in driving market share growth and achieving profitable sales for our brands. The manager will be responsible for building integrated marketing programs across key accounts, including Loblaws, Instacart (Canada and USA), Sobeys/Voila, Amazon (Canada...
Service Advisor
Midas Auto Experts,
Mississauga, ON
5 days ago
Benefits:Bonus based on performanceDental insuranceEmployee discountsFlexible scheduleHealth insuranceVision insuranceAt Midas, we are dedicated to providing quality services to customer vehicles in the areas of inspection, diagnosis, and repair utilizing the expertise attained through ASE Certifications and/or years of automotive repair experience.ResponsibilitiesThe Service Advisor supports the Shop Manager in the optimization of the shop’s retail sales and profitability while delivering superior...