Job description
At AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox Live, PlayStation Network, and EA Origin. We are backed by top investors including Softbank, Sony Interactive Entertainment, Galaxy Interactive, NetEase, and Krafton. Our latest Series B funding has firmly solidified our place as a top player in the gaming industry. AccelByte’s talent has decades of experience building and shipping some of the largest game and distribution platforms in the world.
We believe that the best companies empower employees to make decisions, obsess about the best user experience, and are not afraid to make and learn from their mistakes. Our culture is based on humility, openness to feedback, drive, and collaboration, which we feel results in the best performing teams. As a company that values diversity, inclusion, and employee growth, our employees have opportunities to work with and learn from teams all over the world. We offer competitive salaries, a full range of health benefits, social activities, career growth opportunities, and an amazing team. Come join us!
Position Summary
As an Incident Manager, you are responsible for the overall response to critical incidents and managing escalations of all infrastructure support incident types. You establish and improve the incident processing and communication frameworks and protocols, educate infrastructure and other development teams for best incident handling practice.
Essential Functions/Responsibilities:
The Incident Manager is accountable for the following functions and responsibilities:
- Establish, improve and document the incident handling, escalation, and postmortem processes.
- Maintain an understanding of all major problems, issues, trends, and changes in the supported environments.
- Functions as a 24/7 escalation point for critical service interruptions, after-hour support required when dealing with outages after regular business hours
- Escalates risks and issues to the responsible process owner and all key stakeholders to ensure awareness of service issues, trends, and mitigation plans
- Drive both technical and non-technical staff to better ensure the timely, efficient resolution of incidents and minimize adverse impact on clients and business
- Work to ensure timely service restoration and problem resolution of complex and/or high impact incidents and minimize the adverse impact of incidents and problems
- Drive the root cause analyses and postmortem process for all priority 1 and 2 incidents
- Facilitate regular critical incident and problem review meetings with stakeholders and responsible for the outcomes
- Create and/or manages all problem tickets throughout their lifecycle and generate client-facing reports
- Review and manage external vendors in accordance with applicable Service Level Agreements
- Continuously seeks improvement opportunities to ensure service availability and clients’ expectations are managed and met.
- Develop and maintain strong and effective working relationships with various internal and external stakeholders and suppliers
- Support service management reports, weekly Incident reporting, monthly SLA/SLO/SLI reporting
Qualifications/Experience Required:
- Bachelor’s degree in Computer Science, or related field of study
- 3+ years of relevant professional experience in Major Incident and Problem Management experience in a global IT environment
- 2+ years of cloud environment support experience
- Proven experience with Root Cause Analysis (RCA) methods such as Generic Error Modeling System (GEMS), Kepner-Tregoe, 5 Whys, and Ishikawa Diagrams
- Practical knowledge of Change Management processes
- Sound IT infrastructure experience as well as quality and results orientation. IT application development knowhow helpful
- Practical project management experience
- Strong planning, coordination, and organizational skills
- Self-starter, self-confidence, assertiveness, commitment
- Ability to work across functional areas to drive continuous improvement
- Ability to work in challenging and ambiguous environments
- Ability to remain flexible and adapt to changing priorities with promptness, efficiency, and ease
- Strong service orientation mindset
- Strong technical documentation skills
- Strong collaboration and customer service orientation, excellent oral and written communication skills
Qualifications/Experience Preferred:
- ITIL v3 2011 Foundation certification or similar certification
- Previous experience working in the game industry and/or cloud infrastructure with AWS and other cloud services vendors
- Knowledge of or experience with SRE, IaC, Infrastructure Security
- Experience with Confluence, Jira, and BitBucket
AccelByte Inc is an Equal Employment Opportunity Employer, all qualified candidates applicants will receive consideration for employment without regard to race, religion, gender, national origin, sexual orientation, marital status, age, or disability. Our culture is innovative and inclusive, and we value our people the highest.
Please visit our career page for a complete listing of our open positions https://accelbyte.io/careers
colinoncars.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, colinoncars.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, colinoncars.com is the ideal place to find your next job.