This is a 24/7 team responsible for production systems health monitoring, deployment of code changes, escalation handling and standardized communication of all change management within the technical operations organization. Multi-task and prioritize system events according to severity and escalation procedures. Quickly and accurately communicates production emergencies, both with internal and external groups. This individual navigates in both Unix and Windows environments and be skilled in actively troubleshooting and/or resolving production issues. Mentor junior staff members and takes lead on outages and dashboard assignments.
Ideal Candidate will have the following:
- Take lead on production outages
- Delegate dashboard assignments
- Mentor more junior staff members
- Responsible for overall daily monitoring - 24x7x365 Health monitoring of Unix and Windows environments hosting various based web, mobile and telephony platforms using server, network and application monitoring systems
- Manage real-time escalations and on point for ensuring escalations procedures are in process and are driven to resolution
- Handle stressful situations, such as initiating emergency conference bridge calls and sending quick and accurate outage notifications
- Create quality control on communications for code releases, schedule maintenances and service interruptions
- Monitor the infrastructure change management policies and procedures
- Responsible for communicating between departments, vendors and partners as a central repository for information regarding production site, customer support, help desk and core systems issues across the entire organization
- Responsible for the deployment/release of engineering code across multiple environments - all builds/releases communicated and applied to staging and production environments according to standard operating procedures
- Provide application support for Unix and Windows applications, including performing various system administration tasks and performing standard operating procedures as needed to maintain system health
- Work a combination of day, evening and or third shifts as needed
- Perform other related duties as required and assigned
- Demonstrate behaviors which are aligned with the organization's desired culture and values
This job has expired.
- A bachelor's degree in Computer Science or a related technical field, or equivalent practical system administration and programming experience.
- 3+ years of previous operations center or equivalent experience
- Must be comfortable working in a command line as well as GUI environments
- 3+ years of direct experience (running scripts, grepping logs, troubleshooting errors)
- 3+ years of direct Windows experience (running scripts, processing event log messages, troubleshooting errors)
- 3+ years of direct Vmware Horizon 7 experience (Managing environment)
- 3+ years of direct Commvault experience (Managing backup environment)
- Hands-on experience Amazon Web Services (AWS), Jenkins, and Chef
- Hands-on experience Ivanti Patch Management
- Knowledge of Docker containerization and Kubernetes/EKS cluster management for container orchestration
- Experience programming with at least one language - Powershell, Python, Go, Ruby, or PHP -- and a desire to learn more.
- Knowledge of fundamental networking protocols, such as TCP/IP, HTTP, SSL, and DNS, or of Linux system internals.
- An understanding of large scale system design, monitoring, and operational practices.
- Must be able to accurately report information in a timely manner
- Excellent written and oral communication skills
- Experience with , ServiceNow, New Relic, SumoLogic, Nagios or Opsview is a required
- Team leadership skills are required