Monitoring & Incident Management:
Improve the studio's reliability through monitoring, rapid response, communication and coordination.
Develop and manage the deployment architecture for the application, develop the monitoring architecture and implement monitoring agents, dashboards, escalations and alerts.
Routinely identifies operational problems by observing and studying system architect, functionality and performance results. Troubleshooting procedures with the overall studio architect and investigating surfaced issues, and handling incidents.
Identifies operational priorities by assessing operational objectives; determining project objectives, such as, efficiency, cost savings, energy conservation, operator convenience, safety, environmental quality; estimating relevance, time, and costs.
Development & Data Analyzing:
Develop operational solutions by defining, studying, estimating, and screening alternative solutions; calculating economics; determining impact on total system.
Create new tools to facilitate automated monitoring of the studio's operational environment.
Anticipates operational problems by studying operating targets, modes of operation, unit limitations; monitoring unit performance.
Improves operational quality results by studying, evaluating, and recommending process re architecting, implementing changes, contributing information and opinion to unit design and modification teams.
Provides operational management information by collecting, analyzing, and summarizing operating and engineering data and trends.
Updates job knowledge by participating in educational opportunities; reading professional publications; maintaining personal networks; participating in professional organizations.
Accomplishes engineering and organization mission by completing related results as needed.
Mastery of Systems Linux and Networking administration
Strong systems engineering and troubleshooting skills
Shell scripting (BASH & PHP)
Strong TCP/IP understanding and ability to produce detailed documentation
Write up new and maintain technical documentation
Ability to administer networking firewalls, routers, and switches
S3 Maintenance, Apache maintenance, Load Balancer Management
Administer and maintain MySQL and other opensource databases
Write and perform basic queries to evaluate database stability, integrity and performance
Large/Big Data Management
Administer and maintain Aurora infrastructure
System Level (Nagios, Munin, Check_MK)
Writing checks & scripts
Log/Application Level (Splunk, Elastic Searching, Apache)
Ability to diagnose infrastructure as a whole!