Morgan Stanley New York , NY 10007
Morgan Stanley is a leading global financial services firm providing a wide range of investment banking, securities, investment management and wealth management services. The Firm's employees serve clients worldwide including corporations, governments and individuals from more than 1,200 offices in 43 countries.
As a market leader, the talent and passion of our people is critical to our success. Together, we share a common set of values rooted in integrity, excellence and strong team ethic. Morgan Stanley can provide a superior foundation for building a professional career - a place for people to learn, to achieve and grow. A philosophy that balances personal lifestyles, perspectives and needs is an important part of our culture.
Technology works as a strategic partner with Morgan Stanley business units and the world's leading technology companies to redefine how we do business in ever more global, complex, and dynamic financial markets. Morgan Stanley's sizeable investment in technology results in quantitative trading systems, cutting-edge modeling and simulation software, comprehensive risk and security systems, and robust client-relationship capabilities, plus the worldwide infrastructure that forms the backbone of these systems and tools. Our insights, our applications and infrastructure give a competitive edge to clients' businessesand to our own.
The Unix Operations team is responsible for implementing and managing the Linux infrastructure for Morgan Stanley. The group is involved in evaluation, certification, integration, and maintenance of various products, including hardware, Operating Systems, such as Red Hat Linux and Solaris, system services (DNS, DHCP, NTP, syslog, etc.), file systems (NFS, AFS, Hadoop and various cluster file systems), High Availability, Virtualization technologies and a variety of in-house developed tools.
We interact with a high number of customers from numerous business units to generate improvements that ensure the smooth operation of the plant without being hard to manage. We also liaise with engineering groups to set direction for the many disciplines that are part of our day-to-day service portfolio (storage, core infrastructure services and special projects) and create different solutions for the low-level components that makes our infrastructure tick. We interact with many customers from numerous business units as a mechanism to generate improvements to be able to 'run the bank' more efficiently and effectively.
The role is critical to our day to day incident management function with primary responsibilities for:
Diagnosis and resolution of immediate production impacting issues in the electronic trading, compute and storage plants
Working with other infrastructure teams including networking, database administration and hosted solution teams for outage resolution, as well as customers aligned with the business users of our plant to determine scope, impact, and appropriate resolution path
Carry out proactive health & hygiene tasks to maintain operational stability and compliance for risk & control programs to ensure the production environment is not put at risk
Collaborate with engineering teams to test and certify new hardware & software products
Collaborate with application development/support teams for proof-of-concept setups for in-house-developed and/or ISV-supplied products.
Occasional weekend project work responsibilities to on-board new UNIX assets for growth or large programs such as new datacenter build outs
Ability to read complex code and also write scripts using Shell, Perl and Python.
Must have strong knowledge and experience with Linux, preferably Redhat, and/or any other Linux distributions.
Knowledge and experience of various services i.e DNS, DHCP, NTP, Kerberos, SSHD, PXE, SFTP, HTTPD etc.
Knowledge of various enterprise server hardware models [blades, rackmount, standalone] networking, routers and switches.
Must be able to read, understand and write intermediate to complex scripts using KSH, Bash, Perl, Python etc.
Excellent communication and written skills. Being able to explain technical problem to non-technical audience.
Available for on-call (1 week out of every 4-6 weeks), rotated weekly within the team, and become a point person for any production issues.
Ability to work in a global distributed team.
Experience with trouble shooting incidents involving compute resources, network problems, remote storage related problems [SAN, NAS] etc.
Experience with analyzing and diagnosing kernel carsh/core dumps, network packet captures and identifying the root cause of problems from those.
Sound knowledge of networking, TCP/IP, Layer 2/3 network design, bonding, routing, firewalls (host- and appliance-based), switches and routers etc.
Experience working in a DevOps environment.
Knowledge and experience with various server hardware models and vendors i.e. IBM, Dell, HP etc.
Ability to identify performance bottlenecks and tune the system parameters to provide more throughput.
Good understanding and knowledge of Load Balancing, High Availability and BCP.
Good understanding and workings of configuration management tools, Redhat Satellite servers, Puppet, Chef, SaltStack, etc.
Good knowledge and understanding of Clustering, Virtualization, NAS, NFS and SAN