About Us: GE is the world’s Digital Industrial Company, transforming industry with software-defined machines and solutions that are connected, responsive and predictive. Through our people, leadership development, services, technology and scale, GE delivers better outcomes for global customers by speaking the language of industry.
At GE Digital, we are creating technology and solutions to enable social, mobile, analytical and cloud capabilities for the Industrial Internet. The Industrial Internet is an open, global network that connects people, data and machines. It’s about making infrastructure more intelligent and advancing the industries critical to the world we live in. At GE, we believe it’s about the future of industry—energy, healthcare, transportation, manufacturing. It’s about making the world work better. GE is transforming itself to become the world's premier digital industrial company, executing critical outcomes for our customers. Explore how you can drive greater asset reliability, lower operating costs, reduce risk and accelerate operational performance with our Predix platform and software solutions. GE offers a great work environment, professional development, challenging careers, and competitive compensation. GE is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected by law.
Role Summary: The Site Reliability Engineer will be responsible for performance and availability of Compute and Network infrastructure consumed by all business segments. The Site Reliability teams are composed of highly talented individuals obsessively focused with availability through operational excellence. The ideal individual is relentlessly technical, passionate for automating everything and totally committed to delivering amazing customer experiences.
Essential Responsibilities: As a Site Reliability Engineer, for GE Digital’s Global Operations, you must have an good understanding of standard IT infrastructure equipment and systems – reliability and failure causes, the ability to quickly understand the key operational characteristics equipment and systems
Available 24x7 to quickly respond and resolve critical service outages severely impacting consumers
assist with establishing performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteria
Assist in developing automated solutions to address potential problems before they result in a service interruption
Provide impact assessment and mitigation plan for changes going into the production environment
Investigate root cause of severe and systemic outages, identify corrective actions and apply across the enterprise
Identify thresholds for all critical links in the data path to quickly isolate where imbalances may result in potential outages
Analyze failure points in services to model risk level and resolution steps if failure occurs.
Assist in driving architecture enhancements into system to mitigate potential failure points.
Programmatically monitor for and remediate configuration drift of critical devices
Help develop response plans to potential failure points and evaluate effectiveness during planned tests
Perform comprehensive operational health checks of the entire services to identify areas of concern and track activities to drive improvements at all levels of the architecture
Provide technical coaching and direction to more junior teammates
Bachelor's degree in Computer Science, Information Management, similar STEM degree, or equivalent practical experience.
Minimum 2 years IT experience in enterprise-wide deployments.
Firm understanding of scripting or developing software and services for the cloud Ruby, Python, Go, Java, Node.js, .NET, etc.
Desired Characteristics: Technical Expertise:
Very good knowledge of common operating systems (Unix/Linux, Windows)Strong oral and written communication skills.
Very Good knowledge of network protocols (TCP/IP, SNMP, FTP, syslog, TFTP, etc.
Some experience managing version control systems such as Git
Knowledge of deploying and managing infrastructure on public clouds such as AWS or Azure
Some experience using an automated configuration management system (Terraform, Chef, Puppet, Ansible, Salt, etc.)
Strong organizational and project management skills
Strong analytical and problem resolution skills
Very good knowledge of Network Management (SNMP, MIB)
Some experience with configuring, customizing, and extending monitoring tools (Datadog, Sensu, Grafana, Splunk, etc.)
Very good knowledge of TCP/IP networking, and inter-networking technologies (routing/switching, proxy, firewall, load balancing etc.)
Knowledge of Analytics Software Packages like Matlab, SAS, JMPro etc. Programming experience with open source scripting and data analysis packages like Python, is a plus. Leadership:
Proactively engages with cross-functional teams to resolve issues and design solutions using critical thinking and analytics skills and best practices by actively incorporating input from various sources
Strong analytical and strong problem solving skills - effectively evaluates information/data to make decisions; anticipates obstacles and develops plans to resolve
Continuous improvement oriented – actively generates process improvements; champions and drives change initiatives
Ability to deliver results in a rapidly changing dynamic environment Personal Attributes:
Emotional Intelligence, ability to influence up and out and the ability to work independently
Must be a team player with a strong desire to win
Passionate about continuously learning and able to quickly adapt and pivot to win in dynamic environment
Highly organized and efficient; able to balance competing priorities and execute accordingly
We are in the process of transitioning to an improved job application system and in the interim we are operating with two systems. Have your Job ID ready (from the email you received when you applied) to log in and check your application status.
Click the appropriate button. If you don't know your job ID, you can still check your status: use both buttons.