Principal Site Reliability Engineer, Cloud Platform Operations
Location: Bengaluru, Karnataka, India
Job Number: 31642
Position Title: Principal Software Engineer (Ops Team)
Everything Informatica does begins and ends with data. Simply stated, we make great data – data that is connected, clean and safe -- ready to use so that all enterprises can be data ready and put their unique information potential to work. A data ready enterprise is decision-ready, customer-ready, application-ready, cloud-ready and regulation-ready. And by design, our Intelligent Data Platform delivers great data to enable our customers to be ready for anything.
Informatica Cloud is the clear market leader of the Integration Platform-as-a-Service (iPaaS) providers. We provide Data Integration, Data Quality, Information Lifecycle Management, Test Data Management, Master Management and other Enterprise Information Management solution as a service on the cloud. You will be working on the Cloud Product Operations team, and will be responsible for the management, monitoring and operation of our services in production. We use cutting edge Cloud hosting, monitoring and deployment technologies to deliver a world class, non-disruptive Cloud user experience to our customers.
Our Ideal Candidate
You are a bright, organized, and dedicated cloud operations engineer with deep knowledge and hands-on experience in Linux administration, Microsoft technologies, server monitoring, application management and Software engineering. You will become the primary contact in supporting our 24 x 7 SaaS operation environment, deploying data management products into Cloud (like AWS and Azure), automate the delivery, maintenance and monitoring the services. You will work closely with R&D, QA, CloudTrust and IT teams to implement product delivery processes, automation, monitoring and alerting tools. This position will report to Director of Cloud Operations and is based in the company’s Bangalore location. You will be a key contributor in a lean and high impact team that is aligned in different geographic locations. You are comfortable with supporting and operating high availability Cloud services and provide on-call support for Sev1 incidents on production and critical development/QA cloud environments. You will ensure patching of the Informatica products on the cloud, linux machines and databases, performance monitoring, documentation of tasks. You are self-motivated with strong problem solving and troubleshooting skills.
As a Principal Site Reliability Engineer, you will be responsible for the overall observability of the system , possess strong monitoring skills, excellent communication skills, and a passion for automation and innovation. You must be able to work in and adapt to a fluid, fast-paced environment.
In this individual contributor role, You will need,
- A background in Devops, platform engineering, site reliability engineering, systems administration, or software development
- Ability to configure and maintain monitoring tools such as Prometheus and Grafana
- Working knowledge of logging systems like ElasticSearch, Logstash, and Kibana
- Previous experience building custom dashboards from multiple data sources
- Excellent monitoring, debugging, and troubleshooting skills
- Self-motivated with a strong sense of ownership, urgency, and drive
- Empathetic listening skills and a desire to continuously improve and grow
- Planning, installing and deploying highly available solutions on public cloud
- Support the agile software development process among cross-functional teams to ensure smooth product delivery
- Be a primary person to handle P0/P1 incidents reported in production and staging landscape
- Work with development teams across multiple organizations to drive automation, establish software standards, service modularity, testing standards, and deployment/management of microservices.
- Focus on automating build, release/deployment, manual processes and workflows
- Perform incident/alert troubleshooting, problem analysis and provide high quality solutions to technical issues
- Support and improve our tools, infrastructure, and processes that support rapid and reliable delivery of high-quality software to our production service.
- Advocate for improving our build and release toolchain.
- Support deployment activities of development and production releases including troubleshooting of release blockers such as infrastructure, configuration and code.
- Assist development in troubleshooting system and software issues in all environments
- Write effective documentation
- Manage RCA, Incident Process, and Risk Analysis of the cloud services
- On-call support in cases of issues on production environment
- Provide proactive support on critical issues, including liaison with business users, system users.
- Take ownership and resolve the issues related to production environment within expected by SLA time frames
- Keep up to date on the latest and greatest tools and solutions that will best serve the business
Technology You’ll Use not limited to the following:
- Docker and cloud platforms (AWS, Azure), cloud log services (ELK), cloud application monitoring tools (Elastic APM, AppDynamics)
- Jenkins and Spinnaker CI/CD tools
- Git version control systems
- Kubernetes, AWS EKS, Helm, Istio, Kiali, Service Mesh
- Telemetry tools which include Prometheus, ELK
10+ years of relevant experience in platform engineering, site reliability engineering, systems administration, or software development
- Experience across entire SDLC, CI/CD tools, with configuration & release management, deployments, and troubleshooting in cloud environments
- Significant experience with tools used for automated deployment, scaling, and operations of application containers such as Kubernetes/Mesos
- Experience with public clouds such as AWS, Azure, GCP
- Hands on experience with CI/CD tools such as Jenkins, bamboo, Spinnaker
- Strong scripting experience with Bash, PowerShell, Python
- Strong understanding of source code version control systems, GitHub and code branching/merging strategies
- Expertise with build package and release tools such as npm, Maven, JVM, Ant and Gradle
- Experience with tools such as Prometheus, Grafana, Sumo Logic, ELK, Jaeger, Opsgenie is added advantage
- Experience with deploying Docker images in the Cloud delivered through AWS EKS is preferred
- Network automation such as switch configurations, routing, & load balancers
- Proficiency with Cloud technologies such as AWS Cloud (EC2, EC2 Container Service, Kubernetes), Azure, GCP
- Excellent cloud experience including but not limited to multi-tenancy, secured application, high availability, Micro Services, telemetry,
- Excellent communication skills (written, verbal, interpersonal)
- Ability to work independently with little direct supervision
Education / Experience
- 10+ years’ experience in software engineering with SRE/Observability Responsibilities
- Bachelor’s/Master’s Degree in Computer Science or equivalent experience
Seniority Level: Mid-Senior Level
Alternative Location(s) :
Community / Marketing Title: Principal Site Reliability Engineer, Cloud Platform Operations
Remote LinkedIn Hashtag:
LinkedIN Hashtag: LI-CT1
Unleash Your Potential
A career with Informatica gives you all the opportunities and benefits that can only come from working for the trusted industry leader. By joining our team, you'll be able to solve real-life problems, make a difference, have a global impact, and join a supportive group of globally diverse teammates. We encourage you to be yourself, grow with us and unleash your potential.
EEO Employer Verbiage:
Informatica, the Enterprise Cloud Data Management leader, empowers businesses to realize the transformative power of data. We have pioneered a new category of software, the Informatica Intelligent Data Management Cloud (IDMC), powered by AI and a cloud-first, cloud-native, end-to-end data management platform that connects, manages and unifies data across any multi-cloud, hybrid system, empowering enterprises to modernize and advance their data strategies. Customers in more than 100 countries and 85 of the Fortune 100 rely on Informatica to drive data-led digital transformation. For more information, visit us at www.informatica.com, LinkedIn, Twitter, and Facebook.
Conquering the Impossible with data, come join #LifeAtINFA!
All qualified applicants will receive consideration for employment without regard to race, sex, color, religion, sexual orientation, gender identity, national origin, protected veteran status, or on the basis of disability.
Travel Requirement: Limited
Location_formattedLocationLong: Bangalore, Karnataka IN