Senior Software Engineer, SRE, Cloud Incident Response
Company: WeAreTechWomen
Location: Mountain View
Posted on: May 20, 2025
Job Description:
Minimum qualifications:
- Bachelor's degree in Computer Science, a related field, or
equivalent practical experience.
- 5 years of experience with software development in one or more
programming languages.
- 5 years of experience with data structures or algorithms.
- 3 years of experience in designing, analyzing, and
troubleshooting distributed systems, and 2 years of experience
leading projects and providing technical leadership.
- Experience in SRE or incident management/response
environments.Preferred qualifications:
- Experience working in computing, distributed systems, storage,
or networking.
- Experience in telemetry systems, incident and risk
management.
- Experience in designing, analyzing, and troubleshooting
large-scale distributed systems.
- Ability to debug, optimize code, and automate routine
tasks.
- Excellent problem-solving skills, with strong verbal and
written communication abilities.About the jobSite Reliability
Engineering (SRE) combines software and systems engineering to
build and run large-scale, massively distributed, fault-tolerant
systems. SRE ensures that Google Cloud's services-both our
internally critical and externally-visible systems-have
reliability, uptime appropriate to customer needs, and a fast rate
of improvement. Additionally, SREs monitor system capacity and
performance continuously.Our software development focuses on
optimizing existing systems, building infrastructure, and
eliminating work through automation. On the SRE team, you'll manage
the unique challenges of scale at Google Cloud, leveraging your
expertise in coding, algorithms, complexity analysis, and
large-scale system design. We foster a culture of curiosity,
problem-solving, and openness, encouraging collaboration, big
thinking, and risk-taking in a blame-free environment. We support
self-directed work on meaningful projects and provide mentorship to
facilitate growth. Our team maintains the architecture behind
Google's online presence, from data centers to next-generation
platforms, ensuring optimal network performance for
users.Responsibilities
- Ensure Google Cloud Platform (GCP) stability and reliability
through critical incident support, focusing on high-quality
customer outcomes and cross-team collaboration.
- Create training materials and processes for incident
management, partnering with Cloud Support leadership.
- Develop systems and tools to improve incident response
visibility, issue detection, and communication with customers and
stakeholders.
- Identify and escalate risks in Cloud services, implementing
tactical measures to reduce major incident probabilities.
- Support the scalability and reliability of systems throughout
their lifecycle by participating in pre-launch activities,
automation, and continuous improvement initiatives.
#J-18808-Ljbffr
Keywords: WeAreTechWomen, Vallejo , Senior Software Engineer, SRE, Cloud Incident Response, IT / Software / Systems , Mountain View, California
Didn't find what you're looking for? Search again!
Loading more jobs...