hero

Job opportunities in our portfolio companies

Seaya

Site Reliability Engineer (Paris or Remote France)

Alma

Alma

Software Engineering
Paris, France
Posted on Saturday, July 29, 2023

Job Description

About the job

Within our Engineering team, a Site Reliability Engineer will be part of the Platform tribe and be responsible for:

  • Ensuring that the infrastructure is aligned with the internal and external customers' needs and with the requirements of our SLAs/SLOs
  • Working with the Engineering teams to design and implement scalable and resilient solutions
  • Promoting automation and SRE best practices to optimise operational efficiency
  • Developing and maintaining backup and disaster recovery strategies to protect data and ensure business continuity
  • Designing, implementing and maintaining monitoring tools to track key system metrics and health indicators
  • Providing technical support and expertise to engineering teams for the resolution of application and infrastructure incidents
  • Carrying out in-depth analyses of incidents in order to identify the underlying causes and put in place corrective measures
  • Maintaining the platform in operational condition by implementing updates, security patches and continuous improvements
  • Participating in the optimisation of the platform's operating costs

As a bonus element, our technical stack is:

  • Cloud providers: GCP, CloudFlare, AWS
  • Backend: Python + FastAPI and Flask
  • Frontend: React / Typescript
  • Databases technologies: PostgreSQL, Redis, BigQuery
  • Log and error management: Datadog, Sentry
  • CI/CD: Github Actions, Docker
  • Monitoring: Datadog
  • Infrastructure as Code: Terraform

About you

We are looking for a candidate who embodies the following qualities:

  • At least 5 years of experience in the management of cloud infrastructures
  • Deep knowledge of Google Cloud Platform or other cloud providers
  • Good network knowledge
  • Strong appeal for security topics
  • Experience in setting up and maintaining monitoring tools, analysing metrics and malfunctions
  • Practice of Infrastructure as code
  • Ability to solve problems methodically and work effectively under pressure during critical incidents.
  • Strong communication skills to collaborate with different teams and communicate problems and solutions effectively.
  • Good practice of English

Recruitment Process

  • Recruiter Interview
  • Hiring Manager Interview
  • Technical Test
  • Final Interview