🛠️ Site Reliability Engineer (SRE) – Join Tinybird’s Remote Engineering Team

Location: Remote (Global)
Salary Range: €62,000 – €109,000 + Stock Options
Industry: Real-time Data | Developer Tools | Infrastructure
Experience Level: Mid–Senior

🚀 About Tinybird

Tinybird helps developers and data teams unlock the full potential of real-time data. With our platform, teams can:

Ingest large volumes of data effortlessly
Shape and query data using 100% pure SQL
Build low-latency, high-concurrency APIs in minutes—not hours

Trusted by innovative engineering teams, Tinybird lets developers create fast APIs, faster than ever. If you’re passionate about scalable systems, cloud infrastructure, and making software and hardware work together seamlessly, we want to hear from you.

👩‍💻 The Role: Site Reliability Engineer

We’re looking for an experienced SRE to help scale our platform, optimize system performance, and ensure reliability, elasticity, and observability as we grow. This role is not just about managing infrastructure; it’s about building systems that scale intelligently and transparently, directly impacting product development and customer success.

🔧 What You’ll Be Doing

Design and build highly available, scalable cloud infrastructure to support our platform
Participate in the on-call rotation, gaining a deep understanding of customer issues
Optimize infrastructure usage and cost efficiency across services and regions
Collaborate with backend engineers to influence system architecture and data workflows
Improve observability with monitoring, logging, and alerting tools (Grafana, Loki, Mimir)
Design resilient systems for disaster recovery, failure detection, and recovery
Other Posts You May Be Interested In
Automate customer scaling processes (currently semi-manual) for dynamic resource allocation
Work hands-on with technologies like ClickHouse, OpenResty, Redis, and Varnish
Use Terraform and Ansible for provisioning and configuration management

🧱 Our Stack Includes:

Infrastructure: Linux, Terraform, Ansible
Load Balancing & Caching: OpenResty, Varnish
Data Systems: ClickHouse, Redis, Zookeeper
Monitoring & Alerting: Grafana, Loki, Mimir
Languages: Python (main), C++ (for hot paths)
Cloud Provisioning: Custom infrastructure automation (VMs, Kubernetes clusters, etc.)

💡 Some Ongoing Challenges You’ll Help Solve

Elastic scaling for large customer workloads with dynamic resource provisioning
High availability and global redundancy for mission-critical APIs
Better observability from per-process metrics to a bird’s-eye view of system health
Self-service infrastructure so customers can scale up seamlessly—no manual intervention required
Tighter integration with ClickHouse, understanding internals to maximize performance

✅ What You Bring to the Table

Proven experience designing, deploying, and scaling cloud-native, distributed architectures
Strong systems-level thinking and the ability to manage edge cases and failure modes
Solid programming skills in Python or C++
Ownership mindset: you don’t shy away from problems—you dive in and fix them
A passion for action, iteration, and delivering scalable, resilient solutions fast
Ability to collaborate asynchronously, document clearly, and communicate effectively
Experience with infrastructure-as-code tools (e.g., Terraform, Ansible)
Bonus: Knowledge of ClickHouse internals or database scalability best practices

🎁 What We Offer

💰 Competitive salary: €62,000 to €109,000
📈 Stock options package
🏖️ 22 days of vacation, plus your birthday and public holidays
🏥 Comprehensive health coverage
🏡 Fully remote work flexibility
🪑 €2,400 home office setup allowance
🌎 Optional visits to our offices in Madrid or NYC

🤝 How We Work

Remote-first: Work from anywhere, anytime
Collaborative & transparent: You’re always in the loop
Impact-driven: Your work will shape our platform, product, and culture
Fast-paced & fun: We’re still small, so you’ll wear multiple hats and grow fast

📌 FAQs – Site Reliability Engineer at Tinybird

❓ Is this a fully remote role?

Yes. We’re a remote-first company with global team members. You can work from anywhere.

❓ What is the on-call expectation?

You’ll be part of a rotating on-call schedule, supporting production infrastructure and customer-facing systems.

❓ Do I need prior experience with ClickHouse?

It’s a plus, but not mandatory. We’ll support you in learning and mastering it.

❓ What time zone do you prefer for this role?

We’re flexible, but overlap with European time zones is a bonus.

❓ Is this role more DevOps or more backend?

A bit of both. You’ll work deeply with infrastructure but also with backend teams to influence architecture.

🧠 How to Crack the SRE Interview at Tinybird

🧩 1. Understand our stack and philosophy.

Read up on ClickHouse, OpenResty, Redis, and Grafana. Show that you understand high-availability architecture and the challenges of real-time data.

🧪 2. Show you can debug and iterate.

You might get asked to walk through a production incident. Explain how you’d identify, debug, and resolve the issue—including communication steps.

🛠 3. Talk about automation and systems design.

Be prepared to sketch out how you’d scale an architecture for increasing load or handle self-service resource provisioning.

🧬 4. Think like an engineer and a product owner.

We value people who consider business impact, not just system uptime.

✈️ Ready to Take Flight?

If you’re excited about building infrastructure that supports real-time data innovation, we’d love to meet you.

👉 Apply now and help us scale Tinybird to the next level.

Support Job Site Reliability Engineer-job id-9016