Open to new roles  ·  London  ·  Responds within 24h

Cloud infra built to last under load and attack.

I'm Sathish, a cloud platform and security engineer who designs and operates production infrastructure on AWS, GCP, and Azure. Automation, reliability, and security practices that hold up when it counts.

5+
Years Multi-Cloud
MSc
Cyber Security
AWS
Solutions Architect
OSCP
In Progress
Sathish Boini, Cloud Platform & Security Engineer
5+ Years
Cloud Experience
About

Building platforms that don't break under pressure.

Most cloud platforms are built to ship features fast. The ones I build are built to last under load, withstand attack, and adapt as threats evolve. That mindset runs through every part of my work.

From production infrastructure for multi-client engagements, to training the next generation of cloud and security engineers, to research on adversarial attacks against deep learning systems — the motivation is the same. I work with teams that take engineering seriously and want their foundations to hold up when it counts.

Specialties

Where I create the most value.

Cloud & DevOps Engineering

Design and operate production cloud platforms across AWS, GCP, and Azure. Infrastructure as code, CI/CD pipelines, observability, and platform engineering that teams actually rely on.

Cloud Security Engineer

Harden cloud estates end-to-end. IAM done properly, secrets management that holds up to audit, network controls, and GDPR-aware data handling baked in from day one.

AI & ML Infrastructure

Take AI systems from notebook to production. RAG pipelines, agentic systems, inference infrastructure, evaluation frameworks — with the engineering rigor that decides whether it really works.

Projects

Problems solved, results delivered.

Each entry walks through the real challenge, the approach, measurable outcomes, and the exact stack used.

01 Infrastructure Redesign −40% cloud spend

Cloud Cost Optimization & High Availability

The Problem

Client running $30k+/month AWS bills with no autoscaling, single-region setup, and 4-hour release windows caused by fully manual deployments. Any outage meant total downtime — no failover, no recovery plan.

How I Solved It

Rewrote the entire stack as Terraform modules, introduced multi-region failover via Route 53 health checks, replaced on-demand fleets with scheduled spot instances, and wired end-to-end CI/CD through GitHub Actions. Zero-downtime deploys from day one.

40% Cost Saved
60% Less Overhead
99.9% Uptime SLA
<5 min Deploy Time
02 Security Engineering SOC 2 Certified

Zero Trust Security Framework & Compliance

The Problem

A multi-cloud environment with overly permissive IAM policies, secrets hardcoded in environment variables, and no audit trail. Enterprise customers demanding SOC 2 compliance before signing — the current setup would fail every control.

How I Solved It

Designed zero trust architecture from scratch — least-privilege IAM rewrites across all services, HashiCorp Vault for secrets, GuardDuty + Security Hub for real-time detection, and a complete audit trail meeting SOC 2 Type II requirements.

85% Fewer Incidents
SOC 2 Type II Achieved
0 Critical Findings
100% Audit Coverage
03 Platform Engineering 50+ engineers unblocked

Kubernetes Internal Developer Platform

The Problem

50+ engineers blocked waiting for ops to provision environments. Deployments took days, no service visibility, and every team had invented their own snowflake process. Productivity was collapsing under manual ops toil.

How I Solved It

Built a self-service internal developer platform on Kubernetes. GitOps via ArgoCD meant any engineer ships by merging a PR. Standardised Helm chart library, Prometheus + Grafana dashboards for every service, automated provisioning cut the ops queue to zero.

70% Faster Deploys
50+ Engineers Enabled
0 Ops Tickets/Week
100% Observability
04 AI Infrastructure Sub-100ms latency

Production LLM & RAG Inference Platform

The Problem

LLM-powered product running in a Jupyter notebook with 3–8 second response times, no batching, no evaluation, and completely unmonitored in production. Costs were unpredictable and regressions invisible.

How I Solved It

Rebuilt the inference layer on FastAPI with async batching, a Pinecone-backed RAG pipeline, GPU autoscaling on EKS, and a full evaluation framework that catches quality regressions before they hit users. Latency dropped from seconds to milliseconds.

<100ms P95 Latency
65% Cost Reduction
100% Eval Coverage
10x Throughput
Credentials

Certifications & education.

AWS Solutions Architect

Designing scalable, reliable, and cost-optimised cloud architectures across AWS.

Certified

MSc Cyber Security

NCSC certified pathway. Advanced research in threat modelling, cryptography, adversarial ML, and secure system design.

Awarded

OSCP

Offensive Security Certified Professional — practical penetration testing, exploit development, and real-world adversarial techniques.

In Progress
Contact

Let's build something that holds up.

Open to full-time roles, contract work, and consulting. Whether you need someone to own your cloud platform, harden your security posture, or ship your AI system to production — I'd like to hear about it.

Available now  ·  responds within 24h

Send a message

Message sent.

I'll get back to you within 24 hours.