Staff SRE · Platform Engineering · DevSecOps

Hi, I'm Mohammad
Ali.

I design, scale, and operate cloud-native systems — building reliable, observable platforms that power enterprise products at scale.

AWS GCP Azure Kubernetes Terraform Datadog CI/CD DevSecOps OpenTelemetry
01.

About Me

I'm a Lead SRE and Platform Engineer based in Sterling Heights, MI, with 8+ years designing, scaling, and operating cloud-native infrastructure across AWS, GCP, Azure, and Kubernetes at enterprise scale.

At Ford Motor Company, I own service reliability architecture for large-scale customer-facing platforms — defining SLOs, SLIs, and error budgets, and leading enterprise-wide DevSecOps transformation across 30+ engineering teams.

I built Ford's centralized Internal Developer Platform using Backstage, standardizing golden paths for CI/CD, secrets management, and service onboarding across the organization. I also designed the enterprise observability stack — full-stack metrics, logs, traces, RUM, and synthetic monitoring using Datadog and Dynatrace.

I'm passionate about developer experience, incident response philosophy, and building platforms that reduce cognitive load — turning operational complexity into reliable, automated systems that teams can trust and own.

I hold a B.S. in Computer Science from Wayne State University.

8+
Years of Experience
35%
MTTR Reduction
40%
Fewer Repeat Incidents
30+
Teams Standardized
02.

Skills & Tools

☁️

Cloud Platforms

AWS, Google Cloud Platform, Microsoft Azure

🐳

Containers & Orchestration

Kubernetes, OpenShift, Docker, ECS, EKS

🏗️

Infrastructure as Code

Terraform, Ansible

🔄

CI/CD

GitHub Actions, Jenkins, Tekton, Cloud Build

📊

Observability

Datadog, Dynatrace, OpenTelemetry — metrics, logs, traces, RUM, synthetics

🔐

Security & DevSecOps

IAM, Policy Enforcement, Fossa, SonarQube, Checkmarx

💻

Languages

Go, Python, Bash, JavaScript, TypeScript, C#

🛡️

SRE Practices

SLOs, SLIs, Error Budgets, On-call Optimization, Capacity Planning

03.

Experience

Lead Software Engineer — Site Reliability Engineering
Ford Motor Company
Sept 2024 — Present
  • Own service reliability architecture for large-scale, customer-facing cloud platforms, defining SLOs, SLIs, and error budgets.
  • Reduced MTTR by 35% by redesigning alerting strategies, runbooks, and observability-driven triage workflows.
  • Designed enterprise observability systems using Datadog and Dynatrace — metrics, logs, traces, RUM, and synthetic monitoring.
  • Implemented Terraform-based automation for monitoring/alerting infrastructure across AWS and GCP.
  • Led Sev-1 incident mitigation, reducing repeat incidents by 40% through permanent corrective actions.
  • Performed deep capacity planning and scalability analysis for distributed Kubernetes workloads.
SLOs/SLIs Datadog Dynatrace Terraform AWS GCP Kubernetes
Lead DevSecOps Engineer — Cloud Engineering
Ford Motor Company
Sept 2021 — Sept 2024
  • Architected enterprise CI/CD pipelines using Terraform, Jenkins, GitHub Actions, Tekton, and cloud-native services.
  • Designed and deployed a centralized internal developer platform using Backstage.
  • Embedded DevSecOps controls into CI/CD — IAM enforcement, policy validation, artifact scanning, audit logging.
  • Led large-scale containerization and migration to AWS ECS and EKS, increasing cloud-native adoption by 40%.
  • Standardized build and release workflows across 30+ teams, improving delivery reliability.
GitHub Actions Tekton Backstage EKS OpenTelemetry DevSecOps
Lead Kubernetes Software Engineer
Ford Motor Company
Sept 2018 — Sept 2021
  • Led enterprise migration from VM-based workloads to OpenShift/Kubernetes, accelerating cloud-native adoption by 50%.
  • Owned production Kubernetes platform reliability, ensuring 99.9% uptime for critical workloads.
  • Designed Tekton-based Kubernetes-native CI/CD pipelines to replace legacy Jenkins workflows.
  • Implemented RBAC, network policies, namespace isolation, autoscaling, and quota management.
  • Developed automation for cluster upgrades and incident recovery, reducing manual effort by 35%.
Kubernetes OpenShift Tekton RBAC Autoscaling
Software Engineer — Cloud Engineering
General Motors
Jun 2016 — Sept 2018
  • Developed automated testing frameworks integrated into CI/CD pipelines.
  • Conducted load, stress, and performance testing for cloud-hosted applications.
  • Contributed to early cloud adoption via deployment automation and production validation.
  • Performed root cause analysis and production monitoring for cloud-based systems.
CI/CD Testing Cloud Monitoring
04.

Key Impact

35% MTTR Reduction

Redesigned alerting strategies, runbooks, and observability-driven triage workflows to dramatically reduce mean time to resolution.

🛡️

40% Fewer Repeat Incidents

Led Sev-1 incident mitigation with cross-functional coordination and permanent corrective actions that eliminated recurring issues.

🚀

50% Cloud-Native Acceleration

Led enterprise VM-to-Kubernetes migration at Ford, dramatically accelerating cloud-native adoption across the organization.

🏢

30+ Teams Standardized

Standardized CI/CD build and release workflows across 30+ engineering teams, reducing deployment inconsistencies.

🔬

Enterprise Observability

Designed full-stack observability platforms using Datadog & Dynatrace covering metrics, logs, traces, RUM, and synthetics.

🔒

DevSecOps by Design

Embedded security controls — IAM, artifact scanning, policy validation — directly into CI/CD pipelines across the enterprise.

🏛️

Internal Developer Platform

Designed and deployed a Backstage-powered IDP at Ford, giving 30+ teams golden path templates for CI/CD, service onboarding, and observability — reducing new service time-to-production from weeks to days.

☁️

40% Cloud-Native Adoption Increase

Led large-scale containerization and migration to AWS ECS and EKS, with standardized pipelines and DevSecOps controls embedded at every stage of the delivery lifecycle.

🔧

35% Ops Automation Gain

Built automation for cluster upgrades, incident recovery, and capacity scaling — reducing manual operational effort by 35% and enabling the SRE team to focus on higher-leverage reliability work.

05.

Philosophy

🎯

Reliability is a Product Feature

SLOs aren't just metrics — they're a contract with your users. I design reliability systems that translate technical signals into business outcomes, giving teams the data to make confident risk decisions instead of reactive ones.

🔁

Toil is the Enemy of Scale

Every manual runbook step, every click-to-deploy, every alert that requires human interpretation is debt. I build platforms that automate the predictable so engineers can focus on the novel.

🏗️

Platforms Over Point Solutions

The best infrastructure work is invisible to developers. I design internal platforms with golden paths — opinionated defaults that make the right thing the easy thing, reducing cognitive load at scale.

06.

Case Studies

Enterprise Observability Platform at Ford

Datadog Dynatrace OpenTelemetry AWS GCP
Problem

Ford's customer-facing platforms lacked unified observability. Teams were operating in silos with inconsistent alerting, no SLO definitions, and MTTR averaging over 45 minutes for Sev-1 incidents.

Approach

Designed and implemented a full-stack observability platform using Datadog and Dynatrace — standardizing metrics, structured logging, distributed tracing, RUM, and synthetic monitoring. Rebuilt alerting from noise-based to signal-based using error budget burn rate alerts. Authored org-wide SLO/SLI frameworks and runbook standards.

Outcome

Reduced MTTR by 35%. Reduced repeat Sev-1 incidents by 40%. Gave leadership real-time SLO dashboards for business-critical user journeys.

Internal Developer Platform (IDP) with Backstage

Backstage Kubernetes GitHub Actions Terraform DevSecOps
Problem

30+ engineering teams at Ford operated with inconsistent CI/CD pipelines, duplicated infrastructure boilerplate, and no standardized service onboarding. New services took weeks to reach production-ready state.

Approach

Architected and deployed a centralized Internal Developer Platform using Backstage as the developer portal. Defined golden path templates for service creation, CI/CD pipeline setup, secrets management, and observability integration. Embedded DevSecOps controls — IAM enforcement, artifact scanning with Fossa/SonarQube/Checkmarx, and policy validation — directly into the platform.

Outcome

Standardized build and release workflows across 30+ teams. Reduced new service time-to-production from weeks to days. Increased cloud-native adoption by 40%.

Enterprise VM-to-Kubernetes Migration

Kubernetes OpenShift Tekton Terraform RBAC
Problem

Ford's workloads were running on aging VM-based infrastructure with poor resource utilization, slow deployment cycles, and limited scalability for bursty traffic patterns.

Approach

Led the enterprise migration from VM-based workloads to OpenShift/Kubernetes. Designed Tekton-based Kubernetes-native CI/CD pipelines to replace legacy Jenkins workflows. Implemented RBAC, network policies, namespace isolation, autoscaling, and quota management. Built automation for cluster upgrades and incident recovery.

Outcome

Accelerated cloud-native adoption by 50%. Achieved 99.9% uptime for critical production workloads. Reduced manual operational effort by 35% through automation.

07.

Get In Touch

Let's work together.

I'm open to senior SRE, platform engineering, and DevSecOps opportunities. Whether you want to discuss a role, a project, or just connect — my inbox is open.