Dustin Gardner

I fix compute platforms when cost and scale break at the same time.

I fix compute platforms when cost and scale break at the same time. Only infrastructure engineer responsible for a certification-critical simulation platform—failure would have blocked program-level analysis.

Reduced a projected $6.5 M AWS trajectory to $1.5 M while scaling throughput 10× within the first year. Built and owned the entire cost-governed compute platform end-to-end as the sole infrastructure engineer on the program.

Reduced projected AWS spend by 77%

$6.5 M → $1.5 M over first year

1.5 B+ simulated flight seconds per week

Certification-scale throughput

30,000+ Monte Carlo runs per campaign

Across autoscaled EC2 fleets

Sole owner of the full platform stack

Every layer, end to end

Full architecture + cost breakdown →

This failure mode is common.

If your compute costs are rising faster than your output, you are already here.

Compute systems scale before cost control does. At first it looks like progress: more runs, more data, more throughput. Then it flips: cost grows faster than output, pipelines stall behind bottlenecks, and reliability issues appear only at scale.

When cost and scale break at the same time, the problem is not just financial. Teams slow down. Confidence in results drops. Critical decisions get delayed. In programs where simulation output drives certification or product direction, that becomes a schedule risk.

At that point the system is no longer accelerating the team. It is the bottleneck.

Cost-Governed Compute Platform

If you're dealing with this failure mode, this is the system I bring in.

Architecture Pattern

This pattern is not unique to aerospace. It shows up anywhere large-scale compute is pushed without tight cost and execution control:

  • Monte Carlo and simulation platforms
  • ML training infrastructure and GPU fleets
  • Batch compute systems with complex pipelines

See how this played out at scale →

Experience

Senior HPC Platform Engineer

2022 – Present

Blue Origin, Denver, CO

  • Replaced a failing simulation platform under active program pressure, reducing projected AWS spend from $6.5 M to $1.5 M while scaling throughput 10×. Sole infrastructure engineer. Owned every layer.
  • Broke apart tightly coupled simulation execution into a distributed orchestration platform bridging MATLAB/Simulink autocoded flight software into large-scale AWS compute. Tens of thousands of parallel runs per campaign.
  • Diagnosed and eliminated quadratic merge costs by architecting map-reduce data pipelines with tree-structured merge strategies and signal-level decomposition, sustaining linear performance at certification scale.
  • Owned campaign-scale reliability across 180,000+ stage executions: spot interruption recovery, scale-dependent race condition diagnosis, silent container failure detection.
  • Built a real-time cost governance system (Datadog, Prometheus) providing per-campaign and per-workload spend visibility; authored proposal to establish a dedicated HPC & Simulation Platform team across multiple spacecraft programs.
  • Sole technical bridge between GNC domain engineers and infrastructure, translating flight dynamics requirements into distributed compute architecture decisions.
  • Mentor 2 GNC engineers on CS fundamentals, HPC patterns, and cloud infrastructure, with 3–5 additional informal mentees.

DevSecOps Tech Lead / Staff Software Engineer

2017 – 2022

Lockheed Martin Space, Denver, CO

  • Managed CI/CD and simulation infrastructure for 100+ engineers across 80+ servers, sustaining 40,000 jobs/week at 99.5% uptime. Promoted through three levels from Testbed & Simulation Software Engineer to Staff Software Engineer.
  • Migrated 100+ legacy simulation and testbed workflows to cloud-native CI/CD in AWS with autoscaling compute.
  • Owned hardware-in-the-loop testbeds through multiple successful qualification events as simulation product owner.
  • Led Digital Twin initiative from concept to production as scrum master; delivered customer demonstrations to the Air Force.
  • Mentored junior developers as part of day-to-day work and broader team culture.

Software Engineer Intern

Summers 2013–2016

Lockheed Martin Space, Huntsville & Denver

Graduate Teaching Assistant

2012 – 2016

Tennessee Tech, Cookeville, TN

Skills

Distributed Compute Architecture
Large-scale Monte Carlo orchestration, map-reduce pipeline design, CPU affinity optimization, campaign-scale reliability engineering
Cloud Cost Engineering
AWS (EC2, S3, EKS), spot fleet strategy, workload-level cost attribution, autoscaling, Terraform, right-sizing
Platform & Observability
Datadog, Prometheus, Grafana, GitLab CI, Docker, Kubernetes, Openshift, infrastructure-as-code, Linux (RHEL, Ubuntu)
Programming
Python, C/C++, Bash, Git

Teams That Bring Me In

Education

M.S. Computer Science

May 2017 | 3.9 GPA

Tennessee Tech University

Advanced coursework in High-Performance Computing; Thesis: Autonomic Protection Systems

B.S. Computer Science, cum laude

May 2015 | 3.5 GPA

Tennessee Tech University

Minor: Statistical Mathematics

Other

Eagle Scout | Active in skiing, climbing, and mountaineering

Spending more but shipping less?

Typical engagements: architecture audit, cost reduction, platform redesign.