How I Saved $2500 a Month Killing Heroku

3/2023 Aman Azad

Heroku is Dead Heroku is Dead

Over at Legaci we've been a Heroku shop for the past few years because as a Platform as a Service, they offer hands down the best ability for us as a dev team to iterate fast while spending 0.01% of our time on infrastructure.

This was an issue however, when we were onboarding our biggest client to date K-pop artist BI onto our platform. Our servers fell over three times, almost risking us the relationship. My cofounder and myself are engineers by trade. I give ourselves a pass if we fumble on other parts of growing a startup like maintaining our content marketing, or accounting, or sales. But to fumble on the technical side hurts our pride.

So on top of being on the most expensive Heroku plan at a little more than ~$2500/mo and our servers going down three times, I decided it was time for us to graduate to AWS. I want to use this space to outline what exactly we were paying for on Heroku, delve into the infrastructure I eventually set up on AWS. This by far is not an entirely optimal set up (I hope the internet lets me know if I have glaring issues), but as a team we need to optimize time spent on finding PMF. From start to finish, the process cost us roughly 2 weeks of development time. Not a bad trade-off.

Cost breakdown on Heroku.

What went into that Heroku cost exactly? Here's a breakdown:

  • $2000 - 4x Performance L Dynos set to auto-scale to 10 (Heroku talk for compute resources, maximum spend of $500*10 dynos $5000/mo)
  • ~$100/mo Redis cache
  • ~$200/mo Postgres
  • ~$5/mo Logging with Mezmo
  • ~$10/mo Staging environments

I'm fully aware we were using far more compute resources that were necessary, but again a through line here is our team needs to priortize development speed. We probably could've gotten away with Heroku for much longer had we sat down and optimized our code, but frankly as a pre-PMF company it's not a good use of time.

AWS infrastructure.

We chose a very standard AWS infrastructure based on ECS. The full list of services used is below:

  • RDS for our PostgresDB
  • Redis Labs instance for caching layer
  • AWS Secrets Manager for environment variables
  • ECR to host our docker builds
  • ECS clusters and services to host our main backend app, and workers
  • ELB for load balancing
  • CodePipeline for automated deploys on pushes to Github
  • CloudWatch for monitoring
  • Distributed Load Testing AWS CloudFormation template

I know some folks swear by Terraform or CloudFormation templates, but we wanted to get the migration done ASAP, and I saw trying to grok the config languages more trouble than using the AWS console. All the infrastructure above was created and setup using the AWS console, and took very little time to scaffold up.

Serverless.

We had a lot of success using NextJS in our frontend app and deploying on Vercel that we wanted to continue using serverless on our backend if possible. We considered Elastic Beanstalk, but coming from Heroku we know we liked the ease of deployment, but we wanted a bit more control over our infrastructure.

Github pushes and monitoring.

One thing Heroku had built out amazingly well was quality of life around deploy triggers from pushes to Github branches, and monitoring. We wanted to fully replicate the ease of pushing to our branches to trigger deploys so CodePipeline was a must for this implementation. Load testing.

This is an amazing tool AWS has developed, their Distributed Load Testing CloudFormation Template. We were able to get the tool up and running using their prebuilt template, record network requests made on page load using JMeter, and get operational with load testing before release in a matter of hours. More on this below.

Changes needed in our app.

Dockerizing the backend. The biggest blocker to getting our backend migrated into AWS was our backend was not serverless. We had a monolith JavaScript app running our GraphQL server, and one additional worker process. I wanted an easy deploy process on AWS so the first order of business was to dockerize our backend services. All of our code lives in a single NodeJS-TypeScript monorepo, so we have two separate Dockerfiles to build our web and worker images:

The two dockerfiles for the apps load and run different NodeJS start commands for our two core processes:

/docker/web/Dockerfile
/docker/worker/Dockerfile

In our docker-compose.yml:

# /docker-compose.yml

web:
  ...
  dockerfile: ./docker/web/Dockerfile

worker:
  ...
  dockerfile: ./docker/worker/Dockerfile

Database migrations with knex.

We use objection/knex as our ORM in our backend app. Heroku made it extremely easy to run these knex migrations using their heroku run commands. As we were going serverless, we needed a way to run the migration node command yarn knex migrate:latest/rollback remotely. We added a protected endpoint to run these migrations manually via API endpoint, as I didn't see an immediate way to run a node command against an ECS service.

Migration plan and go-live.

With our infrastructure created, backend dockerized, and migrations handled, by and large we were ready to move over to AWS. We started moving everything over on our staging environments first, ran our load tests, migrated all data over to our production instances, ran load tests, then pointed our frontend to the new AWS infrastructure.

Data migration on staging.

For both our staging and production data we had to dump and restore our PostgresDB data, and our Redis cache. On staging this wasn't an issue as usage was just internal, so we ran the Postgres commands to dump the database, and restore the data on RDS, repeated the process for Redis.

Load testing.

Using the AWS tool previously mentioned, we ran our load testing tool against our new staging infrastructure with auto scaling set to what we'd use in production, and we successfully handled 3x the largest spike of traffic we had ever seen on our site! Great news. At this point we felt confident to migrate over our backend fully to AWS.

Database migration on production.

Production was tricky as users were live on the site mutating data. To solve this issue, we temporarily shut down our services with a maintenance mode landing page until our Google Analytics reported 0 active users, ran the database dump, and restored it to RDS, repeated this for our Redis cache.

At this point, all the production infrastructure was in place:

  • Domains were set up correctly with SSL
  • Load balancing and ECS was setup with enough compute resourcing and auto scale settings
  • Logging and monitoring in place
  • All data migrated over to AWS's RDS

As soon as the data dump finished restoring on AWS, we rebuilt our frontend on Vercel with the backend API URL pointing to our new AWS infrastructure, and disabled the maintenance mode from the site!

Pricing difference and performance.

Once we migrated our infrastructure over to AWS, our monthly estimated cost was going to run us roughly ~$400/mo. This was amazing to see as we saw with our load testing we were able to handle 3x the volume of traffic at essentially 1/4 of the price. But, we were even luckier.

There's no beating $FREE. When we incorporated our business using Stripe Atlas, we had been given a $10,000 credit for AWS that we never cashed in. Now that the bulk of our infrastructure was on AWS, it made sense to finaly activate those credits. 0-dollar backend server costs since.

We should've made the move much sooner, but with all things in building a startup, that switching cost came at a cost we never were quite willing to bear until a client relationship was at stake. I could've gone much deeper into the nitty gritty of how everything was glued together, but I hope this gives an over-arching picture of how we migrated for other folks in similar positions.