How Goblins Cut Inference Time by 50%
Eli Mernit
About Goblins
Goblins uses AI to bring students one-on-one math support. Founded by a former math teacher and AI engineer, Goblins lets students draw math on any device using a digital whiteboard, and gives them instant feedback that builds the conceptual foundation they’re missing. Today, Goblins serves thousands of students across North America.
Goblins launched in early 2024, and required OCR functionality for handwritten text and diagrams, which historically involved a patchwork of segmented processing and third-party OCR providers.

Challenges Before Beam
Goblins has a real-time app, which means inference performance and reliability are critical to their product experience.
Before turning to Beam, Goblins ran their OCR models on another serverless GPU provider, and suffered from long boot times, reliability issues, and high costs.
Last Spring, Goblins began search for a inference provider with leading performance and reliability.

Achieving Faster Inference with Beam
Goblins was able to easily migrate their workloads to Beam.
The team replaced their prior setup with Beam for inference tasks, leveraging 4090 GPUs for their OCR model.
In addition, the team used Beam’s scale-to-zero functionality to maintain minimal instances during off-peak hours (e.g., nights) and spin up GPUs during high-traffic periods.
"Our product is used in schools and during the evenings for at-home practice, so we need our GPUs to spin down at night when students aren’t using it. Beam lets us scale dynamically without paying for always-on GPUs.” - Alp Karavil, CTO @ Goblins
And when it comes to debugging, Beam’s developer workflow helps Goblins debug their apps faster than their previous provider.
Scaling OCR Inference
Since moving to Beam, Goblins is achieving 50% faster inference on their OCR models, which significantly enhances the user experience for students.
"Beam was a lot faster than we thought. We assumed it would be better [than our previous provider], but the performance blew us away.” - Alp Karavil, CTO @ Goblins
In addition, running inference on Beam cut monthly costs from over $1,000 to under $600 for certain workloads. These savings allowed the team to scale up their GPU utilization strategically.
Make sure to check out Goblins and give their app a try today!

Keep Reading

The Top Serverless GPU Providers in 2025, Ranked by Cold Start
In this article, we'll break down the top serverless GPU providers by cold start times.
Eli Mernit
Building a Modern Serverless Cloud for Bioinformatics
Today, the cloud feels a bit like programming with punch cards. And we think there's a better way.
Eli Mernit