May 20, 2026·3min·Startup & Tech

I Ran AI on a Galaxy S20

No cloud. No internet. On a chip released back in 2020 — a language model was actually running inference, right in my hand.

I ran AI on a Galaxy S20.

No cloud. No internet. On a chip released in 2020.

At first I was just curious.

Could AI really run on a device you carry in your hand? With no server, no subscription, and my data never uploaded anywhere.

I installed Termux on the Galaxy S20. I put llama.cpp on it. I loaded a model.

It ran.

Twenty tokens per second.

You could call that slow. But this is the Galaxy S20, unveiled back in 2020. Samsung announced the Galaxy S20 series in February 2020, and even then it was touting a mobile revolution that fused 5G, AI, and IoT.

And now, on that old device, a language model was actually running inference.

Not a single line of cloud. Right in my hand.

In that moment, I became certain.

Until now, the AI market has revolved around the cloud. Massive data centers, enormous power draw, HBM, DRAM, foundries, tens of billions of dollars in infrastructure investment.

At the center of this current sits Samsung DS.

But cloud AI has structural limits.

First, it can't properly see the most personal data. My gallery, my notes, my routines, the context of how I use my apps. These are too private to upload to a server.

Second, the multi-step reasoning that truly personalized AI demands is too expensive to run in the cloud on every single query.

In the end, the last mile of personal AI is completed not on the server but on the device.

On-device AI solves both problems at once. Privacy and cost. Personalization and persistence. Speed and control.

That was when I started seeing Samsung DX in a new light.

Galaxy sits in the pockets of countless people around the world. Samsung is extending Galaxy AI beyond the S series into a broader mobile ecosystem, and has laid out a direction of growing its base of AI-equipped smartphones past 800 million units by 2026.

DX's new CTO is also worth watching. President Yoon Jang-hyun, who previously led Samsung Venture Investment, has moved up to CTO of the DX Division and head of Samsung Research. He's someone deeply connected to future technologies — software platforms, IoT, Tizen, AI, robotics.

The infrastructure is there. The devices are there. A clear direction is there too.

What had been missing until now was the runtime.

I spent seven months building exactly that.

MAEUM Runtime.

These are figures confirmed on an Apple M5, based on internal benchmarks.

Hallucination on unanswerable questions: 23% → 0%. Personal memory recall accuracy: 100% @ 733ms. Memory recall 8.5x faster than before. On-device inference speed: 70 tok/s. Utilizing up to 96% of the memory bandwidth ceiling.

And on the Galaxy S20, 20 tok/s.

If a 2020 chip can do this much, how far could it go on the latest Galaxy flagship?

The race over models will continue. But the next battlefield isn't just model size.

This is now the era of runtime engineering.

More than which model you use — where you run it. How cheaply you run it. How safely you handle personal data. How long, how fast, and how consistently you can run inference: that's what comes to matter.

If DS holds the infrastructure of cloud AI, DX stands at the front line of on-device AI.

I don't think the market has fully priced in this possibility yet.

But the moment AI comes down from the server to the device, the most important battlefield shifts.

That shift has already begun.

Watching AI run on a Galaxy S20, I saw it for myself.

Originally published on Brunch · May 20, 2026

Lee · Lee's Blueprint

Founder, MAEUM.io

Email [email protected] →

← Previous

Just Break Up Samsung Electronics

I Think Humans Are Pleasure Machines