What it takes to build a live video calling app
A two-person video call looks finished in an afternoon. The real work is everything after — connecting strangers on bad networks, holding latency, matching people in real time, metering minutes, and keeping it safe. Field notes from shipping live video.
A two-person video call is the most deceptive thing we build. In a demo it takes an afternoon — drop in an SDK, point two phones at each other, wave. It looks finished. Then you ship it to real people on real networks, and you find out the afternoon bought you maybe a tenth of the actual product. The rest is everything that happens when the network, the wallet, and two strangers all refuse to behave.
We build live video and social-discovery apps for a living — the kind where someone opens the app, gets matched with a stranger, and is face to face within seconds. Here’s what that work actually involves, written for the founder weighing it up rather than the engineer who’ll inherit it.
Connecting two strangers is the hard part
Placing a call between two phones sounds like the whole job. It’s the easy half. The hard half is that the two phones have to find each other across the open internet, and most phones are hiding — behind home routers, office firewalls, and mobile carriers that quietly rewrite every connection. Two devices that both think they’re behind a wall can’t simply dial one another.
The fix is a relay server that sits in the middle and passes the video through when a direct connection can’t be made. Skip it, and a real slice of calls — roughly one in five, and more on locked-down office or cellular networks — never connect at all. The user doesn’t see a networking failure; they see a black screen and decide your app is broken. So a chunk of the early build goes to the plumbing nobody demos: signaling, relays, and a graceful answer for the call that won’t connect.
“Works on my wifi” and “works for a stranger on a train” are two completely different products.
Latency is a feeling, not a number
You can measure delay in milliseconds, but users feel it as awkwardness. Past about half a second of lag, a conversation stops working — people talk over each other, both stop, both restart, and it feels stilted in a way they can’t name and won’t tolerate. Real-time video lives under a tight budget, well under half a second from one person’s camera to the other’s screen, and every layer you add spends against it.
Holding that budget is ongoing work, not a setting you flip on. It means choosing infrastructure built for two-way conversation rather than one-way broadcast, killing the echo when both people use speakerphone, and degrading gracefully when the network turns bad — dropping video quality to protect the audio, because a frozen picture is survivable and choppy sound is not.
Matching people is its own real-time system
If your app pairs strangers — dating, friendship, social discovery, advice, anything one-to-one — the call is only half the product. The other half is the matchmaking layer underneath it: who’s online right now, who’s free, who wasn’t already matched a second ago, and how to put two of them together in the time it takes to tap a button. That’s a live, constantly shifting picture of who’s available that has to stay correct under load.
And it carries a chicken-and-egg problem most consumer founders underestimate: the empty room. The product is worthless until enough people are online at the same moment for a match to feel instant, and waiting in a dead queue is the fastest way to lose the users you paid to acquire. Designing for the cold start — how the app feels with ten people online, not ten thousand — is a product decision, not just an engineering one.
Every second on a call costs money
Streaming live video between two people isn’t free, and the bill scales with usage in a way most app features don’t. Each minute of a call carries a real infrastructure cost, which means two things founders rarely plan for. First, your unit economics live or die on that cost-per-minute and whatever you charge or earn against it — the math has to work before you scale, not after. Second, if you bill users by the minute, that meter has to be exactly right, and it has to survive dropped connections, the call that silently fails to end, and the reconnect that mustn’t be charged twice.
We treat call metering the way we treat payments: as something that has to be correct on the bad days, not just the happy path. A billing dispute over a handful of minutes erodes trust faster than almost anything else in a paid app.
Safety isn’t a setting — it’s part of the build
The moment you let strangers see each other live, you’ve taken on a duty of care, and it can’t be bolted on at the end. Live video between people who’ve never met needs the safety layer built in from the first weeks: easy in-the-moment tools to block, report, and skip; moderation that can act while a session is live; and a way to confirm the person on the other end is a real person and not a bot or a recording. App stores increasingly require it, users expect it without thinking about it, and the alternative is a product that becomes unusable — or unlistable — the moment it gets popular.
What we’d tell a founder starting one
Stripped down to the advice we give on the first call:
- Buy the media layer, don’t build it. Running your own global video infrastructure is a company in itself. Use a proven provider and spend your engineering on the experience, the matching, and the safety — the parts that make the app yours.
- Instrument every call from day one. Connection success rate, time-to-match, call length, why calls drop. Launch blind to why calls fail and you can’t fix the thing quietly killing retention.
- Design the empty room. Decide how the app feels with a handful of people online. The cold start is a launch problem, and assuming you’ll have a crowd on day one is how good apps feel dead.
- Budget for trust and safety now. Moderation, reporting, and verification aren’t a phase-two line item. They’re the cost of putting strangers on camera together, and far cheaper to build in than to retrofit.
None of this is meant to talk you out of it. Live video is one of the most engaging things you can put in front of people — there’s a reason the apps that get it right grow fast. It’s just that the demo is an afternoon and the product is everything after: the call that connects on a bad train, the match that lands before anyone gives up, the minute that’s billed correctly, and the stranger who feels safe enough to come back tomorrow.
That's the work we do every day. Tell us what you're shipping.