How I Built Netflix & What Now: An AI TV Companion
Point your phone at the TV and ask anything. How I built an AI companion that identifies shows from camera captures and answers contextual questions — using Gemini vision, TMDB, and voice input.
Point your phone at the TV and ask anything. How I built an AI companion that identifies shows from camera captures and answers contextual questions — using Gemini vision, TMDB, and voice input.
You know the moment — you are flipping channels and land on something mid-scene. The show looks interesting, but you have no idea what it is. You could squint at the corner of the screen for a network logo, open Google, and try to describe what you are seeing. Or you could just point your phone at the TV.
That is exactly what Netflix & What Now does. Capture your TV screen, and AI identifies the show in under 2 seconds. Then ask any follow-up question — who is that actor, how many seasons are there, is it worth watching?
The flow is straightforward:
The key insight was using Gemini's vision capability for identification rather than traditional image recognition. Gemini can read on-screen text, recognize actors, identify show graphics, and even interpret scene context — far more robust than matching against a database of screenshots.
This is a phone-first experience. You are on the couch, phone in hand. A PWA made perfect sense:
I wanted hands-free interaction (you are holding snacks, remember). The app supports two voice backends:
Users choose based on their needs. No account required for either — the browser-native option works out of the box.
Instead of running my own API keys and dealing with billing, rate limiting, and abuse prevention, Netflix & What Now uses a bring-your-own-keys approach. Users enter their free Gemini and TMDB API keys, which are stored locally in the browser. Benefits:
The project is open source. Try it at netflix-and-what-now.vercel.app or check out the source on GitHub.