How I Built TalentGrep: Finding Engineers by What They've Actually Built
TalentGrep analyzes GitHub profiles to surface real skill profiles — moving past resume keywords to what engineers have actually shipped.
TalentGrep analyzes GitHub profiles to surface real skill profiles — moving past resume keywords to what engineers have actually shipped.
I've been on both sides of technical hiring — applying to jobs where "3+ years of Python" is the bar, and reviewing stacks of resumes that all read identically. The fundamental disconnect: a resume says "Proficient in Python and machine learning." A GitHub profile shows you built a production ML pipeline handling 10K requests/second, wrote thorough tests, and maintained it for two years.
Those are completely different signals. TalentGrep was built to close that gap.
TalentGrep takes a GitHub handle and produces a structured skill profile:
GitHub gives you 5,000 API requests per hour. Sounds generous until you do the math.
A thorough profile analysis needs the user profile (1 request), their repo list (1-3 requests with pagination), then per-repo calls for languages, contributors, commit history, and README content (4 requests each). For someone with 30 repositories, that's roughly 125 requests — meaning you can analyze about 40 profiles per hour.
Three strategies made this workable:
Aggressive caching. Repository metadata changes slowly. I cache analysis results and only re-analyze repos with new commits since the last scan. Signal prioritization. Not every repo is equally informative. I rank by star count, commit frequency, README quality, and whether it's a fork. Top repos get deep analysis; the rest get lightweight treatment. Async processing. Profile analysis runs in the background. Users submit GitHub handles and get notified when results are ready. No one waits around.GitHub's built-in language detection tells you a repo is "73% Python, 27% JavaScript." That's nearly useless for hiring decisions. TalentGrep goes much deeper.
I analyze import statements, package manifests, and config files to identify specific tools: "This person uses FastAPI for APIs, PyTorch for ML, and Airflow for orchestration." That's a concrete signal a hiring manager can act on.
Project structure, file organization, and code patterns reveal architectural knowledge: clean separation of concerns, dependency injection, integration tests alongside unit tests. These patterns are hard to fake and highly predictive of senior-level capability.
Repository topics, README content, and code semantics combine to classify domains: "This person works primarily in fintech — payment processing, fraud detection — with ML infrastructure expertise." Knowing the domain matters as much as knowing the stack.
Not all commits are equal. TalentGrep scores on code complexity, test coverage signals, documentation quality, collaboration patterns (PR reviews given, issue discussions), and consistency of contribution over time.
Here's the product insight that shaped everything: hiring managers don't search for skill lists. They search for roles.
Nobody types "Python AND Kubernetes AND PostgreSQL." They want "backend engineer who can build data-intensive services." TalentGrep translates natural role descriptions into skill vectors and matches them against analyzed profiles. This required building a taxonomy that bridges observed code signals, skill abstractions, and how hiring managers actually think about roles.
Team composition analysis (identify skill gaps across an engineering org), interview question generation based on a candidate's actual projects, and contribution trending to track how skills evolve over time.
TalentGrep is live at talentgrep.com.