All Posts
Product BuildingFebruary 15, 20268 min read

How I Built TalentGrep: Finding Engineers by What They've Actually Built

TalentGrep analyzes GitHub profiles to surface real skill profiles — moving past resume keywords to what engineers have actually shipped.

TalentGrepGitHub APIAI/MLNext.jsProductHiring

Resumes Are Claims. Code Is Evidence.

I've been on both sides of technical hiring — applying to jobs where "3+ years of Python" is the bar, and reviewing stacks of resumes that all read identically. The fundamental disconnect: a resume says "Proficient in Python and machine learning." A GitHub profile shows you built a production ML pipeline handling 10K requests/second, wrote thorough tests, and maintained it for two years.

Those are completely different signals. TalentGrep was built to close that gap.

How It Works

TalentGrep takes a GitHub handle and produces a structured skill profile:

  1. Repository analysis — scans public repos for languages, frameworks, complexity patterns, and architecture decisions
  2. Contribution quality scoring — evaluates commit patterns, PR reviews, issue participation, and documentation
  3. Skill extraction — maps observed activity to a taxonomy that goes beyond languages into domains like "distributed systems" or "data pipeline architecture"
  4. Search and matching — lets hiring managers search by skill combinations, experience level, and domain expertise

Wrestling with GitHub's API

GitHub gives you 5,000 API requests per hour. Sounds generous until you do the math.

A thorough profile analysis needs the user profile (1 request), their repo list (1-3 requests with pagination), then per-repo calls for languages, contributors, commit history, and README content (4 requests each). For someone with 30 repositories, that's roughly 125 requests — meaning you can analyze about 40 profiles per hour.

Three strategies made this workable:

Aggressive caching. Repository metadata changes slowly. I cache analysis results and only re-analyze repos with new commits since the last scan. Signal prioritization. Not every repo is equally informative. I rank by star count, commit frequency, README quality, and whether it's a fork. Top repos get deep analysis; the rest get lightweight treatment. Async processing. Profile analysis runs in the background. Users submit GitHub handles and get notified when results are ready. No one waits around.

Skill Extraction Beyond Language Percentages

GitHub's built-in language detection tells you a repo is "73% Python, 27% JavaScript." That's nearly useless for hiring decisions. TalentGrep goes much deeper.

Framework Detection

I analyze import statements, package manifests, and config files to identify specific tools: "This person uses FastAPI for APIs, PyTorch for ML, and Airflow for orchestration." That's a concrete signal a hiring manager can act on.

Architecture Patterns

Project structure, file organization, and code patterns reveal architectural knowledge: clean separation of concerns, dependency injection, integration tests alongside unit tests. These patterns are hard to fake and highly predictive of senior-level capability.

Domain Classification

Repository topics, README content, and code semantics combine to classify domains: "This person works primarily in fintech — payment processing, fraud detection — with ML infrastructure expertise." Knowing the domain matters as much as knowing the stack.

Contribution Quality

Not all commits are equal. TalentGrep scores on code complexity, test coverage signals, documentation quality, collaboration patterns (PR reviews given, issue discussions), and consistency of contribution over time.

The Search Problem

Here's the product insight that shaped everything: hiring managers don't search for skill lists. They search for roles.

Nobody types "Python AND Kubernetes AND PostgreSQL." They want "backend engineer who can build data-intensive services." TalentGrep translates natural role descriptions into skill vectors and matches them against analyzed profiles. This required building a taxonomy that bridges observed code signals, skill abstractions, and how hiring managers actually think about roles.

Stack

  • Frontend: Next.js, TypeScript, Tailwind CSS, deployed on Vercel
  • Backend: Python analysis pipeline, PostgreSQL for storage
  • GitHub integration: GitHub App with OAuth for auth and API access
  • AI/ML: Custom classifiers for domain detection and skill extraction
  • Infrastructure: Vercel for the web app, background workers for analysis

What I Learned

Cold start is real. The product is useless without profiles to search. I bootstrapped by analyzing top contributors across popular open-source projects — about 10K profiles across different domains — before anyone could search. Public data still needs privacy care. GitHub profiles are public, but aggregating someone's work history into a searchable profile feels different. TalentGrep only processes public repos, clearly discloses what's collected, and lets developers opt out. Trust is non-negotiable in a hiring product. Build for both sides. The initial version was recruiter-only. But the most compelling feature turned out to be the developer view: "Here's how your GitHub profile looks to employers, and what you could improve." This became the primary growth driver. Accuracy over coverage, always. Accurate profiles for 50K developers beat noisy profiles for 5M. One bad recommendation erodes trust faster than ten good ones build it.

What's Next

Team composition analysis (identify skill gaps across an engineering org), interview question generation based on a candidate's actual projects, and contribution trending to track how skills evolve over time.

TalentGrep is live at talentgrep.com.

VS
Venkata Subramanian Srinivasan
Senior Data Scientist at Asurion | Georgia Tech Alumni
Share