How I Built "Wordshot" a Word Game That Scales to 10 Million Words (And the Architecture Decisions That Made It Possible)

Most word games have a few thousand words. I collected 10.2 million. What started as "let me scrape some names" turned into a three-week hardwork that forced every architectural decision I made afterward.

This is the story of building WordShot, a real-time multiplayer word game where players race against time to answer categories using words that start with a randomly selected letter. Think Scattergories, but with 10.2 million words in the database, real-time WebSocket synchronization for 2-8 players, and architecture decisions that seemed obvious until they weren't..

It started simple enough. A word game needs words. Categories like Animals, Cities, Food, Names, the basics. I figured I'd grab some public domain word lists, maybe scrape Wikipedia for place names, and call it a day. A weekend project, tops.

Then I remembered: I'm Nigerian. This game would have Nigerian users. That means Yoruba names like "Ọmọkehinde" and "Adébáyọ̀" should be valid. Igbo names like "Chukwuemeka" and "Nneka" too. I couldn't just use American name databases.

So I started collecting.

The Bible Problem

The game has a "Bible" category for biblical references. Simple, right? Just grab some names from the King James Bible and move on.

Except... how many biblical names are there actually?

First attempt: YouVersion API. Found about 800 common names (Abraham, Moses, David). Felt good about myself.

Second attempt: Seminary documents. Found open-source theological databases. YAML files with genealogies and cross-references. Another 1,200 names (Zerubbabel, Mahershalalhashbaz, yes these are real).

Third attempt: Bible JSON projects on GitHub. Multiple repos with structured Bible data. Different translations had different transliterations. Another 1,000 unique spellings.

I wrote a script that parsed all three sources. Recursively extracted names from nested structures. Deduplicated. Normalized Unicode (Hebrew names have diacritics). Final count: about 3,000 unique biblical names.

The Yoruba Names Challenge

This was harder. There's no "official Yoruba name database." It's an oral tradition, passed down through families. Different regions spell names differently. The romanization isn't standardized.

I needed AI.

I built an iterative script that prompted Claude, OpenAI, and Gemini in rounds. The approach: generate letter combinations (aa, ab, ad, ak...) and ask each AI provider for authentic Yoruba names starting with those letters.

Why this worked:

Each AI provider has different training data
Rotating providers gave me diverse results
Letter combos forced systematic coverage
Incremental saves meant I didn't lose progress if something failed

I let this run overnight. Next morning: about 12,500 unique Yoruba names.

But I wasn't done. I repeated the process for Igbo names, Hausa names, Swahili names. By the end, I had about 105,000 African names that no American word game would ever have.

The Dictionary Breakdown

I found public domain dictionaries online. Massive 50MB text files with words and definitions. Hundreds of thousands of words. But they weren't categorized. I needed to know: is "aardvark" an animal? Is "abacus" a thing?

I wrote a parser that used definition keywords to categorize. Animal keywords: mammal, bird, reptile, fish, insect. Food keywords: edible, fruit, vegetable, dish. Place keywords: city, town, country, location.

This categorization wasn't perfect. "Apple" could be a food or a company. Context matters. But for a first pass, keyword matching got me about 40,000 categorized words.

The Maps Scraping Mission

For the "Place" and "City" categories, I needed real locations. Not just "Paris" and "London", I wanted "Ogbomoso" and "Enugu" and "Zanzibar”, and even niche places like “Ejigbo”, “Iwo”, or “Ikoyi” etc.

I scraped Google Maps (Places API, 1000 requests/day free tier), Apple Maps (browser automation with Puppeteer), and Wikipedia ("List of cities in [country]" pages).

I ran this for every country on Earth. Took 2 days because rate limiting was brutal. Final count: about 115,000 place names.

The Pipeline

Now I had data from multiple sources:

Bible: 3,000 names
African names: 8,000
Dictionary: 40,000 words
Maps: 15,000 places

But the formats were all different. Some had uppercase, some lowercase. Some had Unicode, some ASCII. Some had duplicates across sources.

I needed a unified pipeline. The logic: normalize Unicode, lowercase everything, deduplicate by word:category key, validate (minimum length, no special chars), extract first letter, find aliases.

I ran this pipeline. Deduped everything. Validated each entry. Final database size: 10.2 million words across 13 categories.

The Alias Problem (Still Ongoing)

Even with 10.2 million words, there's a problem: spelling variations. "Grey" vs "Gray". "Judgement" vs "Judgment". "Theatre" vs "Theater". UK vs US vs EU spellings.

A player types "Grey" but the database has "Gray." They score zero. That's frustrating.

I built a background cron job that runs nightly. It evaluates words per night. For each word: ask AI providers for aliases, scrape dictionary sites for alternate spellings, use logic for UK/US/EU variations (our vs or, re vs er, ise vs ize).

This job processes words per time. The most common words get evaluated first. The long tail can wait.

The Feedback Loop

There's another spawn job that runs in the background. When a player submits an answer that's marked wrong (not in database), the system queues it for re-evaluation.

The process: ask AI if the word is valid for that category, web scraping for verification, combine signals. If valid, add to database and credit the user (future feature).

This means the database grows over time. Players teach the system. If 10 players submit "Oko" for "City" and it's actually a valid Nigerian city, the system learns. Next player who uses "Oko" gets points.

Why This Matters

The word collection wasn't just data entry. It forced every architectural decision I made afterward:

10.2 million records made database indexes mandatory (queries were 150ms without them)
Progressive evaluation made background jobs non-blocking (can't freeze gameplay)
Spelling variations made the alias system complex (can't just check exact matches)
Multiple sources made the data pipeline critical (can't manually merge formats)
Continuous growth made the database design support writes during gameplay

If I'd stopped at 100,000 words, I wouldn't have learned these lessons. The scale forced me to build better systems.

Three weeks of word collection. Ten million words. Every architecture decision afterward was shaped by this foundation.

Feature-Sliced Design (The Decision I Got Right From Day One)

I've built systems that turned into spaghetti. I wasn't doing that again.

Before writing a single line of game logic, I knew three things:

The game would have multiple modes (single-player, multiplayer, demo)
Features would grow independently
Folder-by-type (/components, /hooks, /utils) was a trap I'd fallen into before

I chose Feature-Sliced Design from the start. Not because it was trendy, but because I'd seen what happens without it.

The Traditional Hell

On one of my old project, the structure looked like this: everything in /components, /hooks, /utils, /types. 47 components. 23 hooks. 31 utility files. 18 type files.

What happened:

Want to understand multiplayer? Grep across 4 folders
Change one feature? Touch files in every folder
Onboard a new dev? "Good luck understanding how this all connects"
Remove a feature? Hope you found every related file

It was chaos. Every feature touched every folder. No clear boundaries.

The FSD Approach

This time, I organized by feature. Each feature owns its complete stack: API calls, state management, UI components, types, routing.

The structure:

src/
├── features/
│   ├── game/           (single-player)
│   ├── multiplayer/    (multiplayer)
│   └── demo/           (walkthrough)
└── shared/             (cross-feature)
    ├── services/
    ├── hooks/
    └── ui/

What this gives me:

Clear feature boundaries. Want to understand single-player? Everything is in features/game/. API calls, state management, UI components, types, routing, all in one place.

Parallel development. I built the demo mode while multiplayer was still in progress. Zero conflicts. They don't share code except for shared/.

Easy deletion. When I considered removing the demo feature, I looked at the features/demo/ folder. That's it. No hunting across the codebase.

Feature-level testing. Each feature can be tested in isolation. Mock the API layer, test the provider, verify the UI.

The Demo Mode That Took 2 Days

The clearest proof that FSD worked: I built the demo mode in 2 days.

What is demo mode? Interactive walkthrough for first-time users. Shows how the game works step-by-step. No API calls, no real gameplay, just guided UI.

Why it was fast: Created features/demo/ folder, copied UI components from features/game/, wrote a simple state machine for the walkthrough, hooked it into routing. Done.

No refactoring. No "how do I isolate this from the main game?" questions. FSD already had the answer: it's a separate feature.

If the codebase was folder-by-type, I'd still be untangling dependencies.

The Rule I Follow

Not everything goes in shared/. There's a rule: if code is used by 2+ features, move it to shared/. Otherwise, keep it in the feature.

Examples moved to shared: Button component (used everywhere), sound service (used everywhere), cache service (used by game and multiplayer).

Examples that stayed in features: roulette screen (only single-player), WebSocket provider (only multiplayer), role encoding (multiplayer-specific).

This prevents premature abstraction. Code starts in a feature. If another feature needs it, then we move it to shared/.

The Minimal State Philosophy

React developers love state. I learned to hate it.

Not because state is bad. But because unnecessary state is a bug waiting to happen.

The Previous Project Where State Killed Me

On my last project, we used Redux. Every feature dumped its state into the global store. Why? "Because we might need it elsewhere."

What happened: 37 action creators, 28 reducers, selectors everywhere. No one knew what was in the store at any given time. Debugging meant logging the entire state tree. Updates triggered re-renders across unrelated components.

The final straw: A dev updated game.currentRound and accidentally broke the notification badge because the badge subscribed to the entire game object, not just game.roundsComplete.

I swore off Redux after that project.

The 10.2M Words Problem

With 10.2 million words in the database, I became paranoid about state.

Question: Should I cache the entire word database in Redux?

Math: 10.2M words × 100 bytes each = 1GB. JavaScript heap limit: 1.5GB. Answer: Hell no.

Question: Should I cache the current round's valid words in state?

Math: Player sees 3-5 categories per round. Each category has about 5,000 valid words. 5 categories × 5,000 words × 100 bytes = 2.5MB. Answer: Maybe, but probably overkill.

Question: Should I cache the player's answers in state?

Math: Player submits 3-5 answers per round. 10 rounds max = 50 answers total. 50 answers × 50 bytes = 2.5KB. Answer: Yes, this makes sense.

The scale forced me to be ruthless about what deserved to be in state.

My Decision Matrix

I made a decision matrix before writing any state management code:

For this project:

2 features share state (Game and Multiplayer)
State transitions are simple (Player answers, Validation, Results)
No time-travel debugging needed
No middleware needed

Verdict: Context API wins.

The Three-Layer State Architecture

Layer 1: Global State (Minimal). App-level state: error boundary state, sound preference. That's it.

Layer 2: Feature State (Context API). Each feature has its own provider. GameProvider for single-player. MultiplayerProvider for multiplayer.

Layer 3: Component State (Local UI). Everything else is component-local. Input values, modal visibility, selected tabs. This state doesn't need to be global. It's purely UI concern.

The Multiplayer Multi-Provider Hierarchy

Multiplayer has two providers: WebSocketProvider (connection layer) and MultiplayerProvider (game state layer).

Why two providers?

WebSocketProvider handles low-level connection: socket instance, connection status, reconnection attempts, send message wrapper.

MultiplayerProvider handles game logic: room data, player list, game phase, actions (create room, join room, start game).

Why separate? WebSocket can reconnect without resetting game state. Game state can be manipulated independently of connection status. Testing: Mock WebSocket provider, test game logic in isolation.

What I Keep in State vs What I Don't

What's in state: Current game ID, current round number, player answers, multiplayer room data, WebSocket connection status.

What's NOT in state: Word database (way too big), validation results (fetched once from API), game history (stored in localStorage), sound effects (singleton service), UI animations (Framer Motion).

The Rule: Derive State When Possible

If data can be calculated from existing state, don't store it separately.

Bad: Store currentRound, totalRounds, and isLastRound. Now you have to keep isLastRound in sync.

Good: Store currentRound and totalRounds. Calculate isLastRound = currentRound === totalRounds.

One source of truth. No sync bugs.

The ServiceResult Pattern (Why I Stopped Using Exceptions)

Exceptions made sense until I had to debug them in production at 2 AM.

The problem isn't exceptions themselves. It's that exceptions make your code lie. A function signature says it returns User, but secretly it might throw UserNotFoundError or DatabaseConnectionError or ValidationError. You don't know until it happens.

The Production Bug

Backend code, Node.js + Express. User starts a game. Backend throws LetterSelectionError (couldn't find enough valid letters). Caught by the generic catch block. Returns 500 error. Frontend shows "Internal Server Error."

But it's not a server error. It's a validation error. The user selected incompatible categories. But the generic exception handling made everything a 500.

I had to dig through logs to find the real error. 2 AM. Production down.

The Problem With Exceptions

Hidden control flow. Non-local reasoning. Lost type safety. Boilerplate everywhere.

To understand what a function can fail with, you have to read its entire implementation and every function it calls. TypeScript can't tell you what exceptions a function throws. The type system is blind to error paths.

The ServiceResult Pattern

Every service method returns ServiceResult<T>. It's a discriminated union: either { success: true, data: T } or { success: false, error: string }.

Every caller must check .success. TypeScript enforces this. Try to access .data without checking? Compiler error.

Why this works at scale: With 10.2 million words, errors become common. Player types a word not in database. Player selects rare letter with not enough categories. Player submits too fast and hits rate limit. These aren't exceptional. They're normal operation.

With exceptions, you have to know every possible exception. Miss one, app crashes.

With ServiceResult, all error paths funnel through !result.success. One check. No surprises.

Type Safety Across 50+ WebSocket Events

This pattern extends to WebSocket. Room created response: either { success: true, data: Room } or { success: false, message: string }.

Benefits: Discriminated unions enforce success checks. TypeScript provides autocomplete for both paths. No silent failures.

The Pattern That Looked Verbose But Saved Me

Yes, ServiceResult adds lines of code. Before was 1 line. After is 4 lines.

But here's what I gained:

Every error path is visible. No hidden throws. Compiler enforces error handling. Errors are data, not control flow. Consistent response format across all API endpoints.

Measured impact:

Before ServiceResult: 23 unhandled exceptions in production (first month). 7 of those were user-facing crashes. Average debug time: 45 minutes.

After ServiceResult: 0 unhandled exceptions. 0 user-facing crashes from error handling. Average debug time: 5 minutes.

Database Design: The Denormalization Decision

Normalizing 10.2 million words seemed right. It was 98% slower.

This is the section where I learned that database theory and database reality are different things.

The Naive Approach

Initial schema: just word, category, and aliases. No startsWith field. Calculate it on-the-fly with MongoDB's $substr operator.

Query to validate "Apple" in "Food" starting with "A": use $expr with $substr to check first character.

Performance: Full collection scan (COLLSCAN). 10.2 million documents examined. Query time: about 400ms.

Why so slow? MongoDB can't index computed fields. The $substr expression runs on every document. No way to optimize it.

The Denormalization Decision

I added a startsWith field. Just one extra byte per document. Store the first letter explicitly.

Query becomes simpler: match on word, category, and startsWith.

Performance with compound index: Index scan (IXSCAN). About 89 documents examined (only words starting with 'a' in 'food'). Query time: less than 5ms.

Improvement: 98.75% faster.

The Trade-Off Analysis

Cost of denormalization: Extra field, 1 byte per document. 10.2M documents × 1 byte = 10MB. Disk space: Negligible.

Maintenance cost: One pre-save hook to keep startsWith in sync. 3 lines of code.

Benefit: 98% faster queries. No full collection scans. Scales to billions of words.

Verdict: Worth it.

Compound Index Strategy

I needed two indexes:

Index 1: { startsWith: 1, category: 1 } for answer validation (most common query).

Index 2: { category: 1, startsWith: 1 } for cache building (startup).

Why both orders? MongoDB can only use an index if the query matches the prefix of the index. Without both indexes, one query pattern would be slow.

Measured impact: Validation went from 400ms to 5ms. Cache build went from 800ms to 8ms.

Embedded vs Referenced Documents

Another decision: How to store game sessions?

Option 1: Embedded. Everything in one document. Players, rounds, results, all nested.

Option 2: Referenced. Separate collections. Game session has player IDs pointing to Players collection, round IDs pointing to Rounds collection.

Decision matrix: Embedded wins on query complexity (1 query vs 3+), atomicity (single doc update vs multi-collection transaction), and game summary speed (10ms vs 100ms).

Why embedded won: Self-contained games (4-10 rounds, never more). Document size well below 16MB limit. Single query faster. Atomic updates prevent race conditions. Simpler code.

Document size analysis: Worst case 8 players, 10 rounds, 5 categories per round. About 63KB total. Well within 16MB limit.

The Background Jobs That Never Stop

The word collection ended. The word refinement never will.

The Alias Evaluator

Even with 10.2 million words, the alias problem persists. "Grey" vs "Gray". "Judgement" vs "Judgment". UK vs US vs EU spellings.

I built a background cron job that runs nightly. It evaluates 1,000 words per night. For each word: check AI providers for aliases, web scraping for alternate spellings, logic for spelling variations (our/or, re/er, ise/ize).

The job processes 1,000 words per night. At that rate, it'll take 27 years to evaluate all 10.2 million words. But it's fine. The most common words get evaluated first. The long tail can wait.

The Feedback Loop

There's another spawn job. When a player submits an answer that's marked wrong (not in database), the system queues it for re-evaluation.

Process: Ask AI if the word is valid for that category. Web scraping for verification. Combine signals. If valid, add to database.

This means the database grows over time. Players teach the system. If 10 players submit "Oko" for "City" and it's actually a valid Nigerian city, the system learns it. Next player who uses "Oko" gets points.

The Continuous Improvement Pipeline

These background jobs aren't just cleanup. They're the system learning from real usage. Every failed answer is a signal. Every alias discovered is an improvement. The database isn't static. It evolves.

This is only possible because the architecture supports it. ServiceResult pattern makes errors data. Background jobs are non-blocking. Database design supports concurrent writes during gameplay.

The Patterns That Scaled (And the Ones I Refactored)

Not everything I built on day one survived contact with 10.2 million words.

What Held Up

Feature-Sliced Design: Never had to refactor folder structure. Adding multiplayer didn't break single-player. Demo mode took 2 days.

ServiceResult Pattern: Zero unhandled exceptions in production. Every error path is visible and handled.

Minimal State Philosophy: Never hit memory limits. State management stayed simple even as features grew.

Pre-Computed Caches: Game starts in less than 10ms. Letter selection has 0% failure rate. Cache build takes 3 seconds on startup, saves hours of cumulative query time.

What I Refactored

Initial letter selection: Had 2% failure rate. Users couldn't start games. Fixed with pre-validation during selection. Now 0% failures.

Validation flow: Was 150ms per query. Users waited too long. Added compound indexes and two-tier caching. Now less than 5ms.

Room cleanup: Memory leaked 400% daily. Abandoned rooms stayed in cache forever. Added periodic cleanup job and TTL. Now memory stays stable.

What I'd Do Differently

Start with Redis instead of in-memory cache. In-memory cache means single-server deployment. Can't scale horizontally. Redis would allow multi-server setup.

Build admin panel for word moderation earlier. Right now, adding/removing words requires database access. Admin panel would let non-technical team members curate the database.

Test on Nigerian mobile networks from day one. I built on fast Wi-Fi. Real users on MTN 3G had different experience. The three-layer reconnection strategy came from this pain.

The Numbers

Let's be honest about what this architecture achieved.

Performance gains:

Game start: 300ms to less than 10ms (97% faster)
Answer validation: 150ms to less than 5ms (97% faster)
API throughput: 5x increase (400% improvement)
Memory usage: 35% reduction
Game failures: 2% to 0% (100% elimination)

Code quality:

TypeScript coverage: 100% (zero JavaScript files)
Type safety: Zero any types in source code
Feature isolation: 3 independent feature folders
State layers: 3 distinct layers (global, feature, component)

User experience:

Reconnection success rate: 98% on mobile
Session recovery: 95% (5% edge cases like cleared storage)
Error rate: Less than 1% of game sessions encounter errors

The Honest Conclusion

10.2 million words taught me something: scale isn't just about performance. It's about architecture that doesn't collapse when your database grows 100x larger than you planned. It's about minimal state when your data is massive. It's about patterns that assume things will break, so they're built to handle it.

I didn't plan to collect 10.2 million words. But the architecture decisions I made, Feature-Sliced Design, minimal state, ServiceResult pattern, database denormalization, they're the reason the game still works when a player types "Ọmọkehinde" and the system finds it in 4ms, validates it against UK/US/EU spellings, checks aliases, and returns whether it's rare or common.

The word collection was a three-week obsession. The architecture is the reason it didn't become a three-month refactor.

That's the real lesson: Good architecture isn't about planning for scale. It's about making decisions that work at small scale and don't break at large scale. Feature-Sliced Design works with 3 features or 30. ServiceResult works with 10 errors or 10,000. Minimal state works with 100KB of data or 1GB.

The patterns that scale are the patterns that start simple and stay simple as complexity grows. Not because they're clever, but because they refuse to be clever. They just work.

Tech Stack:

Frontend: React 18, TypeScript 5.6, Vite 6, Tailwind CSS
Backend: Node.js 18, Express, TypeScript
Database: MongoDB with Mongoose
Real-Time: Socket.IO
Performance: NodeCache, compound indexes, write-behind caching

Metrics:

10.2 million words across 13 categories
97% performance improvement (300ms to 10ms)
98% session recovery rate on mobile
0% game initialization failures
15,000 lines of code, 3 months part-time

The game works. The architecture held up. The words keep growing. And I learned that sometimes the best architecture decision is the one that lets you obsess over word collection for three weeks without breaking everything else.

Command Palette

The Bible Problem

The Yoruba Names Challenge

The Dictionary Breakdown

The Maps Scraping Mission

The Pipeline

The Alias Problem (Still Ongoing)

The Feedback Loop

Why This Matters

Feature-Sliced Design (The Decision I Got Right From Day One)

The Traditional Hell

The FSD Approach

The Demo Mode That Took 2 Days

The Rule I Follow

The Minimal State Philosophy

The Previous Project Where State Killed Me

The 10.2M Words Problem

My Decision Matrix

The Three-Layer State Architecture

The Multiplayer Multi-Provider Hierarchy

What I Keep in State vs What I Don't

The Rule: Derive State When Possible

The ServiceResult Pattern (Why I Stopped Using Exceptions)

The Production Bug

The Problem With Exceptions

The ServiceResult Pattern

Type Safety Across 50+ WebSocket Events

The Pattern That Looked Verbose But Saved Me

Database Design: The Denormalization Decision

The Naive Approach

The Denormalization Decision

The Trade-Off Analysis

Compound Index Strategy

Embedded vs Referenced Documents

The Background Jobs That Never Stop

The Alias Evaluator

The Feedback Loop

The Continuous Improvement Pipeline

The Patterns That Scaled (And the Ones I Refactored)

What Held Up

What I Refactored

What I'd Do Differently

The Numbers

The Honest Conclusion

Comments

More from this blog