Code Smells from First Principles
Why code smells exist, grounded in cognitive science and information theory.
156 lines
| 1 | # Code Smells from First Principles |
| 2 | |
| 3 | Let me rebuild this from the ground up, thinking about what code fundamentally is and why certain patterns make it difficult to work with. |
| 4 | |
| 5 | ## First Principle: Code Is Communication Across Time |
| 6 | |
| 7 | ### The Core Truth |
| 8 | |
| 9 | Code has two audiences: |
| 10 | |
| 11 | 1. The computer (which executes it) |
| 12 | 2. Humans (who read, modify, and reason about it) |
| 13 | |
| 14 | The computer doesn't care about variable names, comments, or structure. It executes bytecode. Every code smell exists because it makes the human's job harder. |
| 15 | |
| 16 | ### Why This Matters for Review |
| 17 | |
| 18 | Code review is fundamentally a transfer of mental model. The author has a complete understanding of: |
| 19 | |
| 20 | - What problem they're solving |
| 21 | - Why they chose this approach |
| 22 | - What alternatives they considered |
| 23 | - What edge cases they're handling |
| 24 | - What assumptions they're making |
| 25 | |
| 26 | The reviewer starts with zero context and must reconstruct this entire mental model by reading the code. |
| 27 | |
| 28 | This is why "file too long" is a smell: It's not arbitrary. It's about working memory limits. |
| 29 | |
| 30 | ## Cognitive Load and Working Memory |
| 31 | |
| 32 | ### The Human Constraint |
| 33 | |
| 34 | George Miller's famous paper "The Magical Number Seven, Plus or Minus Two" established that humans can hold roughly 7±2 items in working memory simultaneously. |
| 35 | |
| 36 | Modern research (Cowan, 2001) suggests it's closer to 4 chunks for complex information. |
| 37 | |
| 38 | This is the fundamental constraint that generates most code smells. |
| 39 | |
| 40 | ### Long Files/Functions: A Working Memory Problem |
| 41 | |
| 42 | When you review a 2000-line file, you can't hold the entire thing in your head. You must: |
| 43 | |
| 44 | 1. Load context (what does this file do?) |
| 45 | 2. Navigate (where is the relevant code?) |
| 46 | 3. Understand (what does this section do?) |
| 47 | 4. Remember (what did I see 500 lines ago?) |
| 48 | 5. Connect (how does this relate to that other thing?) |
| 49 | |
| 50 | Each step consumes working memory. By the time you're on step 5, steps 1-2 have been pushed out. |
| 51 | |
| 52 | From first principles: A function should fit in one "chunk" of understanding. |
| 53 | |
| 54 | ## Cyclomatic Complexity: Exponential Path Explosion |
| 55 | |
| 56 | ### The Fundamental Problem |
| 57 | |
| 58 | Cyclomatic complexity isn't just about "number of if statements." It's about the number of possible execution paths through code. |
| 59 | |
| 60 | Mathematical reality: |
| 61 | |
| 62 | - 1 if statement = 2 paths |
| 63 | - 2 nested if statements = 4 paths |
| 64 | - 3 nested if statements = 8 paths |
| 65 | - n nested conditions = 2^n paths |
| 66 | |
| 67 | This is exponential growth. |
| 68 | |
| 69 | ### Why This Destroys Reviewability |
| 70 | |
| 71 | When you review code, you're mentally executing it with different inputs. You're asking: |
| 72 | |
| 73 | - "What happens if this is null?" |
| 74 | - "What if this array is empty?" |
| 75 | - "What if the user is not logged in?" |
| 76 | |
| 77 | With high cyclomatic complexity, you cannot mentally trace all paths. |
| 78 | |
| 79 | ## Naming: Compression and Decompression |
| 80 | |
| 81 | ### The Information Theory Perspective |
| 82 | |
| 83 | A variable name is compressed information about what the variable represents. |
| 84 | |
| 85 | When you read code, you're constantly: |
| 86 | |
| 87 | 1. Decompressing names into concepts |
| 88 | 2. Holding those concepts in working memory |
| 89 | 3. Using those concepts to understand logic |
| 90 | |
| 91 | Bad names have high decompression cost. |
| 92 | |
| 93 | ### The Context Window Problem |
| 94 | |
| 95 | Single-letter variables work in tiny scopes. 'i' is immediately understood as a loop index. But what is 'i' 50 lines later? |
| 96 | |
| 97 | From first principles: Variable name quality should scale with scope size. Larger scope = more descriptive name required, because the name must survive longer in memory or be re-decompressed when encountered again. |
| 98 | |
| 99 | ## Deep Nesting: The Indentation Tax |
| 100 | |
| 101 | ### The Cognitive Cost of Indentation |
| 102 | |
| 103 | Each level of indentation represents a context you must remember. |
| 104 | |
| 105 | At each level, you're asking: "Under what conditions does this code execute?" |
| 106 | |
| 107 | By 4 levels deep, you're juggling 4+ conditions in working memory while also trying to understand what the code does. |
| 108 | |
| 109 | ### The Early Return Pattern: Reducing Cognitive Load |
| 110 | |
| 111 | From first principles: Each early return eliminates a dimension of conditional space. Instead of 2^n paths, you have n+1 paths (one for each guard plus the success path). |
| 112 | |
| 113 | ## Separation of Concerns: Modularity as Comprehension |
| 114 | |
| 115 | ### Why Mixed Concerns Are Impossible to Review |
| 116 | |
| 117 | When code does multiple things, you cannot understand or verify any single thing in isolation. |
| 118 | |
| 119 | From first principles: Separation of concerns converts multiplication of complexity into addition of complexity. |
| 120 | |
| 121 | - Mixed: Must understand validation AND crypto AND database AND email simultaneously (multiplicative) |
| 122 | - Separated: Can understand validation, then crypto, then database (additive) |
| 123 | |
| 124 | Mathematical: Understanding cost of N concerns: |
| 125 | - Mixed: O(concern₁ × concern₂ × ... × concernₙ) |
| 126 | - Separated: O(concern₁ + concern₂ + ... + concernₙ) |
| 127 | |
| 128 | ## The Fundamental Theorem of Code Review |
| 129 | |
| 130 | Code is reviewable if and only if: |
| 131 | |
| 132 | 1. It fits in working memory (size constraints) |
| 133 | 2. Its behavior is traceable (complexity constraints) |
| 134 | 3. Its intent is clear (naming/documentation constraints) |
| 135 | 4. Its dependencies are explicit (coupling constraints) |
| 136 | 5. Its correctness is verifiable (testability constraints) |
| 137 | |
| 138 | Every code smell is a violation of one or more of these principles. |
| 139 | |
| 140 | ## The Economics of Code Quality |
| 141 | |
| 142 | Code is written once but read many times. Research suggests code is read 10x more often than it's written. |
| 143 | |
| 144 | Poor code has a compounding cost: |
| 145 | |
| 146 | Total Cost = Writing Cost + (Reading Cost × Number of Reads × Number of Readers) |
| 147 | |
| 148 | The code that took 2x longer to write saves 3x total time. |
| 149 | |
| 150 | ## Toward Better Code: Design Principles from First Principles |
| 151 | |
| 152 | 1. Minimize Cognitive Load — Code should require minimal working memory to understand. |
| 153 | 2. Maximize Locality — Related concepts should be close together. |
| 154 | 3. Make Dependencies Explicit — What code depends on should be obvious. |
| 155 | 4. Optimize for Reading — Code is read 10x more than written. Choose clarity over cleverness. |
| 156 | 5. Enable Verification — Correct code should be provably correct. |