analysis

Code Smells from First Principles

Why code smells exist, grounded in cognitive science and information theory.

156 lines
1# Code Smells from First Principles
2 
3Let me rebuild this from the ground up, thinking about what code fundamentally is and why certain patterns make it difficult to work with.
4 
5## First Principle: Code Is Communication Across Time
6 
7### The Core Truth
8 
9Code has two audiences:
10 
111. The computer (which executes it)
122. Humans (who read, modify, and reason about it)
13 
14The computer doesn't care about variable names, comments, or structure. It executes bytecode. Every code smell exists because it makes the human's job harder.
15 
16### Why This Matters for Review
17 
18Code review is fundamentally a transfer of mental model. The author has a complete understanding of:
19 
20- What problem they're solving
21- Why they chose this approach
22- What alternatives they considered
23- What edge cases they're handling
24- What assumptions they're making
25 
26The reviewer starts with zero context and must reconstruct this entire mental model by reading the code.
27 
28This is why "file too long" is a smell: It's not arbitrary. It's about working memory limits.
29 
30## Cognitive Load and Working Memory
31 
32### The Human Constraint
33 
34George Miller's famous paper "The Magical Number Seven, Plus or Minus Two" established that humans can hold roughly 7±2 items in working memory simultaneously.
35 
36Modern research (Cowan, 2001) suggests it's closer to 4 chunks for complex information.
37 
38This is the fundamental constraint that generates most code smells.
39 
40### Long Files/Functions: A Working Memory Problem
41 
42When you review a 2000-line file, you can't hold the entire thing in your head. You must:
43 
441. Load context (what does this file do?)
452. Navigate (where is the relevant code?)
463. Understand (what does this section do?)
474. Remember (what did I see 500 lines ago?)
485. Connect (how does this relate to that other thing?)
49 
50Each step consumes working memory. By the time you're on step 5, steps 1-2 have been pushed out.
51 
52From first principles: A function should fit in one "chunk" of understanding.
53 
54## Cyclomatic Complexity: Exponential Path Explosion
55 
56### The Fundamental Problem
57 
58Cyclomatic complexity isn't just about "number of if statements." It's about the number of possible execution paths through code.
59 
60Mathematical reality:
61 
62- 1 if statement = 2 paths
63- 2 nested if statements = 4 paths
64- 3 nested if statements = 8 paths
65- n nested conditions = 2^n paths
66 
67This is exponential growth.
68 
69### Why This Destroys Reviewability
70 
71When you review code, you're mentally executing it with different inputs. You're asking:
72 
73- "What happens if this is null?"
74- "What if this array is empty?"
75- "What if the user is not logged in?"
76 
77With high cyclomatic complexity, you cannot mentally trace all paths.
78 
79## Naming: Compression and Decompression
80 
81### The Information Theory Perspective
82 
83A variable name is compressed information about what the variable represents.
84 
85When you read code, you're constantly:
86 
871. Decompressing names into concepts
882. Holding those concepts in working memory
893. Using those concepts to understand logic
90 
91Bad names have high decompression cost.
92 
93### The Context Window Problem
94 
95Single-letter variables work in tiny scopes. 'i' is immediately understood as a loop index. But what is 'i' 50 lines later?
96 
97From first principles: Variable name quality should scale with scope size. Larger scope = more descriptive name required, because the name must survive longer in memory or be re-decompressed when encountered again.
98 
99## Deep Nesting: The Indentation Tax
100 
101### The Cognitive Cost of Indentation
102 
103Each level of indentation represents a context you must remember.
104 
105At each level, you're asking: "Under what conditions does this code execute?"
106 
107By 4 levels deep, you're juggling 4+ conditions in working memory while also trying to understand what the code does.
108 
109### The Early Return Pattern: Reducing Cognitive Load
110 
111From first principles: Each early return eliminates a dimension of conditional space. Instead of 2^n paths, you have n+1 paths (one for each guard plus the success path).
112 
113## Separation of Concerns: Modularity as Comprehension
114 
115### Why Mixed Concerns Are Impossible to Review
116 
117When code does multiple things, you cannot understand or verify any single thing in isolation.
118 
119From first principles: Separation of concerns converts multiplication of complexity into addition of complexity.
120 
121- Mixed: Must understand validation AND crypto AND database AND email simultaneously (multiplicative)
122- Separated: Can understand validation, then crypto, then database (additive)
123 
124Mathematical: Understanding cost of N concerns:
125- Mixed: O(concern₁ × concern₂ × ... × concernₙ)
126- Separated: O(concern₁ + concern₂ + ... + concernₙ)
127 
128## The Fundamental Theorem of Code Review
129 
130Code is reviewable if and only if:
131 
1321. It fits in working memory (size constraints)
1332. Its behavior is traceable (complexity constraints)
1343. Its intent is clear (naming/documentation constraints)
1354. Its dependencies are explicit (coupling constraints)
1365. Its correctness is verifiable (testability constraints)
137 
138Every code smell is a violation of one or more of these principles.
139 
140## The Economics of Code Quality
141 
142Code is written once but read many times. Research suggests code is read 10x more often than it's written.
143 
144Poor code has a compounding cost:
145 
146Total Cost = Writing Cost + (Reading Cost × Number of Reads × Number of Readers)
147 
148The code that took 2x longer to write saves 3x total time.
149 
150## Toward Better Code: Design Principles from First Principles
151 
1521. Minimize Cognitive Load — Code should require minimal working memory to understand.
1532. Maximize Locality — Related concepts should be close together.
1543. Make Dependencies Explicit — What code depends on should be obvious.
1554. Optimize for Reading — Code is read 10x more than written. Choose clarity over cleverness.
1565. Enable Verification — Correct code should be provably correct.