The Parameter Ledger
Why a headline parameter count is closer to marketing than to a technical specification
The integer everyone quotes
Same count, different jobs
Two architectures can match on paper and diverge everywhere that matters for training and deployment
Sharing, priors, and gradients
Other lines on the ledger
Useful quantities that parameter count obscures
What to read beside the headline
Illustration: same tensor, different reuse
import torch
import torch.nn as nn
B, Cin, H, W = 2, 4, 64, 64
x = torch.randn(B, Cin, H, W)
conv = nn.Conv2d(Cin, 32, kernel_size=3, padding=1)
y_conv = conv(x)
y_pooled = y_conv.mean(dim=(2, 3)) # [B, 32] global summary per channel
flat_in = Cin * H * W
linear = nn.Linear(flat_in, 32)
y_linear = linear(x.flatten(1))
n_conv = sum(p.numel() for p in conv.parameters())
n_lin = sum(p.numel() for p in linear.parameters())
print("Conv2d parameters:", n_conv)
print("Linear parameters:", n_lin)
print("Shapes (both [B, 32]):", y_pooled.shape, y_linear.shape)
A short checklist
Before you compare two models on parameter count alone
Questions that take minutes and save weeks
The quantum footnote
Where the word parameter stops being comparable