The Parameter Ledger

Why a headline parameter count is closer to marketing than to a technical specification

The integer everyone quotes

Same count, different jobs

Two architectures can match on paper and diverge everywhere that matters for training and deployment

Sharing, priors, and gradients

Other lines on the ledger

Useful quantities that parameter count obscures

What to read beside the headline

Illustration: same tensor, different reuse

import torch
import torch.nn as nn

B, Cin, H, W = 2, 4, 64, 64
x = torch.randn(B, Cin, H, W)

conv = nn.Conv2d(Cin, 32, kernel_size=3, padding=1)
y_conv = conv(x)
y_pooled = y_conv.mean(dim=(2, 3))  # [B, 32] global summary per channel

flat_in = Cin * H * W
linear = nn.Linear(flat_in, 32)
y_linear = linear(x.flatten(1))

n_conv = sum(p.numel() for p in conv.parameters())
n_lin = sum(p.numel() for p in linear.parameters())

print("Conv2d parameters:", n_conv)
print("Linear parameters:", n_lin)
print("Shapes (both [B, 32]):", y_pooled.shape, y_linear.shape)

A short checklist

Before you compare two models on parameter count alone

Questions that take minutes and save weeks

The quantum footnote

Where the word parameter stops being comparable