←  Daniel Wlazło

Discriminatory uncertainty

A model can discriminate not only through its decisions, but through how confident it is in them.

A credit model is judged by what it decides. Approve or reject. Score 600 or 720. Send to manual review or auto‑decision. The fairness literature has spent twenty years on this surface — building tools that ask whether two people, equally creditworthy, equally observed, but differing on a protected attribute, are decided differently.

That work is necessary. It is also not the whole picture.

Modern credit models do not return decisions. They return scores, and around those scores, a rendering of how sure they are. Calibration tells you that a score of 0.05 means roughly five‑in‑a‑hundred. Conformal prediction tells you, with a coverage guarantee, that the true outcome lies inside a stated interval. Both — the score and the interval around it — feed a downstream process that can include a human, a manual review queue, an auto‑decision rule, an exposure limit, a price.

What if the score is fair, and the interval is not?


Consider a model that, on average, gives demographic groups A and B the same expected default rate at the same input pattern. By any standard fairness test on point predictions, it passes. Now look at the prediction intervals it produces. For group A, the average interval width is 0.04 — a tight, useful estimate. For group B, the average width is 0.18 — a wide, almost uninformative band. The model does not say group B is riskier. It says, of group B, I don’t know.

That asymmetry has consequences, even when nobody has decided yet what to do about it.

Wider intervals tend to be sent to manual review more often, because uncertainty itself gets read as risk. They tend to attract more conservative thresholds, smaller credit limits, more questions, longer waits. They produce more “no, but” decisions and fewer crisp approvals. The decision rule may be the same for both groups, but the uncertainty the rule is reading from is not — and so the lived process is not.

I call this discriminatory uncertainty — the systematic differentiation of certainty across groups, even when decisions on point predictions are well‑calibrated and balanced.


It is worth being precise about what it is and what it is not.

It is not the same as the model being more accurate for one group than another. A model can have identical group‑conditional error rates and still produce systematically wider intervals for one group, because accuracy averages over outcomes while interval width reflects how much information the model has about each individual case.

It is not bias in the everyday sense of “the model leans against this group”. The score itself may be unbiased on average. The asymmetry sits one rung lower — in the model’s confidence about the score it is producing.

It is not visible in a confusion matrix, an ROC curve, or a calibration plot stratified by group, because each of these reads off point predictions. The phenomenon lives in the second moment, not the first.

What it is, is a form of differential treatment that is real, that is measurable, and that the current fairness vocabulary does not name.

That is what my thesis is about.