Identity from voice, never from text

This is a founder's note about a thing we got wrong, so I'll tell it straight. When Bonfiyah was first figuring out who was speaking, we did something that seemed clever and turned out to be a mistake: we let the system use the words people said to help decide who they were. If someone said “as the CEO, I think…,” well, that's probably the CEO. If a voice introduced itself by name, why not use the name? It reads as obviously helpful. It is also, I now believe, the single most dangerous shortcut you can take in a tool like this, and we tore it out.

Around the fire, you knew who was speaking by their voice — the sound of them, not a claim they made about themselves. Getting that wrong, around a real fire, would mean attributing one person's words to another, and everything built on that mistake inheriting it. That's exactly what went wrong for us, in software, and exactly what the fix corrects.

The shortcut that seemed smart.

The appeal of reading identity from text is easy to feel. A transcript is right there. People announce who they are, refer to their roles, address each other by name. It looks like free signal — why not use it to figure out who's who? So we did, in part, and for a while it seemed to help.

The problem is that text is a treacherous identity signal in a way that isn't obvious until it fails. People misspeak. They quote each other — “so then Sarah says to me, ‘I'll handle it’” — and now whose commitment is that? They use names that belong to people who aren't in the room. They introduce themselves in ways the system misreads. Every one of those is a chance to attach a name to the wrong voice. And unlike a transcription typo, a misidentification doesn't stay contained.

The cascade: why one wrong identity poisons everything.

Here's the part that made this a hard-way lesson rather than a tidy bug. In Bonfiyah, identity is the foundation that almost everything else stands on. People Memory builds a person's profile from everything attributed to them. Promise Tracker records who committed to what. The AI layer reasons across conversations about specific people. So when identity is wrong, it isn't one wrong label — it's a poisoned root, and everything downstream drinks from it.

We saw it concretely. A speaker got mis-tagged as the account owner — pulled from a textual cue, not from voice — and the error didn't sit still. It cascaded. The wrong person's profile absorbed words that weren't theirs. Commitments got attributed to the wrong mouth. The system's whole picture of who said what quietly bent around a single bad identity decision, and because everything trusted that decision, nothing flagged it. That's the nightmare case for a memory tool: not a visible crash, but a confident, wrong memory that contaminates the records built on top of it. One mis-tag, and the corruption spreads outward through every feature that trusts identity — which is most of them.

The rule we adopted: voice is the only signal.

So we made a rule and made it absolute. Identity comes from voice, and only from voice. Bonfiyah recognizes a speaker by voice biometrics — the actual sound of them — never by reading the words to guess who they are. We do not extract names from self-introductions to label speakers. We do not infer “this is probably the boss” from what someone said. The transcript tells us what was said; it is never allowed to tell us who said it. That job belongs to the voice, full stop. Voice ID is the feature that does it.

This is a real constraint, and it costs us things. Voice-only identity is harder. It means we sometimes ask you to confirm or label a speaker rather than guessing from an easy textual cue we're choosing not to trust. We took that cost on purpose, because the alternative — a fast guess that's occasionally, invisibly, catastrophically wrong — is worse than a slower answer that's trustworthy. In a tool whose entire value is remembering correctly, a confident wrong memory is the one failure you cannot ship.

Why this is load-bearing for everything else.

It's worth connecting this back to why it matters beyond the principle of it. People Memory is only valuable if the person it's remembering is actually the right person — a profile built from misattributed words isn't a feature, it's a liability. Promise Tracker only earns trust if the commitment really was made by the person it's pinned to. The cross-recording reasoning that makes Bonfiyah different is built on identity being right across recordings. Every one of those features inherits the quality of the identity decision underneath it. Trust the foundation to a treacherous signal and you've quietly compromised the whole building. So we don't. Voice, every time.

What this isn't.

This isn't a claim that voice identity is infallible — no identity system is, and I won't pretend ours never needs a correction from you. It's a claim about which kind of error we're willing to risk. We'd rather occasionally ask you to confirm a voice than ever confidently attribute words to the wrong person from a textual guess. And it isn't a story I'm telling to look humble; it's here because building in the open means showing the decisions we got wrong before we got them right, and this is one of the load-bearing ones. The lesson — don't build a memory on a signal you can't trust — shaped more of the product than almost anything else.

We learned it the hard way. That's usually the only way a lesson like this actually sticks.

Try the version that learned.

The fix isn't visible as a feature — it's visible as a product that gets who said what right across your conversations, which is the thing that makes everything built on top of it worth trusting. Record a few real conversations with the same people, let the AI layer run, and watch how the same person is recognized from one recording to the next by voice. That continuity, kept honest, is what the hard-way lesson bought. Bonfiyah is free to start.

You always knew the circle by the sound of each voice in it. We do too now — and only by that. It's the one thing a memory can't afford to get wrong.

— Richard

What we got wrong, then fixed: identity from voice, never from text.