What is “Ask the Archive”?
I wanted a system that can answer questions about my work, but only from sources.
Private material can point the system toward the right page or timestamp, but it is designed so that unpublished, unreviewed text never reaches the part that writes the answer.
My podcast episodes are transcribed by speech-to-text models, local Whisper for solo episodes, OpenAI’s speaker-separated transcription for interviews, and pass through human review, keeping a few things apart: me from my guests, claim from record, evidence from synthesis.
The goal is a system that admits uncertainty instead of laundering it into confidence. When the archive does not have enough to answer well, I would rather it say so plainly and send you to the record.
So it always points back to the accountable surface: pages, timestamps, citations, reviewed summaries. The sources are the record. The answer is only the concierge.
In other words, an attempt at answerability as interface design.
I want a site that works with whoever is asking, human or AI, to find the record and show its work, while refusing to overclaim.
How it’s built
The pipeline is deliberately plain. Most of the care goes into the boundaries, not the model.
Speech-to-text turns episode audio into timestamped text: solo episodes through Whisper running locally, interviews through OpenAI’s diarized transcription, guided by a keyterm list of names and titles so the proper nouns come out right. I review and label every transcript before any of it can become public, separating my words from my guests’ and marking anything I am unsure about as unverified.
The reviewed transcripts and the site’s pages are split into passages and embedded with OpenAI, then searched with a hybrid of exact matching and meaning, so a name search and a concept search both land. A model then writes a short answer from only the public, reviewed evidence it is handed, cites its sources, and is graded against a fixed set of test questions that check for the failures I care about most: inventing facts, putting a guest’s words in my mouth, or exposing anything private.
If the answer layer goes down, search still works. The record does not depend on the model.
For developers: the engine is open source, the
ask-the-archive folder of
github.com/lukefwalton/lukefwalton.com.
A system that argues for answerability should let you check how it
answers.