When an accessibility scan flags your documents, don't fix them first

The Material of Teaching · Cross-functional working session

In this article

The decision: when a scan flags a pile of documents, sort them by what each is for before repairing anything — most shouldn’t stay documents at all.
The method: route each document to where its job belongs — a web page, a form, an inline answer, a repaired PDF, or the bin.
Who does what: the researcher runs the triage; content, engineering, design, and compliance own the work it routes to them.
The rule: nothing is deleted until its content is confirmed to exist elsewhere.

A way of thinking, not a fixed recipe. Ownership, your CMS, and your team’s capacity all bend the routing — for example, a third-party manual you can’t edit doesn’t route to “rebuild as a page.” Treat the sheet as a model to modify, and adjust it to your environment as you go.

The decision this article is about: when an accessibility scan lights up a pile of PDFs, the first move is not to repair the files. It is to sort them by what each document is for, and send each type where it belongs. Some get rebuilt as web pages. Some get deleted. Only a minority actually get repaired as documents. Deciding which is which — before anyone touches a file — is the researcher’s job, and it sets where every other team spends its effort.

The situation

A mature website, especially one selling many vendors’ products, collects documents over years: care instructions, assembly guides, FAQ sheets, returns forms, terms, conformity declarations, reports. Each was added for a good reason. Most were made in Word or a design tool, saved as PDF, and linked from a page. Nobody owns the pile.

Then a scanner is run against the site, and the pile turns red — no tags, no titles, no language set, missing alt text, no reading order. Several teams are now involved whether they planned to be or not. Content owns most of the documents. Engineering owns the site they sit on. Design owns whatever replaces them. Compliance owns the risk the scan just surfaced. And someone has to decide what actually happens — that someone is the researcher, and what they decide first matters more than how fast anyone fixes anything.

The trap

The reflex is to treat the scan report as a task list: the files are broken, so repair them one by one in the order listed. This feels like progress. It’s the wrong first move.

The scanner is a thermometer. It tells you there’s a fever, not what the illness is, and certainly not the cure. Treating the report as a to-do list assumes every flagged document should stay a document and become an accessible one. For most of the pile that’s false — and acting on it spends the most effort on the files that deserve it least.

A second wrong assumption hides underneath: that length should decide a document’s fate — short ones become pages, long ones get a fancier viewer. Length is easy to read off a file list, which is why it tempts. It’s also the wrong key. A fifty-page report read once, top to bottom, is a different object from a two-page sheet someone checks with one question in mind. Length doesn’t tell them apart. Function does.

For a junior researcher, the lesson worth keeping: a scan tells you what failed, not what to do about it. The judgement lives in the gap between those two.

Sort by function, then route

Replace “how do I fix this file?” with “what is this document for, and where does that job belong?” Sort the pile by function rather than file type or page count, and the right destination for each group becomes close to obvious. Six functions cover most piles:

Tells someone how to use, clean, assemble, or maintain a product → a structured web page, with a print version generated from it. People search for this and read it one task at a time, often on a phone. It needs to reflow, be findable, and be translatable. A page does all three; a download does none.
Answers a question the site already asks → published inline, on the page where the question appears. The document is an answer pretending to be a file.
Collects structured input (a form, a checklist) → a web form that produces a printable summary. The interaction is the point.
States fixed, dated, citable terms (legal, regulatory, governance) → stays a PDF, repaired in place. Here the document is the right object: signed, dated, formally referenced. Converting it loses the fixity that gives it authority.
Read linearly for an overall impression (a report) → stays a tagged PDF, with a presentation layer over it only if wanted. Length is real, but the duty sits on an accessible version, not a decorative viewer.
Markets, with content that already lives elsewhere → removed, once you’ve confirmed the substance is genuinely on a page.

Read that against the red report and watch it shrink. The legal and regulatory documents get repaired in place, cheaply, because they were authored as text and tag well. Another share gets deleted, not repaired, because the content already exists on a page. The genuine conversion work — the part that costs real effort — turns out to be a minority of the pile, not the majority the scanner implied.

That is the orchestration move: the researcher isn’t fixing documents, but deciding which function each serves and routing it to the team that owns the destination. The decision is upstream of the repair, and it’s what stops four teams spending a quarter on the wrong work.

Who does what

Researcher — runs the triage: sorts by function, sets each destination, sets the order of work, and owns the parity check before anything is deleted. Does almost none of the hands-on work; the contribution is the routing.
Content — rewrites instructional and FAQ documents as pages, confirms each page carries everything the old document did, and flags the marketing duplicates that can go.
Engineering — builds the page structures, forms, and print generation; repairs the PDFs that stay; removes deleted files and their links cleanly.
Design — handles structure and reading order of new pages and forms, so they’re accessible by construction, not by later repair.
Compliance / legal — confirms which documents are genuinely fixed-and-citable, and signs off the parity checks on anything retired, since retiring a document is a risk decision as much as a content one.

One thing this division makes visible: clearing the pile is not the cure. The files failed because the way they were authored never treated structure as content — headings styled to look like headings instead of marked as headings, meaning in images that was never written down, reading order left to chance. Fix everything and change nothing about authoring, and next year’s scan turns red again. The work that means you don’t repeat this is getting content, design, and engineering to treat structure as content from the start.

The rule that prevents the expensive failure

One failure mode hides inside the work that looks most like progress. When instructional content moves from a document to a web page, the rewrite is an act of judgement — and judgement drops things. A safety warning, a caveat about one model, a note that mattered to a small group: exactly the details a confident rewrite leaves behind, because they look marginal until the person they protect is the one reading the page.

So the page looks modern, passes the scanner, and is quietly less complete and less safe than the document it replaced. The score went up while the information available to a user went down — the worst outcome in the exercise, because every surface signal says it went well.

The rule is one sentence: nothing is removed until its content is confirmed to exist elsewhere. The old document retires on the day someone confirms, line by line, that the new page carries everything it did — not the day the page launches.

For a stakeholder, that’s the question to ask in every status meeting: has parity been confirmed, or just assumed?

The instrument

This article comes with a document triage sheet — a spreadsheet that takes a scan export and walks the team through the sort: function, destination, owning team, order of work, and a parity-check column that must be ticked before a deletion row can close. Headers are written for the whole team, so a content owner or developer can work in it directly.

Download Excel Template

When an accessibility scan flags your documents, don’t fix them first

The situation

The trap

Sort by function, then route

Who does what

The rule that prevents the expensive failure