AI Research Division · Retrieval Engineering

Retrieval Surface Engineering: The Asset Layer AI Systems Actually Ingest

PDFs, transcripts, changelogs, and FAQs are weighted unequally by retrieval-augmented LLMs. This report quantifies the lift from publishing each asset class against a controlled prompt bank.

By Shayne Beavan

Founder, Deep AI Solutions · Inventor of record, 6 USPTO filings

May 19, 2026 10 min

Not all assets are equal

We tested seven asset classes — Organization JSON-LD, FAQ JSON-LD, blog markdown, transcripts, changelogs, FAQ pages without markup, and unstructured PDFs — by publishing matched versions across a controlled set of Houston businesses, then running a 100-prompt scan before and after.

Lift by asset class

Asset	Mean mention-rate lift
Organization JSON-LD	+18.4pp
FAQ JSON-LD	+14.1pp
Blog markdown w/ structured headings	+8.2pp
Transcript HTML	+6.0pp
Changelog	+4.7pp
FAQ page, no markup	+2.3pp
Unstructured PDF	+0.8pp

Read

The lift is not about content quality alone. It is about retrievability — whether the asset is chunkable, embeddable, and confidently citable by the retrieval system. JSON-LD wins because it removes ambiguity, not because it is more "valuable."