Quick Answer: Formative Digital scraped four AI engines across nine Ontario cities and five verticals, capturing 1,732 real citations from 176 queries into a single database, matrix.db. This is the full method: the prompts, engine coverage, how we pulled a citation from each raw response, and how we collapsed those 1,732 hits to 583 distinct cited hosts (326 once rolled up to registrable domains).
Matt Griffin, Formative Digital: "We published the method before the conclusions on purpose. A citation study you cannot reproduce is just an opinion with a chart attached. Truth, not tricks: here is exactly what we asked, which engines we asked, how we counted, and where the count breaks down."
That principle shapes everything below. Most published work on AI citations is either a general explainer of how a retrieval pipeline works or a tracking how-to with no underlying corpus you can inspect. This piece is the opposite: a primary-source disclosure of one real scrape, covering the questions, the four named engines, the nine named cities, the five categories, and the extraction and de-duplication steps that turned a pile of raw responses into a tidy table. It shows how the data was made, so you can judge any claim drawn from it or run the same procedure on your own category.
What this methodology covers
- Why we published the method, not just the numbers
- What we queried across nine cities and five verticals
- Which four engines we scraped, and how each returns sources
- How a citation gets extracted from a raw response
- From 1,732 raw rows to 583 distinct cited hosts
- Separating a retrieved URL from a cited source
- The limits of this method, and what it could miss
- Frequently asked questions
Why We Published The Method Before The Numbers
Across the four engines and forty-four city-vertical cells, the 1,732 citations resolved to 583 distinct cited hosts, and only 95 of those (16.3%) were cited by two or more engines. That leaves 83.7% of cited hosts unique to a single engine, a finding that is only believable if you can see how it was measured. The number carries weight precisely because the procedure underneath it is open. If we told you ChatGPT, Anthropic Claude, Google Gemini and Perplexity are reading largely different webs, your fair first question is how would you know, and could you have miscounted. The honest response is to hand over the recipe.
There is academic precedent for measuring citation behaviour with a fixed, controlled query set. Aggarwal et al., in the foundational GEO paper presented at ACM KDD 2024 (arXiv:2311.09735), built GEO-bench, a large benchmark of diverse user queries across many domains, and showed that generative-engine optimization methods can raise a source's visibility in AI answers by up to 40%. The lesson we took from that work is structural rather than tactical. A claim about how engines cite is only as trustworthy as the query set behind it. So we fixed our query set, named it, and counted the results the same way for every engine.
The timing is Canadian and specific. Per Statistics Canada's Canadian Survey on Business Conditions, 12.2% of Canadian businesses reported using artificial intelligence to produce goods or deliver services in the twelve months before the second quarter of 2025, double the 6.1% a year earlier. When adoption doubles in a year, which Ontario businesses an AI engine names stops being abstract, so we wanted a local answer on local data that anyone could check.
The Query Grid: Nine Cities, Five Verticals, One Question Shape
The study asked one synthetic question per cell: who are the best businesses in a given vertical in a given Ontario city. The grid is nine cities by five verticals, which is forty-five cells on paper; one cell returned no usable response, so forty-four cells carry data. That 45-versus-44 gap is the kind of detail a transparent study states plainly rather than smoothing over, because it changes the denominator for any per-cell average.
The nine cities were chosen to mix market sizes rather than to chase population rankings. Toronto, Mississauga and Hamilton sit at the large end. Brantford and several mid-sized Ontario centres fill the middle and lower end. The intent was to see whether engine behaviour shifts between a dense market with many candidate businesses and a thinner one where fewer local pages exist to cite. The five verticals span home services, health, and legal, so the corpus crosses ordinary commercial categories and double-YMYL ones where engines tend to behave more cautiously.
Every prompt used the same neutral phrasing pattern, varying only the city and vertical tokens. Holding the wording constant matters: if one cell asked for the best dentist and another asked for a recommended dentist, any difference in the answers could be the wording rather than the engine. One question shape, forty-four cells, four engines per cell. That produced 176 successful engine responses, the rows that became 1,732 citations once each response was unpacked. Every figure quoted on this page traces back to that single grid, so the procedure and the counts share one inspectable foundation.
Which Four Engines We Scraped, And How Each Hands Back Sources
The four engines were ChatGPT, Anthropic Claude, Google Gemini and Perplexity, queried programmatically through the DataForSEO LLM Responses API rather than by hand. Per the DataForSEO documentation, that API returns, for each query, the user question, the model's answer, the list of sources the AI quoted, and the list of websites the model retrieved while looking things up. Four engines, one consistent response shape, which is what makes a fair cross-engine count possible at all.
The catch is that the four engines expose their sources very differently, and the method has to respect that. ChatGPT, when its search step runs, tends to return citations pointing at google.com, the Maps and Knowledge Graph surface, rather than at the businesses' own sites. Claude leans on curated directory pages, with threebestrated.ca dominating its output. Perplexity spreads its citations across review and directory hosts such as homestars.com, opencare.com and bbb.org. Gemini is the outlier: it wraps almost everything through vertexaisearch.cloud.google.com, its Vertex grounding redirect, so the raw host on a Gemini citation is frequently the wrapper rather than the publisher underneath.
Those four behaviours are not noise to be cleaned away; they are part of the result, and the method records them as found. We did not normalise Gemini's wrapper into guessed publisher domains, because that would invent data the API never returned. We counted what each engine actually handed back, then annotated the quirks, so anyone scraping the same four engines should expect the same four fingerprints in their own raw counts.
From Raw Response To One Row In matrix.db
Extraction means pulling a clean list of cited sources out of one engine's answer and writing each as a row in matrix.db. For every one of the 176 responses, the pipeline read the engine name, the city, the vertical, the answer text, and the structured source list the API attached. Each source in that list became one citation row, tagged with which engine produced it and which city-vertical cell it belonged to. No source was added from the prose by hand; the rows came from the structured field the API returned, so the count is mechanical and repeatable.
What one citation row records
Each row in matrix.db captures four facts: the engine (ChatGPT, Claude, Gemini, or Perplexity), the city-vertical cell it came from, the cited host as the engine returned it, and the page title or business name attached to that citation. A Toronto dentists response from Perplexity, for example, produced separate rows for opencare.com, bitedental.ca, hellodent.com, yorkvillesmiles.com and 123dentist.com. The same question to ChatGPT produced five rows that all pointed at google.com, because ChatGPT surfaced Maps listings rather than the practices' own domains. Storing the host exactly as returned is deliberate: it preserves the wrapper hosts and the Maps redirects so they can be audited later rather than silently rewritten.
Where a fact sits on the page shapes whether an engine lifts it, which is worth flagging even though our scrape sits one layer above the passage. Kevin Indig's early-2026 Growth Memo analysis of verified ChatGPT citations found about 44% of citations come from the first 30% of a page's text. Our pipeline records the cited host and title, not the exact sentence the engine quoted, so passage-level attribution is a separate problem we did not try to solve here.
From 1,732 Raw Rows To 583 Distinct Cited Hosts
De-duplication runs in two passes. The first groups the 1,732 rows by cited host, which leaves 583 distinct hosts; the same host cited eleven times in one city and nine in another counts once. homestars.com, for example, appeared 41 times across nine cities and two engines, yet it is a single host. The second pass rolls each host up to its registrable domain, folding sibling hosts on one property together, which tightens 583 to 326. The raw 1,732 answers how often an engine cited anything; the 583 and 326 answer how many genuinely different sources are in play, at two grains.
The grain matters, so the study states it. Counting at the host level keeps things like maps.google.com and google.com separable; rolling up to registrable domain merges them, which is usually what you want when the question is source concentration rather than page structure. We report the single-engine-unique share, 83.7%, against the 583 host-level set, because that is the grain at which a host either was or was not shared between engines.
The corpus, from raw responses to distinct sources
- 9 cities x 5 verticals = 45 cells, one cell empty, leaving 44 populated cells.
- 4 engines per cell produced 176 successful queries after dropping non-responses.
- Unpacking each response's structured source list yielded 1,732 citation rows.
- Grouping by cited host left 583 distinct hosts; rolling those up to registrable domains tightened it to 326.
- Of the 583 hosts, only 95 (16.3%) were cited by two or more engines, leaving 83.7% unique to one engine.
- The most-cited host, vertexaisearch.cloud.google.com, accounts for 384 of the 1,732 raw rows on its own, all from Gemini, which is why raw frequency and distinct counts tell different stories.
For scale, Digital Applied's April 2026 study of 1,000 Google AI Overviews recorded 4,243 unique cited URLs and found the top 1% of cited domains, around twelve sites, capture 47% of all citations. Our corpus is smaller and narrower by design, but the de-duplication and sample-design choices are the same class of decision every honest citation study has to disclose.
A Retrieved URL Is Not A Cited Source
A retrieved URL is a page the model fetched while researching; a cited source is a link the model actually surfaced to the user as support, and the two are not the same number. The DataForSEO LLM Responses API returns both lists for every query, which is exactly why the distinction has to be made explicit. A model might retrieve thirty pages to compose an answer and cite only five of them. Counting all thirty would inflate the corpus and overstate how many sources the engine endorsed. Our 1,732 figure counts cited sources, the ones the engine put in front of the user, not every page it touched.
This single decision is one of the larger forks in any citation study, and it explains why two studies of the same engines can report very different totals. Count retrieved URLs and the numbers balloon. Count cited sources and you measure what the user actually sees and could click. We chose cited sources because the question that matters to an Ontario business owner is not what the model skimmed; it is which sources got the engine to name a competitor. Decide this before you count, and state which list you used, because a reader cannot interpret your total without it.
If you want the deeper mechanics of how engines fetch, weigh and surface sources before any of this counting begins, our explainer on retrieval-augmented generation for businesses walks the pipeline that produces the citations we are measuring here.
Where This Method Stops: Four Honest Limits
The honest limits of this method are four. First, the prompts are synthetic. We asked neutral best-in-city questions, which is not how a real person phrases a search; a real query might be longer, messier, or loaded with constraints like price or hours, and those would pull different sources. Our corpus measures how the engines answer a clean canonical question, not the full range of human phrasing.
Second, this is a single snapshot. Each cell was queried once, on one day. AI engines are non-deterministic, so the same prompt on a different day can return a different ordering and sometimes different sources entirely. A single run cannot separate a stable preference from a one-time draw, which is why the responsible reading of any one cell is directional rather than precise. How far an engine's answer drifts between runs is its own measurement problem, which we take up in why AI gives a different answer every time.
Third, raw counts carry engine plumbing. The vertexaisearch.cloud.google.com wrapper inflates Gemini's apparent footprint, and ChatGPT's lean on google.com folds many distinct Maps listings behind one host. Both are real behaviours, but they mean the raw 1,732 is a count of cited rows, not a clean census of independent publishers, which is why the 583-host and 326-domain views sit beside it.
Fourth, the scope is deliberately narrow. Five verticals, nine Ontario cities, four engines, one snapshot. The method does not describe every category or market and should not be read that way. What it does claim is to be reproducible: the grid, the prompts, the engine list, the extraction step, and the two de-duplication passes are all stated, so another team can run the identical procedure on its own cities and categories and get a comparable table. That replicability, not the size of the corpus, is the point, and it is one application of the Measure vector in the 12 Vectors, counting AI visibility the same way every time so the numbers mean something next quarter.
If you would rather not stand up the scrape yourself, this is the kind of baseline measurement that begins every Formative Digital engagement, run live against each engine for your specific category and city before any optimization work starts.
Frequently Asked Questions
How many queries and citations did the study cover?
The corpus is 176 successful queries returning 1,732 citations, stored in matrix.db. Each query asked one of four AI engines who the best businesses are in one of five verticals across one of nine Ontario cities. Grouping those 1,732 raw rows by cited host leaves 583 distinct hosts, and rolling each host up to its registrable domain tightens that to 326.
Why nine cities and five verticals instead of a round ten by ten?
Nine cities times five verticals is forty-five city-vertical cells, and one cell returned no usable response, leaving forty-four populated cells. The grid was chosen to mix large markets like Toronto and Mississauga with smaller ones like Brantford, and to span home-service, health, and legal categories rather than to hit a tidy round number.
What is the difference between a cited source and a retrieved URL?
A cited source is a link the engine surfaced to the user as support for a named business. A retrieved URL is a page the model fetched while researching, which may never appear in the answer. The DataForSEO LLM Responses API returns both lists, and conflating them inflates counts, which is why our 1,732 figure counts cited sources rather than every fetched page.
Why does vertexaisearch.cloud.google.com appear so often?
Gemini routes its grounded citations through vertexaisearch.cloud.google.com, the Vertex AI grounding wrapper, so the raw host on a Gemini citation is often that redirect rather than the publisher. It appeared 384 times across all nine cities. We kept it visible in the raw counts and flagged it as a wrapper rather than treating it as a genuine independent source.
Can an Ontario business reproduce this method for its own category?
Yes. Pick your category and the cities you serve, write one neutral best-in-city prompt per cell, run it through each engine you care about, then record the cited sources per response and de-duplicate by registrable domain. The honest caveat is that a single snapshot is one run on one day, so repeating the prompts on different days will show how much each engine's answer varies.
Sources
- Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). GEO: Generative Engine Optimization. ACM KDD 2024. arXiv:2311.09735
- DataForSEO. (2026). LLM Responses API Overview. Cited sources versus retrieved URLs across ChatGPT, Claude, Gemini and Perplexity. DataForSEO API v3
- Indig, K. (2026, February 16). The science of how AI pays attention. Growth Memo. Growth Memo
- Digital Applied. (2026). We Analyzed 1,000 AI Overviews: Citation Pattern Study. Digital Applied
- Statistics Canada. (2025). Analysis on artificial intelligence use by businesses in Canada, second quarter of 2025. Canadian Survey on Business Conditions. Statistics Canada
Get Your Free AI Visibility Audit
Formative Digital, Brantford, Ontario
We run the same procedure described here against your category and city, capturing which sources ChatGPT, Claude, Gemini and Perplexity actually cite for your market, and hand you the table whether or not you engage further.