Patterns in the Clouds

I’m a big “sky guy.”

You know, when you look up at the stars and try to identify constellations. Or possibly, on a warm day, sitting out in McCarren park looking at clouds moving and shifting, looking for dogs, cats, or other shapes within the fluffy masses.

The fun in it is that there is no right answer. It’s just your imagination projecting patterns onto randomness. Tells you more about your brain than it does the cloud.

On a similar note, I once had an English literature professor who would constantly repeat the phrase:

“You’ll always find what you’re looking for.”

The class was introductory, meant to teach students how to analyze any text.

As such, we learned various critical frameworks – psychoanalytical, marxist, feminist, deconstructionist, etc. – with which you could view a text.

The macro lesson was that no matter what frame you used, you could find evidence in the text that matched that frame. They could all be “right.”

So clouds, stars, Virginia Woolf…where does AI search fit into this?

Our Current Firehose

The volume of data now available about AI search is staggering.

Prompt databases, citation tracking tools, platform-specific analytics, aggregate studies, vendor reports. Every week another dataset drops showing which sources AI platforms cite, how often, and in what contexts.

The problem is that all this data seems to be making things more confusing, not less.

Reddit accounts for 46.7% of Perplexity’s top citation sources, so community management is the play.
LinkedIn jumped from #11 to #5 on ChatGPT in three months, so LinkedIn content is the play. It’s the top cited domain for professional queries, even.
Ninety-five percent of AI citations originate from off page sources, so digital PR is the play.
Press releases work for AI search. Wait, no they don’t. Or do they? Dammit.
AirOps shows pages with clean heading hierarchies get 2.8× citation likelihood, so on-page ~~SEO~~ AEO is the play.
Original research compounds over time (and you can syndicate it for those sweet off page brand mentions), so content marketing is the play.
AI “loves freshness,” so update your pages (automatically, using agents) often.

Every one of these claims is backed by real data from credible sources.

Every one of them, taken in isolation, could justify a major strategic pivot. And every vendor selling a service in one of these categories can point to a dataset that proves their service is essential, or their methodology is the correct one (though there’s mounting evidence that Mount AI tactics like high scale, low quality content publication is indeed bad).

Whatever you do or sell, there’s a view of the data that says what you do is what really matters for AI search visibility.

Cherry-Picking at Industrial Scale

A discerning reader might ask of these datasets: what prompts are being tracked? What citations are being analyzed? Are they all the same dataset? Does this dataset apply in a meaningful way to my company and industry?

Before we get into those strategy questions, however, it’s worth diving into correlations, cherry picking, and industrial sized data sets.

Tyler Vigen’s Spurious Correlations project is a well-known, funny, and corrective look at large data sets.

He found, for example, that the number of Nicolas Cage movies released per year correlates with swimming pool drowning deaths at r=0.67. Mozzarella consumption tracks with civil engineering doctorates.

These are absurd and obvious. But we can easily fall into the same pattern, unknowingly, when we analyze patterns against something like AI search performance.

Nassim Taleb’s argument is that big data amplifies this problem. More variables means more chances for coincidence. The spurious relationships grow faster than real information. You end up swimming in data that’s technically accurate and strategically meaningless.

In AI search, the variable space is enormous.

Millions of prompts (or, theoretically, an infinite number of possible prompt permutations, ever expanding like the universe). Millions of citation sources. Dozens of platforms. Hundreds of industries. Varying time windows. No standard methodology for sampling prompts, defining what counts as a “citation” versus a “mention,” or weighting one platform against another.

Even the terminology is fragmented — GEO, AEO, GSO, LLMO, AIO.

The surface area for false patterns is vast.

Cherry-picked views. Studies that highlight one platform’s citation mix without controlling for query type, industry, or time window. For what it’s worth, I think most of the research being put out is being done in good faith, but the findings are often contorted to back into confirmation bias by readers and sharers of the research.
Heterogeneous comparisons. Treating all prompts as equivalent when “best CRM for small teams” has a fundamentally different citation structure than “how to fix a leaky faucet.” Aggregate data smears these distinctions into a single number that describes no one’s actual situation.
Methodological fog. No consensus on how to sample, measure, or report. Two studies can examine the same platforms over the same period and reach different conclusions because they defined their terms differently. Both are “right.” Neither is useful without understanding the methodology underneath.
Simple confounding variables. Is Nike’s share of voice high because they optimized for cosine similarity, or because they are Nike? Is Zapier dominating AI search because they deployed “advanced AEO techniques” like listicles? Or have they been publishing listicles for a decade (and ignoring my emails to get my brand to appear on them)?

Look, we produce research on AI search as well.

We’ll keep putting out research. It helps us answer our own client and business questions. It’s hard to get it right. Good faith efforts to identify signals in this space are valuable, even when coming from self-serving sources (your author included). I genuinely appreciate that we’re collectively looking for answers, and I’m not cynical on the research itself.

What I’m saying is “caveat emptor” to the readers and decision makers.

The argument here is simply that it’s difficult to ordain universal truths about AI search from any given dataset, so it’s best to apply critical thinking, first principles thinking, and your own data in pursuit of organic growth.

The Texas Sharpshooter at the Vendor Conference

The Texas Sharpshooter fallacy: fire at the barn, then paint the target around the tightest cluster of bullet holes.

In AI search, this looks like a vendor running thousands of prompts across dozens of categories, finding the slice where their thing dominates, and presenting that slice as the universal truth.

This isn’t necessarily dishonest. It’s what happens when the dataset is large enough to contain almost any pattern you need to back into your point of view.

CMOs and marketing leaders are the ones caught in the crossfire.

They’re under real pressure to show results and expertise in AI search. Their boards are asking questions. Their competitors are making claims. And they’re receiving strongly compelling, mutually contradictory advice from multiple directions, each backed by data that looks rigorous.

The pressure to have an answer (any answer) is the vulnerability that noisy data exploits.

The Surface Area Problem

Take the specific case of Reddit. I wrote about this last year when there was a Reddit frenzy followed by a steep drop in citation influence. Reddit, Wikipedia, Youtube – these sites dominate citation sources in many data sets. But these are massive websites with topical footprints spanning millions of subjects. Their citation prominence reflects surface area, not necessarily individualized strategic opportunity.

Research on citation patterns across prompt databases is academically interesting. Over a time series, shifts in citation allocation could signal real platform changes worth watching. But as a direct input to organic growth strategy for a specific brand, this data is usually low signal.

Massively pivoting resources to Reddit because aggregate citation data shows Reddit is influential across a massive data is probably an overcorrection, at least in isolation.

Furthermore, Reddit brand mentions are often downstream of product marketing, product and customer experience, and word of mouth. People don’t talk about your product on Reddit because you deployed a community management program. They talk about it because you built something worth talking about (good or bad). Temu isn’t going to fix their reputation by astroturfing a few subreddits. Eli Schwartz and Gaetano DiNardi both wrote great posts on Reddit & AEO talking about this.

Authentic community participation is valuable in and of itself, especially for brands of a certain size. But the tail doesn’t wag the dog.

The same logic applies to LinkedIn’s citation prominence, PR-driven content statistics, review site data. Each channel has real value. None of them, in isolation, is “the answer” to AI search, despite what the data seems to show when viewed through the right lens.

Additionally, I’ve found through small client analyses that the top 500-1000 citation source URLs shift quite frequently month over month (though domains remain more stable).

Robust AI search performance requires brand ubiquity and portfolio thinking (as long as you humbly assume you can’t predict where the puck is going with 100% certainty).

Your Data, Not The Data

The corrective is unsexy but essential: run your own research.

Start with first-party customer research. Who are your buyers? How do they discover products in your category? What questions do they ask? What sources do they trust? This is the foundation. Without it, strategies and tactics tend to converge around median level best practices, washing away any competitive alpha.

Then get specific. Use tools like Peec, Profound, AirOps, Scrunch, or your own vibe coded AI visibility tool to check the actual sources and influential citations in your space. Not “what does ChatGPT cite in general” but “what does ChatGPT cite when someone asks about our category?” The answers are often much subtler than the Wikipedias and Reddits of the world. They’re often niche publications, specific creators, industry analysts, comparison pages you’ve never heard of.

From there, hatch a plan to increase your coverage, ubiquity, and favorability within a finite set of sources that matter for your buyers. This is a focused strategy, not a broad one. It doesn’t require pivoting your entire marketing organization toward Reddit or LinkedIn or PR. It requires understanding the specific citation ecosystem around your category and systematically improving your presence within it.

Creating content that answers user questions (especially nascent ones discovered through customer research), the questions people are starting to ask but that don’t have good answers yet, is still a high-leverage move. In B2B, doing this via video and YouTube is probably underrated and a high leverage move right now.

Validating your presence and aligning your messaging across the broader web is the real play. As my friend Gaetano DiNardi put it, “The biggest GEO upside doesn’t come from technical optimization — but rather, the coordination of brand positioning, messaging, and reputation management across on-site and off-site channels.”

Your own data, applied to your own situation, beats the aggregate every time. For example, who cares what the average landing page conversion rate is? I care that mine improves month over month.

On LinkedIn, Specifically

Okay, so this is a general problem, but just to touch on the current AI citation source du jour: LinkedIn.

Harpreet Singh Chatha posted this on X. It’s very well said:

“Looked at 300k AI responses and over 80% of LinkedIn citations for ChatGPT go to /company/ pages.

Any LinkedIn AEO / GEO correlation study you see, including this one, is flawed.

AI answers can deliver an infinite number of answers, and 300k AI responses is not a big enough dataset.

In this case, the dataset has many terms like “what does brand name do”, so the data is skewed towards company pages.

So, if you see a bigger study of 325k AI responses, then come to the conclusion that you should post more on LinkedIn to show up in AI, that study is flawed too.

To really know whether LinkedIn is useful or not for your AEO / GEO, you need to track the presence of LinkedIn’s domain for searches that are important to your business.

Then you need to break down the domain and see what LinkedIn folders are showing up.

I did this for a B2B company, and a whopping 0% of LinkedIn AI answers include /posts/.

Obviously being active on LinkedIn, having multiple active employees, and what not can fall under good marketing, good marketing in general can influence LLMs over time, but specifically saying “post more on LinkedIn for AEO” is a stretch.”

There you have it. Post on LinkedIn if it will help your business, but it’s probably not your AI search lifeline just because aggregate citation data shows increased prominence temporarily. It might help on the margin, but it’s just one arrow in your quiver.

The Right Lens

My English professor’s point was not that literary criticism was useless.

It was that the text will yield evidence for any sufficiently motivated reading.

The discipline is in choosing the right lens and being honest about what you’re looking for before you start looking.

AI search data works the same way. The firehose will confirm whatever you want to believe. The discipline is in sitting down with your own data, thinking critically, and constructing a strategy that wins specifically, not broadly.

Sometimes the most honest reading is the one that says “I don’t know yet.”

In a landscape this noisy, with this much pressure to have a confident answer, that might be the most strategic position of all.. Honesty and experimentation.

And then the hard work of finding out what’s actually true for your business, your buyers, your category – with your own data, through your own lens.

Want more insights like this? Subscribe to Field Notes

Patterns in the Clouds

Our Current Firehose

Cherry-Picking at Industrial Scale

The Texas Sharpshooter at the Vendor Conference

The Surface Area Problem

Your Data, Not The Data

On LinkedIn, Specifically

The Right Lens

Alex Birkett

About

Agency Services

Content Guides

Company

Patterns in the Clouds

Our Current Firehose

Cherry-Picking at Industrial Scale

The Texas Sharpshooter at the Vendor Conference

The Surface Area Problem

Your Data, Not The Data

On LinkedIn, Specifically

The Right Lens

Alex Birkett

You May Also Like

What 30+ B2B Accounts Saw in June

The 6 Best Content Creation Agencies for B2B Growth

The 6 Best Enterprise SEO Agencies in 2026