The AI Visibility Playbook for eCommerce

This is Edition #1 of The Product Data Signal — a newsletter covering AI visibility, product data strategy, and the evolving infrastructure of digital commerce.

AI is reshaping how products get discovered — but most of the advice out there is written for content sites, not catalogs. This edition cuts through the noise for the people who actually manage product data, run marketplaces, and sell things online.

The llms.txt Debate: Should Your Store Talk Directly to AI?#

There’s a growing consensus in the SEO world that if AI can’t read your site, it won’t cite your site. Fair enough. But the advice on how to make your site AI-readable is getting contradictory — and if you’re running an eCommerce operation, the stakes are different from a blog or a SaaS documentation site.

The case for llms.txt and markdown-friendly architecture#

A framework making the rounds — called CITED — puts crawlability first. The argument: deploy an llms.txt file (a plain-text summary of your site, written specifically for language models), configure your robots.txt for AI crawlers like GPTBot and ClaudeBot, add schema markup, and structure your content so AI can parse it cleanly. Vercel is the poster child here — they published a full llms-full.txt at their documentation root, and reportedly receive around 10% of their referral traffic from LLMs.

For content-heavy sites, this makes intuitive sense. You’re essentially pre-digesting your content for AI. But for a product catalog with 50,000 SKUs? An llms.txt file isn’t going to list your entire inventory. This is where the advice diverges from the eCommerce reality.

The pushback from Google and Bing#

Google’s John Mueller has been blunt: creating separate markdown pages for LLMs is unnecessary. LLMs have parsed standard HTML since day one. Bing’s Fabrice Canel adds a practical concern — maintaining duplicate content creates crawling overhead and maintenance burden. Both hint at a more serious issue: serving different content to bots than to humans could be treated as cloaking, a longtime SEO violation.

This isn’t a theoretical risk. If you build a parallel markdown version of your site for AI consumption and it drifts out of sync with your actual product pages, you’re creating the exact kind of inconsistency that search engines have penalized for years.

So what should an eCommerce business actually do?#

The answer is less dramatic than either camp suggests. The universal wins — the things nobody disagrees about — are:

Structured data (schema markup): Organization, Product, BreadcrumbList, and FAQ schemas are cited disproportionately in AI responses. Recent analysis shows Organization schema appears on 34% of AI-cited pages, Article on 26%, BreadcrumbList on 20%. For eCommerce, Product schema is the obvious priority.
Clean site architecture: Fast loading, logical category structures, descriptive URLs. Data from multiple studies suggests URLs with 17–40 character slugs receive the highest AI citation rates.
robots.txt awareness: At minimum, know which AI crawlers are hitting your site and decide deliberately whether to allow them. OAI-SearchBot, GPTBot, ClaudeBot, PerplexityBot — each has different behaviors.

The llms.txt file? It’s a low-effort experiment that won’t hurt you, but it’s not a substitute for proper structured data. Think of it as a cover letter, not a resume. For a catalog business, your product data is the content — and that needs to be structured at the data level, not summarized in a text file.

What this means for product data teams#

If you manage product information, the takeaway is this: the same attributes that make your listings perform well on marketplaces and search engines — complete specifications, consistent naming, accurate categorization — are exactly what makes your catalog legible to AI. There’s no separate “AI SEO” discipline for product data. It’s the same discipline you’ve always needed, with higher stakes because AI surfaces are becoming a real discovery channel.

The companies that already maintain clean, structured product data will have a head start. The ones still working from messy supplier spreadsheets are now falling behind on two fronts instead of one.

Measuring AI Visibility: You Can’t Improve What You Can’t See#

Here’s an uncomfortable truth: most eCommerce companies have no idea whether AI systems recommend their products. They can tell you their Google ranking for a hundred keywords. They can show you conversion rates by traffic source. But ask “when someone asks ChatGPT for the best supplier of industrial fasteners, do we show up?” — silence.

Why GA4 isn’t enough#

Google Analytics 4 was built for a world where traffic comes through links. AI disrupts that model in several ways:

AI Overviews and Google’s AI Mode don’t show up as distinct traffic sources in GA4. They appear as organic or direct traffic — indistinguishable from a traditional search click. If Google’s own AI features can’t be isolated in Google’s own analytics tool, that tells you something about how prepared the ecosystem is.

Beyond Google, most AI systems send partial or no referral data. A visitor who found you through a Perplexity answer or a ChatGPT recommendation often appears as “direct” traffic — the analytics equivalent of a shrug. Some practitioners have found success with regex-based filters to identify AI referrer patterns (ChatGPT, Perplexity, Gemini, Copilot), but this captures only users who clicked through. It misses the much larger consideration phase — the people who asked an AI about your category, got an answer that didn’t mention you, and never visited at all.

Then there’s the agent problem. As AI agents start browsing the web autonomously — comparing products, checking specifications, reading reviews — they create traffic patterns that look nothing like human visitors. Unusual session durations, no mouse movement, concentrated on Chrome desktop. This data pollutes your analytics without any clear way to segment it in GA4.

What to measure instead#

The emerging consensus points to a different set of metrics:

Citation frequency: How often does your brand appear in AI responses for relevant queries? This requires querying the AI systems directly or using monitoring tools — it’s not something any web analytics platform captures.
AI share of voice: When someone asks an LLM about your product category, which brands get mentioned? How does your mention rate compare to competitors? This is the AI equivalent of market share.
AI referral traffic (with caveats): Yes, still track it in GA4, but accept it’s a lower bound. The real number is higher than what you see.
Content citation patterns: Which of your pages get cited most frequently? This tells you what formats and content types AI systems prefer — and you can double down on those.

Tools in this space are still maturing. Options like LLM Pulse and Ahrefs’ Brand Radar are imperfect, but they’re better than measuring nothing. For B2B eCommerce specifically, tools like BrandKarma focus on tracking how LLMs recommend suppliers and products in procurement contexts — which is a different question than generic brand visibility.

The practical starting point#

If you’re starting from zero, here’s a minimum viable measurement setup:

Add regex-based referrer filters in GA4 to identify traffic from known AI systems.
Run a monthly manual audit: ask ChatGPT, Perplexity, and Gemini your top 10 product queries and note who they recommend. Screenshot and log the results. It’s crude, but it gives you a baseline.
Check your server logs for AI crawler activity — volume, frequency, which pages they’re hitting. This tells you what AI systems are actually reading from your site.
Once you have a baseline, evaluate whether a dedicated monitoring tool is worth the investment.

The companies that start measuring now — even imperfectly — will have months of trend data by the time their competitors wake up to the question.

The Protocol Question: How Will AI Agents Actually Buy From You?#

We’ve talked about being visible to AI and measuring that visibility. But there’s a bigger shift coming that most eCommerce teams haven’t even started thinking about: AI agents that don’t just recommend products, but actually purchase them.

Today, when ChatGPT suggests a supplier, a human still clicks the link, navigates the site, and completes the transaction. But the trajectory is clear — AI agents are learning to browse, compare, and transact on behalf of users. And the plumbing for this is being built right now, in real time, through a set of competing protocols.

The alphabet soup: MCP, ACP, UCP#

If you’ve come across these acronyms and your eyes glazed over, you’re not alone. Here’s what matters:

MCP (Model Context Protocol) was introduced by Anthropic. It standardizes how AI models connect to external tools and data sources — think of it as a universal adapter that lets an AI agent talk to your inventory system, your API, or your database. MCP is the foundational layer: it’s about connecting AI to systems, not specifically about commerce.

ACP (Agent Commerce Protocol) comes from OpenAI. It’s specifically designed for shopping within AI assistants. The idea: a user asks their AI assistant for a product, the agent discovers it through merchant feeds, displays pricing and availability, and completes checkout — all without the user ever leaving the chat interface. Merchants provide product data through structured feeds, and payment happens via delegated tokens.

UCP (Universal Commerce Protocol) is Google’s answer. It covers the full shopping journey but currently operates primarily within Google’s own surfaces — Search AI Mode, Gemini, Google Shopping. Checkout happens via Google Pay. Where ACP is platform-agnostic in theory, UCP is tightly coupled to Google’s ecosystem in practice.

What this means for online stores#

The practical implication: in the near future, your store won’t just need to be findable by AI — it will need to be transactable by AI. An agent that can read your product data but can’t complete a purchase will move on to a competitor that supports the protocol it speaks.

This is not hypothetical. Checkout.com — one of the largest payment processors — is already publishing guidance on supporting both ACP and UCP, advising merchants not to pick one but to prepare for both.

The product data connection#

Here’s where this connects to the daily work of product data teams: the quality of your structured product data becomes the foundation for agent commerce. If your product titles are inconsistent, your specs are incomplete, your pricing is stale, or your inventory data lags — an AI agent will either skip you or provide inaccurate information that leads to failed transactions.

The bar is rising. It used to be that messy product data cost you some marketplace rankings. Then it cost you AI visibility. Soon it will cost you transactions that happen entirely through AI intermediaries.

For teams managing large catalogs — especially those feeding data to multiple channels — this increases the urgency of having reliable, structured product information. AI-first PIM systems like FacetFlux exist precisely for this scenario: ingesting messy supplier data, normalizing it with AI, and distributing clean structured data across channels. On marketplace platforms specifically, tools like Optivise automate the attribute completion and listing optimization that becomes even more critical when AI agents are reading your listings, not just humans.

And then there’s WebMCP#

Just when you thought the acronym soup couldn’t get thicker — Google recently introduced WebMCP, currently in early preview in Chrome 146. Unlike the protocols above, which require backend integration, WebMCP works client-side: it’s a JavaScript interface that lets websites expose structured data and actions directly to AI agents browsing the page.

Think of it this way: MCP, ACP, and UCP require you to build API-level integrations. WebMCP lets an AI agent interact with your existing website — filling forms, searching products, placing orders — through a standardized interface that lives in the browser.

This is very new and very early. But for eCommerce, the implications are significant: it could lower the barrier to agent commerce dramatically. Instead of building protocol-specific integrations, you’d register your site’s capabilities through a JavaScript API, and AI agents could discover and use them.

We’ll dig deeper into WebMCP and what it means for marketplace operators in a future edition. For now, the takeaway is: the infrastructure for AI-driven commerce is being built by the biggest players in tech, and the pace is accelerating.

The contrarian question: what if product data gets commoditized?#

One more thought to sit with. If every store eventually has AI-optimized product data — clean specs, complete attributes, structured feeds — then product data itself stops being a differentiator. Everyone has the same information, formatted the same way, readable by the same agents.

So what makes someone buy from you?

Brand clarity and trust signals apply here too. When product data is table stakes, differentiation comes from brand authority, original content, customer reviews, proprietary expertise, and the kind of trust signals that can’t be auto-generated. The store that has both clean data and a reason to be trusted wins. Clean data without brand is a commodity. Brand without clean data is invisible.

This is Edition #1 of The Product Data Signal. We cover AI visibility, product data strategy, and the evolving infrastructure of digital commerce — for the people who actually build and manage these systems.

Questions, pushback, or a topic you want us to dig into? Get in touch.

Next edition: A deep dive into WebMCP and what marketplace operators should be doing about it now.