A scenario like this is becoming familiar in ecommerce: a company decides to modernize product discovery and launches an AI-powered search experience.
The implementation starts with a promising demo in which the AI accurately interprets search queries, makes product discovery more intuitive and provides results based on intent rather than exact keywords. Internally, the expectation is straightforward: if AI can understand language better, relevance should improve.
Then the system reaches production, and the results become less predictable: users enter specific queries but receive only loosely related items in return. Conversion stagnates, while zero-result searches begin to increase.
What makes the situation frustrating is that the AI search layer is likely doing exactly what it was designed to do: it’s interpreting the data it was given. Yet the overall search experience feels less reliable than expected.
That is often the point where organizations start asking the wrong question.
The First Conclusion: The AI Search Layer Must Be Broken
From a business perspective, this conclusion is entirely logical. Search was imperfect before the upgrade, but at least it behaved in familiar ways. The legacy system relied on strict keyword matching and manual rules. If a product didn’t match, a merchandiser could usually trace the missing keyword and fix the issue.
The new AI layer makes relevance feel unstable, opaque, and difficult to explain. As a result, attention shifts toward the AI itself. I have seen teams spend weeks analyzing and tweaking prompts, ranking models, embeddings, retrieval logic and semantic matching strategies, yet still have the same search issues.
The stumbling block here is often the assumption that AI can overcome poor data through sheer volume. Organizations believe that because they have a massive product catalog and years of historical data, the AI should be able to figure things out. They assume the intelligence of the model can compensate for messy inputs.
But AI does not magically organize data chaos, which, in my experience, is often present in ecommerce environments.
Expectations vs. Reality in Catalog Data
If we look at a typical enterprise catalog, we can usually see the following:
- Products arrive from first-party sources and third-party suppliers.
- Attributes are complete for one brand but sparse for another.
- Variants exist as separate records in one category and grouped records in another.
- Similar concepts are described using different terminology depending on the source system.
Traditional keyword search could work on this and fail safely because it was constrained. It matched only what was explicitly indexed. AI search can feel worse because it succeeds more often at matching “something”, even when that “something” is not the right thing.
When you remove strict keyword matching and give AI more room to interpret intent, it needs reliable constraints to stay accurate. Without clean attributes to anchor its logic, the model starts making connections based on weak similarities. It associates products that share a vague adjective while overlooking the functional requirements behind the user’s query.
The challenge becomes even greater because enterprise commerce environments are typically built from many independent systems, supplier relationships, regional requirements, and years of historical decisions. Product information may originate from multiple PIMs, supplier feeds, ERP platforms, enrichment processes, spreadsheets, and manual workflows. Most of those systems were never designed with AI interpretation in mind.
As a result, the search layer often receives data containing gaps, local exceptions, duplicated concepts and category-specific workarounds accumulated over years of business growth.
The Fix is to Prepare What Search Receives
All of this is the reason I view AI search as a downstream consumer of catalog quality rather than a solution for catalog quality. A more reliable approach I see is not to ask AI search to compensate for inconsistent inputs, but to prepare those inputs before they reach the search layer. That usually means:
- Validation: defining which fields are critical for relevance within each category and enforcing completeness and allowed values.
- Normalization: ensuring that values describing the same concept, such as product specifications, follow consistent formats across the catalog so filters and grouping behave predictably.
- Harmonization: creating a unified representation for similar products, especially when combining data from multiple first-party and third-party sources.
- Guardrails: extracting business rules such as availability, compliance requirements and regional restrictions from separate systems and making them available as structured inputs for search.
- Behavioral signals: cleaning and governing data such as clicks, purchases and search behavior so it can provide reliable guidance for AI-driven discovery.
The goal is not to make the AI smarter. The goal is to make the inputs less ambiguous. This allows AI to work with structured data and explicit business constraints instead of interpreting conflicting signals from raw source systems.
The good news is that such preparation doesn’t require a complex manual preparation process, as there are tools, such as AI Search Readiness Kit, Datos and more, designed specifically for this purpose. They can normalize catalog inputs and prepare behavioral signals before they reach the search engine, without a complete data overhaul.
This type of preparation layer is valuable not only for AI search. It creates a stable foundation for virtually any AI use case, from chatbots to autonomous agents.
The Same Scenario, Different Outcome
Now imagine the same organization returning to the original AI search rollout. This time, the search team starts somewhere different.
The demo still looks impressive, but the team treats it as a signal rather than a guarantee. Before making AI search the default experience, they diagnose catalog gaps, prepare search-ready inputs and test against real production constraints.
Once AI search enters this prepared environment, it still introduces broader interpretation and expands discovery opportunities. But now it becomes much easier to evaluate. The team can track metrics such as zero-result rate, facet click-through rate, attribute completeness, duplicate rate, and overall readiness scores while tuning the system based on measurable outcomes.
That distinction matters because when relevance gets worse after an AI search deployment, the more useful question is not whether the AI search layer failed, but whether the catalog was ever ready to be interpreted by AI in the first place.
Pavel Tsarikov specializes in aligning enterprise ecommerce with advanced technology strategy, bridging the gap between complex operations and digital transformation. He helps large-scale organizations integrate AI and automation into everyday workflows, eliminating operational bottlenecks while ensuring every solution meets strict standards for governance, reliability and enterprise scale.





