Two academic studies reveal the fatal flaw in AI-generated customer insights—and why the qualitative research industry isn’t as threatened as the hype suggests.
If you’ve been following the market research discourse over the past eighteen months, you’ve heard the proclamations: synthetic personas will replace focus groups. AI-generated respondents will make customer interviews obsolete. Why spend $15,000 on a qualitative study when you can simulate 500 synthetic users for a fraction of the cost?
The pitch is compelling. The technology is genuinely impressive. And based on two recent academic studies, I’m convinced the hype is almost entirely wrong.
Not because synthetic research doesn’t work. It works remarkably well—at precisely the thing qualitative research was never designed to do.
The Numbers That Should Make You Nervous
Let’s start with the impressive figures, because they are impressive.
A 2025 study by Dr. Bernard Jansen and colleagues at Qatar Computing Research Institute examined AI-generated persona interviews built from real survey data. The results were striking: synthetic personas achieved 90.4% factual accuracy and 94.4% perceptual accuracy when their responses were compared against the underlying human survey data.
Ninety percent accuracy. In most contexts, that’s an A-minus. In software testing, that’s ship-ready. In medical diagnostics, that’s better than many human practitioners.
In qualitative research, that’s a trap.
Here’s why: qualitative research doesn’t exist to confirm the 90%. It exists to discover the 10%. The entire value proposition of sitting across from a human being—whether in a focus group, a customer interview, or a contextual inquiry—is surfacing what you didn’t know to ask. The tangent that reveals a hidden use case. The hesitation before an answer that exposes an unspoken concern. The “actually, now that you mention it…” moment that reframes your entire product strategy.
A method optimized for pattern-matching against existing data is structurally incapable of surprise. It can only reflect back what was already captured in its training inputs. When Jansen’s synthetic personas achieved 90% accuracy, they were proving they could successfully regurgitate the patterns humans had already expressed. They weren’t discovering anything new.
For validation at scale, this is valuable. For discovery, it’s worse than useless—it’s actively misleading, because it creates false confidence that you’ve done the research. As I’ve written previously, synthetic panels should be treated as accelerators for human research, not substitutes.
The Variability Problem Nobody’s Discussing
A second study, published on ArXiv in 2025, examined what happens when LLMs are guided by rich interview data to model survey responses. The researchers found that these models could “capture overall patterns of human survey responses”—but with a critical caveat that should concern anyone using synthetic data for strategic decisions.
The synthetic responses showed “lower variability” than real human responses.
This finding doesn’t get the attention it deserves. Lower variability sounds like a statistical footnote. In practice, it’s the quiet killer of synthetic research’s value proposition for anything beyond commodity validation.
Real markets aren’t normal distributions. They’re lumpy, weird, and full of edge-case segments. The early adopter who uses your product in ways you never intended. The churning customer who can articulate exactly why your onboarding fails. The unexpected persona who becomes your most profitable segment precisely because no one else is serving them.
Synthetic research flattens the bell curve. It gives you the median response—the composite, averaged, smoothed-out version of your market. It systematically underweights the outliers.
In quantitative research, outliers are often noise to be filtered. In qualitative research, the outlier frequently is the signal. The customer who hates your product for reasons no one else mentions might be telling you about a segment you haven’t discovered yet. The power user with an unusual workflow might be showing you your next feature. The frustrated buyer who almost churned might be articulating the objection your sales team hasn’t learned to handle.
When the ArXiv researchers note that synthetic data “captures overall patterns,” they’re describing a method that’s excellent at telling you what average looks like. For founders trying to find product-market fit, average is the last thing you need. You need the specific, the passionate, the weird—the customer who will go out of their way to tell others about you, or the one who will tell you exactly why they won’t.
The Nuance Gap
The ArXiv study’s most damning finding is also its most understated: synthetic responses “struggle with nuance” and the method “does not fully replicate human response complexity.”
I’ve spent thirty years in enterprise software, including the 1990s era when CASE tools promised to automate programming by pattern-matching existing code. The parallel to synthetic research is almost exact. Pattern-matching has always been AI’s strength. Contextual judgment has always been its weakness. This isn’t a bug waiting to be fixed with better models—it’s a structural feature of how these systems work.
Consider what “nuance” actually means in a customer research context.
A synthetic persona can tell you users want “ease of use.” A live interview reveals they mean “I need to look competent in front of my CFO in the first 30 seconds of a demo.” Both responses would code to the same category in a thematic analysis. Only one tells you what to actually build.
A synthetic persona can report that customers value “integration capabilities.” A real conversation uncovers that they specifically need to pull data into a Monday morning executive dashboard, that they’ve been manually copying numbers from three systems, and that their CEO publicly called out a data discrepancy last quarter. The synthetic response gives you a feature request. The live research gives you the emotional job, the workflow context, and the political dynamics that explain why this matters more than their stated priorities suggest.
The jobs-to-be-done framework, developed by the late Harvard professor Clayton Christensen, has become standard vocabulary in product management. What’s often forgotten is that Christensen developed it through ethnographic observation—watching people struggle with real tasks in real contexts, noticing the gaps between what they said they wanted and what they actually hired products to do. That gap is where product insight lives. Synthetic research, by definition, can only work with what people have already articulated. It cannot observe the gap.
When Synthetic Actually Works
I’m not arguing that synthetic research is worthless. It has legitimate applications, and dismissing the technology entirely would be as foolish as the hype claiming it will replace human research.
Synthetic data works well for validation at scale. If you’ve already identified a hypothesis through live research, synthetic personas can help you test whether the pattern holds across a larger simulated population. It’s a form of sensitivity analysis—stress-testing your conclusions before you commit resources.
Synthetic data works for augmenting sample sizes in quantitative studies, particularly for demographic segments that are expensive or difficult to recruit. The key is that you’re extending known patterns, not discovering new ones.
Synthetic data works for training and simulation. If you’re preparing a sales team for objection handling, synthetic personas can generate realistic practice scenarios faster than recruiting real prospects for role-play.
What synthetic data cannot do—what the academic evidence increasingly confirms it cannot do—is replace discovery research. It cannot surface what you don’t know. It cannot identify the question you should be asking. It cannot find the customer segment you haven’t imagined yet.
Greenbook, the market research industry’s trade publication, frames synthetic data as a tool “to augment real-world data, not a shortcut to bypass it.” This isn’t defensive positioning from an industry protecting its turf. It’s an accurate assessment of what the technology can and cannot do. The researchers studying these methods know this. The practitioners implementing them responsibly know this.
The Nielsen Norman Group, after testing synthetic user tools against their own studies with real participants, concluded that “synthetic-user responses for many research activities are too shallow to be useful” and that responses felt “one-dimensional” compared to real human insights.
The danger is the gap between what informed practitioners know and what the LinkedIn discourse proclaims. And that gap is where early-stage founders—the people who most need genuine customer discovery and can least afford to get it wrong—are most likely to fall.
The 1990s Parallel
I keep returning to CASE tools because the pattern is so exact.
In the early 1990s, Computer-Aided Software Engineering promised to revolutionize development. The pitch: capture requirements in structured diagrams, and the tools would generate working code automatically. Pattern-match from existing systems, and you could skip the messy work of actual programming. By 1995, the CASE market had grown to over $12 billion.
CASE tools were roughly 90% accurate at generating boilerplate. They could produce standard data access layers, routine CRUD operations, the commodity code that every enterprise system required. What they couldn’t do was handle the 10% that required human judgment—the business logic exceptions, the integration edge cases, the architectural decisions that determined whether a system would scale or collapse.
The market corrected. CASE tools didn’t disappear, but they retreated to their appropriate niche: accelerating routine work, not replacing expertise. The programmers who understood what CASE tools could and couldn’t do used them to amplify their productivity. The organizations that tried to replace developers with CASE-generated code built systems that couldn’t handle real-world complexity.
Synthetic research is CASE tools for customer insight. It’s excellent at templating—generating plausible responses that match expected patterns. It’s dangerous as a replacement for thinking.
The founders and product leaders who understand this distinction will use synthetic data to accelerate validation and scale their research capacity. The ones who don’t will build products optimized for synthetic customers—averaged, median, stripped of the variability that represents real market opportunity—and wonder why their “validated” roadmap finds no traction with actual buyers.
The Question Synthetic Can’t Answer
Here’s the meta-point that underlies everything I’ve argued: synthetic research can only answer questions you already know to ask.
The models generate responses by pattern-matching against existing data—surveys, interviews, behavioral records that capture what has already been expressed or observed. They’re interpolation engines, not discovery engines. They can tell you what falls within the boundaries of known customer behavior. They cannot tell you that the boundaries are drawn wrong.
The highest-value qualitative insight isn’t an answer. It’s discovering the question itself.
When Christensen identified the jobs-to-be-done framework, he wasn’t answering a question the milkshake buyers had articulated. He was noticing that the question everyone else was asking—how do we make a better milkshake?—was the wrong question. The real question was: what job are morning commuters hiring the milkshake to do? That reframe emerged from observation, not simulation.
When Intuit’s “Follow Me Home” program sent developers to watch customers use QuickBooks in their actual offices, they weren’t validating hypotheses. They were looking for surprises—the workarounds customers had invented, the features they ignored, the problems they’d stopped trying to solve. No synthetic persona built from QuickBooks survey data could generate those insights, because the insights existed in the gap between what customers said and what they did.
For late-stage validation, for scaling known patterns, for augmenting quantitative studies—synthetic research has a role. For pre-seed founders trying to find product-market fit, for product teams hunting for the next growth lever, for anyone engaged in genuine discovery—there’s no substitute for sitting across from a human being and hearing something that surprises you.
The Sophisticated Play
The market research industry won’t bifurcate into “synthetic” and “traditional” camps. The firms and practitioners who thrive will be the ones who understand which tool serves which purpose.
Use synthetic to scale validation. Use it to stress-test hypotheses across larger simulated populations. Use it to generate training scenarios and augment hard-to-recruit segments.
Use live research to discover. Use it when you’re not sure what questions to ask. Use it when you need the outlier, the edge case, the unexpected segment. Use it when the 10% matters more than the 90%.
This is precisely the approach we take in our AI-Accelerated PMF Validation work—using Sean Ellis methodology with synthetic data for directional signals, then validating critical insights through targeted human interviews. The synthetic layer accelerates; the human layer discovers.
The founders who understand this distinction won’t fall into the 90% trap. They’ll use synthetic research for what it’s good at—and invest in human conversation for what it can’t replace.
The ones who don’t will build beautiful products for customers who don’t exist.
John Mecke is Managing Director of DevelopmentCorporate LLC, advising early-stage SaaS companies on competitive intelligence, market research, and acquisition strategy. He has been watching AI promise to replace human judgment since CASE tools were going to eliminate programmers in 1992.
References
Academic Studies:
- Jansen, B.J., et al. (2025). “Interviewing AI-Generated Personas: Talking To Your Data.” bernardjjansen.com
- “Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI-Generated and Human Data.” ArXiv
Industry Analysis:
- Nielsen Norman Group. “Synthetic Users: If, When, and How to Use AI-Generated ‘Research.'” nngroup.com
- Greenbook. “Putting Synthetic Data to Work: Balancing Effort, Trust and Impact.” greenbook.org
Frameworks & Methodology:
- Christensen Institute. “Jobs to Be Done Theory.” christenseninstitute.org
- Ellis, Sean. “Using Product/Market Fit to Drive Sustainable Growth.” Medium
- Intuit. “Why Every Company Should Be Doing a Follow Me Home.” Intuit Blog
Related Reading from DevelopmentCorporate:
- The Synthetic Research Threat: How AI Could Undermine Your SaaS Customer Insights
- Synthetic Responses in Market Research: Promise vs. Reality in 2025
- AI “Fake People” for Research: A Real Talk for SaaS Founders & VCs
- The Ghost of CASE Tools: Why AI “Vibe Coding” Is Repeating Software History’s Most Expensive Mistake
- The Pre-Revenue Founder’s Guide to AI-Accelerated PMF


