A practical map for UX & product teams that want better questions, better methods, and better decisions
Most of us learn the field through a tidy dichotomy: generative research helps us discover opportunities and needs; evaluative research helps us test solutions and validate usability. The shorthand is memorable and, as far as on-ramps go, not bad. But as soon as you start making real product decisions: across messy organizations, multiple time horizons, and conflicting constraints: you realize two buckets can’t possibly carry all the nuance.
Some work isn’t really about discovery or validation so much as diagnosis (“why is step three failing?”). Some work is causal (“did this change cause the lift or was it seasonality?”). Some is longitudinal, because the truth of a workflow only appears over time. And some is not a “study” at all, but a continuous practice: listening to support calls, mining sales objections, reviewing session replays: that steadily changes what you build and how.
This essay offers a map beyond the binary. I’ll keep the generative/evaluative umbrellas in sight, but the goal is to give you a way to categorize research that better matches the decisions you face and the evidence they truly require. If you’re junior, this should untangle the jargon without dumbing it down. If you’re mid-senior, you’ll see places to tighten your practice: where a slight shift in framing unlocks a better method, faster.
Ready to build that trust and kickstart your research?
let’s make trust the foundation of every project you work on.

Why the Simple Split is Helpful. And Why it Falls Short
Generative vs. evaluative is a useful way to resist the trap of testing ideas you haven’t actually framed. If you’ve ever run a beautiful usability test on a feature no one needed, you know the pain. Generative work slows you down to speed you up: you spend time with people in their contexts, you learn their language, you witness failure and workarounds, and you surface problems worth solving. Evaluative work, when done right, hardens our confidence. We test the thing in front of us against the job we said it would accomplish, and we fix what breaks.
Yet as a taxonomy, the pair blurs categories that matter in the day-to-day: what time horizon are we working on (strategy, roadmap, release, post-launch operations)? Where will the evidence come from (what people say vs. what they do)? What shape is the data (qualitative stories vs. quantitative distributions; single moment vs. over time)? In what context are we capturing it (field vs. lab, moderated vs. unmoderated, remote vs. in-person)? And what risk are we addressing (discovery unknowns, diagnostics, benchmarking, or causal inference)?
Once you start to see these axes, your method picks become obvious instead of doctrinal. You stop arguing over what’s “properly” generative or evaluative and start asking, “For this decision, which evidence reduces the most risk the fastest?”
The Axes that Actually Shape Research Decisions
If you only adopt one idea from this piece, make it this: the best categorization scheme is the one that helps you choose. Generative/evaluative is a helpful headline; the following axes are your day-to-day levers.
- Decision horizon. Are you informing a strategic bet, a quarterly roadmap decision, a sprint-level design change, or a post-launch remediation? Strategic work tolerates ambiguity but demands breadth; sprint work demands specificity and speed; post-launch needs a tight coupling to product analytics and operational metrics.
- Evidence source: attitudinal vs. behavioral. Interviews, concept tests, and surveys surface beliefs, desires, anxieties: the things people say. Observation and telemetry show the things they do. The cliché is correct: you need both, but not for the same questions.
- Data shape: qualitative vs. quantitative, cross-sectional vs. longitudinal. Thick descriptions and clips explain why; distributions and counts tell you how often and how big. Some truths are only visible across weeks or months (adoption curves, habit formation), which pushes you to longitudinal diaries, experience sampling, or cohort analytics.
- Context of capture. A participant’s desk, their phone on a bus, a usability lab, a remote session with screen share: all shape the behavior you’ll see. So do moderation (probing in the moment) and unmoderated setups (speed and breadth when wording is unambiguous).
- Risk focus. Are you discovering unknowns, diagnosing a failure, benchmarking current performance, or establishing causality? Each risk suggests a different family of methods.
These five are not academic hair-splitting; they are the knobs you’ll actually turn. When teams get stuck, it’s usually because they’re arguing after the method is chosen. Flip the conversation: start with the axes, then pick the method.
Popular Categorisations Translated into Choices
You’ll encounter other pairs and trios in the wild. They aren’t wrong; they’re just different lenses on the same decisions.
- Formative vs. summative: improve vs. prove. Formative work exists to change the design; summative work exists to state what performance is. If you’re not willing to change anything, you’re doing summative. If you don’t have predefined criteria, you’re doing formative. Many teams blur the two and end up with neither: a “score” no one trusts and a design no one changed.
- Exploratory vs. descriptive vs. causal: a stats lens in plain language. Exploratory finds signals and patterns; descriptive tells you how big and how often; causal asks “did X cause Y?” and requires careful controls. If your team is arguing about whether a new onboarding actually “worked,” you’ve wandered into causal territory whether you name it or not.
- Problem space vs. solution space: the Double Diamond’s friendly cousin. The shift that matters is your posture: in problem space you’re broad and curious; in solution space you’re selective and specific.
- Continuous discovery vs. project-based: cadence, not method. Continuous discovery means you never shut the faucet: conversations weekly, a standing panel, a rolling diary, a habit of mining operational data. Project-based spikes for big questions, but the system that keeps you honest is the continuous layer.
- Primary vs. secondary: new data vs. existing signals. Some of the best “research” you’ll ever do is secondary: trawling support tickets, reading sales objections, living inside product analytics, listening to five hours of call recordings. It’s not glamorous, but it is clarifying.
None of these need to replace generative/evaluative. They’re pragmatic zoom-levels. Use the one that leads you to a better question, faster.
Research Method Families and Where they Sit on the Map
Rather than a phone-book list, it’s more useful to place methods by what they’re excellent at. Interviews are not “for discovery”; they are for language and reasoning: the words people use, the criteria they apply, the stories they tell themselves. Contextual inquiry is not “for qualitative data”; it is for seeing work as it really happens, including the environment, the third-party tools, the policy weirdness. Usability testing is not “for validation”; it is for finding where intent and interface disagree and uncovering the cause.
Here’s a compact view to anchor the mental model:
Method family | What it’s best at | Typical moment |
Context & field (contextual inquiry, shop-along, shadowing) | Real workflows, constraints, tacit knowledge you will not hear in interviews | Strategy / early roadmap; also diagnosis for complex failures |
Interviews & IDIs (semi-structured, JTBD, expert interviews) | Language, decision criteria, anxieties, motivations | Early discovery; reframing; post-test “why” |
Prototype & UI (moderated usability, first-click, tree tests) | Alignment of intent and interface; findability; comprehension | Iteration; pre-release checks |
Post-launch & experiments (analytics, session replays, A/B) | What happened at scale; whether the change moved the metric | Ongoing operations; impact sizing |
This is deliberately simple. You can add diary studies (longitudinal understanding), surveys (descriptive sizing, segmentation), expert reviews (fast heuristic passes), and accessibility evaluations (standards and observation) on top, but the map above will already improve 80% of method decisions.
Decision-first Chooser: Start from the question you must answer
The easiest way to escape taxonomy debates is to put the decision in the center. What’s the actual question that must be answered for the team to move? In practice, a handful of patterns cover most work.
“What problems truly exist, and how do people currently solve them?”
Start with field observation if you can; if you can’t, simulate context during sessions and bring artifacts into the room: screenshots, emails, tools, physical objects. Follow with interviews that anchor descriptions in concrete episodes (“Tell me about the last time…”) rather than hypotheticals. Mine support tickets and search logs in parallel. You’ll get triangulation: what they do, what they say, and what the system records when it breaks.
“Which problem should we prioritize?”
You’re moving from insight to sizing. Translate qualitative themes into answerable questions and run a survey that samples the right population. Keep language faithful to what people actually said; don’t sanitize it into corporate mush. Combine the descriptive results with value mapping (impact on revenue/cost/risk) and feasibility. You’re not proving causality; you’re deciding where to place scarce effort.
“Will this concept resonate?”
Before you argue about pixels, test the idea as a promise. “Imagine you could… How would you compare that to how you do it today?” Bring competitive alternatives into the conversation: it’s rare that your product competes with nothing. If people struggle to see value, the UI can’t save you. If the value is clear but fragile, you have a design problem worth solving.
“Why is this flow failing?”
Don’t guess. Watch people attempt the exact sequence that’s underperforming, on the devices and connections that match reality. Record the attempts and annotate the struggle. Follow with a tight change that directly addresses the cause you observed. Then measure the change in the wild. Diagnosis without follow-through is theater; follow-through without diagnosis is waste.
“Are we ready to ship?”
This is a confidence question, not a curiosity one. Run a short, standardized pass on the final flow. Keep tasks and success criteria crisp. Capture the error types that would drive support or legal risk. You’re not chasing perfection; you’re bounding risk so you can ship with eyes open and a watchlist.
“Did the change help?”
If you have traffic, instrument cleanly and look at the before/after in production; if you have the infrastructure, run an experiment. If you don’t, trend thoughtfully and be honest about other forces. A change that looks flat might have protected you from a seasonal drop; a lift might be a coincidence. The grown-up move is to treat uncertainty like a design constraint and decide what additional evidence would change your mind.
Mixed-method logic that actually saves time
“Mixed methods” isn’t a slogan; it’s a discipline of sequencing. When you see a metric fall, you move Quant => Qual: the number points to the wound, observation explains the injury, a fix follows, and a later number confirms the healing. When you hear a compelling story in interviews, you move Qual => Quant: translate the theme into concrete items, size it with a survey, and use the result to prioritize.
When an idea is born in discovery, you go Discovery => Experiment: test early for comprehension and value, take the best variant forward, and size the impact once it’s live. And when the team is spinning on risk, use Riskiest Assumption Testing: list the assumptions, pick the one that would sink the effort if wrong, and choose the fastest method to break it.
The time savings come from resisting the urge to “do everything.” You don’t need a gold-plated study; you need the next piece of evidence that unlocks a decision. The art is picking the right piece.
Picking up a Research Method in Real-life Situations
A membership product was stuck at “add payment.” Analytics showed a gap but not the cause. In moderated sessions two things became obvious in minutes: the card-type logos looked like tappable options (people kept tapping Visa instead of entering digits), and the “billing address same as shipping” checkbox was below the fold on mobile.
The fix was absurdly small: remove the tappable affordance from logos, surface the checkbox, and clean up error language. Support tickets dropped and conversion ticked up. Was this generative or evaluative? Neither label helped. It was diagnostic. The method: moderated tasks with realistic devices: flowed naturally from that word.
A B2B dashboard had low active use beyond day one. Interviews kept circling the same complaint: “I don’t know what to do with the numbers.” That’s attitudinal, but not frivolous. We ran a diary for two weeks with five people and asked for a weekday screenshot plus one line: “What decision did you make from this today?” The answer was: none.
They were waiting for a weekly review. The change wasn’t UI chrome; it was a feature that generated a Monday “what changed and why it matters” brief, paired with a UI nudge that turned the brief into a small, guided action. Longitudinal work revealed the cadence mismatch; the solution sat at the intersection of product and communication.
Question – Evidence – Research Method Map
Not a taxonomy for taxonomy’s sake: just a quick cross-reference you can use in planning.
Question you’re answering | Evidence you need | Methods that fit |
What problems exist & how do people solve them now? | Behavioral reality + language | Contextual inquiry, shadowing, semi-structured interviews, support/analytics review |
Which problem should we prioritize? | Descriptive size + value | Survey sizing, opportunity scoring, stakeholder value mapping |
Why is this flow failing? | Causal-ish mechanism (without lab illusions) | Moderated task-based testing on real devices + targeted session replays |
Are we ready to ship? | Bounded risk on critical steps | Standardized usability pass, accessibility review, “watchlist” for ops |
Did the change help? | Field impact | Instrumentation, analytics trend, experiment where feasible |
Limit yourself to this table for a quarter and see how many arguments evaporate.
Article Recap – in Tabular Format
Axes recap: when each matters most
Axis | Why it changes the plan | A cue you’re in this territory |
Decision horizon | Strategy tolerates ambiguity; sprint work needs specificity | Stakeholders ask “what should we build next?” vs. “can we ship this week?” |
Evidence source | Attitudinal explains intent; behavioral reveals reality | You hear “users say…” vs. you see “users do…” |
Data shape | Qual explains; quant sizes; longitudinal reveals change | “We need to understand why” vs. “we need to know how big” vs. “we need to see it over time” |
Method fit: quick sanity check
If you need… | Favor… | Be careful of… |
Language & decision criteria | Semi-structured interviews, JTBD | Hypotheticals, leading prompts |
Real workflow constraints | Contextual inquiry, shadowing | Lab setups that hide the environment |
UI comprehension & flow | Moderated task-based tests | Unmoderated prompts that can be misread |
Field impact of a change | Clean analytics, experiments | Reading success into noisy trends |
Keep these small, and they’ll actually be read.
Bringing it all together as Summary
You don’t need a brand-new taxonomy. What you need is a way of thinking that honors the strengths of the generative/evaluative split while admitting what real work demands: choices along several axes, each reshaping the study you run and the confidence you can claim. When someone asks, “What kind of research is this?” answer with the decision you’re enabling, the risk you’re reducing, and the evidence you’ll use. When someone says, “Can we skip research and just experiment?” ask what variants they plan to ship and whether those candidates were created with diagnosis or wishful thinking.
When analytics show a cliff, go watch three people fall off it, and you’ll often know in an hour what to change. When interviews yield stories that won’t leave your head, go size them honestly so the roadmap has something firmer than enthusiasm behind it.
The most effective teams I’ve seen aren’t the ones that can recite ten categorizations; they’re the ones that can change posture quickly. They know when to open up and when to close down. They are comfortable saying “we don’t know yet” and just as comfortable saying “we know enough to ship.” They steal speed from process where speed is harmless and spend time where attention is priceless. They resist the dopamine of dashboards and the theater of “research for research’s sake.”
They cultivate one simple habit that makes the rest easy: observe, decide, and observe again.If you remember nothing else, remember this: labels are for shelves; evidence is for decisions. Use the simplest label that gets you to the right method. Use the smallest method that gets you to a real decision. And keep the loop running: not because a framework told you so, but because every time you do, the product gets a little less mythical and a little more true.
Frequently asked questions
What is the difference between generative and exploratory research?
Both fall under the umbrella of formative (generative) vs. summative (evaluative) research. Generative research focuses on creating new ideas and solutions, while exploratory research aims to understand the current situation.
When should I use generative research in UX?
Generative research is ideal at the early product stage to uncover user needs, motivations, frustrations, and contexts. It lays the foundation for UX research tools like user interviews, stakeholder interviews, diary studies, and open card sorting. This phase helps define the right problem before jumping into solutions—complemented by design artifacts like personas and journey mapping.
What methods fall under evaluative research?
Evaluative research methods such as usability testing software, remote usability testing tools, A/B testing, tree testing tools, and closed card sorting are used to test and validate designs or prototypes. They measure effectiveness, usability, and user satisfaction with existing interfaces or features.
How can user research platforms support both types of research?
A comprehensive user research platform supports both generative and evaluative methods: from recruiting participants for generative studies like concept validation and in-person interviews to running remote usability testing software and user surveys for evaluative feedback.
Why use both generative and evaluative research?
Because they serve complementary purposes. Generative research helps answer “What problem should we solve?”, while evaluative research tests “How well is our solution working?” Combined, they form a robust cycle of discovery and validation, ensuring your solution meets real user needs.
What is an example of evaluative research?
An evaluation research example is a public health study assessing a new vaccination campaign’s effectiveness in reducing childhood disease by measuring changes in disease rates before and after the campaign and surveying community perceptions. Another example is a company analyzing a new website’s usability through usability testing and user surveys to identify pain points and gather feedback for improvements.
Can generative research be qualitative only?
Yes—generative research is primarily qualitative, focusing on deep insights into user mindset and behavior. Evaluative research often uses quantitative methods (e.g., structured surveys, analytics), though it can also incorporate qualitative approaches like follow-up interviews.
What are the differences between generative and evaluative research?
Generative research is used early in the design process to explore and uncover user needs, motivations, and problems, while evaluative research is done after you have a concept or product to test and validate how well it works. In short:
Generative = find the right problem
Evaluative = test the right solution