in customer research

Embracing the Un-Science of Qualitative Research Part One – Small Sample Sizes are Super

If you’re into qualitative research at all, it wouldn’t have taken long before you had someone ask you about the statistical significance of your research and how you could back your findings with such a small sample size, or to find others out there trying to make qualitative research look more scientific by trying to extract hard data from it.

There are three main ways that you can try to make qualitative research look more scientific, being:

  1. Use a relatively large sample size
  2. Ensure that your test environment doesn’t change
  3. Ensure that your test approach doesn’t change (don’t change the script, and stick to it)

Now, there are some times when one or more of these tactics is appropriate, but conversely, in many instances it has been my experience that by breaking these rules, you are able to get much greater insight into the research question(s) you have set yourself.

There are many different kinds of qualitative research study, so in the interests of clarity, let’s pick one just like I’ve been working on this week – a lab based combination of interview & a wee bit of usability which is intended to ensure that my client’s proposition is sound, that it is being well communicated, that the users understand what the service is and how it works, and to weed out any critical usability issues.

In the interest of not making you read an enormous post, I’ve divided this into three parts. So, let’s start with part one – a large sample size. Now… to the best of my knowledge there is no scientific way to determine the correct number of participants in a qualitative research study. Now, I’m no statistician (if you are, please feel free to weigh in here), but it is my understanding that the likelihood of reaching a statistically significant result using the methodology I’ve described above, is pretty much nil. Not that it’s impossible, but you’d have to do a heck of a lot of interviewing.

And here’s one golden rule of qualitative research that always holds true – if the research is going to take too long or be too expensive, it will not happen. You can count on that one.

As a result, sample size for qualitative research is often driven by the time and budget available – and that’s not necessarily a bad thing. In fact, this is one subject upon which Jakob Nielson and I actually quite agree. Jakob says that most of the time elaborate usability testing is a waste of time and that you should test with no more than five users. He has a natty little graph that illustrates why this is so:

Problem Finding Curve

As you can see – by the time you’re up to five or six users, you’ve gotten to the bottom of most of the usability issues, and from then on you spend more time repeatedly seeing what you’ve already seen before and uncovering very few new findings. In my experience – this is as true for other aspects of research as it is for usability.

I would add a caveat which is that if you have user groups that are quite divergent in their attitudes, experience, or requirements/goals etc. you will want to ensure that you apply this rule to each of those groups. So, for example, if you have an audience of ‘buyers’ and an audience of ‘sellers’ you’ll want to get no more than five each from each key audience. One final caveat – when I say no more than five, I also say no fewer than three (and, what do you know, so does Jakob). You need at least three to identify what are actually patterns from those things that are just personal quirks – because that’s what you’re looking for here – the patterns.

Is it scientific to use such a small user group? If you want to make it look that way, you can look to Jakob for some algorithms and graphs. In my experience – it doesn’t matter whether it is scientific or not. The richness of the information and insight you receive even from this small sample size makes the return on investment enormous – and the small sample size makes it an activity that almost any project can incorporate into their timeline and budget. At the end of the day – those things are far more important than scientific validity.

Is it worth doing qualitative testing with only a small sample size? Absolutely yes. In fact, in many ways, this is the best way to do this research. Qualitative research is not about numbers, it is about the richness of the information and insight you can get access to by spending time with the people who form your audience (or potential audience), and looking for patterns in their reactions and responses.

In many cases, increasing the size of your sample so that it seems more ‘valid’ is a waste of time and money as the later interviews become more and more a repetition of finding you’ve already identified and confirmed. This time and money could be much better spent improving your product and conducting another round of research.

If it’s numbers you’re after – go do a survey. I say embrace and defend the small sample size of qualitative research.

What say you?

(Coming soon: Part Two – Ever-Evolving Prototypes are Ace)

  1. For the purpose identifying common usability problems, such small sample sizes are perfectly valid from a statistical perspective. One approach to convincing a skeptical client of that without going into the math is to point out that if problem exhibits itself in a large portion of your users, you’re very likely to see it in a small sample; if the problem exhibits itself in a very small portion, you’re very unlikely it see it in a small sample. Small samples, in other words, are a wide-mesh net, convenient for likely catching only the big problems –which are often the ones the client primarily cares about.

    Now you can point out to the client that if one out of a mere five test users have a certain problem, you can’t say with high certainty whether that means 10%, 20%, or even 50% of the total population of users will have that same problem. But as Reichelt implies, in many situations it’s more important to spend your time and money on detailed qualitative interviewing and observing of that user to find out _why_ she or he is having a problem than on getting a more precise fix on the size of the problem.

  2. Actually, I have issues with applying statistics to something as rich and variable as user behaviour and meeting user needs and expectations. Aside from that (yes, I’m a convinced relativist most days of the week), I’d say that you’re more likely to pick up small but potentially disruptive-in-the-long-term problems from detailed work with small samples than large surveys. I’d contend that large qualitative surveys frequently only tell you what you’ve already decided is likely to be the case if you look at how hard it is to avoid questions that telegraph or construct the ‘appropriate’ answers.
    Now ‘positivist’ data does have it’s place – and that may well be in the area of working with user stats (another advantage of “release small and release often”).

  3. I’d say the numbers are bang on Leisa. We recently did some fairly early user feedback on service propositions, articulated as user scenarios, in two distinct locations and we reported back on exactly six participants from each session. To be fair, we were specifically testing ‘desirability’ through usefulness but I figure the same guidelines apply.

    What often makes for even better client buy-in (or, to put it another way, de-risking the whole process) is if you can meet with the same ‘users’ you saw at the beginning of the project – if you’re lucky enough to have carried out empathic research.

    Key of course is in selecting your participants – not to slant the results by only choosing ‘friendlies’, but a balanced group of differing skills and abilities, a ‘cross-section’ of society if you will.

    Not that I’m an expert on usability – I didn’t make it anywhere near David Armano’s Top Ten Names in User Experience, unlike some people around here!

  4. As a former lecturer in statistical methodology for psychology it’s interesting how many people make mistakes in the choice and use of tools like statistical analysis.

    When testing children for developmental problems, like autism or ADHD, giving a child an intelligence test like the WISC will give good indications of where (statistically) the problems lie. What it won’t tell give you an understanding of the child and his problems — only the tester’s observations can do this. And when it comes to telling the parents, handing them numbers won’t relate the issues — this is where the qualitative measures are critical.

    Only qualitative measures are going to give the numbers any real meaning. Only qualitative and quantitative measures together are going to give you the whole picture.

    M

  5. Hi Leisa
    I totally agree upon the idea of small sample groups when conducting qualitative research. As you said, the rest will become repetitive and redundant, and more time should be spent on further research that complements the previous one.
    The reason I am reading your blog is that it’s a requirement for my Master’s program in Educational Technology at MSU, but I’m enjoying it nevertheless:-)

Comments are closed.