By Clifford Anderson
I was watching a vendor facilitate a test of his software in my lab, when I distinctly overhead him say, “What is he doing?” and “Hey, he can’t do that.”
That was probably my first misgiving, as I’ve always been taught that you should avoid testing your own work. Watching the facilitator, I was worried that he had stepped over the line of too much interaction and was biasing the test.
I’ve been doing usability testing for almost 20 years. Although I haven’t been exposed to that many other facilitators, the ones I had seen facilitated very much like I did, which also seemed to be very much ‘by the book.’ Here, was someone with a very different approach.
To do a quick sanity check, I asked another usability engineer on my team to come and observe. He seemed to confirm my fears: there was too much interaction by the facilitator in the test. I raised my concerns to a usability e-mail discussion list What they had to say got me thinking.
The main theme of my responses was “it depends.” A number of respondents pointed out that the degree of interaction would depend greatly on the kind of test. For example, a quantitative test would be much less interactive than a qualitative one. As Jennifer Kremer, of Hewitt Associates, believes that the level of interaction can also depend on the type of test. If the test requires the user to be timed, I limit interaction as much as possible. If the test is just to see how the user would complete the tasks (and no timing is involved), then I may be more interactive.
Jennifer points out another difference, depending on when the testing is being done, or on the particular parameters of that test:
Also, if it is a more informal design review or paper prototyping testing session, the rigor you would have for a formal usability test may not be appropriate (it just depends on the expectations of you, your users, and the people sponsoring/owning the work).
Being flexible is the key here. As Whitney Quesenbery, of WQUsability, points out that perhaps we are talking about adapting the moderation style to the participants, context, and goals of the test (as, indeed, one might choose a specific testing method to meet specific goals).
At the same time, though, there are a number of calls for interaction that are pretty standard. Chauncey Wilson, of BMC Software, pointed out two very practical considerations:
If a prototype is buggy and there is limited functionality, you may have to intervene to help people around the rough edges.
Intervene when someone has gotten so far off track that you are wasting everyone’s time and you are not learning anything. Try to provide cues in a progressive disclosure style to avoid blatant assistance (give a small tip and then escalate).
Figuring out when the second issue actually happens is more problematic. I rely on quotes in my reports, and I find I get some of my best ones when users are in such situations. At the same time, making the user feel helpless or upset can be very counter-productive. This is a judgment call.
One form of interaction we all have experience with is getting a user to talk. Non-directive, open-ended questions such as “What are you thinking?” “Is that what you expected?” or “What just happened?” seem to work best here. More directive interjections like “Are you confused?” or “Were you trying to copy the file?” run the risk of interpreting behavior and, thus, possibly influencing it. They should be avoided.
A particularly good method to keep users talking involves “active listening.” This limits interaction more to echoing what the user just said and lots of “uh-huhs.” The former is particularly useful if what the user said was incomplete (“This isn’t what …”) or vague (“Wow!”).
Personally, I also do all I can at the beginning of the test to address the unusual situation that users find themselves in (and which, I believe, contributes to their being silent). I directly address the oddness of the environment, the unfamiliarity of thinking out loud, and of my sitting there but not participating. Telling users that the system, and not them, is the “subject” of the test helps too. Treating the user as more of a partner in the study is even more beneficial.
A final necessary interaction is to elicit specific issues. Though we design our tests to lead the user to these issues as part of a real task, human behavior is unpredictable, and users don’t always end up where we think they will. Note, though, that this kind of interaction is much less straightforward and much more prone to bias.
Quality, not Quantity
Ash Donaldson points out that it’s not the amount of interaction, but the kind of interaction that counts, “It’s not so much what level, but what types of interactions, will bias the results.”
As Ash points out, this can get quite subtle:
Without discipline and the relevant background and training, opinions are often transferred during interactions with the participant in the form of leading questions (while probing) on the obvious side, or changes in pitch, intonation and pace of the voice, posture, head movements, hand movements, facial expressions, eye movements, etc., on the not-so-obvious side.
Chauncey Wilson points out that this is especially a problem when it comes to reinforcement:
Watch how you reinforce the participant’s responses. For example, it might be necessary to provide some positive feedback, but don’t overdo it because it might backfire the first time the person has trouble and you aren’t providing feedback (I’ve seen this happen and the participant will sometimes look at the facilitator and say something like, “I guess that I’m not doing too good now” when they are suddenly deprived of that positive feedback.)
One reader referenced an excellent article by Howard Tamler, How (Much) to Intervene. Tamler discusses a particularly subtle problem – “why” questions:
They are not only imprecise but also imply criticism. For example, if I say to my shivering and crying child “Why didn’t you wear your jacket?” she knows she’s being scolded rather than being asked for information.
Likewise, “Why did you select that file?” suggests that the user needs to justify her action because it’s incorrect, and does not specify what the questioner is fishing for. A better way to phrase the same question is “When you selected that file, what were you expecting to do with it?” or “How did you decide to select that file rather than some other file?”
These more neutral and precise questions imply a sincere request for particular information, rather than a request for justification. The difference is subtle but can often be potent in terms of how the user responds.
Other problems include engaging in a dialog with the user (with all the problems the consequent social dynamics introduce). Donna Maurer suggests a way to deal with the user’s natural tendency to do so, “If asked a direct question, try tossing it back with a ‘what do you think?’ kind of question.” Because some users may be naturally curious about how the system you are testing works (especially if they failed to complete a task), I also always tell them to “hold that thought,” record the question, then come back to it at the end of the test.
Another issue involves letting the user play designer. Tamler cites Jared Spool as observing that, “users don’t know enough about the particular application, or design in general, to come up with feasible suggestions, and their answers to such questions are generally naive and unproductive.” At the same time, though, Tamler cites several instances where a user has cut through the Gordian knot and proposed a credible solution, something that I have also observed. In general, at our lab we allow users to “put on their designer hats,” but without encouraging them to do so, and we shut them off only if the hat seems to fit too well and they spend too much time in design mode.
A final issue involves task completion. A user’s sense of whether they are done can be a very important finding. If, however, the facilitator jumps in and asks directly, “Are you done?”, we will never get that valuable information.
Given what a challenge all this can be, how can the facilitator avoid bias in these situations? Simply “winging it” is probably not the solution. Chauncey Wilson suggests including a section in any proposal, test plan, or report that details guidelines for intervention. For example, you could include a note about how you would handle the situations described above.
I take this one step further. I am famous for my extremely detailed test scripts. These typically include the scenario I will give the user and detailed steps for the correct and any logical alternate paths (whether correct or incorrect). The scripts also include possible interactions and interventions for each step. Though no test will ever go so neatly as my test scripts, thinking ahead what you are going to say in this manner can help ensure that each user will hear the same open-ended, non-directive questions.
Even if you’re not open to this degree of fanatical preparation, you can make a conscious effort to avoid bias, even if only during the test. As Ash Donaldson points out,
By choosing words carefully, maintaining a consistent voice and minimizing physical presence, I believe that much more, less biased data can be drawn from participants.
Improving Your Skills
None of this is easy. Tamler cites his background as a psychotherapist as particularly useful. Though going back for your PhD in clinical psychology may not be practical for all of us, there are things we can all do better.
Simply recognizing that facilitation is a real, difficult, hard-to-acquire skill may help. This is particularly the case for beginners, especially when one realizes that the skills involved are not natural in any sense of the word. As Donna Maurer puts it, “One of the hardest things to do is to learn when to keep quiet.” It is natural to want to keep the conversation going, to participate, to be helpful. Much of the mentoring I have done has focused on this basic skill.
For more experienced facilitators, perhaps the best way to improve is simply to get feedback. As Chauncey Wilson points out, if you are a sole practitioner, take some old tapes and have a colleague watch with you and discuss how you intervened and whether it was too little, too much, biased, etc. If you work with a group, consider asking the observers during pilot testing or regular sessions if there was anyplace where the intervention didn’t seem appropriate.
As with a number of usability skills, and as Donna very aptly puts it, “The concepts are easy, the practice hard.”
Special thanks to Jennifer Kremer, Whitney Quesenbery, Chauncey Wilson, Ash Donaldson, Donna Maurer for their contributions to this article.