Why Professional Usability Testing Matters
How Do Development Teams Produce World-Class Products? For consumer-facing hardware or software to have high levels of usability and UX quality, they must undergo rigorous professional usability testing before design freeze. Such professional testing must make use of a wide range of mockups, simulations, and functional prototypes. It turns out that such testing is a fundamental requirement of all consumer-facing hardware and software systems that will eventually deliver a high degree of consumer engagement. One can often predict the potential usability and UX performance of a hardware device or software interface simply by examining the development process and the degree to which professional human factors science and unbiased professional usability testing has been applied during development.
Investors Don’t Link Poor Usability With High-Risk Usability and feature/function engagement are criteria that are rarely investigated by institutional investors when assessing the relative risk of making an investment decision. Defective usability has played a major role in the failure of numerous recent high profile products including the now infamous SNAP Spectacles. A few simple questions by investors during SNAP’s recent IPO roadshow would have likely revealed a failure to employ human factors research and professional usability testing during development of the SNAP Spectacles. More generally, when management is asked whether or not their products have undergone professional user testing and feature/function engagement testing they will often respond in the affirmative. Many executives wrongly assume that running a survey, or obtaining feedback from a few employees or a small group of consumers constitutes professional usability testing. It does not.
Design Does Not Replace Professional Usability Testing Professional usability testing is not the same as UX Design, Industrial Design or Design Thinking which are concept generation frameworks. Professional usability testing is a concept verification methodology. In order to have confidence that what has been developed will be usable and engaging one must apply a meaningful measure of marketing science. Professional usability testing is, first and foremost, a science-based process that is designed to produce unbiased and robust data on products and software user experience performance. In professional usability testing programs, there is always a need for statistical confidence that can aid management in making objective decisions about the usability and feature/function engagement of hardware or software before design freeze.
Professional Usability Testing Does What No Other Methodology Does It is important to note from the beginning that professional usability testing investigates not only the usability of the product but also assesses the emotional connection that core features and functions have with target user population. If a product concept fails to deliver features that resonate with users, the best usability performance possible can do little to ensure the success of that product.
The Big Questions Professional usability testing answers two key questions for development teams. Question 1: Is the core feature set engaging and likely to drive engagement? and Question 2: Can the user interact with such features with ease, low error rates and minimal negative transfer from other products they work with every day? No other professional development discipline reduces market risk and improves innovation like professional usability testing. It answers these key questions by providing management with objective, unbiased data on hardware or software usability and feature/function engagement.
What Really Is Professional Usability Testing? This question may seem obvious, but in fact hardware and software development groups routinely fail to make use of usability and UX optimization research methods available. This is due to simple arrogance or lack of awareness. Very few UX Design, Industrial Design or Software Development programs teach formal usability testing methodologies. Even when development groups are aware of professional testing, they rarely understand which methods to utilize during various phases of hardware and software development.
Standard Professional Usability Testing Methods Hardware and software development teams have available a wide palette of formal research methodologies that can be employed to ensure the usability and UX performance of hardware and for that matter software as well. Traditional usability testing methods include:
- Lab-based usability testing
- Large-sample online usability testing
- Formative user testing
- Summative user testing
- Environmental usability testing
- Heuristics analysis / best practice reviews
- Children’s (COPPA)-age user testing
- User guide testing
- Reference guide testing
- Behavioral data mining
- Ethnographic and field research
- Media and cross-platform testing
- Cognitive modeling and user testing
- Persona development and needs testing
These methods have been traditionally utilized to optimize the usability and UX performance of hardware devices and software interfaces prior to design freeze.
New Advanced Methods However, recently an entire palette of more advanced testing methodologies have been developed that allow hardware and software development teams to dramatically increase the probability that a new product will be successful in the marketplace. These new methods include:
- Environmental navigation eye-tracking and workload analysis
- Advanced emotional response user testing
- Advanced consumer preference testing
- 3D spatial tracking and UX optimization
- Electromyography / physiological effort testing
- Specialized eye-tracking for whole system UX optimization
- Newtonian force measurement
- Multi-factorial visual design testing and UX optimization
- Physical ergonomic optimization
- Cognitive learning decay modeling
- Mobile device utilization and cognitive resource allocation
- Wearable eye-tracking for UX optimization
- Ethnographic total user experience optimization (TUXO) and data mining
- Virtual world avatar behavior tracking and UX optimization
These new advanced methodologies vary in cost, time to execute and scientific validity. They have all been utilized in the optimization of products and software that demands high-quality user experience design solutions. In order to be successful, individuals with advanced degrees in human factors engineering science apply these methodologies.
Professional Usability Testing Is Based On Testing Science Regardless of whether one is employing a standard or advanced usability testing methodology, all professional usability testing programs adhere to well-established scientific testing best practices. Such practices include: design of studies with large enough samples sizes to ensure reliable statistical data; screening and recruiting of respondents based on professional and ethical standards; testing with respondents that objectively represent the actual potential user population; testing with respondents that represent the RANGE of possible users based on professionally developed screening criteria; execution of all research in a company-blind research setting designed to eliminate respondent bias; design of study tasks, questions, and data collection forms that produce unbiased responses; training of all moderators and testing moderators to verify comprehension and study execution; controlled and secure storage of all data, and; design, calibration, and recalibration of all data capture systems prior, during and after each respondent.
Analyzing The Data After all respondents have been tested, additional testing best practice continues, including: checking and cross-checking all data for structure, errors and proper respondent assignment; scrubbing and double verification of all data prior to analysis; use of professionally certified statistical analysis software; checking of data before summarizing for the development team; presenting critical findings to the development team in a format that clearly includes all measures of statistical significance; additional analysis of data based on questions from the development team; archiving the entire project for future reference by the research team and client, and; execution of formal quality audit and application of GDP checklist and debrief by the research team including updates to standard operating procedures (SOPs). This is a way of saying that professional usability testing is a science-based method that delivers reliable insights based on proven methods that are designed to reduce risk.
Management Assumptions That Are Wrong Based on the description of science-based professional usability testing processes described above, management routinely makes two immediate assumptions. Assumption 1: Professional Usability Testing is too expensive, and Assumption 2: Professionally Usability testing takes too much time to execute. Both assumptions are wrong.
The Cost of Failure Versus The Cost of Research The cost-benefit of professional usability testing is a simple calculation: What is the cost of failure vs. the cost of research designed to reduce the risk of failure? For example, SNAP apparently has over 40 million dollars worth of unsold SNAP Glasses sitting in warehouses. The probable failure of the SNAP hardware was largely knowable through the application of professional usability testing for a tiny fraction of the total write-off, not to mention the decrease in market valuation and damage to the SNAP brand overall.
Reductions in Overall Development Time The time to execute professional usability testing studies are well aligned with product development timelines and schedules assuming that such testing is planned for in the overall product launch timeline. Professional usability testing can frequently REDUCE time-to-market for a successful product. When properly executed the study findings eliminate features/functions that do not resonate with the user. Such testing dramatically improves usability, thus reducing instructional development time, call-center development and training. Testing streamlines brand attribute conveyance for critical marketing messaging. If instructional support material is required, data from a professional usability study provides the detailed content for user instruction development based on tasks tested. However, not every program requires a fully structured study. Even the simplest form of professional usability analysis can be highly beneficial. Take for example the following use of professional usability heuristics applied to the SNAP Spectacles.
When Professional Heuristics May Be Good Enough Of all the methods listed above that SNAP could have employed to improve the UX performance of the SNAP Spectacles, that with the shortest lead-time and lowest cost and is professional human factors engineering heuristics analysis. This methodology involves the execution of an audit of the proposed hardware platform early in the development process or at any point during later development. Heuristics UX analysis must be executed before design freeze. In order for heuristics analysis to be effective, it must be conducted by a highly experienced professional human factors engineer. The best result is obtained from a certified HFE professional (CHFP), or individual with similar qualifications. The use of heuristics analysis was selected for the following discussion due to its low cost and fast execution time. Heuristics do not fully replace other forms of respondent-based observational research listed above. However, when properly applied, a heuristics analysis does produce robust insights on potential usability and UX optimization problems early in development.
The Process Is Direct A professional heuristics analysis involves rating the product on a standardized set of twenty heuristic rules. The first step is to determine whether or not the hardware interaction design and physical product design violates a given rule, and if so, to determine the severity of the rule violation. Rule violation severity is rated on a scale between 1 and 5 with 1 being “no violation and no usability and UX impact” and 5 being “extreme impact likely to significantly degrade usability and UX performance”. Below are four of the twenty rule ratings for the SNAP Spectacles. The original analysis on which this article is based involved ratings across all twenty heuristic rules. The SNAP Spectacles violated all twenty heuristic rules. The severity ratings for violations were high in almost all dimensions. The important point is that by executing a low cost and rapid response heuristic analysis, SNAP could have understood and in fact predicted the failure of the current hardware in the marketplace. This type of analysis could have been executed at any point prior to design freeze. In our professional usability optimization practice heuristics analysis has been conducted as early a the paper prototype phase with the resulting analysis identifying serious usability and UX performance problems at the earliest stages of development. Below is a matrix of four heuristic tests and rule ratings for the production SNAP Spectacles.
Heuristic Test #1 – Device Information Clarity Rule
Violation: Yes – Severity: 5
Problem / Analysis: All feedback provided by the Spectacles is presented as light patterns on either the outward-facing light display or the inner hinge of the glasses. The display states require the user to learn an entirely new set of information formats with associated mapping to device functions. Prior instruction is required to know what each of these light patterns represents, yet no such information is provided either with the product or in an easy to locate format. Thus, all critical feedback requires secondary processing of functional information not provided in device instructional materials.
Why This Matters: Users of any form of hardware and software want the information flowing from their new device to be clear and understandable. When users are forced to work with information formats that are totally unique and are presented without ANY explanatory information, they are left to random walk the interface to hopefully determine what the device is trying to communicate. This is the primary usability failure of the SNAP Glasses and is a perfect example of a clever hardware design overriding the far more important usability and information clarity attributes of the device during routine operation and common error states. Strike 1 for the SNAP Spectacles.
Violation: Yes – Severity: 4.5
Problem / Analysis: The outward-facing light display is not in view when the user is wearing the Spectacles. Thus, the user is unable to check battery level while wearing the Spectacles. The hardware design does not make clear which elements of the device control critical functions and how one interacts with device elements to make productive use of the SNAP Spectacles. This is known in the field of human factors science as poor control/display compatibility.
Why This Matters: When users first engage with a new product they bring to such experiences a vast knowledge base of prior experience with other devices. The surprising factor that most UX designers fail to grasp is that the best UX designs, first and foremost, take extensive advantage of their user’s prior knowledge. When a hardware or software design enters the marketplace like the SNAP Spectacles that require an entirely new learning profile in terms of which components require interaction and which contain information display the amount of cognitive effort heaped upon the user is often beyond the implied benefit of using the product. Simply put: new hardware needs to be cognitively familiar or a significant number of users will simply give up…this problem started the slow degradation of the SNAP hardware into usability purgatory. There is more. Strike 2 for the SNAP Spectacles.
Violation: Yes – Severity: 4.5
Problem / Analysis: All critical device interface functional states are presented through the same basic visual status ring light display and small flashing LED on the front and inside of the glasses. This requires the user to expend excessive cognitive workload when attempting to understand key device states by displaying different information in the same display format. The use of multi-modal display violates fundamental HFE science.
Why This Matters: Multi-modal displays are well understood to create high levels of cognitive complexity. The reason for this is found in learning theory where it has been demonstrated that learning something new is far less complex than unlearning and then relearning something. This is exactly what happens when different device states are displayed through the same display interface. Every time the user looks at the SNAP circular display they are forced to forget what the display indicated before and to query long-term memory for new meaning flowing from the same display. Violation of the Non-Modal Rule pushed the SNAP Spectacles further in the domain of truly poor usability. Strike 3 for the SNAP Spectacles.
Violation: Yes – Severity: 5
Problem / Analysis: When the initial pairing of the Spectacles to the mobile device is unsuccessful, the user is not provided information about why the error occurred and how they can correct the error. One is left to random walk the device and online interface looking for possible solutions to initial paring problem. The user must eventually search the web for assistance to be found in videos presented on YouTube by other frustrated users. This is especially problematic during the First User Experience (FUE), as the instructions provided with the Spectacles suggest users’ first steps should be to turn their phone’s Bluetooth on, install and open the latest version of Snapchat, swipe down in the app to view their Snapcode and press the button atop the left hinge of the Spectacles to pair them with their mobile device. In reality, the Spectacles must first be charged to be able to pair them with a mobile device and begin using them. Users do not receive this information from the device nor the mobile app. Users cannot understand and manage critical error states during paring, battery level detection and how to interface the Spectacles with the SNAP on screen App interface. In these types of devices, error state management builds user confidence and over time contributes in a major way to brand value. There is more than one reason that Apple has a Genius Bar…Think error state management.
Why This Matters: Device error state management is the key to helping users build true confidence in using a product. All products fail at some point during normal use cycles. However, most UX designers fail to think of error state management as part of normal use-case. As a result software and hardware design teams fail to develop a display framework and instructional support that allows the user to easily recover from error states. Error state management can have a greater impact on brand image than almost any another product attribute. The SNAP Spectacles leave the user in error state management purgatory. So arrogant is the UX design of the SNAP Spectacles that the device and instructional support, for the most part, fails to even acknowledge that the device may fail to sync, lose charge or otherwise stop playing nice with the user. There is virtually no way that the SNAP Spectacles could have generated wide user acceptance or high levels of engagement. The cognitive workload far exceeds the functional benefits offered to the user. Strike 4 for the SNAP Spectacles.
One can see from the heuristic analysis above that the SNAP hardware had the most basic usability and UX optimization problems and that the severity ratings were on the extreme end of the scale. These types of findings were consistent across all twenty rule assessments. Clearly, this hardware was going to cause consumers a high degree of usability and UX performance pain. This was totally knowable months before design freeze. Even if SNAP employed a hardware accelerator approach in the development of the Spectacles it would have been trivial to conduct professional human factors heuristic reviews during acceleration. Leaving usability and UX optimization to the UX design and hardware engineering teams is a clear path to problems in the marketplace for innovative new products like the SNAP spectacles.
Really, How important Is The Usability Of The SNAP Spectacles? As the relatively simple professional heuristic analysis above shows, the usability of the SNAP glasses are a major problem but ultimately the larger problem is simply a failure to allocate functions that would have had an emotional resonance with the existing user base of the SNAP social media platform. The Spectacles simply fail on the most important structural level in terms of feature function allocation and mapping to the underlying engagement model of SNAP. Even if outstanding usability performance were present in the SNAP hardware, the Spectacles would have likely been a failure in the marketplace. The fundamental question that should have been asked by Wall Street and now investors in SNAP is how does any new hardware – or, for that matter software innovations – engage and expand the SNAP user experience that currently drives its user base to higher levels of engagement. In the field of formal usability science, this is known as function allocation. What can the SNAP Spectacles do better, actually much better, than the user’s smartphone? The answer? Nothing. Producing a novel image capture channel without attendant engagement innovations is and always has been a non-starter.
SNAP Is Not Alone But let’s be clear: SNAP is not the only major high-tech entity to attempt the development of spectacle-based data capture and information display hardware. By far the largest and most visible failure was Google Glass. It is interesting to note that Google Glass offered unprecedented technology solution to data capture and data display. It did so in a potentially transformative manner. However, Google Glass was a massive failure of industrial design. It was and remains a primary example of how the visual appearance design of a product is far more complex than most industrial designers realize. In the case of Google Glass, the industrial team failed entirely to realize that any form factor that is directly positioned on the user’s face produces an extraordinary amount of cognitive impact in terms of impression and projected meaning on the part of those observing and wearing the product.
The Human Face Is A New Frontier For Hardware Design The Perception of faces is so important in our evolutionary and day-to-day existence that face perception has its own primary neurological center in the human brain. Anything associated with the human face is loaded with special significance and embedded meaning. Objects positioned on the face undergo instantaneous assessment when viewed by others in ones social sphere. The industrial designers of Google Glass applied their own biased visual style to the design of Google Glass to create a wildly differentiated and high-tech visual impression, an impression that instantly communicated negative functional attributes of privacy over-reach and high tech elitism. Google Glass was dead in the water from day one based on naïve application of industrial design visual style theme that was both inappropriate and psychologically off-putting. This was an unfortunate mistake on the part of the Google Glass industrial design team, because the underlying technical platform was and is exceedingly innovative and potentially useful. Today, in certain occupational applications where visual style is moderated by professional need, Google Glass is apparently finding a rich set of new applications. In the same way that SNAP Spectacles failed to drive customer engagement with its core platform, Google Glass failed to drive customer acceptance for its core feature set based on how it appeared visually, not how it functioned technically.
Heat map of a pharmaceutical dosage table generated from MUS eye-tracking data showing confusion over a specific dosage combination required to properly deliver the associated drug.
The Future Of Visual Gaze-Tracking Hardware Even though SNAP and Google have produced major flameouts in the design and production of these types of interfaces, it is clear that spectacle-based data capture and information display as a structural concept remains an area of massive potential. Hardware that successfully integrates data capture of the environment with actual fixations of what the user is viewing produces exceedingly powerful insights into how we navigate our everyday lives and make decisions in a rapidly changing technology-based world. We know from our work in the use of advanced eye-tracking technology to conduct consumer research that such a paradigm is very powerful. Take for example a recent study undertaken by our UX Research Lab on medical device design and instructions for use.
Eye-Tracking As A Research And Data Capture Tool As noted in the list earlier in this article, one of the advanced usability and UX optimization testing methodologies available to hardware development groups is head-mounted eye-tracking. This methodology utilizes advanced data capture glasses to track the users entire visual search behavior and can record and provide highly reliable research data on what the user is viewing, how long they view certain information, which information they return to during a given task, which information they fail to view or read entirely, and a wide range of more sophisticated information including the relative measure of cognitive workload required to deal with a hardware device and/or its instruction set. This following example shows a recent study conducted by our UX Labs Group examining the relative usability of a consumer-facing blood pressure device sold in pharmacies. This study utilized eye-tracking to determine where in the First User Experience (FUE) consumers and users of this type of device encountered critical errors and confusion. The image below shows the component parts of the study set up. This research included an examination of the total user experience (TUX), including unboxing and use.
Visualizing The Cognitive Problem The image below shows one of the studies’ heat maps, which demonstrates during this respondent’s first time use of the hardware they had high levels of confusion and errors associated with a specific set of images and text during initial set up. As can be seen from the image below, the user focused high levels of visual attention on a specific small set of instructions and ignored most of the rest of the procedures. The confusion in this example shows how important it is to test even simple instructions in unison with the actual hardware. In this study, the user failed the first time use of the device because they skipped entirely the step required to insert a set of AA batteries. They continued to attempt to take their own blood pressure with the device for several minutes before finally giving up. From a human information processing point of view, the level of cognitive complexity in this device is comparable to that found in the SNAP Spectacles. This is based on the number of steps required to achieve initial success, the relative complexity of the steps, the amount of prior learning that the user came to the task with and the interconnection of the device to an external APP recording and tracking system. The confusion shown in the heat map below is due to the insertion of a very low-frequency use case exception into the task flow. This type of deviation is well known in human factors engineering science to dramatically increase the cognitive workload of users and increase critical errors. This is similar to the SNAP Spectacles problem that occurs if one happens NOT to update to the latest SNAPCHAT smartphone application.
Where Is The Big Opportunity? It is clear to those involved in professional HFE research that the types of usability and UX performance problems as seen in the SNAP hardware were totally avoidable through the application of standardized professional usability testing methods. What is less obvious is how more advanced research methods like the behavioral response to product features and 3D spatial tracking could have been used by SNAP to actually provide users with a totally new set of functions that would have driven deeper platform engagement and increased user acquisition to the online platform and profits from exceptional hardware design. In terms of Google Glass, the use of multi-factorial visual design testing and UX Optimization research would have both saved an amazing product and likely provided Google with the insight to dramatically understand how industrial design solutions are tested and optimized for the long term.
Chris Morley, M.S. Human Factors Engineer / Aileen S. Gabriel, Human Factors Engineer
About MAURO Usability Science Founded in 1975, we are among the most experienced international consulting firms focused on helping world-class clients and leading startups solve business-critical problems related to the usability and interactive quality of their products and services. In short, we help make complex products simple and simple products empowering. We are proud to have solutions that are running at the heart of the world economy….More.