Editor's note: This is Part 2 of a two-part series. Read Part 1 here.
Stop Asking the Wrong Questions of SPC
In Part 1, in association with “Stop Testing Everything,” we discussed the temptation to use statistical process control (SPC)/control charts too widely. In one sense, using SPC to monitor non-CTQ (Critical to Quality) design output is a case of asking the wrong question. However, our point here runs deeper than that—it goes to one of the fundamental points behind the creation and function of control charts.
Donald J. Wheeler has written of the perceived feat that SPC methodology is able to identify from a single data set whether the process that produced those data is in control (in the control chart sense). Consider this: if you use a data set to identify a distribution while presuming that only one process gave rise to that distribution, how can you use that same data to identify if some of the data points were produced from a different process (i.e. a “special cause”)? The answer is, you can’t. This is not at all what Shewhart’s control chart methodologies really do.
Wheeler effectively and at length describes what control charts really do in his book Normality and the Process Behavior Chart. To summarize at a high level, regardless of the actual chart used (Xbar-R, I-MR, P, U, etc.) the methodology effectively uses one sample population to determine the distribution and associated control limits, and then applies those limits to test a different sample population. This is the origin of “rational subgrouping.” Applied incorrectly, rational subgrouping can easily result in a conclusion that a process is out of control . . . when from another perspective, it actually is in control. The conclusion reached depends on the question you are asking of SPC, so you must understand what question you are actually asking!
This point is best understood by way of example, and the following is the most accessible example I have seen and have run across several times:
Consider that you have the same manufacturing process that is running on two machines (A and B), with each producing the same widget Z. One of the CTQ dimensions on widget Z (measured in mm) is measured for control charting. This process has been running for several years and none of the measured CTQ values are ever outside of specification. There have been no complaints either externally or internally, but remember that tolerance limits are very different and separate from control limits.
Customers see the entire output of both machines, meaning they see both machines as constituting a single process, so we want to sample from both machines. We proceed to use an Xbar-R chart, taking five samples from machine A to form our first subgroup, then five samples from machine B to form our second subgroup, and so on. The results look like this:
The process appears to be “out of control.” However, this result does not make sense in light of our good long-term experience with this process.
This confusing result occurred because we asked the wrong question of SPC. Remember the point we made above: control charts use the properties of one sample population to make a determination of the status of “control” of another distribution. In the example at hand, each subgroup of five samples was drawn from a single machine and the variability (range, in this case) of those within machine samples was used to determine the control limits of the Xbar-R chart.
As it turns out in this case, the output of the two machines are different from each other, as the following histogram with normal fits illustrate. They have equivalent standard deviations, but different means:
So, by creating subgroups strictly within the output of each single machine and then using that to assess the overall output of both machines, we were actually asking, “Does the output of the two separate machines differ significantly?” This is not the same question as asking if the overall output of the process is in control. The customer does not care if the output of the two machines differ—they see the pooled output of both machines. Because of this, we need to define our subgroups in such a way that each subgroup includes output from both machines A and B. The control charting methodology and sampling (subgrouping) approach to use then depends on the question you want to answer.
If, for example, you wished to ask the question, “Is there undue variation (instability) between different time periods of a shift, then you might take five samples at the beginning of a shift, alternating between machine A and B. Use those first five samples to create your first subgroup. In the middle of the shift do the same, perhaps starting with machine A this time, and then do the same at the end of the shift. This way each subgroup is drawn from both machines, and the variation between subgroups is different times of the shift. The resulting Xbar-R control chart is shown below:
This control chart shows the process to be “in control,” as we would expect given our experience. The conclusion here is that there does not appear to be undue variation of the overall output of the process between different times of a shift.
You could take a similar approach to ask the question, “Is there undue variation between shifts?” To ask this question, you might use the same approach as above of taking five samples, alternating between machine A and B—but this time spreading the overall collection time for the five samples across an entire shift. With the data at hand in this example, the resulting Xbar-R chart would look identical to the one shown above, but the conclusion is different, namely: there does not appear to be undue variation of the overall output of the process between different shifts.
The final example we give here is the question, “Is there undue variation of this process over time—days, weeks, or longer?" In this case, we do not have a clear logical subgroup, but simply have an ongoing collection of individual pieces of data collected over time. The hint we should all recognize here is the word “individual” . . . which leads to the I-MR chart as the appropriate charting technique to use here. As before, we alternate between samples from machine A and B. The resulting I-MR chart is as follows. It has many more data points than such a chart should, but we wanted to make sure to use exactly the same data as the previous charts used:
This chart is also “in control,” leading to the conclusion that there does not appear to be undue variation of the overall output of the process over time.
We want to emphasize that in all the examples above the same data were used, yet only one result showed an “out of control” response. Hopefully this helps drive home the point that the SPC charting and sampling approach that is used defines the question being asked. This in turn defines the conclusions that should be reached. In the case of the first “out of control” example, the conclusion is nothing more than “the outputs of the two machines is significantly different.” In this case, the output of either machine is acceptable to the customer and stakeholder, so who cares? Because no one cares, keep things the way they are and move on.
This brings us to the last point of this section, which is likely the most crucial take-away. Anyone who was privileged to have attended a seminar with W. Edwards Deming likely felt his passion for the following point: do not take action where you do not need to. If you do not heed this advice, in reality you are just creating undue confusion about what is happening in a manufacturing process, likely increasing scrap, and increasing costs both due to scrap and increased resource usage. It has been our ongoing observation that Deming’s warning is true.
To take it a step further, we would claim that the most rewarding interactions, yielding the highest quality and financial rewards to organizations, has occurred when we have successfully made data-driven arguments that no further action or changes should be made on the manufacturing process in question.
This can be very hard to accomplish because we naturally feel better and feel that we are accomplishing something when we are doing something. That temptation might be further exacerbated by someone pushing for “doing something different,” whether internally or externally. Make sure you are implementing and interpreting SPC—and process capability analyses—correctly and as you intend. Take a deep breath, and . . . believe the data. If the data paint a picture that is acceptable to the business on an ongoing basis, then be willing to take no action. Surprisingly often this is the best course forward.
Stop Holding on to Products Until the Bitter End
This section is short and to the point. Stop manufacturing and marketing products until forced to do so due to an impossibility of continuing.
As the timeframe in a product’s lifecycle moves beyond the development phase, a countdown timer starts on the availability of components, subassemblies, unique production supplies, and even software and communication standards (like WiFi and Bluetooth). Especially in today’s fast turnover, consumer-driven marketplace for many of those items, time to obsolescence for many can be measured in months or a few years.
When one of these items becomes obsolete, you, as the manufacturer, have a problem. Typical “fixes” to this issue are to do a large “last buy” of an item or to scramble to specify a new item to replace the “lost” one.
Quickly specifying a new item is expensive, invites errors and risks to product quality, and arguably should not even need to be done if the item had not originally been specified too tightly—remember the points of “Stop Arbitrarily Defining Requirements and Specifications.” The risks to cost and quality are amplified if the item is in fact CTQ and the system design requires tight tolerances on it, which in itself is not a good idea.
“Last buys” are also problematic. Maintaining the resulting cost of inventory alone should be a red flag to the organization. Also, we quite often do not have data on the effects of age on those items as they sit in the warehouse. If a manufacturer does not cover the item with a warranty beyond a relatively short period of time, then their Certificates of Compliance will become invalid, and you have an added risk/cost of needing to requalify that inventory. Other risks include loss, contamination, or damage to those stored items.
All of these considerations just get worse as the timeframe stretches out—leading to increased costs, increased scrap, decreased quality, and increased drag on organizational resources at all levels. Sometimes these considerations will put the overall organization at risk due to hits on the profit margin and reputation.
As easy as it is to state, but perhaps harder to consistently implement, the solution to this issue is to build your own product’s obsolescence and replacement into your overall product lifecycle planning, and to do it up front. In a real sense, this is another way of breaking down the artificial distinction between development and manufacturing and other functions like marketing. So the development team, instead of developing a product and throwing it over the wall to manufacturing should simultaneously be thinking of the impact of their design decisions on manufacturing, as well as how marketing will drive obsolescence and replacement of that product a defined number of years down the road.
Portions of information contained in this publication/book are printed with permission of Minitab Inc. All such material remains the exclusive property and copyright of Minitab Inc. All rights reserved.