In the realm of quality management, CPK (Process Capability Index) and PPK (Process Performance Index) are common interview questions and indispensable statistical indicators for quality professionals. They seem simple, yet often lead to confusion and debate.
CPK: Process Capability Index, reflects the capability of a process under controlled conditions, typically used to measure short-term process capability. PPK: Process Performance Index, reflects the actual performance of a process, typically used to measure long-term process capability. The calculation formulas for both are similar, but the estimation method for σ (standard deviation) differs: CPK: Uses within-subgroup standard deviation to estimate σ; the calculation method for within-subgroup standard deviation varies for different data types. PPK: Considers overall variation and uses overall standard deviation to estimate σ. CPK might overestimate process capability, while PPK is closer to the true capability.
CPK Calculation: Based on control charts (x̄-R chart or x̄-s chart), σ is calculated using the average range (R-bar) divided by d2, or the average sample standard deviation (S-bar) divided by c4. PPK Calculation: Includes all data within the control chart in the calculation, σ is calculated directly using the STDEV() function in Excel. Cpk reflects within-subgroup variation (short-term fluctuation), while Ppk includes both short-term within-subgroup variation and long-term between-subgroup variation, representing the overall quality indicator of the entire production process. In practical applications, some advocate using Ppk for control during new product trial production and switching to Cpk for control after mass production stabilizes. This is because the quality fluctuation is large during the trial production stage, and Cpk might not be effective for control; only Ppk can provide an understanding of the overall quality.
However, some people question the value of CPK and PPK. Some believe that Ppk has limited practicality because calculating overall quality means the product has already been produced, and it's impossible to prevent defective products in real-time. Moreover, the data might not come from actual measurements but rather be "fabricated." CPK and PPK seem to have become a "numbers game." Furthermore, there's also debate about whether CPK and PPK represent short-term or long-term capability. Some point out that short-term/long-term capability has nothing to do with CPK/PPK but is solely related to sampling. Short sampling time means short-term capability, and vice versa.
CPK and PPK, as important process capability indicators, play a significant role in quality management. However, we should also recognize their limitations and not blindly pursue indicators while neglecting the control and improvement of the actual process. Sampling plays a crucial role in quality management. The sampling method and sample size will both affect the assessment of process capability. Therefore, when using CPK and PPK, we need to pay attention to the rationality and representativeness of sampling.
CPK, PPK, and sampling are all very important tools in quality management. We need to deeply understand their connotations and limitations and apply them flexibly to truly realize their value and achieve effective quality control.
In Statistical Process Control (SPC), specification limits and control limits are two core concepts. Although they both play important roles in quality management and process monitoring, they differ in their definitions, sources, and applications. This article will explain these two concepts in detail and mainly discuss their potential to be one-sided or two-sided, as well as their impact on SPC metrics.
Specification limits are the acceptable range of product or process characteristics specified by customers, design specifications, or industry standards. They define the quality requirements that a product or service must meet. Specification limits typically include the Upper Specification Limit (USL) and Lower Specification Limit (LSL), used to assess whether the product meets the expected quality standards. If product characteristics exceed the specification limits, they are considered nonconforming.
Specification limits can be one-sided or two-sided:
Control limits are derived from the statistical analysis of process data and are used to monitor process stability. Control limits typically consist of the Upper Control Limit (UCL) and Lower Control Limit (LCL), reflecting the natural variation range of most data points in a process under normal conditions (normal distribution).
Control limits are usually two-sided because their primary function is to detect process stability and abnormalities. The calculation of control limits is typically based on the 6σ principle under a normal distribution, where data points within the 3σ range are considered normal process variations, and data points beyond this range are considered abnormal signals (under a normal distribution, the probability of exceeding the control limits is less than 1%).
Regardless of whether the specification limits are one-sided or two-sided, control limits are used to determine process stability and should include both upper and lower control limits to assess stability.
For example, even if the specification limit only requires the characteristic value to be greater than a minimum value (e.g., greater than 100), control limits will still be calculated based on the data (e.g., UCL = 200, LCL = 50):
The role of UCL and LCL: The Upper Control Limit (UCL) and Lower Control Limit (LCL) are used to detect abnormalities in the process. If process data points exceed these control limits, it indicates potential process anomalies that require further investigation. Even if a data point greater than 200 is acceptable based on the specification limit (as long as it is greater than 100), it may indicate process instability according to the control limit (UCL = 200), prompting further investigation into the cause of this anomaly.
Abnormality and Stability: Even if the process characteristics meet the specification limits, significant process variation (e.g., data points exceeding UCL or LCL) may indicate process instability. Control limits help identify this potential instability.
If the specification limit is one-sided (e.g., only LSL), control limits will still be two-sided. Exceeding the control limits will still trigger alarms, but data points exceeding LCL should receive particular attention, while data points exceeding UCL can be analyzed or ignored as needed.
When the specification limit is one-sided, it affects certain SPC metrics, especially those related to capability indices such as Ca, Ppk, and Cpk.
Ca (Capability Index) measures the degree of deviation between the process mean and the specification center. For one-sided specification limits, Ca cannot be calculated due to the lack of a reference center value.
Ppk and Cpk measure process performance and capability. For one-sided specification limits, Ppk and Cpk calculations only consider the direction of the existing specification limit. For example:
In these cases, one-sided specification limits only assess process capability in one direction, potentially leading to an incomplete evaluation of the process. Particularly with one-sided specification limits, it is essential to use two-sided control limits to monitor process stability comprehensively.
You may have noticed that when using Minitab to create Xbar-R and Xbar-S control charts, each is composed of a pair of charts:
This article provides the most detailed explanation available on the internet.
To clarify, the Xbar Control Charts in the Xbar-R and Xbar-S Control Charts are not the same. The use of Xbar-R Control Chart and Xbar-S Control Chart is conditional; we should not use Xbar-R and Xbar-S simultaneously on the same set of inspection data. Therefore, we do not need to worry about whether the Xbar Control Charts in Xbar-R and Xbar-S are the same, because we will not be using Xbar-R and Xbar-S at the same time.
we need to use the Xbar-R Control Chart. The control limits for the Xbar Control Chart and R Control Chart are calculated as follows:
we need to use the Xbar-S Control Chart. The control limits for the Xbar Control Chart and S Control Chart are calculated as follows:
The SPC constants such as A2, D4, A3, B4, etc., used in these formulas are as follows:
Let's say we have a batch of 3000 pieces, with each piece required to weigh 200g ± 5g. Based on previous experience, the pass rate is about 98%. Using this pass rate, we estimate that we need to sample 60 pieces. For this example, let's assume we sample 100 pieces.
We have sampled 100 pieces and measured their weights, resulting in 100 data points. How can we determine if the weights of this batch are consistent with our standard of 200g?
The data is as follows:
201.67, 202.33, 196.55, 197.94, 199.76, 195.77, 198.74, 199.81, 197.87, 198.49, 198.32, 199.14, 197.74, 200.36, 199.34, 197.67, 200.29, 200.98, 200.75, 202.73, 200.11, 201.47, 200.47, 201.23, 201.76, 204.01, 203, 200.3, 201.34, 197.02, 198.01, 196.63, 200.96, 201.84, 199.06, 201.19, 196.05, 198.24, 198.34, 201.16, 199, 199.12, 202.25, 200.77, 198.83, 201, 200.1, 199.7, 199.93, 199.86, 202.2, 198.8, 201.31, 200.96, 199.83, 202.44, 198.76, 197.26, 197.17, 201.26, 200.59, 197.6, 201.03, 203.05, 199.63, 197.48, 200.34, 200.42, 197.59, 198.16, 197.9, 198.05, 199.36, 202.68, 198.53, 201.11, 197.29, 200.38, 200.02, 201.64, 199.89, 199.5, 195.33, 203.19, 199.45, 199.66, 202.58, 201.08, 198.01, 199.08, 200.82, 197.92, 199.55, 198.81, 201.74, 201.54, 199.58, 198.09, 197.81, 201.56
If we use SPC (Statistical Process Control) analysis, we encounter the following issues:
SPC analysis is typically based on time order, and in this case, the sequence of samples affects the control chart analysis and judgment. Therefore, if we cannot determine the order of the sampled items, using SPC control charts might lead to inaccurate conclusions.
Instead of SPC, we should use a one-sample T-test. This test can determine if the sample mean is significantly different from the target value. The null hypothesis of the test is that the sample mean is equal to the target value (200g), and the alternative hypothesis is that the sample mean is not equal to the target value.
The target value of 200.0 is within the confidence interval (199.4078, 200.1328), indicating no significant difference between the sample mean and the target value of 200.0.
The sample mean does not significantly differ from the target value of 200.0, so we cannot reject the null hypothesis (the sample mean is consistent with the target value).
Suppose we want to test a sample mean , sample standard deviation ( s ), and sample size ( n ) against a target mean ( μ ) of 200g. The t-value can be calculated using the following formula:
Confidence Interval Formula:
where is the critical value from the t-distribution with n-1 degrees of freedom.
Using both the t-statistic/p-value and the confidence interval, we can determine if the sample weight is consistent with the target value of 200g.
SPC is not suitable for determining if a sample is consistent with the specification center; it is a tool for anomaly analysis. For determining if a sample is consistent with the specification center, the recommended method is the one-sample T-test.
When discussing quality anomaly analysis, almost everyone familiar with the field will mention SPC (Statistical Process Control) control charts. Indeed, SPC control charts are currently the most widely used tool for quality anomaly analysis (here, "widely used" refers to companies that already utilize quality analysis, though most manufacturing enterprises have not yet reached the stage of conducting quality anomaly analysis).
So, what other charts can we use for quality analysis related to SPC, besides the SPC control charts?
The chart below illustrates a 6sigma diagram; actual data rarely shows such an ideal state. When we look at control charts, they typically appear as shown below:
However, we often want to know the number or proportion of these points located in the upper and lower ABC zones. The horizontal axis divisions of a normal distribution histogram are often automatically generated, and not necessarily corresponding to each sigma. The following chart divides the upper and lower ABC zones into intervals for each sigma.
Often, the number of samples in each batch is not fixed (not sampled by a fixed subgroup size). For the obtained data, we use the average to plot the chart and display each batch's data points on the chart to intuitively observe the distribution of the points.
The pre-control chart is a newer, simpler, more user-friendly, and more economical quality control technique. Compared to traditional Shewhart control charts, it is statistically more powerful, allowing quicker identification and response to anomalies by setting predefined limits. It is suitable for real-time production line monitoring and small batch production environments.
The capability analysis chart is a comprehensive tool for process capability analysis. It includes a data distribution chart, overall and within-group distribution fitting curves, and indicators such as PPK, CPK, and Ca. These charts and indicators help us evaluate process performance and capability comprehensively, identify potential issues, and provide a basis for process improvement. Through these analyses, we can better understand process stability and consistency, thus more effectively controlling quality.
The capability comparison chart visually compares specification limits, within-group actual deviation (range), and overall actual deviation (range). It allows us to see:
This chart helps identify issues within the process, such as if within-group deviation is much smaller than the overall deviation, it may indicate significant differences between batches. Conversely, if within-group and overall deviations are close, it indicates a more consistent and stable process.
This chart shows the distribution of measurement values for each subgroup. Each subgroup's measurement data is represented by a box plot. The box plot displays the median, interquartile range, and potential outliers for each subgroup. Red dots indicate outliers within each subgroup, significantly deviating from other data points, suggesting further attention and analysis may be needed.
This chart helps us intuitively understand the central tendency and dispersion of each subgroup's data, identify potential outliers, and thus better conduct quality control and data analysis.
The normality test chart is a tool used to assess whether data follows a normal distribution. By plotting data points on a specific chart, such as a Q-Q plot (Quantile-Quantile Plot) or a P-P plot (Probability-Probability Plot), it compares the actual data distribution to the theoretical normal distribution. If the data points roughly follow a straight line, the data conforms to a normal distribution; significant deviations indicate that the data may not follow a normal distribution. This chart is intuitive and easy to interpret, commonly used in statistical analysis.
Machine learning for anomaly detection involves classifying data into two categories using regression-based binary classification methods, where the less frequent category is deemed anomalous. Algorithms such as linear regression, support vector machines (SVM), random forests, and K-means can be used.
However, machine learning for anomaly detection has a critical weakness: the judgment process is often a black box, making it difficult to understand. Unlike SPC's standard eight anomaly detection rules, we cannot clearly know why the machine learning algorithm determines a point as anomalous, making it challenging to find the underlying cause.
Among the analysis charts and tools mentioned above, there are always some that suit your needs.