Statistica Quality Control

Statistica Quality Control combines functionality from several different areas and includes all of the features included in the Statistica Base package. Statistica Quality Control includes tools for quality control charts, process analysis, design of experiments, and power analysis and interval estimation. Summaries of these areas and be found below and in depth descriptions of all the included modules can be found on the modules tab.

Statistica Quality Control Charts features a wide selection of quality control analysis techniques with presentation-quality charts of unmatched versatility and comprehensiveness. It is uniquely ideal for both automated shop-floor quality control systems of all types and levels of complexity (see also Statistica Enterprise Server) as well as sophisticated analytic and quality improvement research. A selection of automation options and user-interface shortcuts simplify routine work and practically all of the numerous graph layout options and specifications can be permanently modified (saved as system default settings or as reusable templates). Finally, Statistica Quality Control Charts includes powerful and easy to use facilities to custom design entirely new analytic procedures and add them permanently to the application, and those options are particularly useful when quality control analyses need to be integrated into existing data collection/monitoring systems.

It features the following features:

  • Standard quality control charts
  • Multivariate charts
  • Interactive, analytic brushing and labeling of points
  • Assigning causes and actions
  • Flexible, customizable alarm notification system
  • Supervisor and operator mode; password protection
  • Organization of data
  • Short run charts
  • Chart options and statistics
  • Non-normal control limits and process capability and performance indices
  • Other plots and Spreadsheets
  • Real-time QC systems; external data sources

Statistica Process Analysis is a comprehensive implementation of Process Capability analysis, Gage Repeatability and Reproducibility analysis, Weibull analysis, sampling plans, and variance components for random effects:

  • Process Capability Analysis
  • Capability Ratios for True Position
  • Designs for Gage Repeatability/Reproducibility (R&R) Analyses
  • Attribute Data
  • Weibull Analysis
  • Sampling Plans

Read more about Statistica solutions for Industrial Solutions and Six Sigma.

Statistica is the best software on market for DOE by far. I tried them all and Statistica is truly outstanding.

Fabien Marino, Integratived Proteomics, Inc.

Statistica Design of Experiments offers one of the most comprehensive selection of procedures to design and analyze the experimental designs used in industrial (quality) research.

Technical notes:

  • General features
  • Residual analyses and transformations
  • Optimization of single or multiple response variables

Types of designs:

  • Standard two-level 2**(k-p) fractional factorial designs with blocks
  • Minimum aberration and maximum unconfounding 2**(k-p) fractional factorial designs with blocks
  • Screening (Plackett-Burman) designs
  • Mixed-level factorial designs
  • Three-level 3**(k-p) fractional factorial designs with blocks and Box-Behnken designs
  • Central composite (response surface) designs
  • Latin squares
  • Taguchi robust design experiments
  • Designs for mixtures and triangular graphs
  • Designs for constrained surfaces and mixtures
  • D- and A-optimal designs
  • D-optimal split plot design
  • D-optimal split plot analysis

Alternative procedures:

  • Designs can also be analyzed via alternative modules such as General Linear Models, General Regression Models, or Generalized Linear/Nonlinear Models.

Using Statistica Power Analysis and Interval Estimation in planning and analyzing your research, you can always be confident that you are using your resources most efficiently. Nothing is more disappointing than realizing that your research findings lack precision because your sample size was too small. On the other hand, using a sample size that is too large could be a significant waste of time and resources.

Statistica Power Analysis and Interval Estimation will help you find the ideal sample size and enrich your research with a variety of tools for estimating confidence intervals.

Read more about Statistica Power Analysis and Interval Estimation:

  • Advantages
  • Power Calculation
  • Sample Size Calculation
  • Interval Estimation
  • Probability Distributions
  • List of Tests
  • Example Application

Statistica Quality Control Charts

Standard Quality Control Charts

Quality Control ChartingThe program offers flexible implementations of Pareto charts, X-bar charts, R charts, S charts, S-squared (variance) charts, C charts, Np charts (binomial counts), P charts (binomial proportions), U charts, CuSum (cumulative sum) charts, moving range charts, runs charts (for individual observations), regression control charts, MA charts (moving average), and EWMA charts (exponentially-weighted moving average). These charts may be based on user-specified values or on parameters (e.g., means, ranges, proportions, etc.) computed from the data. Most of the variable control charts can be constructed from single observations (e.g., moving range chart) as well as from samples of multiple observations. Control limits can be specified in terms of multiples of sigma (e.g., 3 * sigma), in terms of normal or non-normal (Johnson-curves) probabilities (e.g., p=.01, .99), or as constant values. For unequal sample sizes, control charts can be computed with variable control limits or based on standardized values. For most charts, multiple sets of specifications can be used in the same chart (e.g., control limits for all new samples can be computed based on a subset of previous samples, etc.). Runs tests, such as the Western Electric Run Rules, are easily integrated into the QC chart. As with all Statistica graphs, QC charts in Statistica Quality Control Charts are highly customizable; you can add titles, comments, draw lines or mark regions dynamically anchored to specific scale values, or label the samples with dates, ID codes, etc.

Multivariate charts

In addition to the univariate (standard Shewart) control charts, Statistica extends the control charting options with multivariate charts. These multivariate charts are useful for tracking large numbers of parameters (variables) in a single chart. The capability exists to "intelligently" monitor literally hundreds of processes simultaneously. Available charts include:

  • Hotelling T2 chart for individual observations and sample means
  • Multivariate Exponentially Weighted Moving Average charts (MEWMA) for observations and sample means
  • Multivariate Cumulative Sum Charts (MCUSUM) for observations
  • Multiple Stream X-Bar and R charts, MR charts, and S charts for observations and sample means
  • Generalized Variance Charts

Similar to the standard charts, many of the same tools exist for their multivariate counterparts.

Interactive, analytic brushing and labeling of points

General "intelligent" and comprehensive analytic brushing facilities are available for interactive removal or labeling of outliers (or what-if analyses) in individual charts or sets of charts. The user can select individual samples or groups of samples based on currently specified chart criteria (control limits, runs rules), and exclude them from the computations for the chart (but still show them in the chart), or drop them from the chart altogether. Multiple charts can be set up to use the same sample inclusion/exclusion criteria; in this manner several charts can be simultaneously brushed (e.g., a point excluded from the X-bar and R chart will simultaneously be excluded from all histograms). The user can also request to plot all individual observations for selected or for all samples.

Assigning causes and actions

The user can assign causes, actions, and/or comments to outliers or any other points in most charts. Labels for causes and actions can be assigned via interactive brushing, or the program can detect and select out-of-control samples.

Flexible, customizable alarm notification system

A comprehensive selection of options are provided for specifying user-defined criteria that define an out-of-control condition or "noteworthy event" (e.g., runs test violation, individual observation outside specification limits, etc.). The alarm notification system can be customized to trigger various types of "responses" to a particular event. For example, you can set up a system to respond to an out-of-control sample. Statistica Quality Control Charts will automatically prompt the operator to enter a cause, then launch a Statistica Visual Basic program to compute various other statistics or invoke an external program, and then run another external program to (for example) call a particular pager number or send an e-mail to the supervising engineers. The alarm notifications setup can be saved in a configuration file (that can be applied to future charts), or used as the default for all future charts.

Supervisor and operator mode; password protection

The chart-editing features for shop-floor control charts (including the assignment of causes, actions, brushing, alarm notification, etc.), chart specifications, as well as the input data file can be password-protected, to create a customized operator mode with only limited access to the charts or data. The charts can be saved (e.g., by the supervising engineer), and loaded by the operator in this limited-access operator mode.

Organization of data

For most charts, the data can be organized to accommodate practically all formats in which data are gathered for quality control applications. Samples can be identified by sample identifiers or code numbers, or you can specify a fixed number of measurements per sample (and part, see below).

Short run control charts

Most standard variable control charts (X-bar, R, S, S-squared, MA, EWMA) and attribute control charts (C, U, P, Np) can be used for short production runs (short run charts for multiple parts or machines). For short run variable control charts, you can specify nominal target values only (nominal chart or target chart), or target values and variability values for standardized short run charts. Options are provided for sorting sample points in the respective charts and for plotting them by sample number, by part, or in the order in which the respective samples were taken. Detailed statistics are computed by parts and samples. The respective sample and part identifiers for each measurement can be specified in the data file, and/or you can choose to assign a fixed number of consecutive cases to consecutive samples and/or parts. Note that all chart options and statistics (e.g., process capability and performance indices, runs rules, etc.) commonly reported for standard charts are also available for short run charts.

Chart options and statistics

A wide variety of additional quality control statistics are included. The user can compute the process capability and performance indices (e.g., normal distribution Cpk, Ppk, etc., non-normal distribution Cpk, Ppk, etc.), include histograms of the respective quality characteristics, or automatically perform any or all of seven different runs tests (runs rules). The standard variable control charts can be produced as compound graphic displays; for example, the X-bar and the R (or S, or S-squared) chart will be displayed together with optional corresponding histograms for the respective means, ranges, proportions, etc. also shown in the same chart. Outliers (samples outside the control limits) or sections of data identified via runs tests are automatically highlighted (marked) in the plots. The user can also add to the plot warning lines, moving average or exponentially-weighted moving average lines, or lines indicating specification ranges.

Non-normal control limits and process capability and performance indices

For variable control charts, in addition to the customary normal distribution based charts and statistics, the program will also compute charts for measurements that are not normally distributed (e.g., are highly skewed). These options are particularly important for situations where the sample sizes are small and where deviations from normality may lead to greatly inflated or deflated error rates if the customary normal distribution based statistics were used. The program will compute control limits based on the Johnson curves fit to the first four moments of the observed data; user-specified values for the moments can also be supplied. Process capability indices can be computed based on the fitting of Johnson curves as well as Pearson curves. Note that capability indices based on specific distributions can also be computed in Statistica Process Analysis.

Other plots and Spreadsheets

For most charts (including the R-chart), the user may compute and plot the respective operating characteristic curve (OC curve). In addition to the charts, the respective values (plotted in the charts) can also be reviewed via Spreadsheets, allowing the user to examine the precise values of plotted lines and points. Customized (blank) charts can be printed that can later be "filled in" by hand by the quality control engineer. Note that as with all other graphs in Statistica, the graphs produced by Statistica Quality Control Charts can be extensively customized and saved for further analysis and/or customization.

Real-time QC systems; external data sources

Most graphs and charts in Statistica Quality Control Charts can be automatically linked to the data, and updated when the data are updated. To facilitate data transfers powerful (optional) Statistica applications are available (Statistica Enterprise/QC and Statistica Enterprise).

Statistica Enterprise is a groupware version of Statistica fully integrated with a powerful central data warehouse that provides an efficient general interface to enterprise-wide repositories of data and a means for collaborative work (extensive groupware functionality).

Statistica Enterprise/QC. Statistica Enterprise/QC is an integrated multi-user software package that provides complete Statistical process control (SPC) functionality for enterprise installations. Statistica Enterprise/QC includes a central database, provides all tools necessary to process and manage data from multiple channels, and coordinate the work of multiple operators, QC engineers, and supervisors.

Statistica Enterprise/QC and Statistica Enterprise provide very flexible facilities to integrate the procedures in Statistica Quality Control Charts into your enterprise-wide database, and to design elaborate company-wide quality monitoring systems.

Statistica Process Analysis

Statistica Process Analysis is a comprehensive implementation of Process Capability analysis, Gage Repeatability and Reproducibility analysis, Weibull analysis, sampling plans, and variance components for random effects.

 Process Capability Analysis

Process capability analysisStatistica Process Analysis includes a comprehensive selection of options for computing process capability indices for grouped and ungrouped data (e.g., Cp, Cr, Cpk, Cpl, Cpu, K, Cpm, Pp, Pr, Ppk, Ppl, Ppu), normal/distribution-free tolerance limits, and corresponding process capability plots (histogram with process ranges, specification limits, normal curve). In addition, instead of these normal distribution indices and statistics, you can choose estimates (e.g., Cpk, Cpl, Cpu based on the percentile method) based on general non-normal distributions (Johnson and Pearson curve fitting by moments), as well as all other common continuous distributions including the Beta, Exponential, Extreme Value (Type I, Gumbel), Gamma, Log-Normal, Rayleigh, and Weibull distributions.

Statistica will compute maximum-likelihood parameter estimates for those distributions, and it provides numerous options for evaluating the fit of the respective distribution to the data, including the frequency distribution with observed and expected frequencies, the Kolmogorov-Smirnov d statistic, histograms, Probability-Probability (P-P) plots, and Quantile-Quantile (Q-Q) plots. Options are also available for automatically fitting all distributions and choosing the distribution that best fits the data.

Statistica additionally offers process capability indices consistent and in compliance with DIN (Deutsche Industrie Norm) 55319 and ISO 21747. 

Capability Ratios for True Position

Capability Ratios for True Position

Some manufacturing processes, and the allowable tolerances that define acceptable quality, can best be summarized by the metaphor of "hitting a target." For example, when drilling holes at specific locations, the quality requirement is best defined by circles around the desired locations; every time that a hole is drilled outside the acceptable quality (circle), the respective part is rejected.

Tolerances (specification limits) defined as a circle in the two-dimensional plane are also called positional tolerances. For such processes, the standard capability values (ratios) are not appropriate, because while the process may be within-specs on each individual dimension, the respective point (in the two-dimensional plane) may be unacceptably far away from the desired goal (point).

For example, in the illustration shown, there are two points that, when considering the +/-1 USL/LSL specification limits for each dimension separately, would not be out of specs. However, if the specs are defined as a circle around the origin {0,0}, with a radius of 1, then these two points would be rejected.

Designs for Gage Repeatability/Reproducibility (R&R) Analyses

Gage R&RRepeatability/reproducibility experiments with single or multiple trials can be generated and analyzed. The data for the R&R analysis can be arranged in raw-data format or tabulated in a standard R&R data sheet format (as used in many publications of the American Society for Quality and manuals of the Automotive Industry Action Group). Results include estimates of the components of variance (repeatability or equipment variation, operator or appraiser variation, part variation, operator-by-part variation, operators-by-trials, parts-by-trials, operators-by-parts-by-trials), which can be computed based on the range method or the ANOVA table. If based on the ANOVA table, confidence intervals for the variance components will also be estimated. Additional statistics for the variance components include the percent-of-tolerance, process variation, and total variation. Statistica will also compute descriptive statistics by operator/part, range and sigma charts by operators/parts, box-and-whisker plots, and the summary R&R plot. Comprehensive selections of methods for estimating variance components for random effects are also available in the designated Statistica Variance Components module, and the General Linear Models module available in Statistica Advanced.

Attribute Data

Attribute gage studies are conducted in order to assess the amount of bias and repeatability in a gage when the response is a binary (e.g., accept or reject) attribute variable. In Statistica, two methods for testing bias are available: the AIAG method and the Regression method.

In some situations, physical measurements made on certain aspects of quality are difficult or impossible to obtain and reliance upon subjective ratings must be employed. In order for the subjective measurements to be considered meaningful, there should be agreement among multiple appraisers. Rating usefulness is obtained if the appraisers agree. Use Statistica’s Attribute Agreement Analysis to measure the agreement of ratings given by multiple appraisers.

The MSA method for attribute data is a straightforward method that can be used to assess the accuracy of appraisers and the types of mistakes they are likely to make. Typically, samples of parts are appraised by operators as good or bad. These classifications are then compared with a reference or standard.

Weibull Analysis

Weibull analysisThe Weibull analysis options provide powerful graphical techniques for exploiting the power and generalizability of the Weibull distribution. You can produce Weibull probability plots and estimate the parameters of the distribution, along with confidence intervals for reliability. Probability plots can be computed for complete, single-censored, and multiple-censored data, and parameters can be estimated from hazard plots of failure orders. Estimation methods include Maximum Likelihood (for complete and censored data), weighting factors based on linear estimation techniques for complete and single-censored data, and Modified Moment Estimators, which are unbiased with respect to both the mean and variance. Confidence intervals are computed for the shape, scale, and location parameters, as well as for the percentiles. Statistica includes graphical goodness of fit tests, and the Hollander-Proschan, Mann-Scheuer-Fertig, and Anderson-Darling tests of goodness of fit. Note that the Generalized Linear Models module of Statistica Advanced Linear/Nonlinear Models provides options for fitting generalized linear models from the exponential family of distributions to normal and non-normal data.


Sampling Plans

Sampling plansFixed and sequential sampling plans can be generated for normal and binomial means or Poisson frequencies. Results include the sample sizes, operating characteristic (OC) curves, plots of the sequential plans with or without data, expected (H0/H1) run lengths, etc. Note that Statistica Power Analysis also provides options for computing required sample sizes and power estimates for a large number of research designs (e.g, ANOVA) and data types (e.g., for binary counts, censored failure time data, etc.).

Statistica Design of Experiments

General Features

The options for analyzing all factorial, response surface, and mixture designs are general in nature, can handle unbalanced and incomplete designs, and give the user a choice of models to be fitted to the data. The program will compute the generalized inverse of the X'X matrix (where X stands for the design matrix) to determine the estimable effects, and the effects that are aliases of other effects. The program will then automatically report the table of aliases and compute the parameter estimates for all non-redundant effects. You can manually "toggle" specific effects in and out of the current model quickly and easily, and observe the effect on the overall fit. All analyses can be performed in terms of recoded factor values or the original factor values, and a large number of output options are provided to review the parameter estimates, analysis of variance table, etc. Numerous additional options are provided for exploring the predicted (fitted) means, surfaces, etc.; these options will be further described in the context of the respective designs below.

Residual analyses and transformations

A large number of graphs and other output options are provided for further analyses of residuals from a given model. Specifically, the program will compute predicted (fitted) and residual values and their standard errors, user-defined prediction intervals and confidence intervals for the predicted (fitted) values, standardized predicted and residual values, studentized residuals, deleted residuals, studentized deleted residuals, leverage scores, Mahalanobis and Cook distances, and DFFIT and standardized DFFIT values. All of these residual statistics can be saved for further analysis using other Statistica modules (e.g., in order to analyze serial correlations of errors via the Time Series module). These residual statistics for each observation can be reviewed in the order of the observation (case) numbers, or displayed in the order sorted by their magnitudes; thus, outliers with respect to any of the residual statistics can quickly be identified. As further aids for evaluating the fit of the respective model, and for identifying outliers, you can review histograms of residual (and deleted residual) and predicted values, scatterplots of (deleted) residual versus predicted values, or normal, half-normal, and de-trended normal probability plots of (deleted) residuals. As a check for serial correlation of residuals, you can plot the (deleted) residual values against the case numbers. In all plots of individual observations (e.g., residual values for cases), the points are identified by their respective case numbers or labels, and therefore, it is very easy to identify outliers in a dataset. Finally, maximum-likelihood lambda values can be computed for the Box-Cox transformation of the response variables; a plot of the residual sums of squares versus lambda, along with the confidence limit of lambda, accompanies the results in the Box-Cox transformation plot.

Optimization of single or multiple response variables: The response (desirability) profiler

A unique set of options is provided to allow the user to interactively optimize single or multiple response variables, given the current model. First, for second-order response surface models and mixture surface models, the program will compute the factor settings associated with the minimum, maximum, or saddle point value of the respective surface (i.e., determine the critical value of the current surface, along with the respective eigenvalues and eigenvectors, to indicate the curvature and orientation of the quadratic response surface). Note that for mixture designs, the desirability profiler options are not based on a simple reparameterization of the mixture model to an unconstrained surface model (which can lead to erroneous results, such as optimum factor settings that are not valid mixtures); instead all computations will be performed based on the actual (currently fitted constrained) mixture model. Thus, when searching for the optimum factor settings given the desirability function for one or more response variables, it is assured that only the constrained (mixture) experimental region is inspected, and that the resulting factor settings sum to a valid mixture. Second, a comprehensive set of graphical options is provided for visualizing the predicted values of one or more response variables as a function of each factor in the analysis, while holding all other factors constant at particular values. Specifically, for multiple response variables you can specify a desirability function that reflects the most desirable value for each response variable, and the importance of each variable for the overall desirability. Then you can plot the profiles of the desirability function (computed from the predicted values of each response variable) across a user-defined number of levels for each factor. The profiles for each individual response variable, along with confidence intervals, can be displayed in the same graph.

Moreover, the desirability function can be plotted in 3D surface plots or contour plots (desirability contours), and the user can request matrices of such plots for all factors in the analysis (see the illustration at left). All settings, such as the factor grid or the desirability function, can quickly be modified for interactive analyses (e.g., you can quickly exclude specific response variables from the analysis, and observe the effect on the overall desirability function). The specifications for complex desirability functions for many response variables can be saved to a file, and later quickly retrieved when you want to analyze other experiments using the same response variables. Finally, options are provided for determining the optimum value of the desirability function, either by using a grid search over the experimental region, or by using an efficient general function optimization algorithm (which is particularly useful for optimizing desirability functions for experiments with many factors). Note that desirability profiling options are provided in Statistica General Linear Models (GLM), General Regression Models (GRM), and General Discriminant Analysis Models (GDA) (for categorical responses).

Standard two-level 2**(k-p) fractional factorial designs with blocks (Box-Hunter-Hunter minimum aberration designs)

Statistica Design of Experiments provides the complete catalog of all standard (so-called, minimum aberration) designs (as, for example, reproduced in the widely used textbooks by Box and Draper, 1987; Box, Hunter, and Hunter, 1978; Montgomery, 1991). The user can review designs in a Spreadsheet; the runs may be randomized (overall or within blocks), and blank columns may be added to the Spreadsheet. Options are provided for specifying the factor highs and lows, and the design can be reviewed and saved in terms of the coded factor levels or the original metric of factors. The user can request replications, add center points to the design, or add a fold-over of the original design. The fractional design generators and block generators of the design, as well as the matrix of aliases of main effects and interactions can be reviewed.

Statistica Design of Experiments will automatically perform a complete ANOVA on the design. The user has full control over the effects and interactions to be included in the model, and can review the correlations among the columns of the design matrix (X) as well as the inverse of the X'X matrix (i.e., the covariance and correlation matrices of the parameter estimates). The program will compute the ANOVA parameter estimates and their standard errors and confidence intervals, the coefficients for the recoded (-1, +1 factor values and their standard errors and confidence intervals, and the coefficients (standard errors, confidence intervals) for the untransformed factor values. Based on those estimates, the program can compute predicted values (standard errors, confidence intervals) for user-specified factor levels.

The program will compute the complete ANOVA table, based on the mean-square (ms) residual term, or, when the design is at least partially replicated, based on the estimate of pure error. When a pure error estimate is available, the program will compute a test for overall lack-of-fit; when the design contains center points, the program will perform an overall curvature check. The user can review the table of means and marginal means, and their confidence intervals. Numerous options are available for reviewing the results in graphs: Pareto charts of effects, normal and half-normal probability plots of effects, square and cube plots, means plots and interaction plots (with confidence intervals for marginal means), response surface plots, and response contour plots. In addition, the features described above (Residual analyses and transformations, and Optimization of single or multiple response variables) are available for performing detailed analyses of residuals, to evaluate the fit of the model, and for finding the optimum factors settings, given one or more response variables.

Minimum aberration and maximum unconfounding 2**(k-p) fractional factorial designs with blocks: General design search

In addition to the standard 2**(k-p) designs, Statistica Design of Experiments includes a highly efficient search option for generating minimum aberration (least confounded) fractional factorial designs with or without blocks with over 100 factors and over 2,000 runs. These efficient designs allow you to evaluate a greater number of (specific) factor interactions than the standard Box-Hunter designs.

Statistica Design of Experiments is the only program that currently offers this functionality. Given a desired resolution, you can either perform a comprehensive search of all (non-isomorphic) sets of generators, or specify particular sets of interactions that you would like to keep unconfounded at the respective resolution. In addition to the common search criterion of "minimum aberration," you can choose the criterion of "maximum unconfounding" which will lead to the design with the largest possible number of unconfounded effects (unconfounded with all other effects, given the current resolution of the design). These designs can be further enhanced in the same manner as the standard 2**(k-p) designs described in the previous paragraph (by adding replications, center points, foldover, etc.). All analysis options described in the previous paragraph are applicable to these designs (or any arbitrary 2**(k-p) design).

For more information read a white paper entitled Minimum Aberration Designs Are Not Maximally Unconfounded.

Screening (Plackett-Burman) designs

Statistica Design of Experiments allows the user to design and analyze screening designs for over 100 factors. The program will generate Plackett-Burman (Hadamard matrix) designs and saturated fractional factorial designs with up to 127 factors. As with 2**(k-p) designs, the user can request replications of the design, manually add points, add center points, and print or save the design. For the analysis of screening designs, the same options are available as those described for the analysis of 2**(k-p) designs (see the previous paragraphs).


Mixed-level factorial designs

The program supports mixed designs (as enumerated for the National Bureau of Standards of the U.S. Department of Commerce). The design and analysis options available for those designs are identical to those described for 3**(k-p) designs (see the previous paragraph).

Three-level 3**(k-p) fractional factorial designs with blocks and Box-Behnken designs

Statistica Design of Experiments contains a complete implementation of the standard (blocked) 3**(k-p) designs. Also included are the standard Box-Behnken designs. As with all other designs, the user can display and save those designs in standard or randomized order, request replications or add individual runs, review the design and block generators, etc. The program will perform a complete analysis for 3**(k-p) designs. The user has full control over the effects that are to be included in the analysis. The main effects are broken down into linear and quadratic effects, and the interactions are broken down into linear-linear, linear-quadratic, quadratic-linear, and quadratic-quadratic effects. The user can review the correlation matrix of the design matrix (X) as well as the inverse of X'X. The program will compute the standard ANOVA parameter estimates (standard errors, confidence intervals, Statistical significance, etc.), coefficients for the recoded (-1, 0, +1) factors, and coefficients for the unrecoded factors. Based on those values, the program provides options for computing predicted values (and standard errors, confidence intervals) based on user-specified values of the factors. The ANOVA table will include tests for the linear and quadratic components of each effect as well as combined multiple-degree-of-freedom tests for the effects. If the design includes replications, then the estimate of pure error can be used for the ANOVA and significance testing. In that case, an overall lack-of-fit test will be performed.

To aid in the interpretation of results, the program will compute the table of means (and confidence intervals) as well as marginal means (and confidence intervals) for interactions. Graphical options include plots of means and marginal means (with confidence intervals), the Pareto chart of effects, normal and half-normal probability plots of effects, and response surface and contour plots. In addition, the features described above (Residual analyses and transformations, and Optimization of single or multiple response variables) are available, for performing detailed analyses of residuals, to evaluate the fit of the model, and for finding the optimum factors settings, given one or more response variables.

Central composite (response surface) designs

The user can choose from a catalog of standard designs, including small central composite designs (based on Plackett-Burman designs). In addition to the standard options available for all designs (adding runs, randomization, replications, factor highs and lows, etc.; refer to the description of 2**(k-p) designs) the user has the choice of star-points that are face-centered, or computed for rotatability, orthogonality, or both. The analysis options are very similar to those described for 3**(k-p) and 2**(k-p) designs above. The user can compute the ANOVA parameters, coefficients for the recoded factor values, and the coefficients for the untransformed factors. Predicted values for user-specified factor values can be computed. The user has full control over the effects to be included in the model, and can review the correlation matrix for the design matrix (X) as well as the inverse of X'X. If replicates are available, the ANOVA table may include the estimate of pure error, and an overall lack-of-fit test. The standard results graphics options include the Pareto chart of effects, probability plot of effects, and response surface and contour plots (if there are more than two factors, for user-specified values of additional factors). In addition, the features described above (Residual analyses and transformations, and Optimization of single or multiple response variables) are available, for performing detailed analyses of residuals, to evaluate the fit of the model, and for finding the optimum factors settings, given one or more response variables.

Latin squares

The user can choose between different Latin square designs, with up to nine levels. Whenever possible, the program will make available Greco-Latin squares and Hyper-Greco Latin squares. When there are several alternative Latin squares available, the program will either choose randomly from among them, or the user can make a selection. Designs can be reviewed in a Spreadsheet, randomized order, and blank columns may be added to create convenient data entry forms. The design can be saved in a standard Statistica data file. After appending the observed data to this file, the experiment can then be easily analyzed. In addition to the full ANOVA table, Statistica Design of Experiments will compute the means for all factors. These means can be plotted in a summary plot.

Taguchi robust design experiments

Statistica Design of Experiments will generate orthogonal arrays for up to 31 factors; designs with up to 65 factors can be analyzed. As in all other types of designs, the runs of the experiment can be randomized, and the user can add blank columns to the Spreadsheet to generate convenient data entry forms. The user can examine the aliases of two-way interactions.

Statistica Design of Experiments will automatically compute the standard signal-to-noise (S/N) ratios for problems of these types: (1) Smaller-the-better, (2) Nominal-the-best, (3) Larger-the-better, (4) Signed target, (5) Fraction defective, and (6) Number defective per interval (accumulation analysis). In addition, untransformed data can be analyzed; thus, the user can produce any type of customized S/N ratios via Statistica Visual Basic and analyze them with this procedure. In addition to comprehensive descriptive statistics, the user can review the computed S/N ratios. The full ANOVA results are displayed in an interactive Spreadsheet in which the user can "toggle" effects into or out of the error term. A similar interactive Spreadsheet allows the user to predict Eta (the S/N ratio) under optimum conditions, that is, settings of factor levels. Again, the user can "toggle" effects into or out of the model, and specify particular levels for factors. Finally, the means can be summarized in a standard main effect plot of Eta by factor level; if an accumulation analysis on categorical data is performed, the results can be summarized in a stacked bar plot as well as line plots of the cumulative probabilities across categories for the levels of selected factors. Note that different types of response desirability functions for single or multiple variables can be optimized via the response (desirability) profiler described earlier, available in conjunction with 2**(k-p), 3**(k-p), central composite designs, etc. (or in GLM, GRM, GDA).

Designs for mixtures and triangular graphs

This procedure includes options for designing the simplex-lattice and simplex-centroid designs for mixture variables. These designs can be enhanced by additional interior points and a centroid. The user can enter lower-bound constraints for each factor, and the program will automatically construct the respective design in the sub-simplex defined by the constraints. Multiple upper and lower constraints can be handled via the general facilities for constructing designs in constrained experimental regions (see below). The user can add individual runs or replications, and display and save the design in standard or randomized order. The program will compute the coefficients for the pseudo-components and the components in their original metric, along with the standard errors, confidence intervals, and tests of Statistical significance. (Note that the Statistica General Linear Models (GLM) module includes facilities for analyzing mixture experiments; those options are particularly useful for analyzing designs that combine both mixture and non-mixture variables in complex designs.) The user has full control over the terms that are to be included in the model; standard models include the linear, quadratic, special cubic, and full cubic models. The ANOVA table will include tests for the incremental fit of the different models, and if the design includes replicated runs, a test for lack-of-fit based on the estimate of pure error will be computed. Results options include the table of means, the correlations for the columns of the design matrix (X), the inverse of the design matrix X'X (the variance/covariance matrix for the parameter estimates), the Pareto chart, probability plots of parameter estimates, etc. The user can compute predicted values, based on user-defined values of the factors. Specialized graphs to summarize the results of mixture experiments include response trace plots for user-defined reference blends, and triangular surface and contour plots. If there are more than 3 components in the experiment, then the surface and contour plots can be produced for user-defined values of the additional components. Finally, all general features described above (under the headings Design of experiments, Analysis of experiments: General features, Residual analyses and transformations, and Optimization of single or multiple response variables) are available, for performing detailed analyses of residuals, to evaluate the fit of the model, and for finding the optimum factors settings, given one or more response variables. Note that the response (desirability) profiler options available for mixture designs are not based on a simple reparameterization of the mixture model to an unconstrained surface model; instead all computations will be performed based on the actual (fitted) mixture model. Thus, when searching for the optimum factor settings given the desirability function for one or more response variables, it is assured that only the constrained (mixture) experimental region is inspected, and that the resulting factor settings sum to a valid mixture.

Designs for constrained surfaces and mixtures

Statistica Design of Experiments contains procedures for computing vertex and centroid points for constrained surfaces and mixtures defined by linear constraints. The user can enter upper and lower limits for the factors, and specify any additional linear constraints (of the form A1*x1+ … + An*xn+ A0>= 0) on the factor values. The program will then compute the vertex points, and optional centroid points, for the constrained region. The constraints will be processed sequentially, and unnecessary constraints will be identified. There are numerous additional options for reviewing the characteristics of the constrained region. The user can review the vertex and centroid points in 3D and triangular scatterplots (for mixtures). The correlation matrix for the columns of the design matrix X, for various standard types of designs, can be computed as well as the inverse of the X'X matrix (i.e., the variance/covariance matrix of the parameter estimates). This allows the user to evaluate the design characteristics, based on the vertex and centroid points. These points can then be submitted to the optimal design facilities (see below), to construct designs with the minimum number of runs.

D- and A-optimal designs

The program includes several algorithms for constructing optimal designs. The user can choose between the D (determinant) optimality and the A (or trace) optimality criterion, and specify models for surfaces and mixtures. A list of candidate points for the design can be entered by hand or retrieved from a Statistica data file (e.g., a design previously created via the facilities for computing vertex and centroid points for constrained surfaces and mixtures, see above). Points in the candidate list can be marked for forced inclusion in the final design, thus, the user can enhance or "repair" existing experiments. The program includes all common search algorithms developed for constructing D- and A- optimal designs: Dykstra's sequential search procedure, the Wynn-Mitchell simple exchange procedure, the Mitchell DETMAX procedure (exchange with excursions), Fedorov's simultaneous switching procedure, and a modified simultaneous switching procedure. For the final design, the program will compute the determinant of X'X and the D, A, and G efficiencies. The user can review the correlation matrix for the columns of the final design matrix (X), and the inverse of the X'X matrix (the variance covariance matrix of parameter estimates). The final design points can be visualized in 3D and triangular scatterplots (for mixtures).

D-optimal split plot design

Statistica can generate split plot designs for multiple easy and hard to change factors and covariates. This flexible design generation is based upon minimizing the volume of the joint confidence region of the parameter estimates. Options for generating design syntax for subsequent General Linear Models (GLM) analyses as well as options for saving Variance Estimation and Precision designs to the design spreadsheets make it easy to analyze these designs once the experiment is performed and data is collected.

D-optimal split plot analysis

By default, Statistica will analyze the split plot design using the Variance Estimation and Precision module. The Variance Estimation and Precision module is a powerful analytic tool that allows you to analyze the split plot design in the presence of both the whole plot and sub plot error. If the Variance Estimation and Precision module is not available, then Statistica will analyze this design using the General Linear Models (GLM) module.

Alternative procedures

Statistica has a large number of computational methods for analyzing data collected in experiments and for fitting ANOVA/ANCOVA-like designs to continuous or categorical outcome variables.

Alternative produres are available in the following Statistica Advanced product bundle:

Statistica Advanced Linear/Non-Linear Models

  • General Linear Models (GLM) and General Regression Models (GRM): Sophisticated model-building procedures (stepwise and best-subset selection of predictor effects).
  • Generalized Linear Models (GLZ): Stepwise and best-subset selection of predictor effects in ANOVA/ANCOVA-like designs for various popular alternatives to linear least squares models such as logit, multinomial logit, and probit models.

Statistica Multivariate Exploratory Techniques

  • General Discriminant Analysis Models (GDA): ANOVA/ANCOVA-like experimental designs for classification, and stepwise and best-subset selection of predictor effects. GDA includes desirability profiler and response optimization methods, which can be used to determine the factor combinations, levels, and/or values that maximize the posterior classification probabilities for one or more categories of the dependent variable.

And also in the Statistica Data Miner package:

Statistica Data Miner

  • General Classification and Regression Trees Models and General CHAID models: ANOVA/ANCOVA-like experimental designs for building highly non-linear hierarchical classification or regression trees.

Thus, Statistica can be applied to quality-improvement research in creative and innovative ways — when the dependent variables of interest are categorical in nature, or when the effect of the predictor variables is non-linear in nature.



Statistica Power Analysis and Interval Estimation


Some of the advantages of Statistica Power Analysis and Interval Estimation are:

  • Precise and fast computational routines, which maintain their accuracy across a broad range of parameters
  • Presentation-quality, automatically-scaled graphs of power vs. sample size, power vs. effect size, and power vs. alpha
  • Protocol statements describing calculations in a form that can be transferred directly to a text document
Power Calculation

Power Calculation allows you to calculate Statistical power for a given analysis type (see List of Tests below), and to produce graphs of power as a function of various quantities that affect power in practice, such as effect size, type I error rate, and sample size.

Sample Size Calculation

Sample Size Calculation allows you to calculate, for a given analysis type (see List of Tests below), the sample size required to attain a given level of power, and to generate plots of required sample size as a function of required power, type I error rate, and effect size.

Interval Estimation

Interval Estimation allows you to calculate, for a given analysis type (see List of Tests below), specialized confidence intervals not generally available in general-purpose Statistical packages. These confidence intervals are distinguished in some cases by the fact that they refer to standardized effects, and in others by the fact that they are exact confidence intervals in situations where only approximate techniques have generally been available.

Statistica Power Analysis and Interval Estimation is unique among programs of its type in that it calculates confidence intervals for a number of important Statistical quantities such as standardized effect size (in t-tests and ANOVA), the correlation coefficient, the squared multiple correlation, the sample proportion, and the difference between proportions (either independent or dependent samples).

These capabilities, in turn, may be used to construct confidence intervals on quantities such as power and sample size, allowing the user to utilize the data from one study to construct an exact confidence interval on the sample size required for another study.

Probability Distributions

Probability Distributions allows you to perform a variety of calculations on probability distributions that are of special value in performing power and sample size calculations.

The routines are distinguished by their high level of accuracy, and the wide range of parameter values for which they will perform calculations. The noncentral distributions are also distinguished by the ability to calculate a noncentrality parameter that places a given observation at a given percentage point in the noncentral distribution. The ability to perform this calculation is essential to the technique of "noncentrality interval estimation"


These routines, which include the noncentral t, noncentral F, noncentral chi-square, binomial, exact distribution of the correlation coefficient, and the exact distribution of the squared multiple correlation coefficient, are characterized by their ability to solve for an unknown parameter, and for their ability to handle "non-null" cases.

For example, not only can the distribution routine for the Pearson correlation calculate p as a function of r and N for rho=0, it can also perform the calculation for other values of rho. Moreover, it can solve for the exact value of rho that places an observed r at a particular percentage point, for any given N.

List of Tests

Statistica Power Analysis and Interval Estimation calculates power as a function of sample size, effect size, and Type I error rate for the tests listed below:

  • 1-sample t-test
  • 2-sample independent sample t-test
  • 2-sample dependent sample t-test
  • Planned contrasts
  • 1-way ANOVA (fixed and random effects)
  • 2-way ANOVA
  • Chi-square test on a single variance
  • F-test on 2 variances
  • Z-test (or chi-square test) on a single proportion
  • Z-test on 2 independent proportions
  • Mcnemar's test on 2 dependent proportions
  • F-test of significance in multiple regression
  • t-test for significance of a single correlation
  • Z-test for comparing 2 independent correlations
  • Log-rank test in survival analysis
  • Test of equal exponential survival, with accrual period
  • Test of equal exponential survival, with accrual period and dropouts
  • Chi-square test of significance in structural equation modeling
  • Tests of "close fit" in structural equation modeling confirmatory factor analysis
Example Application

Suppose you are planning a 1-Way ANOVA to study the effect of a drug.

Prior to planning the study, you find that there has been a similar study previously. This particular study had 4 groups, with N = 50 subjects per group, and obtained an F-statistic of 15.4.

From this information, as a first step you can (a) gauge the population effect size with an exact confidence interval, and (b) use this information to set a lower bound to appropriate sample size in your study.

Simply enter the data into a convenient dialog, and results are immediately available.

In this case, we discover that a 90% exact confidence interval on the root-mean-square standardized effect (RmsSE) ranges from about .398 to .686. With effects this strong, it is not surprising that the 90% post hoc confidence interval for power ranges from .989 to almost 1. We can use this information to construct a confidence interval on the actual N needed to achieve a power goal (in this case, .90). This confidence interval ranges from 12 to 31. So, based on the information in the study, we are 90% confident that a sample size no greater than 31 would have been adequate to produce a power of .90.



Turning to our own study, suppose we examine the relationship between power and effect size for a sample size of 31. The first graph shows quite clearly that as long as the effect size for our drug is in the range of the confidence interval for the previous study, our power will be quite high, should the actual effect size for our drug be on the order of .25, power will be inadequate.



If, on the other hand, we use a sample size comparable to the previous study (i.e., 50 per group) we discover that power will remain quite reasonable, even for effects on the order of .28.

With Statistica Power Analysis and Interval Estimation, this entire analysis runs in just a minute or two.



Statistica Base Modules

Descriptive Statistics, Breakdowns, and Exploratory Data Analysis

Descriptive Statistics and Graphs

Descriptive statistics, breakdowns, and exploratory data analysisThe program will compute practically all common, general-purpose descriptive statistics including medians, modes, quartiles, user-specified percentiles, average and standard deviations, quartile ranges, confidence limits for the mean, skewness and kurtosis (with their respective standard errors), harmonic means, geometric means, as well as many specialized descriptive statistics and diagnostics, either for all cases or broken down by one or more categorical (grouping) variables. As with all modules of Statistica, a wide variety of graphs will aid exploratory analyses, e.g., various types of box-and-whisker plots, histograms, bivariate distribution (3D or categorized) histograms, 2D and 3D scatterplots with marked subsets, normal, half-normal, detrended probability plots, Q-Q plots, P-P plots, etc. A selection of tests is available for fitting the normal distribution to the data (via the Kolmogorov-Smirnov, Lilliefors, and Shapiro-Wilks' tests; facilities for fitting a wide variety of other distributions are also available; see also Statistica Process Analysis; and the section on fitting in the Graphics section).

By-Group Analyses (Breakdowns)

By-Group Analyses (Breakdowns) Practically all descriptive statistics as well as summary graphs can be computed for data that are categorized (broken down) by one or more grouping variables. For example, with just a few mouse clicks the user can break down the data by Gender and Age and review categorized histograms, box-and-whisker plots, normal probability plots, scatterplots, etc. If more than two categorical variables are chosen, cascades of the respective graphs can be automatically produced. Options to categorize by continuous variables are provided, e.g., you can request that a variable be split into a requested number of intervals, or use the on-line recode facility to custom-define the way in which the variable will be recoded (categorization options of practically unlimited complexity can be specified at any point and they can reference relations involving all variables in the dataset). In addition, a specialized hierarchical breakdown procedure is provided that allows the user to categorize the data by up to six categorical variables, and compute a variety of categorized graphs, descriptive statistics, and correlation matrices for subgroups (the user can interactively request to ignore some factors in the complete breakdown table, and examine statistics for any marginal tables). Numerous formatting and labeling options allow the user to produce publication-quality tables and reports with long labels and descriptions of variables. Note that extremely large analysis designs can be specified in the breakdown procedure (e.g., 100,000 groups for a single categorization variable), and results include all relevant ANOVA statistics (including the complete ANOVA table, tests of assumptions such as the Levene and Brown-Forsythe tests for homogeneity of variance, a selection of seven post-hoc tests, etc.). As in all other modules of Statistica, extended precision calculations (the "quadruple" precision, where applicable) are used to provide an unmatched level of accuracy (see the section on Precision). Because of the interactive nature of the program, exploration of data is very easy. For example, exploratory graphs can be produced directly from all results Spreadsheets by pointing with the mouse to specific cells or ranges of cells. Cascades of even complex (e.g., multiple categorized) graphs can be produced with a single-click of the mouse and reviewed in a slide-show manner. In addition to numerous predefined Statistical graphs, countless graphical visualizations of raw data, summary statistics, relations between statistics, as well as all breakdowns and categorizations can be custom-defined by the user via straightforward point-and-click facilities designed to reduce the necessary number of mouse clicks. All exploratory graphical techniques (described in the section on Graphics) are integrated with statistics to facilitate graphical data analyses (e.g., via interactive outlier removal, subset selections, smoothing, function fitting, extensive brushing options allowing the user to easily identify and/or extract the selected data, etc.). See also the section on Block Statistics, below.


Correlations A comprehensive set of options allows for the exploration of correlations and partial correlations between variables. First, practically all common measures of association can be computed, including Pearson r, Spearman rank order R, Kendall tau (b, c), Gamma, tetrachoric r, Phi, Cramer V, contingency coefficient C, Sommer's D, uncertainty coefficients, part and partial correlations, autocorrelations, various distance measures, etc. (nonlinear regressions, regressions for censored data and other specialized measures of correlations are available in Nonlinear Estimation, Survival Analysis, and other modules offered in Statistica Advanced Linear/Non-Linear Models). Correlation matrices can be computed using casewise (listwise) or pairwise deletion of missing data, or mean substitution. As in all other modules of Statistica, extended precision calculations (the "quadruple" precision, where applicable) are used to yield an unmatched level of accuracy (see the section on Precision). Like all other results in Statistica, correlation matrices are displayed in Spreadsheets offering various formatting options (see below) and extensive facilities to visualize numerical results; the user can "point to" a particular correlation in the Spreadsheet and choose to display a variety of "graphical summaries" of the coefficient (e.g., scatterplots with confidence intervals, various 3D bivariate distribution histograms, probability plots, etc.).

Brushing and outlier detection

The extensive brushing facilities in the scatterplots allow the user to select/deselect individual points in the plot and assess their effect on the regression line (or other fitted function lines).

Display formats of numbers

A variety of global display formats for correlations are supported; significant correlation coefficients can be automatically highlighted, each cell of the Spreadsheet can be expanded to display n and p, or detailed results may be requested that include all descriptive statistics (pairwise means and standard deviations, B weights, intercepts, etc.). Like all other numerical results, correlation matrices are displayed in Spreadsheets offering the zoom option and interactively-controlled display formats (e.g., from +.4 to +.4131089276410193); thus, large matrices can be compressed (via either the zoom or format-width control adjustable by dragging) to facilitate the visual search for coefficients which exceed a user-specified magnitude or significance level (e.g., the respective cells can be marked red in the Spreadsheet).

Scatterplot, scatterplot matrices, by-group analyses

Scatterplot, scatterplot matrices, by-group analyses As in all output selection dialogs, numerous global graphics options are available to further study patterns of relationships between variables, e.g., 2D and 3D scatterplots (with or without case labels) designed to identify patterns of relations across subsets of cases or series of variables. Correlation matrices can be computed as categorized by grouping variables and visualized via categorized scatterplots. Also "breakdowns of correlation matrices" can be generated (one matrix per subset of data), displayed in queues of Spreadsheets, and saved as stacked correlation matrices (which can later be used as input into the Structural Equations Modeling and Path Analysis [SEPATH] module offered in Statistica Advanced Linear/Non-Linear Models). An entire correlation matrix can be summarized in a single graph via the Matrix scatterplot option (of practically unlimited density); large scatterplot matrices can then be reviewed interactively by "zooming in" on selected portions of the graph (or scrolling large graphs in the zoom mode) [see the illustration]. Also, categorized scatterplot matrix plots can be generated (one matrix plot for each subset of data). Alternatively, a multiple-subset scatterplot matrix plot can be created where specific subsets of data (e.g., defined by levels of a grouping variable or selection conditions of any complexity) are marked with distinctive point markers. Various other graphical methods can be used to visualize matrices of correlations in search of global patterns (e.g., contour plots, non-smoothed surfaces, icons, etc.). All of these operations require only a few mouse clicks and various shortcuts are provided to simplify selections of analyses; any number of Spreadsheets and graphs can be displayed simultaneously on the screen, making interactive exploratory analyses and comparisons very easy.

Basic Statistics From Results Spreadsheets (Tables)

Basic StatisticsStatistica is a single integrated analysis system that presents all numerical results in spreadsheet tables that are suitable (without any further modification) for input into subsequent analyses. Thus, basic statistics (or any other Statistical analysis) can be computed for results tables from previous analyses; for example, you could very quickly compute a table of means for 2000 variables, and next use this table as an input data file to further analyze the distribution of those means across the variables. Thus, basic statistics are available at any time during your analyses, and can be applied to any results spreadsheet.

Block Statistics

In addition to the detailed descriptive statistics that can be computed for every spreadsheet, you can also highlight blocks of numbers in any spreadsheet, and produce basic descriptive statistics or graphs for the respective subset of numbers only. For example, suppose you computed a results spreadsheet with measures of central tendency for 2000 variables (e.g., with Means, Modes, and Medians, Geometric Means, and Harmonic Means); you could highlight a block of, for example, 200 variables and the Means and Medians, and then in a single operation produce a multiple line graph of those two measures across the subset of 200 variables. Statistical analysis by blocks can be performed by row or by column; for example, you could also compute a multiple line graph for a subset of variables across the different measures of central tendency. To summarize, the block statistics facilities allow you to produce statistics and Statistical graphs from values in arbitrarily selected (highlighted) blocks of values in the current data spreadsheet or output Spreadsheet.

Interactive Probability Calculator

Interactive Probability Calculator A flexible, interactive Probability Calculator is accessible from all toolbars. It features a wide selection of distributions (including Beta, Cauchy, Chi-square, Exponential, Extreme value, F, Gamma, Laplace, Lognormal, Logistic, Pareto, Rayleigh, t (Student), Weibull, and Z (Normal)); interactively (in-place) updated graphs built into the dialog (a plot of the density and distribution functions) allow the user to visually explore distributions taking advantage of the flexible Statistica Smart MicroScrolls which allow the user to advance either the last significant digit (press the LEFT-mouse-button) or next to the last significant digit (press the RIGHT-mouse-button). Facilities are provided for generating customizable, compound graphs of distributions with requested cutoff areas. Thus, this calculator allows you to interactively explore the distributions (e.g., the respective probabilities depending on shape parameters).

t-Tests and Other Tests of Group Differences

T-tests and Other Tests of Group Differences T-tests for dependent and independent samples, as well as single samples (testing means against user-specified constants) can be computed, multivariate Hotelling's T 2 tests are also available (see also ANOVA/MANOVA, and GLM (General Linear Models) offered in Statistica Advanced Linear/Non-Linear Models. Flexible options are provided to allow comparisons between variables (e.g., treating the data in each column of the input spreadsheet as a separate sample) and coded groups (e.g., if the data includes a categorical variable such as Gender to identify group membership for each case). As with all procedures, extensive diagnostics and graphics options are available from the results menus. For example, for the t-test for independent samples, options are provided to compute t-tests with separate variance estimates, Levene and Brown-Forsythe tests for homogeneity of variance, various box-and-whisker plots, categorized histograms and probability plots, categorized scatterplots, etc. Other (more specialized) tests of group differences are part of many modules (e.g., Nonparametrics (below), Survival Analysis (available in Statistica Advanced), Reliability/Item Analysis (available in Statistica Multivariate Exploratory Techniques).

Frequenct Tables, Crosstabulation Tables, Stub-and-Banner Tables, Multiple Response Analysis, and Tables

Frequency Tables, Crosstabulation Tables, Stub-And-Banner Tables, Multiple Response Analysis, And TablesExtensive facilities are provided to tabulate continuous, categorical, and multiple response variables, or multiple dichotomies. A wide variety of options are offered to control the layout and format of the tables. For example, for tables involving multiple response variables or multiple dichotomies, marginal counts and percentages can be based on the total number of respondents or responses, multiple response variables can be processed in pairs, and various options are available for counting (or ignoring) missing data. Frequency tables can also be computed based on user-defined logical selection conditions (of any complexity, referencing any relationships between variables in the dataset) that assign cases to categories in the table. All tables can be extensively customized to produce final (publication-quality) reports. For example, unique "multi-way summary" tables can be produced with breakdown-style, hierarchical arrangements of factors, crosstabulation tables may report row, column, and total percentages in each cell, long value labels can be used to describe the categories in the table, frequencies greater than a user-defined cutoff can be highlighted in the table, etc. The program can display cumulative and relative frequencies, Logit- and Probit-transformed frequencies, normal expected frequencies (and the Kolmogorov-Smirnov, Lilliefors, and Shapiro-Wilks' tests), expected and residual frequencies in crosstabulations, etc. Available Statistical tests for crosstabulation tables include the Pearson, Maximum-Likelihood and Yates-corrected Chi-squares; McNemar's Chi-square, the Fisher exact test (one- and two-tailed), Phi, and the tetrachoric r; additional available statistics include Kendall's tau (a, b), Gamma, Spearman r, Sommer's D, uncertainty coefficients, etc.


Graphical options include simple, categorized (multiple), and 3D histograms, cross-section histograms (for any "slices" of the one-, two-, or multi-way tables), and many other graphs including a unique "interaction plot of frequencies" that summarizes the frequencies for complex crosstabulation tables (similar to plots of means in ANOVA). Cascades of even complex (e.g., multiple categorized, or interaction) graphs can be interactively reviewed. See also the section on Block Statistics, above, and sections on Log-linear Analysis (available in Statistica Advanced Linear/Non-Linear Models) and Correspondence Analysis (available in Statistica Advanced).

Multiple Regression Methods

Multiple RegressionThe Multiple Regression module is a comprehensive implementation of linear regression techniques, including simple, multiple, stepwise (forward, backward, or in blocks), hierarchical, nonlinear (including polynomial, exponential, log, etc.), Ridge regression, with or without intercept (regression through the origin), and weighted least squares models; additional advanced methods are provided in the General Regression Models (GRM) module (e.g., best subset regression, multivariate stepwise regression for multiple dependent variables, for models that may include categorical factor effects; Statistical summaries for validation and prediction samples, custom hypotheses, etc.). The Multiple Regression module will calculate a comprehensive set of statistics and extended diagnostics including the complete regression table (with standard errors for B, Beta and intercept, R-square and adjusted R-square for intercept and non-intercept models, and ANOVA table for the regression), part and partial correlation matrices, correlations and covariances for regression weights, the sweep matrix (matrix inverse), the Durbin-Watson d statistic, Mahalanobis and Cook's distances, deleted residuals, confidence intervals for predicted values, and many others.

Predicted and residual values

The extensive residual and outlier analysis features a large selection of plots, including a variety of scatterplots, histograms, normal and half-normal probability plots, detrended plots, partial correlation plots, different casewise residual and outlier plots and diagrams, and others. The scores for individual cases can be visualized via exploratory icon plots and other multidimensional graphs integrated directly with the results Spreadsheets. Residual and predicted scores can be appended to the current data file. A forecasting routine allows the user to perform what-if analyses, and to interactively compute predicted scores based on user-defined values of predictors.

By-group analysis; related procedures

Extremely large regression designs can be analyzed. An option is also included to perform multiple regression analyses broken down by one or more categorical variable (multiple regression analysis by group); additional add-on procedures include a regression engine that supports models with thousands of variables, a Two-stage Least Squares regression, as well as Box-Cox and Box-Tidwell transformations with graphs. An add-on package, Statistica Advanced Linear/Non-Linear Models, also includes general nonlinear estimation modules (Nonlinear Estimation, Generalized Linear Models (GLZ), Partial Least Squares models (PLS)) that can estimate practically any user-defined nonlinear model, including Logit, Probit, and others. The add-on also includes SEPATH, the general Structural Equation Modeling and Path Analysis module, which allows the user to analyze extremely large correlations, covariances, and moment matrices (for intercept models). An implementation of Generalized Additive Models (GAM) is also available in Statistica Data Miner.

Nonparametric Statistics

Nonparametric StatisticsThe Nonparametric Statistics module features a comprehensive selection of inferential and descriptive statistics including all common tests and some special application procedures. Available Statistical procedures include the Wald-Wolfowitz runs test, Mann-Whitney U test (with exact probabilities [instead of the Z approximations] for small samples), Kolmogorov-Smirnov tests, Wilcoxon matched pairs test, Kruskal-Wallis ANOVA by ranks, Median test, Sign test, Friedman ANOVA by ranks, Cochran Q test, McNemar test, Kendall coefficient of concordance, Kendall tau (b, c), Spearman rank order R, Fisher's exact test, Chi-square tests, V-square statistic, Phi, Gamma, Sommer's d, contingency coefficients, and others. (Specialized nonparametric tests and statistics are also part of many add-on modules, e.g., Survival Analysis, Statistica Process Analysis, and others.) All (rank order) tests can handle tied ranks and apply corrections for small n or tied ranks. The program can handle extremely large analysis designs. As in all other modules of Statistica, all tests are integrated with graphs (that include various scatterplots, specialized box-and-whisker plots, line plots, histograms and many other 2D and 3D displays).


The ANOVA/MANOVA module includes a subset of the functionality of the General Linear Models module (part of the Advanced Linear/Non-Linear Models), and can perform univariate and multivariate analysis of variance of factorial designs with or without one repeated measures variable. For more complicated linear models with categorical and continuous predictor variables, random effects, and multiple repeated measures factors you need the General Linear Models module (stepwise and best-subset options are available in the General Regression Models module). In the ANOVA/MANOVA module, you can specify all designs in the most straightforward, functional terms of actual variables and levels (not in technical terms, e.g., by specifying matrices of dummy codes), and even less-experienced ANOVA users can analyze very complex designs with Statistica. Like the General Linear Models module, ANOVA/MANOVA provides three alternative user interfaces for specifying designs: (1) A Design Wizard, that will take you step-by-step through the process of specifying a design, (2) a simple dialog-based user-interface that will allow you to specify designs by selecting variables, codes, levels, and any design options from well-organized dialogs, and (3) a Syntax Editor for specifying designs and design options using keywords and a common design syntax. Computational methods. The program will use, by default, the sigma restricted parameterization for factorial designs, and apply the effective hypothesis approach (see Hocking, 19810) when the design is unbalanced or incomplete. Type I, II, III, and IV hypotheses can also be computed, as can Type V and Type VI hypotheses that will perform tests consistent with the typical analyses of fractional factorial designs in industrial and quality-improvement applications (see also the description of the Experimental Design module).

Results statistics

The ANOVA/MANOVA module is not limited in any of its computational routines for reporting results, so the full suite of detailed analytic tools available in the General Linear Models module is also available here; results include summary ANOVA tables, univariate and multivariate results for repeated measures factors with more than 2 levels, the Greenhouse-Geisser and Huynh-Feldt adjustments, plots of interactions, detailed descriptive statistics, detailed residual statistics, planned and post-hoc comparisons, testing of custom hypotheses and custom error terms, detailed diagnostic statistics and plots (e.g., histogram of within-cell residuals, homogeneity of variance tests, plots of means versus standard deviations, etc.).

Distribution Fitting

Distribution FittingThe Distribution Fitting options allow the user to compare the distribution of a variable with a wide variety of theoretical distributions. You may fit to the data the Normal, Rectangular, Exponential, Gamma, Lognormal, Chi-square, Weibull, Gompertz, Binomial, Poisson, Geometric, or Bernoulli distribution. The fit can be evaluated via the Chi-square test or the Kolmogorov-Smirnov one-sample test (the fitting parameters can be controlled); the Lilliefors and Shapiro-Wilks' tests are also supported (see above). In addition, the fit of a particular hypothesized distribution to the empirical distribution can be evaluated in customized histograms (standard or cumulative) with overlaid selected functions; line and bar graphs of expected and observed frequencies, discrepancies and other results can be produced from the output Spreadsheets. Other distribution fitting options are available in Statistica Process Analysis, where the user can compute maximum-likelihood parameter estimates for the Beta, Exponential, Extreme Value (Type I, Gumbel), Gamma, Log-Normal, Rayleigh, and Weibull distributions. Also included in that module are options for automatically selecting and fitting the best distribution for the data, as well as options for general distribution fitting by moments (via Johnson and Pearson curves). User-defined 2- and 3-dimensional functions can also be plotted and overlaid on the graphs. The functions may reference a wide variety of distributions such as the Beta, Binomial, Cauchy, Chi-square, Exponential, Extreme value, F, Gamma, Geometric, Laplace, Logistic, Normal, Log-Normal, Pareto, Poisson, Rayleigh, t (Student), or Weibull distribution, as well as their integrals and inverses. Additional facilities to fit predefined or user-defined functions of practically unlimited complexity to the data are available in Nonlinear Estimation (available in Statistica Advanced Linear/Non-Linear Models).