Essential Regression and Experimental Design

for Chemists and Engineers

Preface, Table of Contents  Back to My Homepage: English, Deutsch

Essential Regression Picture

David D. Steppan, Joachim Werner, Robert P. Yeater

Copyright 1998

 

To our wives.

Thank you for all your support and tolerance of our programming addiction.

 

 

 

Preface

Back to top

Essential Regression and Experimental Design for Chemists and Engineers was developed as an easy-to-use book with an accompanying software package which allows non-statisticians to analyze experimental designs and quantitative data using polynomial and multiple linear regression in a straightforward and understandable manner. From our experience as chemists and engineers, these two variations of regression analysis are the ones used most often to analyze data. They are the "essential" tools in data analysis. Recognizing the widespread use of Microsoft Office software, we developed Essential Regression software as a MS Excel Add-In (compiled Excel Macro). The user can work in the familiar and powerful data analysis environment of Excel and does not have to learn a new statistical software package. Other benefits from working directly in Excel are that it trivializes some of the most time consuming steps of regression analysis when compared to large conventional packages because the entire input and output of the regression lies within a standard spreadsheet workbook which eliminates the need to learn a new interface. They include:

  1. setting up data input tables
  2. creation, customization and printing of graphs
  3. transfer of the regression analysis to other software packages (word processors, presentation software) for a final report
  4. printing, saving, and recalling old results

 

The book and software are intuitive and guide the reader through the process of setting up a regression model and analyzing it. The software also contains an on-line help file which contains thumbnail descriptions of the significance of the output of the regression analysis and detailed instructions on how to use the software. This help file is no substitute for reading and understanding the book.

 

The book and software describe and implement all the tools needed for a complete linear regression analysis. Up to about 20 independent variables or regressors can be selected in a multiple regression, and second and third order models (including interactions) can easily be set up using the built-in dialogs. In the Polynomial Regression module, up to ninth order polynomials can be constructed. There are limitations with respect to the number of data points. The accompanying software is best suited for small and intermediate data sets of 50 to several 100 data points. This is a size which most often occurs in "everyday problems" encountered by students and scientists. It was not developed to handle large data sets of several thousand and more data points used by, for example, sociologists or pharmacological researchers.

 

The following approach is repeated throughout the book. A theoretical discussion of a statistical technique is presented followed by chapters which explain the features of the software pertaining to the theory discussed before. The sequence in which the theory is introduced follows an order which is most likely employed by the user: introduction to regression and types of models, ANOVA, hypothesis testing, outlier analysis, and graphical evaluation including surface plots. At the end of the book, a tutorial is included with data sets (also included in the Excel spreadsheets which come with the software) which are analyzed to illustrate the utility of the software. All the analyses presented can readily be reproduced by the reader. The book starts with the usual discussion of coefficient of variation and ANOVA analysis. It contains a variety of sections on different statistical parameters and residual analyses useful for model adequacy checking. For example there are sections on stepwise regression ("auto fitting") techniques, the effect of response and factor transforming, and the detection of outlier, influence and leverage points. Although the treatment of linear regression is very complete, the book is not intended as a fundamental theoretical textbook of linear regression aimed at statisticians. It is intended to teach regression to non-statisticians by applying linear regression to real data sets.

 

Experimental design is covered as it relates directly to regression analysis. This restricts the design package to factors and responses that are continuous, quantitative variables. Screening designs including full and fractional (Resolution 3-5) 2 level factorial and Plackett-Burman designs are covered. Response surface modeling (RSM) designs including face centered, circumscribed and inscribed central composite designs and Box-Behnken designs are included. The advantages and disadvantages of the various design types are covered. Advanced ideas such as aliasing, orthogonality, rotatability and sequential experimentation are explained.

 

The software accompanying Essential Regression and Experimental Design for Chemists and Engineers delivers all the tools necessary for a thorough, complete experimental design and linear regression analysis combined with easy handling and impressive output possibilities which rival the features of much more expensive and much less intuitive statistics packages.

 

Even as we go forward toward an electronic society, traditional publishing media (books) show no signs of being dethroned as the way to learn detailed technical concepts. However, books with illustrative examples and software that can be immediately applied do represent a vast improvement over a solely traditional approach. We believe that this "learning by doing" approach, along with a reasonably complete fundamental treatment represents an ideal way to learn new and useful technology. This is especially true for well-known and well defined concepts such as regression. We hope that you find Essential Regression and Experimental Design for Chemists and Engineers a good example of this new hybrid type of book.

Dave Steppan

Joachim Werner

Bob Yeater

Gibsonia, PA

Bethel Park, PA

New Martinsville, WV

April 1998

 

 

Contents

Back to top

1. Regression Models, Variables, Coefficients *

  • 1.1 Theoretical Background *
  • 1.1.1 Introduction to Linear Regression *

    1.1.2 Transformation of Variables *

    1.1.3 Regression Model Equations *

    1.1.4 The Least-Squares Method *

    1.1.5 Confidence Limits for Regression Coefficients and Observations *

    1.1.6 Intercept-free Regression Models *

  • 1.2 Application: Regress Menu, Input Dialogs of Essential Regression *

  • 1.2.1 Overview *

    1.2.2 Regress Menu *

    1.2.3 Multiple and Polynomial Regression Input Dialog Boxes *

  • 2. Tests for Significance of the Regression Model and Parameters *

  • 2.1 Theoretical Background *
  • 2.1.1 Introduction into Hypothesis Testing *

    2.1.2 Test for Significance of the Regression Model *

    2.1.3 Test of Significance on Individual Regression Coefficients *

    2.1.4 Test for Lack of Fit *

  • 2.2 Application: Multiple and Polynomial Regression Main Dialog (I): Model Term Selection, ANOVA, and Coefficients Table *

  • 2.2.1 Overview *

    2.2.2 Input Area of Main Dialog *

    2.2.3 Output Area of Main Dialog (I): ANOVA Table and Regression Coefficients Table *

  • 3. Regression Diagnostics and Model Adequacy Checking *

  • 3.1 Theoretical Background *
  • 3.1.1 Overview *

    3.1.2 Coefficients of Multiple Determination for Intercept Models *

    3.1.3 Coefficients of Multiple Determination for No-Intercept Models *

    3.1.4 Residuals, Standardized Residuals and Outliers *

    3.1.5 R2 for Prediction, Precision Index and Coefficient of Variation *

    3.1.6 Tests for Multicollinearity, Variance Inflation Factors *

    3.1.7 Autocorrelation *

  • 3.2 Application: Multiple and Polynomial Regression Main Dialog (II): Regression Summary, Residual Analysis, Outlier Analysis, and VIFs *

  • 3.2.1 Output Area of Main Dialog (II): Summary of Regression and VIFs *

    3.2.2 Outlier Button *

    3.2.3 Response Transformation in Essential Regression *

    . Graphs button *

  • 4. Model Optimization *

  • 4.1 Theoretical Background *
  • 4.1.1 The Problem of Finding the Best Regression Model *

    4.1.2 Performing All Possible Regressions and Criteria For Finding the Best Model *

    4.1.3 Stepwise Regression: Forward Selection of Variables *

    4.1.4 Stepwise Regression: Backward Elimination of Variables *

    4.1.5 Automatic Model Optimization *

    4.1.6 Transformation of the Response *

  • 4.2 Application: Multiple and Polynomial Regression Main Dialog (III): AutoRegress Area *

  • 4.2.1 Overview *

    4.2.2 Perform All Possible Regressions *

    4.2.3 Stepwise Regression in Essential Regression *

  • 5. Essential Regression Output *

  • 5.1 Graphical Evaluation of Residuals *

    5.2 Predicting Observations *

    5.3 Application: Essential Regression XLS Output Worksheet *

  • 5.3.1 Make XLS Button-Overview *

    5.3.2 ANOVA Table, Regression Coefficients Table, and Correlation Matrix *

    5.3.3 Tabular Output of Observations, Predictions, Residuals, and Outliers *

    5.3.4 Printed Output *

    5.3.5 Prediction of New Observations *

    5.3.6 Finding Input Variables for Given Output (Optimization Problem) *

    5.3.7 Graphs: Scatter Plots, Confidence Limits, 3D- Plots, and Animations *

    5.3.8 Deleting or Duplicating an Output Sheet *

    5.3.9 Starting A New Regression from Output Sheet *

  • 6. Experimental Design *

  • 6.1 Introduction *

    6.2 Screening Designs *

  • 6.2.1 Two Level Full Factorial Designs *

    6.2.2 Two Level Fractional Factorial Designs *

    6.2.3 Using the EED Software for a Two Level Fractional Factorial Design *

    6.2.4 Plackett-Burman Designs *

  • 6.3 Orthogonality and Rotatability *

    6.4 Response Surface Modeling (RSM) Designs *

  • 6.4.1 Inscribed Central Composite Designs *

    6.4.2 Circumscribed Central Composite Designs *

    6.4.3 Face Centered Central Composite Designs *

    6.4.4 Box-Behnken Designs *

  • 6.5 Summary *

  • 7. Quick Guide and Tutorial *

  • 7.1 Important Reminder *

    7.2 Installation *

    7.3 Loading Essential Regression into MS Excel *

    7.4 Performing a Regression Analysis using the ER_Test Data *

    7.5 Unloading Essential Regression *

    7.6 Loading Essential Experimental Design into MS Excel *

    7.7 Creating a simple experimental design and analyzing it with Essential Experimental Design (EED) *

    7.8 Unloading Essential Experimental Design *

  • 8. Literature *

    9. Index *

    Back to top