Spring til indhold
Hjem » Slik gjør du lineær regresjonsanalyse med SigmaPlot

Slik gjør du lineær regresjonsanalyse med SigmaPlot

    Introduction to regression analysis

    Are you looking to understand the power of regression analysis and how it can help you better understand relationships between variables? In this tutorial, I will show you how to do simple linear and multiple regression analyses with SigmaPlot. SigmaPlot includes many statistical methods and 100s of regression equations to choose from, and you can add your own customized regression equation if needed. This tutorial hopefully will make you better understand how regression analysis works and how you can apply it to your research.

    Linear regression analysis helps you connect the dots and tell a story from your data.

    SigmaPlot regression analysis video demonstration

    Regression analysis is invaluable for gaining insights and making better decisions based on available data while examining complex relationships.

    Regression analysis has four main uses: description, estimation, prediction, and control. It describes the relationship between dependent and independent variables, allows for estimation of the dependent variable based on observed independent variables, predicts outcomes and changes in the dependent variable based on their relationship, and controls the effect of one or more independent variables while examining the relationship between one independent variable and the dependent variable.

    Video demonstration

    Types of regression analysis

    There are many types of regression analysis, including:

    • Simple Linear Regression
      This involves modelling the relationship between a single independent variable and a dependent variable.
    • Multiple Linear Regression
      This involves modelling the relationship between multiple independent variables and a dependent variable.
    • Logistic Regression
      This is used for classification problems and models the probability of a binary outcome based on one or more independent variables.
    • Non-Linear Regression
      This involves modelling the relationship between an independent variable and a dependent variable as a non-linear function.

    These are some of the most commonly used regression analysis techniques, but many others can be used for specific applications or purposes. No matter the type, all forms of regression analysis investigate how one or more independent variables influence a dependent variable.

    Linear regression models, like simple linear and multiple linear, are the most common. But nonlinear regression analysis is often used for more complicated datasets in which the connection between dependent and independent variables is not linear.

    This tutorial will demonstrate how you can do simple linear and multiple linear regression in SigmaPlot. We will use example data about housing prices from an article about regression analysis with Excel. Hopefully, you will learn a few tips and tricks about how easy and feature-rich this can be done with SigmaPlot.

    For the simple linear regression, we will be using the equation:

    • y = b + a*x

    And for the multiple linear regression, we will be using the equation (two independent variables):

    • y = b + a1*x1 + a2*x2

    y is the dependent variable, and x, x1, and x2 are the independent variables.

    By doing a regression analysis, we can find the unknown b and a, a1 and a2 variables in our equations above, and then be able to calculate the expected value of y (the dependent variable) for any given value of x (the independent variables).

    What is the purpose of regression analysis?

    Linear regression analysis is like making a jigsaw puzzle – you have the pieces, but need to figure out how to fit them together.

    Regression is a statistical technique utilized to determine the relationships between variables in a dataset, allowing for an evaluation of any connections’ strength and statistical significance. It can also be employed to forecast future outcomes based on past occurrences.

    Why is it called regression?

    It is called regression because it involves finding the line of best fit that describes the relationship between the variables, often referred to as a “regression line”. Regression analysis aims to identify patterns in the data and use them to make predictions about future outcomes.

    Which conditions must be satisfied for regression models to work properly?

    Regression analysis is simply a calculation carried out on isolated data. The interpretation of a regression’s output as a statistically meaningful quantity that indicates real-world relationships requires researchers to make various classical assumptions, such as:

    • Data sampling
      Ensuring the data sample is representative of the population and that the independent variables are measured with no error.
    • Model specification
      The first step is to specify the correct model. This involves deciding on the dependent variable, independent variables, and their functional forms.
    • Independence of observations
      The observations in the dataset should be independent of each other. This means that one observation’s value should not influence another’s value.
    • No multicollinearity
      The independent variables should not be highly correlated with each other. This is known as multicollinearity and can lead to unstable parameter estimates and difficulties in interpretation.
    • Normality of residuals
      The residuals (differences between the observed and predicted values) should be normally distributed. This assumption is important for hypothesis testing and making predictions.
    • Homoscedasticity
      The variance of the residuals should be constant across all levels of the independent variables. This is known as homoscedasticity. Non-constant variance is referred to as heteroscedasticity and can affect hypothesis testing and prediction accuracy.
    • No omitted variable bias
      All relevant independent variables should be included in the model. Omitting important variables can lead to omitted variable bias and incorrect parameter estimates.
    • Model fit
      The model should fit the data well. This can be assessed using goodness-of-fit statistics such as R-squared, adjusted R-squared, and the residual plot.
    • Causal inference
      Care should be taken when making causal inferences based on the results of a multiple regression analysis. The direction of causality should be established based on prior knowledge, experimental design, or additional analyses.

    What mistakes do people make when working with regression analysis?

    It’s important to remember that just because there is a correlation between two things doesn’t necessarily mean that one is causing the other. This is a common mistake known as confusing cause and causality. A common cause-and-effect mistake involving house selling prices and square footage is assuming that a larger house will always command a higher selling price. This is an example of a causality error because other factors may affect the selling price of a house, such as the location, age, and condition of the property. So always be wary of making causal claims based solely on correlation – it’s not always as simple as it seems!

    Avoid examining every variable available all at once. Doing so may result in the identification of nonexistent relationships. This concept is similar to flipping a coin. If you keep doing it enough times, you will eventually find patterns that are not real, such as a set of consecutive heads.

    Take caution when gathering data, considering how it is collected and if you can trust the data.

    It is important not to disregard the error term as this can lead to an incorrect perception of certainty in the analysed relationships. Regression analysis may explain 90% of the relationship, but it is crucial to remember that the results are inherently uncertain, and the remaining 10% should not be overlooked.

    It’s important to trust your instincts and judgement. Consider whether the results align with your prior understanding of the situation. If anything seems off, question whether it’s due to incorrect data or a significant error. Pairing any regression analysis with observations is crucial to get the full picture. The best scientists examine both the data and real-world observations.

    Regression analysis example data

    In this tutorial, we will follow and use data from a course on simple linear and multiple regression at Saint Leo University (link to original article). The data set contains (fictional) the selling price, the square footage, the number of bedrooms, and the age of houses (in years) sold in a neighbourhood in the past six months.

    Example data

    Our task is to find a model that predicts the selling price (dependent variable) based on the independent variables of square footage, number of bedrooms and age.

    By doing regression analysis on this dataset, we will try to answer the questions:

    Which independent variables will have the biggest effect on the selling price? Will it be the square footage, the number of bedrooms, or the age of the house? Will we get a better fit if we include all three independent variables, and if not, which two independent variables should we pick?

    Regression analysis is a technique used to mathematically assess which variables have an influence. It can solve inquiries such as: Which characteristics are the most influential? What components can be disregarded? How do these qualities interact with each other? And, probably most significant, how reliable are we about all these aspects?

    Regression analysis with SigmaPlot

    1. Importing the data to SigmaPlot

    In this case, I only had access to the Saint Leo University PDF document and no access to the data file. To avoid wasting time entering the data manually into a SigmaPlot worksheet, I used our PDF management software, FineReader PDF, which has a screenshot reader tool that can extract data tables directly from any screenshot to Microsoft Excel.

    • FineReader PDF: Tools > Screenshot reader > [Send: Table to Excel]

    There are probably free tools out there doing the same. Try googling “screen capture data tables”. A decent screen-capturing tool is necessary when gathering data from different (old) sources. I can also recommend the screen-capturing tool Snagit for grabbing text from documents and images.

    SigmaPlot plays well with Microsoft Excel, so having the data in my Excel sheet, I can easily copy-paste it into my SigmaPlot worksheet. This, however, pastes the column titles in the first row and not in SigmaPlot’s column header/titles. To move them up into the column titles field:

    1. Select all data by clicking the worksheet corner cell (top-left corner) in your SigmaPlot worksheet.
    2. SigmaPlot [Worksheet]: Titles > Promote row [1] to titles > [Promote]

    Another way of doing this is by importing the Excel file to your SigmaPlot project.

    1. SigmaPlot button (top-left): File import > File import > Navigate to the Excel file and open it

    2. Visualise your data in SigmaPlot

    Visualizing your data is a crucial step in understanding and interpreting your results. With SigmaPlot, you have a powerful tool that can help you effectively visualize your data and better understand your results. Whether you want to create simple scatter plots, histograms, or complex 3D surfaces.

    SigmaPlot scatter plots with regression lines

    Let’s visualise the square footage and Age vs Price.

    1. SigmaPlot [Create graph]: Scatter > Simple Scatter – Regression
    2. Choose XY pair > Next
    3. Click X in the “Selected columns” field, and then click the top of your Square footage column in your worksheet, or select “2-Square footage” in the “Data for X” drop-down menu.
    4. Click Y in the “Selected columns” field, and then click the top of the Price column in your worksheet, or select “1-Price” in the “Data for Y” drop-down menu.
    5. Click Finish to create a scatter plot with a regression line in SigmaPlot.
    6. Switch back to the data worksheet in SigmaPlot, and do the same again, but choose “4-Age” for “Data for X” this time.
    7. You should now have a graph page with two scatter plots on top of each other in SigmaPlot. Right-click the graph page and choose “Layouts” > “2 up, 3.5″ x 3.5″ landscape”

    Please note that you can double-click any element on the graph page to edit it. I.e. double-click the title to change the title text for each of your graphs, double-click an axis to change labels and tick-marks, or click-drag the legend boxes to place them underneath the “X Data” text.

    3. Analyse your data and find the best subset for your regression

    Finding the best subset of data for regression analysis is an important step in ensuring the accuracy and robustness of your results. In our case, we have three subsets, the three independent variables: Square footage, number of bedrooms, and age. Which of these correlates with the price the most, and are they all relevant to our study?

    Best subset regression analysis

    SigmaPlot provides a range of diagnostic tools that allow you to identify influential observations and check the assumptions of your regression model. These tools can help you to refine your analysis and improve the robustness of your results. We will use the “Best Subset Regression” analysis tool in this case.

    1. SigmaPlot [Analysis]: Tests > Regressions > Best Subset…
    2. Choose “Price” as the Dependent variable, and then choose Square footage, Bedrooms and Age as the independent variables for this test.
    3. Click Finish to create the report in SigmaPlot.

    Reading the report, we find that the Best Subset for our regression data is:

    • One variable (simple linear regression): Age
    • Two variables (multiple linear regression): Square footage and Age

    And the Best Subset report shows that we do not get a better regression model by including the number of bedrooms variable. R-square is equal for using 2 vs 3 independent variables, but Adjusted R-square is higher for using only the two variables, Square footage and Age.

    4. Simple linear regression using SigmaPlot’s Regression Wizard

    Simple linear regression is a technique in which the correlation between a dependent and independent variable is analyzed following the equation Y = mX + b.

    The simple linear model is expressed using the following equation:

    • Y = y0 + aX + ϵ

    Where:

    • Y is the dependent variable
    • X is the independent (explanatory) variable
    • y0 is the Y-axis intercept
    • a is the slope of the regression line
    • ϵ is the residual (error)
    SigmaPlot graph with 95% confidence and prediction bands.

    To perform a simple linear regression using SigmaPlot and the regression wizard, follow these steps:

    1. Start SigmaPlot and import or paste the worksheet with your data.
    2. In the “Analysis” menu, select “Regression Wizard”. This will open the Regression Wizard.

      Please note that the SigmaPlot Regression Wizard includes 100s of models to choose from. The models are neatly categorized by equation type, and when you select any of the models, you get a nice visual representation of the equation and graph.
    3. Select “Polynomial” as the Equation category and “Linear” as the Equation name. This is the type of regression analysis we want to perform.
    4. Click Next to the second step, where we will select our independent and dependent variables.
    5. Select the two variables you want to use for your regression analysis. The first variable will be the independent variable, Square footage, and the second variable will be the dependent variable, Price.

      Please note that you can click-select a variable in the “Variables” field, select the chosen data column in the above Variable columns drop-down many, or click a column in your worksheet to select the data.
    6. After selecting Square footage as the x variable and Price as the y variable, click Next.
    7. In the third step, you will find information about the number of iterations SigmaPlot used to find the regression values for your equation, and R-squared, Sum of squared, y0 and a values calculated by SigmaPlot.
    8. Click Next to go to the next step.
    9. In the fourth step, you can specify any additional results and options for your regression analysis, such as whether you want to include a report and residual values in your regression analysis.
    10. Click Next to go to the next step.
    11. In the fifth step, choose how you want to display your result graph. You can choose to add 95% Confidence and prediction bands, extend the fit, and if you want to add the equation to the graph title.
    12. Click “Finish” to complete the regression analysis and display the results.

    SigmaPlot will create a scatter plot of your data with your regression fit line and 95% confidence and prediction bands if you choose this. If you chose SigmaPlot to create a report, you would also find your Regression report sheet with all statistical test results for your analysis.

    Using the regression wizard, this is a basic overview of performing a simple linear regression in SigmaPlot. Please refer to the SigmaPlot User’s Guide or help file for more detailed information and options.

    5. Multiple linear regression with SigmaPlot

    In numerous situations, a single variable may not be adequate to account for variation in Y. A multivariable linear regression can then be implemented to evaluate the effect of multiple variables on the result.

    In a multivariable regression model, the dependent variable Y is described as a linear combination of the independent variables of X, given by: Y = a + b1X1 + b2X2 +…+ bn*Xn.

    Multiple linear regression analysis is essentially similar to simple linear regression, except that multiple independent variables are used in the model. Multiple linear regression follows the same conditions as the simple linear model, but please note that the independent variables should show a minimum correlation. If the independent variables are strongly correlated, accurately measuring the relationship between the dependent and independent variables will be difficult.

    The Subset Regression analysis for our data showed that the best independent variables to use were Square footage and Age, so we will use these two variables for our multiple regression analysis with SigmaPlot in the following.

    1. SigmaPlot [Analysis] > Tests > Regression > Multiple Linear
    2. Choose “Price” as the dependent variable and Square footage and Age as the two independent variables.
    3. Click Finish to get your statistical regression report in SigmaPlot.

      Please note that since we only got two independent variables, you can also create a graph of your data.
    4. Select the “Multiple Regression Report” worksheet.
    5. SigmaPlot [Analysis]: Create Result Graph > Choose “3D Scatter and Mesh” in the Select Result Graph list, and click OK.

      Please note that you can double-click the graph and choose “Rotation” to change rotation and perspective by dragging the viewing angle handles.

    Key Takeaways

    Regression analysis using SigmaPlot can provide valuable insights into the relationship between two or more variables. A key takeaway from the analysis is that the regression model results can be used to make predictions about the dependent variable’s future values based on the independent variable’s values.

    Additionally, the coefficient values and p-values from the regression analysis can be used to determine the significance of each independent variable in explaining the variability in the dependent variable. It is important to carefully assess the assumptions of linearity, homoscedasticity, and normality and to consider transforming the variables or using non-linear regression methods if these assumptions are violated.