Regression analysis plays a central role in statistics and our understanding of the world. Linear regression models are the simplest type of regression and an understanding of them is an essential basis for more advanced models. In this article we will show how to use Excel to generate data from a simple linear regression model and illustrate how the statistical methods behave both when the fundamental assumptions of the model hold and when the fundamental assumptions are violated. The advantage of the using the program Excel is that when you press the recalculate button, under the Formulas menu, the data that is generated at random will be regenerated, statistical calculations will be recalculated and relevant graphs will be redrawn. Least squares is the statistical technique typically used when assumptions are satisfied. A statistical technique used when the normality assumption is violated is the non-parametric technique introduced by Kendall and Theil. The latter is useful when data are skewed or heteroskedastic, and is as powerful as least squares regression for Normally distributed data. Exercises are provided to illustrate both these procedures. In these exercises we generate samples of a Simple Linear Regression where the error term could follow a Normal distribution or the heavy tailed t-distribution.
Published in | American Journal of Theoretical and Applied Statistics (Volume 10, Issue 4) |
DOI | 10.11648/j.ajtas.20211004.13 |
Page(s) | 194-201 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2021. Published by Science Publishing Group |
Excel, Regression, Least Squares, Gauss-Markov, Non-parametric, Kendall-Theil
[1] | Frost, J (2020) Regression analysis: an intuitive guide for using and interpreting linear methods. Jim Publishing. |
[2] | Johnson R. A. Wichern D. W. (2002), Applied Multivariate Statistical Analysis. Prentice hall (see chapter 7). |
[3] | Gelman, A., Hill, J., & Vehtari, A (2020) Regression and other stories. Cambridge University Press. |
[4] | Cantrell, J (2018) Simple linear regression in Power B1. https://www.bluegranite.com/blog/simple-linear-regression-in-power-bi |
[5] | Kassambara, A (2020) Linear regression essentials in R. http://www.sthda.com/english/articles/40-regression-analysis/165-linear-regression-essentials-in-r/ |
[6] | Stojiljovic, M (2020) Linear regression in Python. https://realpython.com/linear-regression-in-python/#reader-comments. |
[7] | Fox, J., & Weisberg, S (2018) An R companion to applied regression (3rd edition). Sage Publications. |
[8] | Laverty WH, Miket MJ, Kelly IW., (2002) “Simulation of hidden Markov models with EXCEL” Journal of Royal Statistical Society: Series D. Vol 51 Issue 1, 31-40. |
[9] | Laverty WH, Kelly IW. (2018), “Using Excel to Simulate and Visualize Conditional Heteroskedastic Models” American Journal of Theoretical and Applied Statistics 2018; 7 (6): 242-246. |
[10] | Laverty WH, Kelly IW. (2019), “Using Excel to Visualize State Identification in Hidden Markov Models Using the Forward and Backward Algorithms” Applied Mathematical Sciences, Vol. 13, 2019, no. 4, 151 -162. |
[11] | Theil, H. (1950), "A rank-invariant method of linear and polynomial regression analysis. I, II, III", Nederl. Akad. Wetensch., Proc., 53: 386–392, 521–525, 1397–1412. |
[12] | Sen, P. K (1968), "Estimates of the regression coefficient based on Kendall's tau", Journal of the American Statistical Association, 63 (324): 1379–1389, doi: 10.2307/2285891. |
[13] | Gavrilova, Y (2020) Regression analysis overview: the hows and whys. https://serokell.io/blog/regression-analysis-overview |
[14] | Tsay, R S. (2010) Analysis of Financial Time Series. Third Edition. Wiley. Hoboken (NJ). |
[15] | Cook, J.D. (2018) Statistical Odds and Ends Blog. https://statisticaloddsandends.wordpress.com/2018/03/03/t-distribution-as-a-mixture-of-normals/ |
[16] | Fishman, G.S. (1995) Monte Carlo, Concepts, Algorithms and Applications. Springer, Berlin. |
[17] | Farooqi, A. (2019), A Comparative Study of Kendall-Theil Sen, Siegel Vs Quantile Regression with Outliers” (2019). Wayne State Dissertations. 2352. |
APA Style
William Henry Laverty, Ivan William Kelly. (2021). Exploring the Effects of Assumption Violations on Simple Linear Regression and Correlation Using Excel. American Journal of Theoretical and Applied Statistics, 10(4), 194-201. https://doi.org/10.11648/j.ajtas.20211004.13
ACS Style
William Henry Laverty; Ivan William Kelly. Exploring the Effects of Assumption Violations on Simple Linear Regression and Correlation Using Excel. Am. J. Theor. Appl. Stat. 2021, 10(4), 194-201. doi: 10.11648/j.ajtas.20211004.13
AMA Style
William Henry Laverty, Ivan William Kelly. Exploring the Effects of Assumption Violations on Simple Linear Regression and Correlation Using Excel. Am J Theor Appl Stat. 2021;10(4):194-201. doi: 10.11648/j.ajtas.20211004.13
@article{10.11648/j.ajtas.20211004.13, author = {William Henry Laverty and Ivan William Kelly}, title = {Exploring the Effects of Assumption Violations on Simple Linear Regression and Correlation Using Excel}, journal = {American Journal of Theoretical and Applied Statistics}, volume = {10}, number = {4}, pages = {194-201}, doi = {10.11648/j.ajtas.20211004.13}, url = {https://doi.org/10.11648/j.ajtas.20211004.13}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20211004.13}, abstract = {Regression analysis plays a central role in statistics and our understanding of the world. Linear regression models are the simplest type of regression and an understanding of them is an essential basis for more advanced models. In this article we will show how to use Excel to generate data from a simple linear regression model and illustrate how the statistical methods behave both when the fundamental assumptions of the model hold and when the fundamental assumptions are violated. The advantage of the using the program Excel is that when you press the recalculate button, under the Formulas menu, the data that is generated at random will be regenerated, statistical calculations will be recalculated and relevant graphs will be redrawn. Least squares is the statistical technique typically used when assumptions are satisfied. A statistical technique used when the normality assumption is violated is the non-parametric technique introduced by Kendall and Theil. The latter is useful when data are skewed or heteroskedastic, and is as powerful as least squares regression for Normally distributed data. Exercises are provided to illustrate both these procedures. In these exercises we generate samples of a Simple Linear Regression where the error term could follow a Normal distribution or the heavy tailed t-distribution.}, year = {2021} }
TY - JOUR T1 - Exploring the Effects of Assumption Violations on Simple Linear Regression and Correlation Using Excel AU - William Henry Laverty AU - Ivan William Kelly Y1 - 2021/08/30 PY - 2021 N1 - https://doi.org/10.11648/j.ajtas.20211004.13 DO - 10.11648/j.ajtas.20211004.13 T2 - American Journal of Theoretical and Applied Statistics JF - American Journal of Theoretical and Applied Statistics JO - American Journal of Theoretical and Applied Statistics SP - 194 EP - 201 PB - Science Publishing Group SN - 2326-9006 UR - https://doi.org/10.11648/j.ajtas.20211004.13 AB - Regression analysis plays a central role in statistics and our understanding of the world. Linear regression models are the simplest type of regression and an understanding of them is an essential basis for more advanced models. In this article we will show how to use Excel to generate data from a simple linear regression model and illustrate how the statistical methods behave both when the fundamental assumptions of the model hold and when the fundamental assumptions are violated. The advantage of the using the program Excel is that when you press the recalculate button, under the Formulas menu, the data that is generated at random will be regenerated, statistical calculations will be recalculated and relevant graphs will be redrawn. Least squares is the statistical technique typically used when assumptions are satisfied. A statistical technique used when the normality assumption is violated is the non-parametric technique introduced by Kendall and Theil. The latter is useful when data are skewed or heteroskedastic, and is as powerful as least squares regression for Normally distributed data. Exercises are provided to illustrate both these procedures. In these exercises we generate samples of a Simple Linear Regression where the error term could follow a Normal distribution or the heavy tailed t-distribution. VL - 10 IS - 4 ER -