Description
The Book
Regression is a powerful technique in data analysis for modeling relationships between variables, making it crucial for prediction, decision-making, and pattern recognition. This book offers an accessible introduction to regression modeling, tailored for postgraduate students in fields such as data science, engineering, statistics, mathematics, business, and the sciences. It simplifies complex mathematical concepts and emphasizes real-world applications, complemented by coding examples to reinforce key concepts.
The book covers classical regression methods including simple and multiple linear regression, polynomial regression, and logistic regression. It also addresses regression diagnostics, such as model evaluation, outlier detection, and assessment of model assumptions. By integrating classical methods with modern machine learning techniques, it offers a unique perspective. Machine learning techniques like support vector regression, decision trees, and artificial neural networks (ANN) for regression tasks are introduced, demonstrating their complementarity to classical methods through practical examples. The book also explores advanced methods such as Ridge, Lasso, Elastic Net, Principal Component Regression, and Generalized Linear Models (GLMs). These techniques are demonstrated using Python libraries like Statsmodels and Scikit-learn, enabling students to engage in practical learning.
The Author
Dr KC James is currently working as a Professor in the Department of Statistics, Cochin University of Science and Technology. With over 34 years of experience, he has taught a wide range of courses spanning Engineering, Management, Statistics, and Data Science. In addition to his extensive teaching and research career, he has authored five books to date.
Contents
- Learning Objectives1
- Introduction1
- Regression Applications in Industry and Business2
- Regression analysis in engineering domains6
- A brief history of regression7
- Various types of regression8
- The requirement of system and domain knowledge10
- Steps in Regression Analysis11
- Discussion: Illusions in regression analysis15
- Summary16
- References16
- Learning Objectives18
- Scatterplots: Unveiling Variable Relationships18
- Scatterplot Matrices: Condensing Insights20
- Other useful plots20
- Basic terms21
- Key Steps in Linear Regression Modelling27
- Summary28
- References29
- Learning Objectives30
- Introduction31
- Simple Linear Regression model31
- Estimating the Population Slope and Intercept33
- In matrix form36
- Estimating The Variance of the Random Error Term41
- Assumptions and Inferences in Regression Analysis42
- Confidence Intervals43
- Inference about slope43
- Inference about intercept46
- Standard Regression Output47
- Prediction Intervals50
- Analysis of Variance52
- Coefficient of determination54
- Example: ANOVA Table56
- Summary57
- Exercises58
- References62
- Learning Objectives63
- Multiple regression model63
- Matrix Approach to Regression69
- Sampling distribution76
- Confidence Intervals77
- Hypothesis Testing79
- Analysis of variance82
- Testing reduced model84
- Discussion93
- Summary94
- Exercises94
- References103
- Learning Objectives104
- Introduction104
- Choosing the appropriate degree106
- Regression and designed experiments107
- Model adequacy108
- Limitations and Considerations108
- Example109
- Piecewise Polynomial Fitting (Splines)112
- Cubic spline interpolation113
- Summary116
- Exercise116
- References119
- Learning Objectives120
- Introduction121
- Checking key assumptions of regression models122
- Assessing Model Fit124
- Model Validity126
- Regression Diagnostics127
- Diagnostics for Predictor Variable127
- Properties of Residuals127
- Diagnostic plots of residuals138
- Fun142
- Diagnostics For Leverage and Influence144
- Independence153
- Homoscedasticity (Constant Variance)155
- Multicollinearity157
- How To address multicollinearity159
- Transformations163
- Model Selection172
- Summary178
- Exercises179
- Cheatsheet: Common issues encountered in Ordinary Least Squares (OLS)
- regression analysis 182
- References183
- Learning objectives184
- Introduction184
- One-hot encoding185
- Example186
- Interaction effects193
- Summary197
- Exercise197
- References199
- Learning Objectives200
- Introduction200
- Logistic regression and Business Analytics201
- Logistic Regression Model202
- Example : Plotting least square and sigmoidal curves203
- The likelihood function206
- Inference207
- Odds ratio209
- Statistical Tests and Metrics in Logistic Regression219
- Polychotomous predictor220
- Ordinal Logistic Regression222
- Miscellaneous224
- Evaluating Classification Models225
- Logistic regression diagnostics225
- Stepwise logistic regression226
- Discussion: Warning signs on financial status and predictive ability226
- Discussion : Multi Logistic Regression (MLR) to predict the performance of shares in the Indian stock market227
- Summary228
- Exercises229
- References236
- Learning Objectives237
- Introduction237
- Scaling and Normalisation239
- Data snooping, splitting and cross validation241
- Ordinary Least Squares vs machine learning techniques243
- Loss functions245
- The bias-variance dilemma246
- Basis functions247
- The curse of dimensionality248
- A comparison of classical regression and machine learning regression249
- Regression-oriented Machine learning methods249
- Learning Process250
- Performance metrics258
- Notes on Gradient descent method262
- Summary267
- Exercise268
- References271
- Learning Objectives272
- Introduction273
- Construction of Decision Trees274
- Key Considerations274
- Regression Trees277
- Regression Trees: Partitioning, Modeling, and Interpretation278
- Pruning287
- Ensemble Learning Techniques: Bagging, Random Forests, Boosting, and Bayesian Additive Regression Trees289
- Summary295
- Exercise296
- References298
- Learning Objectives299
- Introduction299
- Key terms300
- Extension to Regression301
- Mathematical formulation of Support Vector Regression301
- Solving the Quadratic Programming Problem305
- Kernel trick307
- Performing Support Vector Regression (SVR) in Python309
- Example313
- Summary314
- Exercise315
- References319
- Learning Objectives321
- Introduction321
- Neural networks act as a type of nonparametric regression model322
- Practical considerations for regression and neural networks324
- Neural network modelling steps325
- Backpropagation and Training326
- Training considerations327
- Backpropagation algorithm328
- Bias333
- Effect of epochs: Simple example334
- Using Python for neural networks336
- ANN: Advantages and disadvantages for regression tasks341
- Discussion: A comparison some regression methods and ANN with RBF and RELU activation functions343
- Summary351
- Exercises352
- References358
- Learning Objectives359
- Introduction360
- Ridge regression360
- Least Absolute Shrinkage and Selection Operator369
- Least Angle Regression (LARS)375
- Elastic Net regression376
- Principal component regression377
- Weighted Least Squares (WLS) regression381
- Robust regression388
- Locally Weighted Regression (LWR)389
- Bayesian regression392
- Poisson Regression in Generalized Linear Models (GLMs)394
- Generalized Linear Models (GLMs)397
- Summary403
- Exercises404
- References407
- Learning Objectives409
- Introduction to Python410
- Pandas417
- Seaborn and Matplotlib418
- DataFrame Basics419
- DataFrame manipulations:421
- Storing and retrieving data422
- Plotting424
- Example 1: Scatter Plot with Matplotlib424
- Example 2: Line Plot with Matplotlib425
- Example 3: Histogram with Seaborn426
- Example 4: Bar Chart with Seaborn427
- Control Plots with Matplotlib428
- Control Plots with Seaborn429
- Scikit-learn432
- Introduction to miscellaneous python concepts432
- Matrix operations in NumPy435
- Example: USA housing Data439
- Summary452
- Exercise: mini project.452
- References487
Reviews
There are no reviews yet.