Название: Rank-Based Methods for Shrinkage and Selection
Автор: A. K. Md. Ehsanes Saleh
Издательство: John Wiley & Sons Limited
Жанр: Математика
isbn: 9781119625421
isbn:
2.3 Swiss fertility estimates and standard errors for least squares (LS) and rank (R).
2.4 Swiss data subset ordering using | t.value |
2.5 Swiss data models with adjusted R2 values.
2.6 Estimates with outliers from diabetes data before standardization.
2.7 Estimates. MSE and MAE for the diabetes data
2.8 Enet estimates, training MSE and test MSE as a function of α for the diabetes data
3.1 The ADRE values of ridge for different values of Δ2
3.2 Maximum and minimum guaranteed ADRE of the preliminary test R-estimator for different values of α.
3.3 The ADRE values of the Saleh-type R-estimator for λmax*=2π and different Δ2
3.4 The ADRE values of the positive-rule Saleh-type R-estimator for λmax*=2π and different Δ2
3.5 The ADRE of all R-estimators for different Δ2
4.1 Table of (hypothetical) corn crop yield from six different fertilizers.
4.2 Table of p-values from pairwise comparisons of fertilizers.
8.1 The VIF values of the diabetes data set.
8.2 Estimations for the diabetes data*. (The numbers in parentheses are the corresponding standard errors).
11.1 LLR algorithm.
11.2 RLR algorithm.
11.3 Car data set.
11.4 Ridge accuracy vs. λ2 with n = 337 (six outliers).
11.5 RLR-LASSO estimates vs. λ1 with number of correct predictions.
11.6 Sample of Titanic training data.
11.7 Specifications for the Titanic data set.
11.8 Number of actual data entries in each column.
11.9 Cross-tabulation of survivors based on sex.
11.10 Cross-tabulation using Embarked for the Titanic data set.
11.11 Sample of Titanic numerical training data.
11.12 Number of correct predictions for Titanic training and test sets.
11.13 Train/test set accuracy for LLR-ridge. Optimal value at (*).
11.14 Train/test set accuracy for RLR-ridge. Optimal value at (*).
11.15 Train/Test set accuracy for LLR-LASSO. Optimal value at (*).
11.16 Train/test set accuracy for RLR-LASSO. Optimal value at (*).
12.1 RNN-ridge algorithm.
12.2 Interpretation of the confusion matrix.
12.3 Confusion matrix for Titanic data sets using RLR (see Chapter 11).
12.4 Number of correct predictions (percentages) and AUROC of LNN-ridge.
12.5 Input (xij), output (yi) and predicted values p~(xi) for the image classification problem.
12.6 Confusion matrices for RNNs and LNNs (test size = 35).
12.7 Accuracy metrics for RNNs vs. LNNs (test size = 35).
12.8 Train/test set accuracy for LNNs. F1 score is associated with the test set.
12.9 Train/test set accuracy for RNNs. F1 score is associated with the test set.
12.10 Confusion matrices for RNNs and LNNs (test size = 700).
12.11 Accuracy metrics for RNNs vs. LNNs (test size = 700).
12.12 MNIST training with 0 outliers.
12.13 MNIST training with 90 outliers.
12.14 MNIST training with 180 outliers.
12.15 MNIST training with 270 outliers.
12.16 Table of responses and probability outputs.
Foreword
It is my pleasure to write this foreword for Professor Saleh’s latest book, “Rank-based Methods for Shrinkage and Selection with Application to Machine Learning”. I have known Professor Saleh for many decades as a leader in Canadian statistics and looked forward to meeting him regularly at the Annual Meeting of the Statistical Society of Canada.
We are well into the golden age of probability and statistics with the emergence of data science and machine learning. Many decades ago, we could attract many bright students to this field of endeavor but today the interest is overwhelming. The connection between theoretical statistics and applied statistics is an important part of machine learning and data science. In order to engage fully in data science, one needs a solid understanding of both the theoretical and the practical aspects of probability and statistics.
The book is unique in presenting a comprehensive approach to inference in regression models based on ranks. It starts with the basics, which enables rapid understanding of the innovative ideas in the rest of the book. In addition to the more familiar aspects of rank-based methods such as comparisons among groups, linear regression, and time series, the authors show how many machine-learning tools can be made more robust via rank-based methods. Modern approaches to model selection, logistic regression, neural networks, elastic net, and penalized regression are studied through this lens. The work is presented clearly and concisely, and highlights many areas for further investigation.
Professor Saleh’s wealth of experience with ridge regression, Stein’s method, and preliminary test estimation informs the arc of the book, and he and his co-authors have built on his expertise to expand the application of rank-based approaches to modern big-data and high-dimensional settings. Careful attention is paid throughout to both theoretical rigor and engaging applications. The applications are made accessible through detailed discussion of computational methods and their implementation in the R and Python computing environments. Its broad coverage of topics, and careful attention to both theory and methods, ensures that this book will be an invaluable resource for students and researchers in statistics.
The authors have identified many areas of useful future research that could be pursued by graduate students and practitioners alike. In this regard, this book is an important contribution in the СКАЧАТЬ