Predictive Modeling

Author

Joaquin Ramirez

Below is an overview of the key concepts I learned, focusing on predictive modeling techniques and analytics tools with a practical approach using R programming. The topics I explored include:

Data Preprocessing: Techniques for cleaning and transforming raw data to make it suitable for analysis, including handling missing values and normalizing data.
Overfitting and Model Tuning: Understanding the risk of overfitting and employing strategies to optimize model performance through hyperparameter tuning, cross-validation, and regularization methods.
Supervised Methods:
- Linear Regression: Analyzing relationships between dependent and independent variables using linear models.
- Nonlinear Regression: Addressing more complex relationships between variables with nonlinear models.
- Classification: Applying algorithms to classify data into categories, using techniques such as logistic regression, decision trees, and ensemble methods.
Unsupervised Methods:
- Clustering: Grouping data into clusters to identify patterns and similarities, utilizing methods such as k-means and hierarchical clustering.
- Principal Component Analysis (PCA): Reducing the dimensionality of data to simplify models while preserving essential variability.
- Outlier Detection: Identifying and managing outliers to improve model accuracy and robustness.
Advanced Techniques:
- Support Vector Machines (SVM): Leveraging SVM for both classification and regression tasks to find optimal decision boundaries and manage high-dimensional data.
- Tree-Based Models: Implementing models such as decision trees, Random Forests, and Gradient Boosting to handle complex data structures and improve predictive accuracy.

Through this comprehensive approach, I gained the ability to choose, implement, and interpret predictive models for a variety of applications. I also developed the skills to create detailed and insightful data analysis reports, effectively communicating findings and supporting decision-making processes.

Student Dataset Case Study

R offers a wide range of functions for data preprocessing, calculation, manipulation, and graphical display, and can be easily extended with new functions through downloadable packages from the Comprehensive R Archive Network (CRAN).

As an example, the studentdata dataset from the LearnBayes package is used, containing 657 observations across 11 variables:

Student student number Height height in inches Gender gender Shoes number of pairs of shoes owned Number number chosen between 1 and 10 Dvds name of movie dvds owned ToSleep time the person went to sleep the previous night (hours past midnight) WakeUp time the person woke up the next morning Haircut cost of last haircut including tip Job number of hours working on a job per week Drink usual drink at suppertime among milk, water, and pop

Install LearnBayes package in R/Rstudio and then access studentdata

#Install the LearnBayes package
#Keep in mind that R is case-sensitive

#install.packages('LearnBayes')

#You just need to install once and then you can directly use
#so long as you access the LearnBayes package
library(LearnBayes)

#Access studentdata from the LearnBayes package
data(studentdata)
attach(studentdata)

#show part of data
head(studentdata)

  Student Height Gender Shoes Number Dvds ToSleep WakeUp Haircut  Job Drink
1       1     67 female    10      5   10    -2.5    5.5      60 30.0 water
2       2     64 female    20      7    5     1.5    8.0       0 20.0   pop
3       3     61 female    12      2    6    -1.5    7.5      48  0.0  milk
4       4     61 female     3      6   40     2.0    8.5      10  0.0 water
5       5     70   male     4      5    6     0.0    9.0      15 17.5   pop
6       6     63 female    NA      3    5     1.0    8.5      25  0.0 water

After accessing the studentdata, we can now use R to answer the following questions:

The variable Dvds in the student dataset contains the number of movie DVDs owned by students in the class.

Construct a histogram of this variable using the hist command in R.

#?hist
# Construct a histogram of the Dvds variable
hist(Dvds,main = "DVDs Owned", xlab = "Number of DVDs", col = "red")

Summarize this variable using the summary command in R.

summary(Dvds)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   0.00   10.00   20.00   30.93   30.00 1000.00      16

Use the table command in R to construct a frequency table of the individual values of Dvds that were observed. If one constructs a barplot of these tabled values using the command barplot(table(Dvds),col=‘red’) one will see that particular response values are very popular. Is there any explanation for these popular values for the number of DVDs owned?

table = table(Dvds)

print(table)

Dvds
   0    1    2  2.5    3    4    5    6    7    8    9   10   11   12   13   14 
  26   10   13    1   18    9   27   14   12   12    7   78    3   20    7    4 
  15   16   17 17.5   18   20   21   22 22.5   23   24   25 27.5   28   29   30 
  46    1    3    1    4   83    3    3    1    3    2   31    3    1    1   45 
  31   33   35   36   37   40   41   42   45   46   48   50   52   53   55   60 
   1    1   12    4    1   26    1    1    5    1    2   26    1    2    1    7 
  62   65   67   70   73   75   80   83   85   90   97  100  120  122  130  137 
   1    2    1    4    1    3    4    1    1    1    1   10    2    1    2    1 
 150  152  157  175  200  250  500  900 1000 
   6    1    1    1    8    1    1    1    1

barplot(table,col='red', main = "DVDs Owned", xlab = "Number of DVDs")

Based on the limited information provided, we can assume there are many reasons for the number of DVDs owned. Some of these reasons include, but are not limited to: sales of DVDs, the release of new or classic DVDs, are students collecting DVDs and DVDs received as gifts. In order to dive deeper into the analysis, it would be crucial to know the name of the movies. This information could provide important details about the reasons why certain DVDs are appearing more often.

The variable Height contains the height (in inches) of each student in the class.

Construct parallel boxplots of the heights using the Gender variable. Hint: boxplot(Height~Gender)

boxplot(Height~Gender, main = "Height by Gender", ylab = "Height (inches)")

If one assigns the boxplot output to a variable output=boxplot(Height~Gender) then output is a list that contains statistics used in constructing the boxplots. Print output to see the statistics that are stored.

output=boxplot(Height~Gender, main = "Height by Gender", ylab = "Height (inches)")

print(output)

$stats
      [,1] [,2]
[1,] 57.75   65
[2,] 63.00   69
[3,] 64.50   71
[4,] 67.00   72
[5,] 73.00   76

$n
[1] 428 219

$conf
         [,1]    [,2]
[1,] 64.19451 70.6797
[2,] 64.80549 71.3203

$out
 [1] 56 76 55 56 76 54 54 84 78 77 56 63 77 79 62 62 61 79 59 61 78 62

$group
 [1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2

$names
[1] "female" "male"

On average, how much taller are male students than female students? 3

average = tapply(Height, Gender, mean, na.rm = TRUE)
print(average)

  female     male 
64.75701 70.50767

avg_female_height = 64.75701 
avg_male_height = 70.50767 

c =  avg_male_height - avg_female_height
print(c)

[1] 5.75066

Male: 70.50767 Female: 64.75701

On average males students are 5.75066 inches taller than female students.

The variables ToSleep and WakeUp contain, respectively, the time to bed and wake-up time for each student the previous evening. (The data are recorded as hours past midnight, so a value of −2 indicates 10 p.m.)

Construct a scatterplot of ToSleep and WakeUp.

plot(ToSleep, WakeUp, main = "Scatterplot: ToSleep and WakeUp", xlab = "Sleep-Time", ylab = "Wake-up Time")

Find a least-squares fit to these data using the lm command and then place the least-squares fit on the scatterplot using the abline command.

plot(ToSleep, WakeUp, main = "Scatterplot: ToSleep and WakeUp", xlab = "Sleep-Time", ylab = "Wake-up Time")
fit = lm(WakeUp~ToSleep)
summary(fit)


Call:
lm(formula = WakeUp ~ ToSleep)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4010 -0.9628 -0.0998  0.8249  4.6125 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7.96276    0.06180  128.85   <2e-16 ***
ToSleep      0.42472    0.03595   11.81   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.282 on 651 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared:  0.1765,    Adjusted R-squared:  0.1753 
F-statistic: 139.5 on 1 and 651 DF,  p-value: < 2.2e-16

abline(fit, col='blue', lwd=2)

Analysis of Glass Identification Data: Exploratory Data Analysis and Model Development

#install.packages('mlbench')
#install.packages('ggplot2')
#install.packages('GGally')
#install.packages('corrplot')
#install.packages('gridExtra')
#install.packages('kernlab')
library(kernlab) 
library(mlbench)

Warning: package 'mlbench' was built under R version 4.3.3

library(ggplot2)

Warning: package 'ggplot2' was built under R version 4.3.3


Attaching package: 'ggplot2'

The following object is masked from 'package:kernlab':

    alpha

library(GGally)

Warning: package 'GGally' was built under R version 4.3.3

Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

library(corrplot)

Warning: package 'corrplot' was built under R version 4.3.3

corrplot 0.92 loaded

library(gridExtra)

Warning: package 'gridExtra' was built under R version 4.3.3

library(AppliedPredictiveModeling)

Warning: package 'AppliedPredictiveModeling' was built under R version 4.3.3

library(caret)

Warning: package 'caret' was built under R version 4.3.3

Loading required package: lattice

The UC Irvine Machine Learning Repository contains a data set related to glass identification. The data consists of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe.

The data can be accessed via

data(Glass)
str(Glass)

'data.frame':   214 obs. of  10 variables:
 $ RI  : num  1.52 1.52 1.52 1.52 1.52 ...
 $ Na  : num  13.6 13.9 13.5 13.2 13.3 ...
 $ Mg  : num  4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
 $ Al  : num  1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
 $ Si  : num  71.8 72.7 73 72.6 73.1 ...
 $ K   : num  0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
 $ Ca  : num  8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
 $ Ba  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Fe  : num  0 0 0 0 0 0.26 0 0 0 0.11 ...
 $ Type: Factor w/ 6 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...

a) Utilize suitable visualizations (employ any types of data visualization you deem appropriate) to explore the predictor variables, aiming to understand their distributions and relationships among them.

Boxplots:

data(Glass)
Glass$Type <- as.factor(Glass$Type)

boxplots <- lapply(names(Glass)[1:9], function(var) {ggplot(Glass, aes_string(x = "Type", y = var)) + 
    geom_boxplot() + 
    ggtitle(paste("Boxplot of", var)) + 
    theme_minimal() +
    theme_dark()})

Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.

boxplots_combined <- do.call(grid.arrange, c(boxplots, ncol = 3))

Histograms:

histograms <- lapply(names(Glass)[1:9], function(var) {ggplot(Glass, aes_string(x = var)) + 
    geom_histogram(bins = 30) + 
    ggtitle(paste("Histogram of", var)) + 
    theme_minimal() +
    theme_classic()})


histogram_combined <- do.call(grid.arrange, c(histograms, ncol = 3))

I utilized both Histograms and Boxplots, you can definitely see a lot of outliers in the plots displayed. Na and Al, seem to be normally distributed but all other seem to be either skewed to the left or to the right.

b) Do there appear to be any outliers in the data? Are any predictors skewed? Show all the work!

Function to create boxplots for each predictor:

create_boxplot <- function(data, var) {
  ggplot(data, aes_string(y = var)) + 
    geom_boxplot() + 
    ggtitle(paste("Boxplot of", var)) + 
    theme_minimal() +
    theme_classic()}


boxplots <- lapply(names(Glass)[1:9], function(var) create_boxplot(Glass, var))
do.call(grid.arrange, c(boxplots, ncol = 3))

Predictors Outliers from Boxplots:

RI (Refractive Index): Significant outliers exist
Na (Sodium): Fewer outliers exist, with some extreme values
Mg (Magnesium): No outliers
Al (Aluminum): Significant Outliers exist, with extreme values
Si (Silicon): Significant outliers exist, with extreme values
K (Potassium): Outliers exist, fewer extreme values
Ca (Calcium): Significant outliers are existing.
Ba (Barium): Significant outliers observed, with many extreme value
Fe (Iron): Outliers observed with few extreme values
Function to create histograms for each predictor

create_histogram <- function(data, var) {
  ggplot(data, aes_string(x = var)) + 
    geom_histogram(bins = 30) + 
    ggtitle(paste("Histogram of", var)) + 
    theme_minimal() +
    theme_classic()}


histograms <- lapply(names(Glass)[1:9], function(var) create_histogram(Glass, var))
histo <- do.call(grid.arrange, c(histograms, ncol = 3))

Predictors Skewness from Histogram:

RI (Refractive Index): Right - Postive skewed
Na (Sodium): Right - Postive skewed
Mg (Magnesium):Left - Negative skewed
Al (Aluminum): Symmetrical, balanced
Si (Silicon): Left - Negatively skewed
K (Potassium): Left - Negative skewed
Ca (Calcium): Symmetrical, balanced
Ba (Barium): Non symmetrical
Fe (Iron): Non symmetrical

Conclusion (Predictors):

We can determine from the observations (Boxplot and Histograms), all but one showed outliers with extreme values. Most of these predictors where either skewed to the left (negative) or to the right (positive). All this to say that we need to consider the outliers and the distribution charactertics going forward.

Are there any relevant transformations of one or more predictors that might improve the classification model? Show all the work!

Glass_transformed <- Glass
Glass_transformed[] <- lapply(Glass_transformed, function(x) if (is.numeric(x)) log(x + 1) else x)
names(Glass_transformed)[names(Glass_transformed) != "Type"] <- paste0("log_", names(Glass_transformed)[names(Glass_transformed) != "Type"])

Glass_sqrt_transformed <- Glass
Glass_sqrt_transformed[] <- lapply(Glass_sqrt_transformed, function(x) if (is.numeric(x)) sqrt(x) else x)
names(Glass_sqrt_transformed)[names(Glass_sqrt_transformed) != "Type"] <- paste0("sqrt_", names(Glass_sqrt_transformed)[names(Glass_sqrt_transformed) != "Type"])

Glass_all_transformed <- cbind(Glass_transformed, Glass_sqrt_transformed[,-which(names(Glass_sqrt_transformed) == "Type")])

log_histograms <- lapply(names(Glass_transformed)[1:9], function(var) {
  ggplot(Glass_transformed, aes_string(x = var)) + 
    geom_histogram(bins = 30) + 
    ggtitle(paste("Histogram of", var)) + 
    theme_minimal() +
    theme_classic()})

sqrt_histograms <- lapply(names(Glass_sqrt_transformed)[1:9], function(var) {
  ggplot(Glass_sqrt_transformed, aes_string(x = var)) + 
    geom_histogram(bins = 30) + 
    ggtitle(paste("Histogram of", var)) + 
    theme_minimal() +
    theme_classic()})

histogram_combined <- do.call(grid.arrange, c(histograms, ncol = 3))

log_histogram_combined <- do.call(grid.arrange, c(log_histograms, ncol = 3))

sqrt_histogram_combined <- do.call(grid.arrange, c(sqrt_histograms, ncol = 3))

set.seed(123)
trainIndex <- createDataPartition(Glass$Type, p = 0.8, list = FALSE)
GlassTrain <- Glass_all_transformed[trainIndex, ]
GlassTest <- Glass_all_transformed[-trainIndex, ]

model_transformed <- train(Type ~ ., data = GlassTrain, method = 'rpart')


pred_transformed <- predict(model_transformed, newdata = GlassTest)


cm_transformed <- confusionMatrix(pred_transformed, GlassTest$Type)

cat("Transformed Model Performance:\n")

Transformed Model Performance:

print(cm_transformed)

Confusion Matrix and Statistics

          Reference
Prediction  1  2  3  5  6  7
         1 11  8  2  0  0  0
         2  3  7  1  1  1  0
         3  0  0  0  0  0  0
         5  0  0  0  0  0  0
         6  0  0  0  0  0  0
         7  0  0  0  1  0  5

Overall Statistics
                                          
               Accuracy : 0.575           
                 95% CI : (0.4089, 0.7296)
    No Information Rate : 0.375           
    P-Value [Acc > NIR] : 0.008001        
                                          
                  Kappa : 0.371           
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: 1 Class: 2 Class: 3 Class: 5 Class: 6 Class: 7
Sensitivity            0.7857   0.4667    0.000     0.00    0.000   1.0000
Specificity            0.6154   0.7600    1.000     1.00    1.000   0.9714
Pos Pred Value         0.5238   0.5385      NaN      NaN      NaN   0.8333
Neg Pred Value         0.8421   0.7037    0.925     0.95    0.975   1.0000
Prevalence             0.3500   0.3750    0.075     0.05    0.025   0.1250
Detection Rate         0.2750   0.1750    0.000     0.00    0.000   0.1250
Detection Prevalence   0.5250   0.3250    0.000     0.00    0.000   0.1500
Balanced Accuracy      0.7005   0.6133    0.500     0.50    0.500   0.9857

After I applied the transformation and achieved accuracy of 57.5%, and the Kappa is 0.371. We can say there is a level of agreement between the variables. Now looking at the classes that were observed we can see that class 1 and 2 are strong in their classification while the rest need improvement. Maybe if we refine our focus on the variblae we observe we can improve out accurancy.

Fit SVM model (You may refer to Chapter 4 material for details) using the following R codes: (This code will be discussed in detail in the following chapters)

set.seed(231) 
sigDist <- sigest(Type~ ., data = Glass, frac = 1)
sigDist

       90%        50%        10% 
0.03407935 0.11297847 0.62767315

svmTuneGrid <- data.frame(sigma = as.vector(sigDist)[1], C = 2^(-2:10)) 
svmTuneGrid

        sigma       C
1  0.03407935    0.25
2  0.03407935    0.50
3  0.03407935    1.00
4  0.03407935    2.00
5  0.03407935    4.00
6  0.03407935    8.00
7  0.03407935   16.00
8  0.03407935   32.00
9  0.03407935   64.00
10 0.03407935  128.00
11 0.03407935  256.00
12 0.03407935  512.00
13 0.03407935 1024.00

set.seed(231)

sigDist <- sigest(Type ~ ., data = Glass, frac = 1)

svmTuneGrid <- data.frame(sigma = as.vector(sigDist)[1], C = 2^(-2:10))

svmModel <- ksvm(Type ~ ., data = Glass, type = "C-svc", kernel = "rbfdot", kpar = list(sigma = as.vector(sigDist)[1]), C = 2^(-2:10))

print(svmModel)

Support Vector Machine object of class "ksvm" 

SV type: C-svc  (classification) 
 parameter : cost C = 0.25 
  parameter : cost C = 0.5 
  parameter : cost C = 1 
  parameter : cost C = 2 
  parameter : cost C = 4 
  parameter : cost C = 8 
  parameter : cost C = 16 
  parameter : cost C = 32 
  parameter : cost C = 64 
  parameter : cost C = 128 
  parameter : cost C = 256 
  parameter : cost C = 512 
  parameter : cost C = 1024 

Gaussian Radial Basis kernel function. 
 Hyperparameter : sigma =  0.0340793487610772 

Number of Support Vectors : 205 

Objective Function Value : -30.8971 -8.4786 -4.7642 -3.9839 -4.8778 -8.4545 -6.0399 -4.2908 -6.4624 -4.3643 -3.8975 -4.5021 -3.8506 -4.9211 -4.0869 
Training error : 0.439252

set.seed(1056)
# Fit SVM model using 10-fold cross-validation
svmFit <- train(Type ~ .,
                data = Glass, method = "svmRadial",
                preProcess = c("center", "scale"),
                tuneGrid = svmTuneGrid,
                trControl = trainControl(method = "repeatedcv", repeats = 5))


plot(svmFit, scales = list(x = list(log = 2)))

Predicting Meat Moisture Content Using Infrared Spectroscopy: Model Comparison and Evaluation

Infrared (IR) spectroscopy technology is used to determine the chemical makeup of a substance. The theory of IR spectroscopy holds that unique molecular structures absorb IR frequencies differently. In practice a spectrometer fires a series of IR frequencies into a sample material, and the device measures the absorbance of the sample at each individual frequency. This series of measurements creates a spectrum profile which can then be used to determine the chemical makeup of the sample material.

A Tecator Infratec Food and Feed Analyzer instrument was used to analyze 215 samples of meat across 100 frequencies. A sample of these frequency profiles is displayed in Fig. 6.20. In addition to an IR profile, analytical chemistry determined the percent content of water, fat, and protein for each sample. If we can establish a predictive relationship between IR spectrum and fat content, then food scientists could predict a sample’s fat content with IR instead of using analytical chemistry. This would provide costs savings, since analytical chemistry is a more expensive, time-consuming process

a) Start R and use these commands to load the data:

library(caret)
data(tecator)

# use ?tecator to see more details
?tecator

starting httpd help server ... done

# Check the structure of the data
str(absorp)

 num [1:215, 1:100] 2.62 2.83 2.58 2.82 2.79 ...

str(endpoints)

 num [1:215, 1:3] 60.5 46 71 72.8 58.3 44 44 69.3 61.4 61.4 ...

The matrix absorp contains the 100 absorbance values for the 215 samples, while matrix endpoints contain the percent of moisture, fat, and protein in columns 1–3, respectively. To be more specific

# Assign the percent content to variables
moisture <- endpoints[,1]
fat <- endpoints[,2]
protein <- endpoints[,3]

summary(moisture)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  39.30   55.55   65.70   63.20   71.80   76.60

#print(moisture)

summary(fat)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.90    7.30   14.00   18.14   28.00   49.10

#print(fat)

summary(protein)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  11.00   15.35   18.70   17.68   20.10   21.80

#print(protein)

# Check for missing values
sum(is.na(absorp))

[1] 0

sum(is.na(moisture))

[1] 0

sum(is.na(fat))

[1] 0

sum(is.na(protein))

[1] 0

b) In this example the predictors are the measurements at the individual frequencies. Because the frequencies lie in a systematic order (850–1,050nm), the predictors have a high degree of correlation. Hence, the data lie in a smaller dimension than the total number of predictors (215). Use PCA to determine the effective dimension of these data. What is the effective dimension?

#  PCA on the absorp data
pca_model <- prcomp(absorp, center = TRUE, scale. = TRUE)

# Summary of PCA 
summary(pca_model)

Importance of components:
                          PC1    PC2     PC3     PC4     PC5     PC6     PC7
Standard deviation     9.9311 0.9847 0.52851 0.33827 0.08038 0.05123 0.02681
Proportion of Variance 0.9863 0.0097 0.00279 0.00114 0.00006 0.00003 0.00001
Cumulative Proportion  0.9863 0.9960 0.99875 0.99990 0.99996 0.99999 0.99999
                           PC8      PC9     PC10     PC11     PC12     PC13
Standard deviation     0.01961 0.008564 0.006739 0.004442 0.003361 0.001867
Proportion of Variance 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000
Cumulative Proportion  1.00000 1.000000 1.000000 1.000000 1.000000 1.000000
                           PC14      PC15      PC16      PC17      PC18
Standard deviation     0.001377 0.0009449 0.0008641 0.0007558 0.0006977
Proportion of Variance 0.000000 0.0000000 0.0000000 0.0000000 0.0000000
Cumulative Proportion  1.000000 1.0000000 1.0000000 1.0000000 1.0000000
                            PC19      PC20      PC21      PC22      PC23
Standard deviation     0.0005884 0.0004628 0.0003897 0.0003341 0.0003123
Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
Cumulative Proportion  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
                            PC24      PC25     PC26      PC27      PC28
Standard deviation     0.0002721 0.0002616 0.000211 0.0001954 0.0001857
Proportion of Variance 0.0000000 0.0000000 0.000000 0.0000000 0.0000000
Cumulative Proportion  1.0000000 1.0000000 1.000000 1.0000000 1.0000000
                            PC29      PC30      PC31      PC32      PC33
Standard deviation     0.0001729 0.0001656 0.0001539 0.0001473 0.0001392
Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
Cumulative Proportion  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
                            PC34      PC35      PC36     PC37     PC38
Standard deviation     0.0001339 0.0001269 0.0001082 0.000104 9.98e-05
Proportion of Variance 0.0000000 0.0000000 0.0000000 0.000000 0.00e+00
Cumulative Proportion  1.0000000 1.0000000 1.0000000 1.000000 1.00e+00
                            PC39      PC40      PC41      PC42     PC43
Standard deviation     9.081e-05 8.668e-05 8.026e-05 7.762e-05 7.36e-05
Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00e+00
Cumulative Proportion  1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.00e+00
                            PC44      PC45     PC46      PC47      PC48
Standard deviation     6.808e-05 6.541e-05 6.44e-05 5.897e-05 5.422e-05
Proportion of Variance 0.000e+00 0.000e+00 0.00e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.000e+00 1.00e+00 1.000e+00 1.000e+00
                            PC49      PC50      PC51      PC52      PC53
Standard deviation     5.027e-05 4.893e-05 4.608e-05 4.419e-05 4.037e-05
Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
                            PC54    PC55     PC56      PC57      PC58      PC59
Standard deviation     3.854e-05 3.8e-05 3.64e-05 3.497e-05 3.443e-05 3.264e-05
Proportion of Variance 0.000e+00 0.0e+00 0.00e+00 0.000e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.0e+00 1.00e+00 1.000e+00 1.000e+00 1.000e+00
                            PC60     PC61      PC62      PC63      PC64
Standard deviation     3.104e-05 3.04e-05 2.959e-05 2.844e-05 2.699e-05
Proportion of Variance 0.000e+00 0.00e+00 0.000e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.00e+00 1.000e+00 1.000e+00 1.000e+00
                            PC65      PC66      PC67      PC68      PC69
Standard deviation     2.586e-05 2.388e-05 2.364e-05 2.284e-05 2.173e-05
Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
                            PC70      PC71     PC72      PC73      PC74
Standard deviation     2.058e-05 1.997e-05 1.93e-05 1.854e-05 1.807e-05
Proportion of Variance 0.000e+00 0.000e+00 0.00e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.000e+00 1.00e+00 1.000e+00 1.000e+00
                            PC75      PC76      PC77      PC78      PC79
Standard deviation     1.728e-05 1.693e-05 1.612e-05 1.569e-05 1.516e-05
Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
                            PC80      PC81      PC82      PC83      PC84
Standard deviation     1.445e-05 1.408e-05 1.356e-05 1.275e-05 1.224e-05
Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
                            PC85     PC86      PC87      PC88      PC89
Standard deviation     1.178e-05 1.09e-05 1.045e-05 1.009e-05 9.396e-06
Proportion of Variance 0.000e+00 0.00e+00 0.000e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.00e+00 1.000e+00 1.000e+00 1.000e+00
                            PC90     PC91      PC92     PC93      PC94
Standard deviation     8.728e-06 8.27e-06 7.613e-06 6.83e-06 6.383e-06
Proportion of Variance 0.000e+00 0.00e+00 0.000e+00 0.00e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.00e+00 1.000e+00 1.00e+00 1.000e+00
                            PC95      PC96      PC97      PC98      PC99
Standard deviation     5.946e-06 5.478e-06 4.826e-06 4.521e-06 4.164e-06
Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
                           PC100
Standard deviation     4.122e-06
Proportion of Variance 0.000e+00
Cumulative Proportion  1.000e+00

# Scree plot to visualize the variance explained by each component
screeplot(pca_model, type = "lines", main = "PCA Model")

Based on both the PCA results above, first the pca summary output, and then screeplot visualization, tells me how much of the total variance is explained as PC are added. The following observations come from the output above:

PC1 explains 98.63% of the total variance.
PC2: 0.97% proportion variance, and 99.60% cumulative proportion.
PC3: 0.279% proportion variance, and 99.875% cumulative proportion.
PC4: 0.114% proportion variance, and 99.99% cumulative proportion.

Decision:

As shown in the screeplot and the summary, PC1 explains a very high percentage (98.63%), majority of the variability is captured here. The rest of the pc’s explain additional variance, but they’re unlikely to provide meaningful information.

c) Split the data into a training and a test set the response of the percentage of moisture, pre-process the data, and build at least three models described in this chapter (i.e., ordinary least squares, PCR, PLS, Ridge, and ENET). For those models with tuning parameters, what are the optimal values of the tuning parameter(s)?

set.seed(123) 
trainIndex <- createDataPartition(moisture, p = 0.7, list = FALSE)
trainData <- absorp[trainIndex, ]
testData <- absorp[-trainIndex, ]
trainMoisture <- moisture[trainIndex]
testMoisture <- moisture[-trainIndex]


trainData <- as.data.frame(trainData)
testData <- as.data.frame(testData)
colnames(trainData) <- paste0("V", 1:ncol(trainData))
colnames(testData) <- paste0("V", 1:ncol(testData))


preProcValues <- preProcess(trainData, method = c("center", "scale", "pca"))
trainTransformed <- predict(preProcValues, trainData)
testTransformed <- predict(preProcValues, testData)

set.seed(123)
ols_model <- train(trainTransformed, trainMoisture, method = "lm")
ols_model

Linear Regression 

152 samples
  2 predictor

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 152, 152, 152, 152, 152, 152, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  8.521083  0.3199381  6.646874

Tuning parameter 'intercept' was held constant at a value of TRUE

set.seed(123)
pcr_model <- train(trainTransformed, trainMoisture, method = "pcr", trControl = trainControl(method = "cv"))
pcr_model

Principal Component Analysis 

152 samples
  2 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 136, 136, 137, 138, 136, 138, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  8.929481  0.2947719  7.356878

Tuning parameter 'ncomp' was held constant at a value of 1

set.seed(123)
pls_model <- train(trainTransformed, trainMoisture, method = "pls", trControl = trainControl(method = "cv"))
pls_model

Partial Least Squares 

152 samples
  2 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 136, 136, 137, 138, 136, 138, ... 
Resampling results:

  RMSE      Rsquared   MAE    
  8.918747  0.2967856  7.34553

Tuning parameter 'ncomp' was held constant at a value of 1

set.seed(123)
ridge_model <- train(trainTransformed, trainMoisture, method = "ridge", trControl = trainControl(method = "cv"))
ridge_model

Ridge Regression 

152 samples
  2 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 136, 136, 137, 138, 136, 138, ... 
Resampling results across tuning parameters:

  lambda  RMSE      Rsquared   MAE     
  0e+00   8.543739  0.3903416  6.778162
  1e-04   8.543732  0.3903418  6.778158
  1e-01   8.536797  0.3904599  6.774300

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.1.

set.seed(123)
glmnet <- train(trainTransformed, trainMoisture, method = "glmnet", trControl = trainControl(method = "cv"))
glmnet

glmnet 

152 samples
  2 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 136, 136, 137, 138, 136, 138, ... 
Resampling results across tuning parameters:

  alpha  lambda      RMSE      Rsquared   MAE     
  0.10   0.01016872  8.541750  0.3903215  6.783427
  0.10   0.10168717  8.540506  0.3903110  6.787156
  0.10   1.01687174  8.533382  0.3900054  6.864132
  0.55   0.01016872  8.542255  0.3902511  6.783899
  0.55   0.10168717  8.541134  0.3901088  6.794327
  0.55   1.01687174  8.569440  0.3871371  6.953858
  1.00   0.01016872  8.542709  0.3902184  6.784242
  1.00   0.10168717  8.541940  0.3898954  6.801561
  1.00   1.01687174  8.628888  0.3822190  7.096575

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were alpha = 0.1 and lambda = 1.016872.

Summary:

OLS:

RMSE: 8.521083
Rsquared: 0.3199381
MAE: 6.646874
Tuning parameter held constant at a value of TRUE. (No tuning parameter)

PCR:

RMSE: 8.929481
Rsquared: 0.2947719
MAE: 7.356878
Tuning parameter held constant at a value of 1.

PLS:

RMSE: 8.918747
Rsquared: 0.2967856
MAE: 7.34553
Tuning parameter held constant at a value of 1.

Ridge Regression:

RMSE: 8.536797
Rsquared: 0.3904599
MAE: 6.774300
Tuning parameter the final value used for the model was lambda = 0.1.

Glmnet:

RMSE: 8.541750
Rsquared: 0.3903215
MAE: 6.783427
Tuning parameters alpha = 0.1 and lambda = 0.1016872

Based on the lowest RMSE, OLS and Ridge Regression are the better models. They also they have a high rquared explaining a higher variance proportion.

d) Which model has the best predictive ability? Is any model significantly better or worse than the others?

The models are ordered from best to worst in terms of RMSE (predictive performance), based on the results provided above:

1) OLS:

RMSE: 8.521083
Rsquared: 0.3199381
MAE: 6.646874
Tuning parameter held constant at a value of TRUE. (No tuning parameter)

2) Ridge Regression:

RMSE: 8.536797
Rsquared: 0.3904599
MAE: 6.774300
Tuning parameter the final value used for the model was lambda = 0.1.

3) Glmnet:

RMSE: 8.541750
Rsquared: 0.3903215
MAE: 6.783427
Tuning parameters alpha = 0.1 and lambda = 0.1016872

4) PLS:

RMSE: 8.918747
Rsquared: 0.2967856
MAE: 7.34553
Tuning parameter held constant at a value of 1.

5) PCR:

RMSE: 8.929481
Rsquared: 0.2947719
MAE: 7.356878
Tuning parameter held constant at a value of 1.

The models are ordered first by the lowest RMSE and then highest R-squared to determine their performance. Based on the criteria of lowest RMSE, the OLS model is the best, this tells me that the OLS model has the lowest prediction error. However, you can also get away with using the Ridge model because it has the second lowest RMSE, and highest rsquared, indicating minimal error and a large proportion of the variance is explained by this model.

e) Explain which model you would use for predicting the percentage of moisture of a sample.

The model I would use to predict the percentage of moisture in a sample would be the one with the lowest RMSE because it has the lowest predictive error and a model with a high rsquared which explains variance proportion. In the outputs above, the model that best fits this is the Ridge model.

Comparative Performance of Machine Learning Models on Friedman’s Benchmark Data: Analyzing kNN, MARS, Neural Networks, and SVM

7.2. Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:

y = 10 sin(πx1x2) + 20(x3 − 0.5)2 + 10x4 + 5x5 + N(0, σ2)

where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

Note: For this exercise, you need to consider at least three of the following models: kNN, MARS, Neural Network, and Support vector machines with a specified kernel.

#install.packages("caret")
# Loading libraries
library(mlbench)
library(caret)
library(earth)

Warning: package 'earth' was built under R version 4.3.3

Loading required package: Formula

Loading required package: plotmo

Warning: package 'plotmo' was built under R version 4.3.3

Loading required package: plotrix

Warning: package 'plotrix' was built under R version 4.3.2

library(e1071)
library(nnet)

set.seed(200)

# Generate training data
trainingData <- mlbench.friedman1(200, sd = 1)
trainingData$x <- data.frame(trainingData$x) # We convert the 'x' data from a matrix to a data frame. One reason is that this will give the columns names.


featurePlot(trainingData$x, trainingData$y) # Visualize the data using featurePlot

# Generate Test Data
testData <- mlbench.friedman1(5000, sd = 1) # This creates a list with a vector 'y' and a matrix of predictors 'x'. Also simulate a large test set to estimate the true error rate with good precision
testData$x <- data.frame(testData$x)

Scatterplot Observations:

X1-X5: show a positive trend.
X6-X10: show no specific trend, no correlation

Tuning several models: kNN, MARS, Neural Network, and SVM.

# Train k-Nearest Neighbors (kNN) model
knnModel <- train(x = trainingData$x,
                  y = trainingData$y,
                  method = "knn",
                  preProc = c("center", "scale"),
                  tuneLength = 10)
print(knnModel)

k-Nearest Neighbors 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
Resampling results across tuning parameters:

  k   RMSE      Rsquared   MAE     
   5  3.466085  0.5121775  2.816838
   7  3.349428  0.5452823  2.727410
   9  3.264276  0.5785990  2.660026
  11  3.214216  0.6024244  2.603767
  13  3.196510  0.6176570  2.591935
  15  3.184173  0.6305506  2.577482
  17  3.183130  0.6425367  2.567787
  19  3.198752  0.6483184  2.592683
  21  3.188993  0.6611428  2.588787
  23  3.200458  0.6638353  2.604529

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 17.

# Predict and evaluate kNN model
knnPred <- predict(knnModel, newdata = testData$x)
knnResults <- postResample(pred = knnPred, obs = testData$y)  # The function 'postResample' can be used to get the test set perforamnce values
cat("\n\n")

print(knnResults)

     RMSE  Rsquared       MAE 
3.2040595 0.6819919 2.5683461

# Multivariate Adaptive Regression Splines (MARS)
marsModel <- train(x = trainingData$x,
                   y = trainingData$y,
                   method = "earth",
                   preProc = c("center", "scale"),
                   tuneLength = 10)
print(marsModel)

Multivariate Adaptive Regression Spline 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
Resampling results across tuning parameters:

  nprune  RMSE      Rsquared   MAE     
   2      4.383438  0.2405683  3.597961
   3      3.645469  0.4745962  2.930453
   4      2.727602  0.7035031  2.184240
   6      2.331605  0.7835496  1.833420
   7      1.976830  0.8421599  1.562591
   9      1.804342  0.8683110  1.410395
  10      1.787676  0.8711960  1.386944
  12      1.821005  0.8670619  1.419893
  13      1.858688  0.8617344  1.445459
  15      1.871033  0.8607099  1.457618

Tuning parameter 'degree' was held constant at a value of 1
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 10 and degree = 1.

marsPred <- predict(marsModel, newdata = testData$x)
marsResults <- postResample(pred = marsPred, obs = testData$y)
cat("\n\n")

print(marsResults)

    RMSE Rsquared      MAE 
1.776575 0.872700 1.358367

# Neural Network
nnetModel <- train(x = trainingData$x,
                   y = trainingData$y,
                   method = "nnet",
                   preProc = c("center", "scale"),
                   tuneLength = 5,
                   trace = FALSE,
                   maxit = 500,
                   linout = TRUE) # linout = TRUE for regression
print(nnetModel)

Neural Network 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
Resampling results across tuning parameters:

  size  decay  RMSE      Rsquared   MAE     
  1     0e+00  2.519137  0.7353382  1.970543
  1     1e-04  2.500942  0.7429904  1.944561
  1     1e-03  2.475847  0.7489994  1.931115
  1     1e-02  2.470504  0.7504027  1.922111
  1     1e-01  2.439968  0.7557086  1.893173
  3     0e+00  3.146237  0.6867740  2.246639
  3     1e-04  3.123896  0.6514690  2.421103
  3     1e-03  2.894276  0.6755563  2.274582
  3     1e-02  2.766721  0.6975189  2.199090
  3     1e-01  2.663439  0.7218025  2.102306
  5     0e+00  6.450585  0.4720758  3.615483
  5     1e-04  3.761009  0.5566309  2.708163
  5     1e-03  3.651200  0.5926186  2.679819
  5     1e-02  3.370460  0.6252829  2.614183
  5     1e-01  3.052473  0.6601510  2.392260
  7     0e+00  6.442198  0.4155727  3.821268
  7     1e-04  4.787702  0.4624648  3.401147
  7     1e-03  4.256500  0.5103711  3.207193
  7     1e-02  3.819179  0.5480917  2.979782
  7     1e-01  3.439917  0.6011807  2.741039
  9     0e+00  5.131231  0.4728159  3.608050
  9     1e-04  4.261980  0.4957059  3.306610
  9     1e-03  4.014608  0.5250012  3.199011
  9     1e-02  4.088546  0.5033594  3.233481
  9     1e-01  3.436716  0.6038520  2.721582

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were size = 1 and decay = 0.1.

nnetPred <- predict(nnetModel, newdata = testData$x)
nnetResults <- postResample(pred = nnetPred, obs = testData$y)
cat("\n\n")

print(nnetResults)

     RMSE  Rsquared       MAE 
2.6493149 0.7177213 2.0295230

# SVM
svmModel <- train(x = trainingData$x,
                  y = trainingData$y,
                  method = "svmRadial",
                  preProc = c("center", "scale"),
                  tuneLength = 10)
print(svmModel)

Support Vector Machines with Radial Basis Function Kernel 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
Resampling results across tuning parameters:

  C       RMSE      Rsquared   MAE     
    0.25  2.564825  0.7797760  2.011238
    0.50  2.357718  0.7938560  1.837232
    1.00  2.223469  0.8096320  1.723875
    2.00  2.136798  0.8217596  1.659346
    4.00  2.084793  0.8287955  1.622207
    8.00  2.067316  0.8310680  1.611923
   16.00  2.065727  0.8311623  1.610359
   32.00  2.065727  0.8311623  1.610359
   64.00  2.065727  0.8311623  1.610359
  128.00  2.065727  0.8311623  1.610359

Tuning parameter 'sigma' was held constant at a value of 0.062404
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.062404 and C = 16.

svmPred <- predict(svmModel, newdata = testData$x)
svmResults <- postResample(pred = svmPred, obs = testData$y)
cat("\n\n")

print(svmResults)

     RMSE  Rsquared       MAE 
2.0723657 0.8258694 1.5741453

Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?

Performance Summary:

MARS:

Optimal nprune: 10
RMSE: 1.776575
R-squared: 0.872700
MAE: 1.358367

SVM:

Optimal C: 16
Optimal sigma: 0.068874
RMSE: 2.0889248
R-squared: 0.8232974
MAE: 1.5874122

Neural Network:

Optimal size: 1
Optimal decay: 0.1
RMSE: 2.6493162
R-squared: 0.7177209
MAE: 2.0295251

kNN:

Optimal k: 19
RMSE: 3.2286834
R-squared: 0.6871735
MAE: 2.5939727

I have order the models from best performance to least, based on the following metric, low RMSE, high rsquared, and low MAE. In conclusion the MARS model outperforms the other models, it displays lowest RMSE, highest rsquared, lowest MAE. Now, the SVM model is also a strong contender following after the MARS model, it performs well. The Nueral Network performs alright it does have much higher RMSE than the previous two, lower rsquared and higher MAE, while the kNN model unperformed.

# Variable Importance for MARS Model
cat("Variable for MARS Model:\n")

Variable for MARS Model:

print(varImp(marsModel))

earth variable importance

   Overall
X1  100.00
X4   82.78
X2   64.18
X5   40.21
X3   28.14
X6    0.00

In regards to the variables that are most informative they are as follows:

X1: 100%
X4: 83%
X2: 64%
X5: 40%
X3: 28%

Evaluating Predictor Importance in Simulated Data: A Comparative Study of Random Forest, Conditional Inference Trees, Boosted Trees, and Cubist Models

8.1. Recreate the simulated data from Exercise 4:

library(mlbench)
set.seed(200)

simulated <- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
colnames(simulated)[ncol(simulated)] <- "y"

(a) Fit a random forest model to all of the predictors, then estimate the variable importance scores:

library(randomForest)

Warning: package 'randomForest' was built under R version 4.3.3

randomForest 4.7-1.1

Type rfNews() to see new features/changes/bug fixes.


Attaching package: 'randomForest'

The following object is masked from 'package:gridExtra':

    combine

The following object is masked from 'package:ggplot2':

    margin

library(caret)

# Fit the random forest model
random_forest <- randomForest(y ~ ., data = simulated, importance = TRUE, ntree = 1000)

# Estimate variable importance
random_forest_var <- varImp(random_forest, scale = FALSE)
print(random_forest_var)

         Overall
V1   8.732235404
V2   6.415369387
V3   0.763591825
V4   7.615118809
V5   2.023524577
V6   0.165111172
V7  -0.005961659
V8  -0.166362581
V9  -0.095292651
V10 -0.074944788

Positive Scores: V1, V2, V4, and V5 have positive scores. These are categorized as important predictors.
Negative Scores: V6, V7, V8, V9, and V10 have negative scores. These are categorized as counterproductive predictions.
V1 has the highest score (8.732) it is the most influential predictor in the model.

Did the random forest model significantly use the uninformative predictors (V6 – V10)?

No, the variables of importance score for these predictors are either low or negative. As stated above these predictors are categorized as counterproductive, so the models performance is driven by the important predictors.

(b) Now add an additional predictor that is highly correlated with one of the informative predictors. For example:

simulated$duplicate1 <- simulated$V1 + rnorm(200) * .1
cor(simulated$duplicate1, simulated$V1)

[1] 0.9460206

duplicate1 = 0.9460206

set.seed(200)  
simulated$duplicate2 <- simulated$V2 + rnorm(200) * 0.1
cor(simulated$duplicate2, simulated$V2)

[1] 0.9506982

duplicate2 = 0.9506982

Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1?

library(randomForest)
library(caret)

set.seed(200)
model_of_duplicates <- randomForest(y ~ ., data = simulated, importance = TRUE, ntree = 1000)
Imp_duplicates <- varImp(model_of_duplicates, scale = FALSE)
print(Imp_duplicates)

               Overall
V1          5.60280192
V2          5.68392894
V3          0.46241755
V4          7.27624754
V5          1.72904882
V6          0.15759142
V7         -0.04038007
V8         -0.08223050
V9          0.01374080
V10        -0.00844889
duplicate1  4.24124904
duplicate2  2.44934620

After adding another predictor V1 decreased to 5.60.
V1 and V2, ad V4 have the highest scores.
Duplicate1, has a score of 4.24.
Duplicate2, had a score of 2.45.
V4 has the highest score (7.28) it is the most influential predictor in the model.

As we add more predictors that are highly correlated, it ends up balancing the distribution of the predictor and shifting the level of importance for the model.

(c) Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that function toggles between the traditional importance measure and the modified version. Do these importances show the same pattern as the traditional random forest model?

library(party)

Warning: package 'party' was built under R version 4.3.3

Loading required package: grid

Loading required package: mvtnorm

Loading required package: modeltools

Loading required package: stats4


Attaching package: 'modeltools'

The following object is masked from 'package:kernlab':

    prior

Loading required package: strucchange

Warning: package 'strucchange' was built under R version 4.3.3

Loading required package: zoo


Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Loading required package: sandwich

Warning: package 'sandwich' was built under R version 4.3.3

set.seed(200)  
cforest_model <- cforest(y ~ ., data = simulated, controls = cforest_unbiased(ntree = 1000))
cforest_conditional <- varimp(cforest_model, conditional = TRUE)
print(cforest_conditional)

          V1           V2           V3           V4           V5           V6 
 1.889829629  3.264726775  0.004320600  5.551127933  0.914778235  0.007554850 
          V7           V8           V9          V10   duplicate1   duplicate2 
 0.014874927 -0.008556608  0.005090607 -0.003604917  2.074263861  0.555345761

set.seed(200)  
cforest_traditional <- varimp(cforest_model, conditional = FALSE)
print(cforest_traditional)

          V1           V2           V3           V4           V5           V6 
 4.446790682  5.209317057  0.011349761  7.518264422  1.440050685 -0.007794484 
          V7           V8           V9          V10   duplicate1   duplicate2 
 0.032232290 -0.017239934  0.012003443 -0.007523583  4.992002245  1.845200350

	Traditional	Conditional
V1	4.446790682	1.889829629
V2	5.209317057	3.264726775
V3	0.011349761	0.004320600
V4	7.518264422	5.551127933
V5	1.440050685	0.914778235
V6	-0.007794484	0.007554850
V7	0.032232290	0.014874927
V8	-0.017239934	-0.008556608
V9	0.012003443	0.005090607
V10	-0.007523583	-0.003604917
duplicate1	4.992002245	2.074263861
duplicate2	1.845200350	0.555345761

In summary, while the general pattern of importance is similar, where V4 has the highest score followed by V2. Additionally, V6 through V10 remain unimportant. The conditional model has a more balance dispenserment while the traditional highlights more the importance of some variables.

(d) Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?

Boosted Trees

library(gbm)

Warning: package 'gbm' was built under R version 4.3.3

Loaded gbm 2.2.2

This version of gbm is no longer under development. Consider transitioning to gbm3, https://github.com/gbm-developers/gbm3

set.seed(200)
gbm_model <- gbm(y ~ ., data = simulated, distribution = "gaussian", n.trees = 1000, interaction.depth = 3)
summary(gbm_model)

                  var   rel.inf
V4                 V4 27.564962
V2                 V2 18.081613
V1                 V1 13.457616
duplicate1 duplicate1 11.043677
V5                 V5 10.647818
V3                 V3  7.468840
duplicate2 duplicate2  4.082592
V7                 V7  2.232902
V6                 V6  1.579418
V8                 V8  1.322100
V10               V10  1.312882
V9                 V9  1.205581

In short the Boosted Model, still considers the level of importance from V6-V10 to be unimportant. The most important variable here is V4 which matches the previous models, followed by V2 and V1. This seem to be in alignment with the other approaches where the level of importance lies within either V4, V1 or V2.

Cubist Model

library(Cubist)

Warning: package 'Cubist' was built under R version 4.3.3

library(caret)

set.seed(200)
cubist_model <- train(y ~ ., data = simulated, method = "cubist", trControl = trainControl(method = "cv"))
cubist_varimp <- varImp(cubist_model, scale = FALSE)
print(cubist_varimp)

cubist variable importance

           Overall
V2            55.0
V1            52.0
V4            49.0
duplicate1    39.0
V5            38.0
V3            32.5
V6            21.5
duplicate2     4.5
V8             0.0
V7             0.0
V9             0.0
V10            0.0

summary(cubist_varimp)

           Length Class      Mode     
importance 1      data.frame list     
model      1      -none-     character
calledFrom 1      -none-     character

As for the Cubist Model, you have some similarities, where V7-V10 remain unimportant and gives a slightly higher score to V6. However, in comparison to the other scores, V2 has the highest score followed by V1 then V4. This also aligns with the other methods where the level of importance is given to the op three variables either V1, V2, or V4.

Overall the pattern remains almost unchanged you have the order of importance shift between variables, but most of the attention lies within V1, V2, V4. The counterproductive variables are pretty much the same besides in the last model, where it give V6 a higher score, but the pattern remains unchanges for the most part.

[Exploring Predictive Modeling and Data Analysis: An Investigation into Housing Data, Soybean Disease Prediction, Oil Classification, and Statistical Concepts]

library(MASS)

This exercise involves the Boston housing data set.

a) To begin, load in the Boston data set. Since the Boston data set is part of the MASS library, you need to install the MASS package into R/Rstudio and then access the package as follows:

#Boston

?Boston #Read about the data set using

How many rows are in this Boston data set? How many columns? What do the rows and columns represent?

data("Boston")

Based on the information provided we have the following:

Rows:506-observations in the dataset for the Boston area.

Columns: 14-variables, each column represents different variables. They are as followed:

crim: per capita crime rate by town.
zn: proportion of residential land zoned for lots over 25,000 sq. ft.
indus: proportion of non-retail business acres per town.
chas: Charles River dummy variable (1 if tract bounds river; 0 otherwise).
nox: nitrogen oxides concentration (parts per 10 million).
rm: average number of rooms per dwelling.
age: proportion of owner-occupied units built prior to 1940.
dis: weighted mean of distances to five Boston employment centers.
rad: index of accessibility to radial highways.
tax: full-value property tax rate per $10,000.
ptratio: pupil-teacher ratio by town.
black: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.
lstat: percentage of lower status of the population.
medv: median value of owner-occupied homes in $1000s.

b) Make some pairwise scatterplots of the predictors (columns) in this data set. Describe your findings.

pairs(Boston, pch = 20, cex = 1, labels = colnames(Boston))

predictors <- colnames(Boston)

par(mfrow = c(3, 3)) 

for (i in 1:length(predictors)) 
  {for (j in (i+1):length(predictors))
    {predictor_x <- predictors[i]
    predictor_y <- predictors[j]
    
    if (!is.na(predictor_x) && !is.na(predictor_y)) 
    
      {plot(Boston[[predictor_x]], Boston[[predictor_y]],
           xlab = predictor_x, ylab = predictor_y,
           main = paste("Scatterplot:", predictor_x, "and", predictor_y))
      
      
      abline(lm(Boston[[predictor_y]] ~ Boston[[predictor_x]]), col = "red")}}}

I have provided two displays. The first combines all pairwise plots in a single view, the second organizes them for a better visualization. After reviewing the pairwise plots, I observed both positive and negative correlations, as well as some no-correlation, and some outliers. Below are a few observations, and as I proceed with the homework I will call out other observations:

rad and zn: Negative correlation. Areas in Boston with more accessible radial highways have less residential zoning.
age and lstat: Positive correlation. Older homes have higher proportions of lower status population.
nox and tax: Positive correlation. Higher nitrogen oxides concentration levels are found in areas with higher property taxes.
chas: No significant correlation. Most of the variables associated (chas) - proximity to the Charles River are not significantly.
indus and tax: Positive correlation. Industrialized areas tend to have higher property taxes.
crim and medv: Negative correlation. Higher crime rates are found in areas with median home values.

c) Are any of the predictors associated with per capita crime rate? If so, explain the relationship.

par(mfrow = c(2, 2)) 

# Repeating the same as in 'b'
for (i in 1:length(predictors)) 
  {for (j in (i+1):length(predictors))
    {predictor_x <- predictors[i]
    predictor_y <- predictors[j]
    
    if (!is.na(predictor_x) && !is.na(predictor_y)) 
      
      {if (predictor_x == "crim" || predictor_y == "crim")   # Checking for crim as a predictor
       
        # Repeating the same as in 'b'
        {plot(Boston[[predictor_x]], Boston[[predictor_y]],
             xlab = predictor_x, ylab = predictor_y,
             main = paste("Scatterplot:", predictor_x, "and", predictor_y))
        
        if (predictor_x == "crim") 
          {abline(lm(Boston[[predictor_y]] ~ Boston[[predictor_x]]), col = "red")}}}}}

Yes, there are predictors associated with the per capita crime rate. The following observation where made from the plots above, I also used the linear regression line to help me.

Negative correlation: - crim and zn: There’s a slight negative correlation. - crim and dis: There’s a negative correlation. - crim and black: There’s a negative correlation. - crim and medv: There’s a negative correlation. - crim and indus: There’s a positive correlation.

Positive correlation: - crim and nox: There’s a positive correlation.
- crim and rm: There’s a slight negative correlation. - crim and age: There’s a positive correlation.
- crim and rad: There’s a positive correlation.
- crim and tax: There’s a positive correlation.
- crim and ptratio: There’s a slight positive correlation. - crim and lstat: There’s a positive correlation.

No correlation: - crim and chas: There’s no clear correlation.

In conclusion the plots suggest that areas with higher nitrogen oxide/pollution, industrial areas, older homes, accessibility to radial highways, taxes, and lower status of population are likely to have higher crime rates. On the other hand, areas with more residential land zoning, larger homes, greater distance to five employment centers, and black population tend to have lower crime rates.

d) Do any of the census tracts of Boston appear to have particularly high crime rates? Tax rates? Comment on the range of each predictor.

hist(Boston$crim, main = "Histogram: crim", xlab = "Per Capita Crime Rate", col = "blue")

hist(Boston$tax, main = "Histogram: tax", xlab = "Tax Rate", col = "red")

# census tracts with high crime rates and tax rates
high_crime = quantile(Boston$crim, 0.95)  
high_tax = quantile(Boston$tax, 0.95)  

high_crime_tracts = Boston[Boston$crim > high_crime, ]
high_tax_tracts = Boston[Boston$tax > high_tax, ]

rangecrim = range(Boston$crim, na.rm = TRUE)
rangetax = range(Boston$tax, na.rm = TRUE)

According to the output there exist census tracts with high crime rates and tax rates, especially in the top 5%. The per capita crime rate ranges from 0.00632 to 88.97620, and the property tax rate range from 187 to 711. This just tells me that there are significant variability for both predictors.

# range of each predictor
predictor_ranges <- apply(Boston, 2, range)
predictor_ranges

         crim  zn indus chas   nox    rm   age     dis rad tax ptratio  black
[1,]  0.00632   0  0.46    0 0.385 3.561   2.9  1.1296   1 187    12.6   0.32
[2,] 88.97620 100 27.74    1 0.871 8.780 100.0 12.1265  24 711    22.0 396.90
     lstat medv
[1,]  1.73    5
[2,] 37.97   50

Zn-proportion of residential land zoned rate range: 0 to 100
indus-proportion of non-retail business acres per town rate range: 0.46 to 27.74
chas-Charles River rate range: 0 to 1
nox-nitrogen oxides concentration rate range: 0.385 to 0.871 (parts per 10 million)
rm-average number of rooms rate range: 3.561 to 8.780
age-older home rate range: 2.9 to 100
dis-distances to employment centres rate range: 1.1296 to 12.1265
rad-accessibility to radial highways: 1 to 24
ptratio-pupil-teacher ratio by town rate range: 12.6 to 22
black-proportion of black population: 0.32 to 396.90
lstat-lower status of the population rate range: 1.73 to 37.97
medv-median value of owner-occupied rate range: 5 to 50

Just reiterating what was previously stated, these predictors demonstrate that there is significant variability.

e) How many of the census tracts in this data set bound the Charles river?

tracts_chas = sum(Boston$chas == 1)
tracts_chas

[1] 35

Since, the predictor chas - Charles River, 1 is if tract bounds river. They’re 35 census tracts.

(f) What is the median pupil-teacher ratio among the towns in this data set?

median_ptratio = median(Boston$ptratio)
median_ptratio

[1] 19.05

The median pupil-teacher ratio by town is 19.05.

Soybean case study

The soybean data can also be found at the UC Irvine Machine Learning Repository. Data were collected to predict disease in 683 soybeans. The 35 predictors are mostly categorical and include information on the environmental conditions (e.g., temperature, precipitation) and plant conditions (e.g., left spots, mold growth). The outcome labels consist of 19 distinct classes.

library(VIM)

Warning: package 'VIM' was built under R version 4.3.3

Loading required package: colorspace

VIM is ready to use.

Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues


Attaching package: 'VIM'

The following object is masked from 'package:datasets':

    sleep

library(mice)

Warning: package 'mice' was built under R version 4.3.3


Attaching package: 'mice'

The following object is masked from 'package:kernlab':

    convergence

The following object is masked from 'package:stats':

    filter

The following objects are masked from 'package:base':

    cbind, rbind

library(mlbench) 
data(Soybean) 
?Soybean

a) Investigate the frequency distributions for the categorical predictors. Are any of the distributions degenerate in the ways discussed earlier in this chapter?

str(Soybean)

'data.frame':   683 obs. of  36 variables:
 $ Class          : Factor w/ 19 levels "2-4-d-injury",..: 11 11 11 11 11 11 11 11 11 11 ...
 $ date           : Factor w/ 7 levels "0","1","2","3",..: 7 5 4 4 7 6 6 5 7 5 ...
 $ plant.stand    : Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 1 1 ...
 $ precip         : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
 $ temp           : Ord.factor w/ 3 levels "0"<"1"<"2": 2 2 2 2 2 2 2 2 2 2 ...
 $ hail           : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 1 ...
 $ crop.hist      : Factor w/ 4 levels "0","1","2","3": 2 3 2 2 3 4 3 2 4 3 ...
 $ area.dam       : Factor w/ 4 levels "0","1","2","3": 2 1 1 1 1 1 1 1 1 1 ...
 $ sever          : Factor w/ 3 levels "0","1","2": 2 3 3 3 2 2 2 2 2 3 ...
 $ seed.tmt       : Factor w/ 3 levels "0","1","2": 1 2 2 1 1 1 2 1 2 1 ...
 $ germ           : Ord.factor w/ 3 levels "0"<"1"<"2": 1 2 3 2 3 2 1 3 2 3 ...
 $ plant.growth   : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
 $ leaves         : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
 $ leaf.halo      : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ leaf.marg      : Factor w/ 3 levels "0","1","2": 3 3 3 3 3 3 3 3 3 3 ...
 $ leaf.size      : Ord.factor w/ 3 levels "0"<"1"<"2": 3 3 3 3 3 3 3 3 3 3 ...
 $ leaf.shread    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ leaf.malf      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ leaf.mild      : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ stem           : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
 $ lodging        : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 2 1 1 1 ...
 $ stem.cankers   : Factor w/ 4 levels "0","1","2","3": 4 4 4 4 4 4 4 4 4 4 ...
 $ canker.lesion  : Factor w/ 4 levels "0","1","2","3": 2 2 1 1 2 1 2 2 2 2 ...
 $ fruiting.bodies: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
 $ ext.decay      : Factor w/ 3 levels "0","1","2": 2 2 2 2 2 2 2 2 2 2 ...
 $ mycelium       : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ int.discolor   : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ sclerotia      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ fruit.pods     : Factor w/ 4 levels "0","1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ fruit.spots    : Factor w/ 4 levels "0","1","2","4": 4 4 4 4 4 4 4 4 4 4 ...
 $ seed           : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ mold.growth    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ seed.discolor  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ seed.size      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ shriveling     : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ roots          : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...

summary(Soybean)

                 Class          date     plant.stand  precip      temp    
 brown-spot         : 92   5      :149   0   :354    0   : 74   0   : 80  
 alternarialeaf-spot: 91   4      :131   1   :293    1   :112   1   :374  
 frog-eye-leaf-spot : 91   3      :118   NA's: 36    2   :459   2   :199  
 phytophthora-rot   : 88   2      : 93               NA's: 38   NA's: 30  
 anthracnose        : 44   6      : 90                                    
 brown-stem-rot     : 44   (Other):101                                    
 (Other)            :233   NA's   :  1                                    
   hail     crop.hist  area.dam    sever     seed.tmt     germ     plant.growth
 0   :435   0   : 65   0   :123   0   :195   0   :305   0   :165   0   :441    
 1   :127   1   :165   1   :227   1   :322   1   :222   1   :213   1   :226    
 NA's:121   2   :219   2   :145   2   : 45   2   : 35   2   :193   NA's: 16    
            3   :218   3   :187   NA's:121   NA's:121   NA's:112               
            NA's: 16   NA's:  1                                                
                                                                               
                                                                               
 leaves  leaf.halo  leaf.marg  leaf.size  leaf.shread leaf.malf  leaf.mild 
 0: 77   0   :221   0   :357   0   : 51   0   :487    0   :554   0   :535  
 1:606   1   : 36   1   : 21   1   :327   1   : 96    1   : 45   1   : 20  
         2   :342   2   :221   2   :221   NA's:100    NA's: 84   2   : 20  
         NA's: 84   NA's: 84   NA's: 84                          NA's:108  
                                                                           
                                                                           
                                                                           
   stem     lodging    stem.cankers canker.lesion fruiting.bodies ext.decay 
 0   :296   0   :520   0   :379     0   :320      0   :473        0   :497  
 1   :371   1   : 42   1   : 39     1   : 83      1   :104        1   :135  
 NA's: 16   NA's:121   2   : 36     2   :177      NA's:106        2   : 13  
                       3   :191     3   : 65                      NA's: 38  
                       NA's: 38     NA's: 38                                
                                                                            
                                                                            
 mycelium   int.discolor sclerotia  fruit.pods fruit.spots   seed    
 0   :639   0   :581     0   :625   0   :407   0   :345    0   :476  
 1   :  6   1   : 44     1   : 20   1   :130   1   : 75    1   :115  
 NA's: 38   2   : 20     NA's: 38   2   : 14   2   : 57    NA's: 92  
            NA's: 38                3   : 48   4   :100              
                                    NA's: 84   NA's:106              
                                                                     
                                                                     
 mold.growth seed.discolor seed.size  shriveling  roots    
 0   :524    0   :513      0   :532   0   :539   0   :551  
 1   : 67    1   : 64      1   : 59   1   : 38   1   : 86  
 NA's: 92    NA's:106      NA's: 92   NA's:106   2   : 15  
                                                 NA's: 31

degenerate_predictors = sapply(Soybean, function(x) 
  {if (is.factor(x)) 
    {length(unique(x)) == 1}
   else {FALSE}})

degenerate_predictors

          Class            date     plant.stand          precip            temp 
          FALSE           FALSE           FALSE           FALSE           FALSE 
           hail       crop.hist        area.dam           sever        seed.tmt 
          FALSE           FALSE           FALSE           FALSE           FALSE 
           germ    plant.growth          leaves       leaf.halo       leaf.marg 
          FALSE           FALSE           FALSE           FALSE           FALSE 
      leaf.size     leaf.shread       leaf.malf       leaf.mild            stem 
          FALSE           FALSE           FALSE           FALSE           FALSE 
        lodging    stem.cankers   canker.lesion fruiting.bodies       ext.decay 
          FALSE           FALSE           FALSE           FALSE           FALSE 
       mycelium    int.discolor       sclerotia      fruit.pods     fruit.spots 
          FALSE           FALSE           FALSE           FALSE           FALSE 
           seed     mold.growth   seed.discolor       seed.size      shriveling 
          FALSE           FALSE           FALSE           FALSE           FALSE 
          roots 
          FALSE

According to the result, the Soybean dataset has no predictors that are degenerate.

b) Roughly 18 % of the data are missing. Are there particular predictors that are more likely to be missing? Is the pattern of missing data related to the classes?

aggr_plot <- aggr(Soybean, col=c('blue', 'red'), numbers=TRUE, sortVars=TRUE, 
                  labels=names(Soybean), cex.axis=.85, gap=3, 
                  ylab=c("Missing data","Pattern"))


 Variables sorted by number of missings: 
        Variable       Count
            hail 0.177159590
           sever 0.177159590
        seed.tmt 0.177159590
         lodging 0.177159590
            germ 0.163982430
       leaf.mild 0.158125915
 fruiting.bodies 0.155197657
     fruit.spots 0.155197657
   seed.discolor 0.155197657
      shriveling 0.155197657
     leaf.shread 0.146412884
            seed 0.134699854
     mold.growth 0.134699854
       seed.size 0.134699854
       leaf.halo 0.122986823
       leaf.marg 0.122986823
       leaf.size 0.122986823
       leaf.malf 0.122986823
      fruit.pods 0.122986823
          precip 0.055636896
    stem.cankers 0.055636896
   canker.lesion 0.055636896
       ext.decay 0.055636896
        mycelium 0.055636896
    int.discolor 0.055636896
       sclerotia 0.055636896
     plant.stand 0.052708638
           roots 0.045387994
            temp 0.043923865
       crop.hist 0.023426061
    plant.growth 0.023426061
            stem 0.023426061
            date 0.001464129
        area.dam 0.001464129
           Class 0.000000000
          leaves 0.000000000

The highest proportion of missing data are as follows:

Hail - 17.7% Server - 17.7% Seed.tmt - 17.7% Germ - 16.4% lodging - 17.7% There are other in the 15% range, and in the 14%, and so on.

Most of the variables seem to be missing data, you have some variables that are not missing (class, and leaves). After looking at the visualization, the missing data does not have a relationship to classes.

c) Develop a strategy for handling missing data, either by eliminating predictors or imputation.

missing_na <- colSums(is.na(Soybean))

# Identify predictors with more than 99 missing values
predictors_removed <- names(missing_na[missing_na > 99])

# Create a new dataset, now without the predictors  that have more than 99 missing values
Soybean_new <- Soybean[, !(names(Soybean) %in% predictors_removed)]

# Summary of the cleaned dataset
summary(Soybean_new)

                 Class          date     plant.stand  precip      temp    
 brown-spot         : 92   5      :149   0   :354    0   : 74   0   : 80  
 alternarialeaf-spot: 91   4      :131   1   :293    1   :112   1   :374  
 frog-eye-leaf-spot : 91   3      :118   NA's: 36    2   :459   2   :199  
 phytophthora-rot   : 88   2      : 93               NA's: 38   NA's: 30  
 anthracnose        : 44   6      : 90                                    
 brown-stem-rot     : 44   (Other):101                                    
 (Other)            :233   NA's   :  1                                    
 crop.hist  area.dam   plant.growth leaves  leaf.halo  leaf.marg  leaf.size 
 0   : 65   0   :123   0   :441     0: 77   0   :221   0   :357   0   : 51  
 1   :165   1   :227   1   :226     1:606   1   : 36   1   : 21   1   :327  
 2   :219   2   :145   NA's: 16             2   :342   2   :221   2   :221  
 3   :218   3   :187                        NA's: 84   NA's: 84   NA's: 84  
 NA's: 16   NA's:  1                                                        
                                                                            
                                                                            
 leaf.malf    stem     stem.cankers canker.lesion ext.decay  mycelium  
 0   :554   0   :296   0   :379     0   :320      0   :497   0   :639  
 1   : 45   1   :371   1   : 39     1   : 83      1   :135   1   :  6  
 NA's: 84   NA's: 16   2   : 36     2   :177      2   : 13   NA's: 38  
                       3   :191     3   : 65      NA's: 38             
                       NA's: 38     NA's: 38                           
                                                                       
                                                                       
 int.discolor sclerotia  fruit.pods   seed     mold.growth seed.size 
 0   :581     0   :625   0   :407   0   :476   0   :524    0   :532  
 1   : 44     1   : 20   1   :130   1   :115   1   : 67    1   : 59  
 2   : 20     NA's: 38   2   : 14   NA's: 92   NA's: 92    NA's: 92  
 NA's: 38                3   : 48                                    
                         NA's: 84                                    
                                                                     
                                                                     
  roots    
 0   :551  
 1   : 86  
 2   : 15  
 NA's: 31

Explanation: Initially during part a, when I ran this model, I noticed variables that higher amounts of missing data. I decided to remove predictors that had more than 100 missing values and removing them accordingly. This way I can remove any variables that exceeds a specific number of missing values. Finally I create a new dataset that can be used for future analysis.

Brodnjak-Vonina et al. (2005) develop a methodology for food laboratories to determine the type of oil from a sample. In their procedure, they used a gas chromatograph (an instrument that separates chemicals in a sample) to measure seven different fatty acids in an oil. These measurements would then be used to predict the type of oil in food samples. To create their model, they used 96 samples2 of seven types of oils.

These data can be found in the caret package using data(oil). The oil types are contained in a factor variable called oilType. The types are pumpkin (coded as A), sunflower (B), peanut (C), olive (D), soybean (E), rapeseed (F) and corn (G). In R,

#install.packages("caret")
library(caret)
data(oil)

str(oilType)

 Factor w/ 7 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...

cat("\n\n")

table(oilType)

oilType
 A  B  C  D  E  F  G 
37 26  3  7 11 10  2

a) Use the sample function in base R to create a completely random sample of 60 oils. How closely do the frequencies of the random sample match the original samples? Repeat this procedure several times of understand the variation in the sampling process.

set.seed(123)

random_sample <- sample(oilType, 60, replace = FALSE)

# Comparing frequencies for the random sample to the original data
random <- table(random_sample)
original <- table(oilType)

print("Original Frequencies:")

[1] "Original Frequencies:"

print(original)

oilType
 A  B  C  D  E  F  G 
37 26  3  7 11 10  2

cat("\n\n")

print("Random Sample Frequencies:")

[1] "Random Sample Frequencies:"

print(random)

random_sample
 A  B  C  D  E  F  G 
24 17  3  3  6  5  2

The random sample has the same frequencies for: C and G.
The random sample has lower frequencies for: A,B, D, E and F.

b) Use the caret package function createDataPartition to create a stratified random sample. How does this compare to completely random samples?

# Following same process as "a"
set.seed(123)

# creating stratified random sample
stratified_sample <- createDataPartition(y = oilType, p = 0.1, list = FALSE)
stratified_data <- oilType[stratified_sample]  

stratified_data_df <- as.data.frame(stratified_data)


table_stratified <- table(stratified_data_df)

cat("Original Frequencies:\n")

Original Frequencies:

print(random)

random_sample
 A  B  C  D  E  F  G 
24 17  3  3  6  5  2

# Print some space
cat("\n\n")

# Print stratified sample frequencies
cat("Stratified Sample Frequencies:\n")

Stratified Sample Frequencies:

print(table_stratified)

stratified_data
A B C D E F G 
4 3 1 1 2 1 1

The values in the statified sample are much lower than the random sample. This is in part due to the stratified refined method, while the random sample looks at all variables within the dataset at random.

c) With such a small samples size, what are the options for determining performance of the model? Should a test set be used?

Methods such as K-fold, along with the train-test split method, could be an option for determining performance of the refined model.

d) One method for understanding the uncertainty of a test set is to use a confidence interval. To obtain a confidence interval for the overall accuracy, the based R function binom.test can be used. It requires the user to input the number of samples and the number correctly classified to calculate the interval. For example, suppose a test set sample of 20 oil samples was set aside and 76 were used for model training. For this test set size and a model that is about 80 % accurate (16 out of 20 correct), the confidence interval would be computed using

binomial_result1 = binom.test(16, 20)
print(binomial_result1)


    Exact binomial test

data:  16 and 20
number of successes = 16, number of trials = 20, p-value = 0.01182
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.563386 0.942666
sample estimates:
probability of success 
                   0.8

In this case, the width of the 95% confidence interval is 37.9 %, and accuracy 80%.

Try different samples sizes and accuracy rates to understand the trade-off between the uncertainty in the results, the model performance, and the test set size.

binom_result2 <- binom.test(41, 50)

print(binom_result2)


    Exact binomial test

data:  41 and 50
number of successes = 41, number of trials = 50, p-value = 5.614e-06
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.6856306 0.9142379
sample estimates:
probability of success 
                  0.82

In this case, the width of the 95% confidence interval is 22.9% and accuracy 82%

binom_result3 <- binom.test(90, 100)

print(binom_result3)


    Exact binomial test

data:  90 and 100
number of successes = 90, number of trials = 100, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.8237774 0.9509953
sample estimates:
probability of success 
                   0.9

In this case, the width of the 95% confidence interval is 12.7%, and accuracy 90%

In conclusion, I noticed that as the sample size increases it reduces the confidence level. However, as the accuracy rate increases it tends to reduce the interval width.

Briefly discuss what is the bias-variance tradeoff in statistics and predictive modeling.

The bias-variance tradeoff is when we chose lower bias which increases variance or lower variance increases bias. The objective is to find a good balance meaning that both bias and variance are at minimal.

Predicting Fat Content in Meat Using IR Spectroscopy and Machine Learning: A Comparative Study of Predictive Models

a) Start R and use these commands to load the data:

library(caret)
library(e1071)
library(nnet)
library(earth)
library(kernlab)
library(pls)


Attaching package: 'pls'

The following object is masked from 'package:caret':

    R2

The following object is masked from 'package:corrplot':

    corrplot

The following object is masked from 'package:LearnBayes':

    predplot

The following object is masked from 'package:stats':

    loadings

library(kknn)

Warning: package 'kknn' was built under R version 4.3.3


Attaching package: 'kknn'

The following object is masked from 'package:caret':

    contr.dummy

data(tecator)
?tecator 
str(absorp)

 num [1:215, 1:100] 2.62 2.83 2.58 2.82 2.79 ...

str(endpoints)

 num [1:215, 1:3] 60.5 46 71 72.8 58.3 44 44 69.3 61.4 61.4 ...

The matrix absorp contains the 100 absorbance values for the 215 samples, while matrix endpoints contain the percent of moisture, fat, and protein in columns 1–3, respectively. To be more specific

moisture = endpoints[,1]
fat = endpoints[,2]
protein = endpoints[,3]

b) Split the data into a training and a test set the response of the percentage of protein, pre-process the data as appropriate.

set.seed(123)
index <- createDataPartition(protein, p = 0.7, list = FALSE)
train_data <- data.frame(absorp[index, ], protein = protein[index])
test_data <- data.frame(absorp[-index, ], protein = protein[-index])

combined_data <- rbind(train_data, test_data)
preProcess <- preProcess(combined_data[, -ncol(combined_data)], method = "pca", pcaComp = 20)
combined_pca <- predict(preProcess, combined_data[, -ncol(combined_data)])

n_train <- nrow(train_data)
train_pca <- cbind(combined_pca[1:n_train, ], protein = train_data$protein)
test_pca <- cbind(combined_pca[(n_train + 1):nrow(combined_data), ], protein = test_data$protein)

c) Build at least three models described Chapter 6: ordinary least squares, PCR, PLS, Ridge, and ENET. For those models with tuning parameters, what are the optimal values of the tuning parameter(s)?

Ordinary Least Squares (OLS):

set.seed(123)
ols_model <- train(protein ~ ., data = train_pca, method = "lm")
ols_model

Linear Regression 

152 samples
 20 predictor

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 152, 152, 152, 152, 152, 152, ... 
Resampling results:

  RMSE       Rsquared   MAE      
  0.7268192  0.9450398  0.5549019

Tuning parameter 'intercept' was held constant at a value of TRUE

OLS - Linear Regression:

RMSE: 0.7268192
R-squared: 0.9450398
MAE: 0.5549019

Principal Component Regression (PCR):

set.seed(123)
pcr_model <- train(protein ~ ., data = train_data, method = "pcr", trControl = trainControl(method = "cv"))
pcr_model

Principal Component Analysis 

152 samples
100 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 136, 137, 137, 137, 136, 138, ... 
Resampling results across tuning parameters:

  ncomp  RMSE      Rsquared   MAE     
  1      2.966122  0.1570382  2.520911
  2      2.854048  0.1843866  2.337577
  3      2.316586  0.4545058  1.848556

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 3.

Principal Component Regression (PCR):

ncomp: 3
RMSE: 2.316586
R-squared: 0.4545058
MAE: 1.848556

Partial Least Squares (PLS):

set.seed(123)
pls_model <- train(protein ~ ., data = train_data, method = "pls", trControl = trainControl(method = "cv"))
pls_model

Partial Least Squares 

152 samples
100 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 136, 137, 137, 137, 136, 138, ... 
Resampling results across tuning parameters:

  ncomp  RMSE      Rsquared   MAE     
  1      2.959109  0.1580897  2.511023
  2      2.256430  0.5094219  1.788162
  3      1.743833  0.6963113  1.291007

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 3.

Partial Least Squares (PLS):

ncomp: 3
RMSE: 1.743833
R-squared: 0.6963113
MAE: 1.291007

d) Build nonlinear models in Chapter 7: SVM, neural network, MARS, and KNN models. Since neural networks are especially sensitive to highly correlated predictors, does pre-processing using PCA help the model? For those models with tuning parameters, what are the optimal values of the tuning parameter(s)?

Support Vector Machine (SVM)

set.seed(123)
svm_model <- train(protein ~ ., data = train_data, method = "svmRadial", trControl = trainControl(method = "cv"), tuneLength = 10)
svm_model

Support Vector Machines with Radial Basis Function Kernel 

152 samples
100 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 136, 137, 137, 137, 136, 138, ... 
Resampling results across tuning parameters:

  C       RMSE      Rsquared   MAE     
    0.25  2.683055  0.3123868  2.126981
    0.50  2.431071  0.4167549  1.921845
    1.00  2.135216  0.5501123  1.656360
    2.00  1.934330  0.6259364  1.506506
    4.00  1.812361  0.6698215  1.397221
    8.00  1.744285  0.6980792  1.333155
   16.00  1.717343  0.7085363  1.317764
   32.00  1.676734  0.7178974  1.272854
   64.00  1.805285  0.6829508  1.316395
  128.00  1.910920  0.6787767  1.327847

Tuning parameter 'sigma' was held constant at a value of 0.05200074
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.05200074 and C = 32.

SVM:

C: 32
sigma: 0.05200074
RMSE: 1.676734
R-squared: 0.7178974
MAE: 1.272854

Neural Network (NN):

set.seed(123)
nn_model <- train(protein ~ ., data = train_pca, method = "nnet", trControl = trainControl(method = "cv"), tuneLength = 10, trace = FALSE)

Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
: There were missing values in resampled performance measures.

nn_model

Neural Network 

152 samples
 20 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 136, 137, 137, 137, 136, 138, ... 
Resampling results across tuning parameters:

  size  decay         RMSE      Rsquared    MAE     
   1    0.0000000000  16.94374         NaN  16.66501
   1    0.0001000000  16.94374  0.14978364  16.66501
   1    0.0002371374  16.94374  0.12567105  16.66501
   1    0.0005623413  16.94374  0.23506957  16.66501
   1    0.0013335214  16.94374  0.22929835  16.66501
   1    0.0031622777  16.94375  0.36449463  16.66502
   1    0.0074989421  16.94376  0.28410002  16.66503
   1    0.0177827941  16.94378  0.37009598  16.66505
   1    0.0421696503  16.94383  0.43328160  16.66510
   1    0.1000000000  16.94394  0.43121217  16.66522
   3    0.0000000000  16.94374         NaN  16.66501
   3    0.0001000000  16.94374  0.05872841  16.66501
   3    0.0002371374  16.94374  0.06529164  16.66501
   3    0.0005623413  16.94374  0.13473509  16.66501
   3    0.0013335214  16.94374  0.08244673  16.66501
   3    0.0031622777  16.94374  0.28238535  16.66501
   3    0.0074989421  16.94375  0.41696389  16.66502
   3    0.0177827941  16.94376  0.42169655  16.66503
   3    0.0421696503  16.94380  0.36330231  16.66507
   3    0.1000000000  16.94387  0.39924295  16.66515
   5    0.0000000000  16.94374         NaN  16.66501
   5    0.0001000000  16.94374  0.04037119  16.66501
   5    0.0002371374  16.94374  0.09412361  16.66501
   5    0.0005623413  16.94374  0.12538589  16.66501
   5    0.0013335214  16.94374  0.10079018  16.66501
   5    0.0031622777  16.94374  0.22678453  16.66501
   5    0.0074989421  16.94375  0.40603294  16.66502
   5    0.0177827941  16.94376  0.39381393  16.66503
   5    0.0421696503  16.94379  0.39586586  16.66506
   5    0.1000000000  16.94385  0.38722123  16.66512
   7    0.0000000000  16.94374         NaN  16.66501
   7    0.0001000000  16.94374  0.05679502  16.66501
   7    0.0002371374  16.94374  0.10851811  16.66501
   7    0.0005623413  16.94374  0.06909546  16.66501
   7    0.0013335214  16.94374  0.13320589  16.66501
   7    0.0031622777  16.94374  0.23264217  16.66501
   7    0.0074989421  16.94375  0.29182176  16.66502
   7    0.0177827941  16.94376  0.35007398  16.66503
   7    0.0421696503  16.94378  0.28503256  16.66505
   7    0.1000000000  16.94384  0.32305081  16.66511
   9    0.0000000000  16.94374         NaN  16.66501
   9    0.0001000000  16.94374  0.07837021  16.66501
   9    0.0002371374  16.94374  0.08903918  16.66501
   9    0.0005623413  16.94374  0.08068229  16.66501
   9    0.0013335214  16.94374  0.08148662  16.66501
   9    0.0031622777  16.94374  0.22060992  16.66501
   9    0.0074989421  16.94374  0.24151517  16.66501
   9    0.0177827941  16.94375  0.36487565  16.66502
   9    0.0421696503  16.94378  0.39001464  16.66505
   9    0.1000000000  16.94382  0.40309815  16.66510
  11    0.0000000000  16.94374         NaN  16.66501
  11    0.0001000000  16.94374  0.08534898  16.66501
  11    0.0002371374  16.94374  0.05052102  16.66501
  11    0.0005623413  16.94374  0.05491134  16.66501
  11    0.0013335214  16.94374  0.07165259  16.66501
  11    0.0031622777  16.94374  0.10939340  16.66501
  11    0.0074989421  16.94374  0.28477069  16.66501
  11    0.0177827941  16.94375  0.27494690  16.66502
  11    0.0421696503  16.94377  0.35160466  16.66504
  11    0.1000000000  16.94382  0.32616988  16.66509
  13    0.0000000000  16.94374         NaN  16.66501
  13    0.0001000000  16.94374  0.07206534  16.66501
  13    0.0002371374  16.94374  0.03338102  16.66501
  13    0.0005623413  16.94374  0.10199462  16.66501
  13    0.0013335214  16.94374  0.07235802  16.66501
  13    0.0031622777  16.94374  0.07123641  16.66501
  13    0.0074989421  16.94374  0.16657309  16.66501
  13    0.0177827941  16.94375  0.30535658  16.66502
  13    0.0421696503  16.94377  0.27075772  16.66504
  13    0.1000000000  16.94381  0.29249124  16.66508
  15    0.0000000000  16.94374         NaN  16.66501
  15    0.0001000000  16.94374  0.16676854  16.66501
  15    0.0002371374  16.94374  0.12257102  16.66501
  15    0.0005623413  16.94374  0.09188033  16.66501
  15    0.0013335214  16.94374  0.10349543  16.66501
  15    0.0031622777  16.94374  0.04629168  16.66501
  15    0.0074989421  16.94374  0.12201012  16.66501
  15    0.0177827941  16.94375  0.24663325  16.66502
  15    0.0421696503  16.94377  0.29833891  16.66504
  15    0.1000000000  16.94381  0.32152626  16.66508
  17    0.0000000000  16.94374         NaN  16.66501
  17    0.0001000000  16.94374  0.11594246  16.66501
  17    0.0002371374  16.94374  0.07114745  16.66501
  17    0.0005623413  16.94374  0.09299386  16.66501
  17    0.0013335214  16.94374  0.06510852  16.66501
  17    0.0031622777  16.94374  0.06360193  16.66501
  17    0.0074989421  16.94375  0.11744827  16.66501
  17    0.0177827941  16.94375  0.24397690  16.66502
  17    0.0421696503  16.94377  0.16413717  16.66504
  17    0.1000000000  16.94380  0.28505783  16.66507
  19    0.0000000000  16.94374         NaN  16.66501
  19    0.0001000000  16.94374  0.08433063  16.66501
  19    0.0002371374  16.94374  0.09962678  16.66501
  19    0.0005623413  16.94374  0.09972566  16.66501
  19    0.0013335214  16.94374  0.19852311  16.66501
  19    0.0031622777  16.94374  0.04105236  16.66501
  19    0.0074989421  16.94374  0.14339708  16.66501
  19    0.0177827941  16.94375  0.19480536  16.66502
  19    0.0421696503  16.94377  0.29160005  16.66504
  19    0.1000000000  16.94380  0.25396402  16.66507

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were size = 1 and decay = 0.

Neural Network:

size: 1
decay: 0
RMSE: 16.94374
R-squared: NA
MAE: 16.66501

Multivariate Adaptive Regression Splines (MARS):

set.seed(123)
mars_model <- train(protein ~ ., data = train_data, method = "earth", trControl = trainControl(method = "cv"))
mars_model

Multivariate Adaptive Regression Spline 

152 samples
100 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 136, 137, 137, 137, 136, 138, ... 
Resampling results across tuning parameters:

  nprune  RMSE      Rsquared   MAE      
   2      2.854067  0.1814645  2.3879962
  15      1.212796  0.8517258  0.9221938
  28      1.357497  0.8313863  0.9689615

Tuning parameter 'degree' was held constant at a value of 1
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 15 and degree = 1.

MARS:

nprune: 15
degree: 1
RMSE: 1.212796
R-squared: 0.8517258
MAE: 0.9221938

k-Nearest Neighbors (kNN):

set.seed(123)
knn_model <- train(protein ~ ., data = train_data, method = "knn", trControl = trainControl(method = "cv"), tuneLength = 10)
knn_model

k-Nearest Neighbors 

152 samples
100 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 136, 137, 137, 137, 136, 138, ... 
Resampling results across tuning parameters:

  k   RMSE      Rsquared   MAE     
   5  2.300655  0.4780876  1.909033
   7  2.453521  0.3946490  2.032608
   9  2.511651  0.3773289  2.075073
  11  2.550576  0.3616637  2.087163
  13  2.619400  0.3441352  2.145169
  15  2.654108  0.3080377  2.188281
  17  2.716741  0.2774054  2.239304
  19  2.716383  0.2760947  2.254749
  21  2.773202  0.2384392  2.297991
  23  2.789007  0.2262430  2.320930

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 5.

kNN:

k: 5
RMSE: 2.300655
squared: 0.4780876
MAE: 1.909033

e) Which model from parts c) and d) has the best predictive ability? Is any model significantly better or worse than the others?

Model	RMSE	Rsquared	MAE	Rank
OLS - Linear Regression	0.7268192	0.9450398	0.5549019	1
Principal Component Regression (PCR)	2.316586	0.4545058	1.848556	6
Partial Least Squares (PLS)	1.743833	0.6963113	1.291007	4
SVM	1.676734	0.7178974	1.272854	3
Neural Network	16.94374	NA	16.66501	7
MARS	1.212796	0.8517258	0.9221938	2
kNN	2.300655	0.478087	1.909033	5

In conclusion, I have ranked the models according to the criteria of lowest RMSE, and lowest MAE, and high rsquared. The OLS - Linear Regression outperforms all other models, followed by the MARS. The Nueral Network model performs the worst out of all the other models as it has the highest RMSE and the rsquared is not available.

Developing a model to predict permeability (see Sect. 1.4 of the textbook) could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug:

a) Start R and use these commands to load the data:

library(AppliedPredictiveModeling) 
data(permeability)
str(fingerprints)

 num [1:165, 1:1107] 0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:165] "1" "2" "3" "4" ...
  ..$ : chr [1:1107] "X1" "X2" "X3" "X4" ...

str(permeability)

 num [1:165, 1] 12.52 1.12 19.41 1.73 1.68 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:165] "1" "2" "3" "4" ...
  ..$ : chr "permeability"

The matrix fingerprints contains the 1,107 binary molecular predictors for the 165 compounds, while permeability contains permeability response:

b) The fingerprint predictors indicate the presence or absence of substructures of a molecule and are often sparse meaning that relatively few of the molecules contain each substructure. Filter out the predictors that have low frequencies using the nearZeroVar function from the caret package. How many predictors are left for modeling?

nzv <- nearZeroVar(fingerprints, saveMetrics = TRUE)
filtered_fingerprints <- fingerprints[, !nzv$nzv]
predictors_left <- ncol(filtered_fingerprints)
predictors_left

[1] 388

There are 388 predictors left.

c) Split the data into a training and a test set, pre-process the data, and tune a PLS model. How many latent variables are optimal and what is the corresponding resampled estimate of R2?

set.seed(123)
index <- createDataPartition(permeability, p = 0.7, list = FALSE)
train_data <- filtered_fingerprints[index, ]
train_permeability <- permeability[index]
test_data <- filtered_fingerprints[-index, ]
test_permeability <- permeability[-index]

# Preprocess the data (center and scale)
preProcValues <- preProcess(train_data, method = c("center", "scale"))
train_data_transformed <- predict(preProcValues, train_data)
test_data_transformed <- predict(preProcValues, test_data)

PLS

set.seed(123)
pls2_model <- train(train_data_transformed, train_permeability, method = "pls", trControl = trainControl(method = "cv"), tuneLength = 10)
pls2_model

Partial Least Squares 

117 samples
388 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 105, 105, 106, 105, 105, 105, ... 
Resampling results across tuning parameters:

  ncomp  RMSE      Rsquared   MAE      
   1     13.36436  0.3433889  10.474224
   2     12.30920  0.4595424   8.621998
   3     12.79841  0.4713902   9.518968
   4     13.01506  0.4586135   9.759753
   5     13.50115  0.4188773   9.868189
   6     13.28765  0.4391301   9.680872
   7     12.89540  0.4604643   9.314659
   8     12.82966  0.4653079   9.399587
   9     12.94528  0.4583512   9.434668
  10     13.30683  0.4341421   9.892463

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 2.

resampled_r2 <- max(pls2_model$results$Rsquared)
resampled_r2

[1] 0.4713902

PLS Model:

ncomp: 2
RMSE: 12.30920
Rsquared: 0.4595424
MAE: 8.621998
resampled R2: 0.4713902

There are 2 optimal latent variables and the resampled estimate rsquared is 0.4713902.

d) Predict the response for the test set. What is the test set estimate of R2?

pls2_pred <- predict(pls2_model, test_data_transformed)
pls2_r2 <- cor(pls2_pred, test_permeability)^2
pls2_r2

[1] 0.3819407

The test set of rsquared is 0.3819407.

e) Try building other models discussed in this chapter. Do any have better predictive performance?

Support Vector Machine (SVM)

set.seed(123)
svm2_model <- train(train_data, train_permeability, method = "svmRadial", trControl = trainControl(method = "cv"), tuneLength = 10)
svm2_model

Support Vector Machines with Radial Basis Function Kernel 

117 samples
388 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 105, 105, 106, 105, 105, 105, ... 
Resampling results across tuning parameters:

  C       RMSE      Rsquared   MAE     
    0.25  13.31509  0.4937223  8.559747
    0.50  12.03042  0.5011860  7.954428
    1.00  11.71028  0.5103354  7.664866
    2.00  11.87293  0.4929464  7.786952
    4.00  12.17146  0.4658056  8.201456
    8.00  12.33716  0.4474256  8.447651
   16.00  12.33661  0.4453930  8.476095
   32.00  12.29978  0.4482620  8.468869
   64.00  12.27952  0.4499855  8.465674
  128.00  12.27952  0.4499855  8.465674

Tuning parameter 'sigma' was held constant at a value of 0.003241275
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.003241275 and C = 1.

svm2_pred <- predict(svm2_model, test_data)
svm2_r2 <- cor(svm2_pred, test_permeability)^2
cat("\n\n")

svm2_r2

[1] 0.4394321

Support Vector Machine (SVM):

sigma: 0.003241275
C: 1
RMSE: 11.71028
rsquared: 0.5103354
MAE: 7.664866

The Test Set of rsquared: 0.4394321

Ridge Regression

set.seed(123)
ridge2_model <- train(train_data, train_permeability, method = "ridge", trControl = trainControl(method = "cv"), tuneLength = 10)

Warning: model fit failed for Fold04: lambda=0.0000000 Error in if (zmin < gamhat) { : missing value where TRUE/FALSE needed

Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
: There were missing values in resampled performance measures.

ridge2_model

Ridge Regression 

117 samples
388 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 105, 105, 106, 105, 105, 105, ... 
Resampling results across tuning parameters:

  lambda        RMSE          Rsquared    MAE         
  0.0000000000      22.42497  0.27702987      15.70016
  0.0001000000    6615.25285  0.08396519    3563.99122
  0.0002371374   99671.84144  0.08457611   62604.59123
  0.0005623413  170244.42077  0.14306870  104949.49993
  0.0013335214   13949.49087  0.14095413    8819.60573
  0.0031622777    1338.29409  0.09027590     926.17683
  0.0074989421    4869.57307  0.19911169    3391.43119
  0.0177827941      17.66500  0.25592818      12.61336
  0.0421696503      15.69511  0.31773516      11.39005
  0.1000000000      14.70937  0.37493137      10.76181

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.1.

ridge2_pred <- predict(ridge2_model, test_data)
ridge2_r2 <- cor(ridge2_pred, test_permeability)^2
cat("\n\n")

ridge2_r2

[1] 0.5311375

Ridge Rigression:

lambda = 0.1
RMSE: 14.70937
rsquared: 0.37493137
MAE: 10.76181

The Test Set of rsquared: 0.5311375

Lasso Regression

set.seed(123)
lasso2_model <- train(train_data, train_permeability, method = "lasso", trControl = trainControl(method = "cv"), tuneLength = 10)

Warning: model fit failed for Fold04: fraction=0.9 Error in if (zmin < gamhat) { : missing value where TRUE/FALSE needed

Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
: There were missing values in resampled performance measures.

lasso2_model

The lasso 

117 samples
388 predictors

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 105, 105, 106, 105, 105, 105, ... 
Resampling results across tuning parameters:

  fraction   RMSE      Rsquared   MAE      
  0.1000000  12.76949  0.4681541   9.532476
  0.1888889  13.75236  0.4031430   9.879639
  0.2777778  14.64878  0.3661843  10.403824
  0.3666667  15.57692  0.3421552  11.066913
  0.4555556  16.62051  0.3177241  11.791049
  0.5444444  17.77760  0.3013368  12.521844
  0.6333333  18.87893  0.2964326  13.284076
  0.7222222  20.07676  0.2902689  14.162004
  0.8111111  21.22992  0.2871456  14.928141
  0.9000000  21.91715  0.2806466  15.326569

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was fraction = 0.1.

lasso2_pred <- predict(lasso2_model, test_data)
lasso2_r2 <- cor(lasso2_pred, test_permeability)^2
cat("\n\n")

lasso2_r2

[1] 0.4573928

Lasso Regression:

fraction: 0.1
RMSE: 12.76949
rsquared: 0.4681541
MAE: 9.532476

The Test Set of rsquared: 0.4573928

Table Summary:

Model	RMSE	Rsquared	MAE	Test_set_rsquared
PLS Model	12.30920	0.4595424	8.621998	0.3819407
SVM	11.71028	0.5103354	7.664866	0.4394321
Ridge Rigression	14.70937	0.3749313	10.76181	0.5311375
Lasso Regression	12.76949	0.4681541	9.532476	0.4573928

f) Would you recommend any of your models to replace the permeability laboratory experiment?

According to the table above, SVM has the lowest RMSE, and MAE, this model has less of a possibility to give us an error. However, the Ridge Regression model has the highest rquared, which simply means that this model can explain the highest variance out of the other models. Now, my recommendation, would be to use the SVM model for this laboratory experiment, the reason is because it has the lowest RMSE and MAE, and a decent amount of variance can be explained by this model.

Return to the permeability problem outlined in Problem 2. Train several nonlinear regression models and evaluate the resampling and test set performance.

Support Vector Machines (SVM)

set.seed(123)
svm3_model <- train(train_data_transformed, train_permeability, method = "svmRadial",
                    trControl = trainControl(method = "cv", number = 10),
                    tuneLength = 10)

svm3_pred <- predict(svm3_model, newdata = test_data_transformed)
svm3_result <- postResample(svm3_pred, test_permeability)
svm3_result

      RMSE   Rsquared        MAE 
10.5483639  0.4394321  7.1154118

Neural Network

set.seed(123)
nn3_model <- train(train_data_transformed, train_permeability, method = "nnet",
                  trControl = trainControl(method = "cv", number = 10),
                  tuneLength = 10, trace = FALSE, linout = TRUE)

Warning: model fit failed for Fold01: size= 3, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold01: size= 5, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold01: size= 7, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold01: size= 9, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold01: size=11, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold01: size=13, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold01: size=15, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold01: size=17, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold01: size=19, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold01: size= 3, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold01: size= 5, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold01: size= 7, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold01: size= 9, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold01: size=11, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold01: size=13, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold01: size=15, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold01: size=17, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold01: size=19, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold01: size= 3, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold01: size= 5, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold01: size= 7, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold01: size= 9, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold01: size=11, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold01: size=13, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold01: size=15, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold01: size=17, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold01: size=19, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold01: size= 3, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold01: size= 5, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold01: size= 7, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold01: size= 9, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold01: size=11, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold01: size=13, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold01: size=15, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold01: size=17, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold01: size=19, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold01: size= 3, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold01: size= 5, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold01: size= 7, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold01: size= 9, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold01: size=11, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold01: size=13, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold01: size=15, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold01: size=17, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold01: size=19, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold01: size= 3, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold01: size= 5, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold01: size= 7, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold01: size= 9, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold01: size=11, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold01: size=13, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold01: size=15, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold01: size=17, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold01: size=19, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold01: size= 3, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold01: size= 5, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold01: size= 7, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold01: size= 9, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold01: size=11, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold01: size=13, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold01: size=15, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold01: size=17, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold01: size=19, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold01: size= 3, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold01: size= 5, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold01: size= 7, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold01: size= 9, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold01: size=11, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold01: size=13, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold01: size=15, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold01: size=17, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold01: size=19, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold01: size= 3, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold01: size= 5, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold01: size= 7, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold01: size= 9, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold01: size=11, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold01: size=13, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold01: size=15, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold01: size=17, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold01: size=19, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold01: size= 3, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold01: size= 5, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold01: size= 7, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold01: size= 9, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold01: size=11, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold01: size=13, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold01: size=15, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold01: size=17, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold01: size=19, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold02: size= 3, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold02: size= 5, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold02: size= 7, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold02: size= 9, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold02: size=11, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold02: size=13, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold02: size=15, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold02: size=17, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold02: size=19, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold02: size= 3, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold02: size= 5, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold02: size= 7, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold02: size= 9, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold02: size=11, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold02: size=13, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold02: size=15, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold02: size=17, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold02: size=19, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold02: size= 3, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold02: size= 5, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold02: size= 7, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold02: size= 9, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold02: size=11, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold02: size=13, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold02: size=15, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold02: size=17, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold02: size=19, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold02: size= 3, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold02: size= 5, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold02: size= 7, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold02: size= 9, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold02: size=11, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold02: size=13, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold02: size=15, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold02: size=17, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold02: size=19, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold02: size= 3, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold02: size= 5, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold02: size= 7, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold02: size= 9, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold02: size=11, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold02: size=13, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold02: size=15, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold02: size=17, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold02: size=19, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold02: size= 3, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold02: size= 5, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold02: size= 7, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold02: size= 9, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold02: size=11, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold02: size=13, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold02: size=15, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold02: size=17, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold02: size=19, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold02: size= 3, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold02: size= 5, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold02: size= 7, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold02: size= 9, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold02: size=11, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold02: size=13, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold02: size=15, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold02: size=17, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold02: size=19, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold02: size= 3, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold02: size= 5, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold02: size= 7, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold02: size= 9, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold02: size=11, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold02: size=13, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold02: size=15, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold02: size=17, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold02: size=19, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold02: size= 3, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold02: size= 5, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold02: size= 7, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold02: size= 9, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold02: size=11, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold02: size=13, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold02: size=15, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold02: size=17, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold02: size=19, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold02: size= 3, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold02: size= 5, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold02: size= 7, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold02: size= 9, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold02: size=11, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold02: size=13, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold02: size=15, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold02: size=17, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold02: size=19, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold03: size= 3, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold03: size= 5, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold03: size= 7, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold03: size= 9, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold03: size=11, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold03: size=13, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold03: size=15, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold03: size=17, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold03: size=19, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold03: size= 3, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold03: size= 5, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold03: size= 7, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold03: size= 9, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold03: size=11, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold03: size=13, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold03: size=15, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold03: size=17, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold03: size=19, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold03: size= 3, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold03: size= 5, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold03: size= 7, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold03: size= 9, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold03: size=11, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold03: size=13, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold03: size=15, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold03: size=17, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold03: size=19, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold03: size= 3, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold03: size= 5, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold03: size= 7, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold03: size= 9, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold03: size=11, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold03: size=13, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold03: size=15, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold03: size=17, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold03: size=19, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold03: size= 3, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold03: size= 5, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold03: size= 7, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold03: size= 9, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold03: size=11, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold03: size=13, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold03: size=15, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold03: size=17, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold03: size=19, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold03: size= 3, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold03: size= 5, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold03: size= 7, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold03: size= 9, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold03: size=11, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold03: size=13, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold03: size=15, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold03: size=17, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold03: size=19, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold03: size= 3, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold03: size= 5, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold03: size= 7, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold03: size= 9, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold03: size=11, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold03: size=13, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold03: size=15, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold03: size=17, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold03: size=19, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold03: size= 3, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold03: size= 5, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold03: size= 7, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold03: size= 9, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold03: size=11, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold03: size=13, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold03: size=15, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold03: size=17, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold03: size=19, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold03: size= 3, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold03: size= 5, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold03: size= 7, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold03: size= 9, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold03: size=11, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold03: size=13, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold03: size=15, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold03: size=17, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold03: size=19, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold03: size= 3, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold03: size= 5, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold03: size= 7, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold03: size= 9, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold03: size=11, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold03: size=13, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold03: size=15, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold03: size=17, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold03: size=19, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold04: size= 3, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold04: size= 5, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold04: size= 7, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold04: size= 9, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold04: size=11, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold04: size=13, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold04: size=15, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold04: size=17, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold04: size=19, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold04: size= 3, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold04: size= 5, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold04: size= 7, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold04: size= 9, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold04: size=11, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold04: size=13, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold04: size=15, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold04: size=17, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold04: size=19, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold04: size= 3, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold04: size= 5, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold04: size= 7, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold04: size= 9, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold04: size=11, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold04: size=13, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold04: size=15, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold04: size=17, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold04: size=19, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold04: size= 3, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold04: size= 5, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold04: size= 7, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold04: size= 9, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold04: size=11, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold04: size=13, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold04: size=15, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold04: size=17, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold04: size=19, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold04: size= 3, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold04: size= 5, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold04: size= 7, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold04: size= 9, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold04: size=11, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold04: size=13, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold04: size=15, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold04: size=17, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold04: size=19, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold04: size= 3, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold04: size= 5, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold04: size= 7, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold04: size= 9, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold04: size=11, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold04: size=13, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold04: size=15, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold04: size=17, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold04: size=19, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold04: size= 3, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold04: size= 5, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold04: size= 7, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold04: size= 9, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold04: size=11, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold04: size=13, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold04: size=15, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold04: size=17, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold04: size=19, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold04: size= 3, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold04: size= 5, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold04: size= 7, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold04: size= 9, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold04: size=11, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold04: size=13, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold04: size=15, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold04: size=17, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold04: size=19, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold04: size= 3, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold04: size= 5, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold04: size= 7, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold04: size= 9, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold04: size=11, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold04: size=13, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold04: size=15, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold04: size=17, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold04: size=19, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold04: size= 3, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold04: size= 5, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold04: size= 7, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold04: size= 9, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold04: size=11, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold04: size=13, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold04: size=15, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold04: size=17, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold04: size=19, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold05: size= 3, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold05: size= 5, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold05: size= 7, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold05: size= 9, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold05: size=11, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold05: size=13, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold05: size=15, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold05: size=17, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold05: size=19, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold05: size= 3, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold05: size= 5, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold05: size= 7, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold05: size= 9, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold05: size=11, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold05: size=13, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold05: size=15, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold05: size=17, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold05: size=19, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold05: size= 3, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold05: size= 5, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold05: size= 7, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold05: size= 9, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold05: size=11, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold05: size=13, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold05: size=15, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold05: size=17, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold05: size=19, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold05: size= 3, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold05: size= 5, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold05: size= 7, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold05: size= 9, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold05: size=11, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold05: size=13, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold05: size=15, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold05: size=17, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold05: size=19, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold05: size= 3, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold05: size= 5, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold05: size= 7, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold05: size= 9, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold05: size=11, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold05: size=13, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold05: size=15, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold05: size=17, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold05: size=19, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold05: size= 3, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold05: size= 5, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold05: size= 7, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold05: size= 9, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold05: size=11, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold05: size=13, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold05: size=15, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold05: size=17, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold05: size=19, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold05: size= 3, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold05: size= 5, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold05: size= 7, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold05: size= 9, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold05: size=11, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold05: size=13, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold05: size=15, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold05: size=17, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold05: size=19, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold05: size= 3, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold05: size= 5, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold05: size= 7, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold05: size= 9, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold05: size=11, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold05: size=13, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold05: size=15, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold05: size=17, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold05: size=19, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold05: size= 3, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold05: size= 5, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold05: size= 7, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold05: size= 9, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold05: size=11, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold05: size=13, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold05: size=15, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold05: size=17, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold05: size=19, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold05: size= 3, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold05: size= 5, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold05: size= 7, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold05: size= 9, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold05: size=11, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold05: size=13, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold05: size=15, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold05: size=17, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold05: size=19, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold06: size= 3, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold06: size= 5, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold06: size= 7, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold06: size= 9, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold06: size=11, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold06: size=13, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold06: size=15, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold06: size=17, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold06: size=19, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold06: size= 3, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold06: size= 5, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold06: size= 7, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold06: size= 9, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold06: size=11, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold06: size=13, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold06: size=15, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold06: size=17, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold06: size=19, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold06: size= 3, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold06: size= 5, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold06: size= 7, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold06: size= 9, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold06: size=11, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold06: size=13, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold06: size=15, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold06: size=17, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold06: size=19, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold06: size= 3, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold06: size= 5, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold06: size= 7, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold06: size= 9, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold06: size=11, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold06: size=13, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold06: size=15, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold06: size=17, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold06: size=19, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold06: size= 3, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold06: size= 5, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold06: size= 7, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold06: size= 9, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold06: size=11, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold06: size=13, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold06: size=15, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold06: size=17, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold06: size=19, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold06: size= 3, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold06: size= 5, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold06: size= 7, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold06: size= 9, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold06: size=11, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold06: size=13, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold06: size=15, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold06: size=17, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold06: size=19, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold06: size= 3, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold06: size= 5, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold06: size= 7, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold06: size= 9, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold06: size=11, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold06: size=13, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold06: size=15, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold06: size=17, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold06: size=19, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold06: size= 3, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold06: size= 5, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold06: size= 7, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold06: size= 9, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold06: size=11, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold06: size=13, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold06: size=15, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold06: size=17, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold06: size=19, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold06: size= 3, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold06: size= 5, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold06: size= 7, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold06: size= 9, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold06: size=11, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold06: size=13, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold06: size=15, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold06: size=17, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold06: size=19, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold06: size= 3, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold06: size= 5, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold06: size= 7, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold06: size= 9, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold06: size=11, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold06: size=13, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold06: size=15, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold06: size=17, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold06: size=19, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold07: size= 3, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold07: size= 5, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold07: size= 7, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold07: size= 9, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold07: size=11, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold07: size=13, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold07: size=15, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold07: size=17, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold07: size=19, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold07: size= 3, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold07: size= 5, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold07: size= 7, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold07: size= 9, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold07: size=11, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold07: size=13, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold07: size=15, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold07: size=17, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold07: size=19, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold07: size= 3, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold07: size= 5, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold07: size= 7, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold07: size= 9, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold07: size=11, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold07: size=13, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold07: size=15, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold07: size=17, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold07: size=19, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold07: size= 3, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold07: size= 5, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold07: size= 7, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold07: size= 9, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold07: size=11, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold07: size=13, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold07: size=15, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold07: size=17, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold07: size=19, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold07: size= 3, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold07: size= 5, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold07: size= 7, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold07: size= 9, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold07: size=11, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold07: size=13, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold07: size=15, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold07: size=17, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold07: size=19, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold07: size= 3, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold07: size= 5, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold07: size= 7, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold07: size= 9, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold07: size=11, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold07: size=13, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold07: size=15, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold07: size=17, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold07: size=19, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold07: size= 3, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold07: size= 5, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold07: size= 7, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold07: size= 9, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold07: size=11, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold07: size=13, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold07: size=15, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold07: size=17, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold07: size=19, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold07: size= 3, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold07: size= 5, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold07: size= 7, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold07: size= 9, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold07: size=11, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold07: size=13, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold07: size=15, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold07: size=17, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold07: size=19, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold07: size= 3, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold07: size= 5, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold07: size= 7, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold07: size= 9, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold07: size=11, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold07: size=13, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold07: size=15, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold07: size=17, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold07: size=19, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold07: size= 3, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold07: size= 5, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold07: size= 7, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold07: size= 9, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold07: size=11, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold07: size=13, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold07: size=15, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold07: size=17, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold07: size=19, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold08: size= 3, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold08: size= 5, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold08: size= 7, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold08: size= 9, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold08: size=11, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold08: size=13, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold08: size=15, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold08: size=17, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold08: size=19, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold08: size= 3, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold08: size= 5, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold08: size= 7, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold08: size= 9, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold08: size=11, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold08: size=13, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold08: size=15, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold08: size=17, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold08: size=19, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold08: size= 3, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold08: size= 5, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold08: size= 7, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold08: size= 9, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold08: size=11, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold08: size=13, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold08: size=15, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold08: size=17, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold08: size=19, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold08: size= 3, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold08: size= 5, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold08: size= 7, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold08: size= 9, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold08: size=11, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold08: size=13, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold08: size=15, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold08: size=17, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold08: size=19, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold08: size= 3, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold08: size= 5, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold08: size= 7, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold08: size= 9, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold08: size=11, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold08: size=13, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold08: size=15, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold08: size=17, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold08: size=19, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold08: size= 3, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold08: size= 5, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold08: size= 7, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold08: size= 9, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold08: size=11, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold08: size=13, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold08: size=15, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold08: size=17, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold08: size=19, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold08: size= 3, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold08: size= 5, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold08: size= 7, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold08: size= 9, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold08: size=11, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold08: size=13, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold08: size=15, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold08: size=17, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold08: size=19, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold08: size= 3, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold08: size= 5, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold08: size= 7, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold08: size= 9, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold08: size=11, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold08: size=13, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold08: size=15, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold08: size=17, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold08: size=19, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold08: size= 3, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold08: size= 5, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold08: size= 7, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold08: size= 9, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold08: size=11, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold08: size=13, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold08: size=15, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold08: size=17, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold08: size=19, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold08: size= 3, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold08: size= 5, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold08: size= 7, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold08: size= 9, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold08: size=11, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold08: size=13, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold08: size=15, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold08: size=17, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold08: size=19, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold09: size= 3, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold09: size= 5, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold09: size= 7, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold09: size= 9, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold09: size=11, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold09: size=13, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold09: size=15, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold09: size=17, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold09: size=19, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold09: size= 3, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold09: size= 5, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold09: size= 7, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold09: size= 9, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold09: size=11, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold09: size=13, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold09: size=15, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold09: size=17, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold09: size=19, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold09: size= 3, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold09: size= 5, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold09: size= 7, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold09: size= 9, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold09: size=11, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold09: size=13, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold09: size=15, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold09: size=17, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold09: size=19, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold09: size= 3, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold09: size= 5, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold09: size= 7, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold09: size= 9, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold09: size=11, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold09: size=13, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold09: size=15, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold09: size=17, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold09: size=19, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold09: size= 3, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold09: size= 5, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold09: size= 7, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold09: size= 9, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold09: size=11, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold09: size=13, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold09: size=15, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold09: size=17, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold09: size=19, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold09: size= 3, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold09: size= 5, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold09: size= 7, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold09: size= 9, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold09: size=11, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold09: size=13, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold09: size=15, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold09: size=17, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold09: size=19, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold09: size= 3, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold09: size= 5, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold09: size= 7, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold09: size= 9, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold09: size=11, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold09: size=13, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold09: size=15, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold09: size=17, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold09: size=19, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold09: size= 3, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold09: size= 5, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold09: size= 7, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold09: size= 9, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold09: size=11, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold09: size=13, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold09: size=15, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold09: size=17, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold09: size=19, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold09: size= 3, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold09: size= 5, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold09: size= 7, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold09: size= 9, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold09: size=11, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold09: size=13, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold09: size=15, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold09: size=17, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold09: size=19, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold09: size= 3, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold09: size= 5, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold09: size= 7, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold09: size= 9, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold09: size=11, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold09: size=13, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold09: size=15, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold09: size=17, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold09: size=19, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold10: size= 3, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold10: size= 5, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold10: size= 7, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold10: size= 9, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold10: size=11, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold10: size=13, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold10: size=15, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold10: size=17, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold10: size=19, decay=0.0000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold10: size= 3, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold10: size= 5, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold10: size= 7, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold10: size= 9, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold10: size=11, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold10: size=13, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold10: size=15, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold10: size=17, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold10: size=19, decay=0.1000000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold10: size= 3, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold10: size= 5, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold10: size= 7, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold10: size= 9, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold10: size=11, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold10: size=13, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold10: size=15, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold10: size=17, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold10: size=19, decay=0.0421697 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold10: size= 3, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold10: size= 5, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold10: size= 7, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold10: size= 9, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold10: size=11, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold10: size=13, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold10: size=15, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold10: size=17, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold10: size=19, decay=0.0177828 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold10: size= 3, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold10: size= 5, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold10: size= 7, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold10: size= 9, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold10: size=11, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold10: size=13, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold10: size=15, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold10: size=17, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold10: size=19, decay=0.0074989 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold10: size= 3, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold10: size= 5, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold10: size= 7, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold10: size= 9, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold10: size=11, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold10: size=13, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold10: size=15, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold10: size=17, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold10: size=19, decay=0.0031623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold10: size= 3, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold10: size= 5, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold10: size= 7, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold10: size= 9, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold10: size=11, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold10: size=13, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold10: size=15, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold10: size=17, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold10: size=19, decay=0.0013335 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold10: size= 3, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold10: size= 5, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold10: size= 7, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold10: size= 9, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold10: size=11, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold10: size=13, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold10: size=15, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold10: size=17, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold10: size=19, decay=0.0005623 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold10: size= 3, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold10: size= 5, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold10: size= 7, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold10: size= 9, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold10: size=11, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold10: size=13, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold10: size=15, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold10: size=17, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold10: size=19, decay=0.0002371 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning: model fit failed for Fold10: size= 3, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1171) weights

Warning: model fit failed for Fold10: size= 5, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (1951) weights

Warning: model fit failed for Fold10: size= 7, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (2731) weights

Warning: model fit failed for Fold10: size= 9, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (3511) weights

Warning: model fit failed for Fold10: size=11, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (4291) weights

Warning: model fit failed for Fold10: size=13, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5071) weights

Warning: model fit failed for Fold10: size=15, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (5851) weights

Warning: model fit failed for Fold10: size=17, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (6631) weights

Warning: model fit failed for Fold10: size=19, decay=0.0001000 Error in nnet.default(x, y, w, ...) : too many (7411) weights

Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
: There were missing values in resampled performance measures.

Warning in train.default(train_data_transformed, train_permeability, method =
"nnet", : missing values found in aggregated results

nn3_pred <- predict(nn3_model, newdata = test_data_transformed)
nn3_result <- postResample(nn3_pred, test_permeability)
nn3_result

     RMSE  Rsquared       MAE 
12.731659  0.197259  9.597258

MARS

set.seed(123)
mars3_model <- train(train_data_transformed, train_permeability, method = "earth",
                    trControl = trainControl(method = "cv", number = 10),
                    tuneLength = 10)

Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
: There were missing values in resampled performance measures.

mars3_pred <- predict(mars3_model, newdata = test_data_transformed)
mars3_result <- postResample(mars3_pred, test_permeability)
mars3_result

      RMSE   Rsquared        MAE 
11.8764033  0.3239788  7.5933958

k-Nearest Neighbors (kNN)

set.seed(123)
knn3_model <- train(train_data_transformed, train_permeability, method = "kknn",
                   trControl = trainControl(method = "cv", number = 10),
                   tuneLength = 10)

knn3_pred <- predict(knn3_model, newdata = test_data_transformed)
knn3_result <- postResample(knn3_pred, test_permeability)
knn3_result

      RMSE   Rsquared        MAE 
10.5121448  0.4547602  6.9931969

a) Which nonlinear regression model that we learned in Chapter 7 gives the optimal resampling and test set performance?

NONLINEAR	RMSE	Rsquared	MAE
SVM	10.5483639	0.4394321	7.1154118
Neural Network	12.731659	0.197259	9.597258
MARS	11.8764033	0.3239788	7.5933958
kNN	10.5121448	0.4547602	6.9931969

The kNN Model gives the optimal resampling and test set performance, because it has the lowest RMSE and the highest R-squared compared to the others.

b) Do any of the nonlinear models outperform the optimal linear model you previously developed in Problem 2? If so, what might this tell you about the underlying relationship between the predictors and the response?

NONLINEAR	RMSE	Rsquared	MAE
SVM	10.5483639	0.4394321	7.1154118
Neural Network	12.731659	0.197259	9.597258
MARS	11.8764033	0.3239788	7.5933958
kNN	10.5121448	0.4547602	6.9931969

Other Models Ran in Q2	RMSE	Rsquared	MAE	Test_set_rsquared
PLS Model	12.30920	0.4595424	8.621998	0.3819407
Ridge Rigression	14.70937	0.3749313	10.76181	0.5311375
Lasso Regression	12.76949	0.4681541	9.532476	0.4573928
SVM	11.71028	0.5103354	7.664866	0.4394321

Yes, the best model so far is the kNN model, which seems to outperform all the other models, this model has the lowest RMSE and MAE, and its rsquared value is not the lowest. This highlights the relationship between predictors and the response variables are nonlinear.

c) Would you recommend any of the models you have developed to replace the permeability laboratory experiment?

Based on the results, I recommend using the kNN model. This model demonstrates the lowest RMSE and MAE, indicating it has the smallest error and provides the most accurate predictions. Additionally, it has one of the higher R-squared values, suggesting it effectively explains the variability in the data.

Analyzing and Predicting Oil Types and Customer Churn Using Machine Learning Techniques

# Load necessary libraries
library(caret)
library(tidyverse)

Warning: package 'tidyr' was built under R version 4.3.3

Warning: package 'dplyr' was built under R version 4.3.3

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ lubridate 1.9.2     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ ggplot2::alpha()       masks kernlab::alpha()
✖ stringr::boundary()    masks strucchange::boundary()
✖ dplyr::combine()       masks randomForest::combine(), gridExtra::combine()
✖ purrr::cross()         masks kernlab::cross()
✖ dplyr::filter()        masks mice::filter(), stats::filter()
✖ dplyr::lag()           masks stats::lag()
✖ purrr::lift()          masks caret::lift()
✖ randomForest::margin() masks ggplot2::margin()
✖ dplyr::select()        masks MASS::select()
✖ dplyr::where()         masks party::where()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(e1071) 
library(randomForest) 
library(nnet) 
library(modeldata)

Warning: package 'modeldata' was built under R version 4.3.3

library(ggplot2)
library(dplyr)
library(corrplot)
library(GGally)
library(rsample)

Warning: package 'rsample' was built under R version 4.3.3


Attaching package: 'rsample'

The following object is masked from 'package:e1071':

    permutations

library(recipes)

Warning: package 'recipes' was built under R version 4.3.3


Attaching package: 'recipes'

The following object is masked from 'package:stringr':

    fixed

The following object is masked from 'package:VIM':

    prepare

The following object is masked from 'package:stats4':

    update

The following object is masked from 'package:stats':

    step

library(rpart)
library(ranger)

Warning: package 'ranger' was built under R version 4.3.3


Attaching package: 'ranger'

The following object is masked from 'package:randomForest':

    importance

library(nnet)
library(caret)
library(yardstick)

Warning: package 'yardstick' was built under R version 4.3.3


Attaching package: 'yardstick'

The following object is masked from 'package:readr':

    spec

The following objects are masked from 'package:caret':

    precision, recall, sensitivity, specificity

library(yardstick)
library(tidyverse)
library(caret)
library(randomForest)
library(xgboost)

Warning: package 'xgboost' was built under R version 4.3.3


Attaching package: 'xgboost'

The following object is masked from 'package:dplyr':

    slice

In Homework 1, Problem 3, we described a data set which contained 96 oil samples each from one of seven types of oils (pumpkin, sunflower, peanut, olive, soybean, rapeseed, and corn). Gas chromatography was performed on each sample and the percentage of each type of 7 fatty acids was determined. We would like to use these data to build a model that predicts the type of oil based on a sample’s fatty acid percentages. These data can be found in the caret package using data(oil). The oil types are contained in a factor variable called oilType. The types are pumpkin (coded as A), sunflower (B), peanut (C), olive (D), soybean (E), rapeseed (F) and corn (G). In R

# Load the data
?oil
data(oil)

str(oilType)

 Factor w/ 7 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...

table(oilType)

oilType
 A  B  C  D  E  F  G 
37 26  3  7 11 10  2

Given the classification imbalance in oil Type, describe how you would create a training and testing set.

# Convert the fatty acid compositions into a data frame
oil_data <- as.data.frame(fattyAcids)
oil_data$oilType <- oilType

# Set seed for reproducibility
set.seed(123)

# Split the data using stratified sampling
train_index <- createDataPartition(oil_data$oilType, p = 0.7, list = FALSE)
train_data <- oil_data[train_index, ]
test_data <- oil_data[-train_index, ]

# Pre-process the data: Centering and Scaling
preProcValues <- preProcess(train_data[,-ncol(train_data)], method = c("center", "scale"))
train_data[,-ncol(train_data)] <- predict(preProcValues, train_data[,-ncol(train_data)])
test_data[,-ncol(test_data)] <- predict(preProcValues, test_data[,-ncol(test_data)])

# Control for cross-validation
ctrl <- trainControl(method = "cv", number = 10, classProbs = TRUE, summaryFunction = multiClassSummary)

# Check for missing values in the dataset
colSums(is.na(train_data))

  Palmitic    Stearic      Oleic   Linoleic  Linolenic Eicosanoic Eicosenoic 
         0          0          0          0          0          0          0 
   oilType 
         0

# Remove rows with missing values
train_data_clean <- na.omit(train_data)

# Impute missing values using the median
preProcess_missing <- preProcess(train_data, method = 'medianImpute')
train_data_clean <- predict(preProcess_missing, train_data)

Which classification statistic would you choose to optimize for this problem and why?

I would choose the F1 score in cases where the classes are imbalanced. The objective is to achieve a balance between precision and recall.

Split the data into a training and a testing set, pre-process the data, and build models and tune them via resampling described in Chapter 12. Clearly list the models under consideration and the corresponding tuning parameters of the models.

k-Nearest Neighbors (k-NN):

# k-Nearest Neighbors (k-NN)
set.seed(123)
knn_model <- train(oilType ~ ., data = train_data_clean, method = "knn", trControl = ctrl, tuneLength = 10)

Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
: There were missing values in resampled performance measures.

knn_model

k-Nearest Neighbors 

70 samples
 7 predictor
 7 classes: 'A', 'B', 'C', 'D', 'E', 'F', 'G' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 62, 61, 62, 65, 63, 62, ... 
Resampling results across tuning parameters:

  k   logLoss    AUC        prAUC       Accuracy   Kappa      Mean_F1
   5  0.5583384  0.9919921  0.01388889  0.9375000  0.9195499  NaN    
   7  0.2130499  0.9974206  0.01708333  0.9138889  0.8913839  NaN    
   9  0.2846983  0.9926091  0.01504630  0.8902778  0.8606707  NaN    
  11  0.3528189  0.9904927  0.01430556  0.8500000  0.7986789  NaN    
  13  0.4069273  0.9866567  0.04625000  0.8333333  0.7649739  NaN    
  15  0.4571756  0.9891865  0.06750000  0.8208333  0.7493668  NaN    
  17  0.5034627  0.9847983  0.07888889  0.7629365  0.6623458  NaN    
  19  0.5718785  0.9844444  0.06222222  0.7629365  0.6633254  NaN    
  21  0.6450366  0.9800893  0.10847222  0.7179365  0.5943018  NaN    
  23  0.7030457  0.9753274  0.15388889  0.7054365  0.5779753  NaN    
  Mean_Sensitivity  Mean_Specificity  Mean_Pos_Pred_Value  Mean_Neg_Pred_Value
  NaN               0.9879592         NaN                  NaN                
  NaN               0.9852891         NaN                  NaN                
  NaN               0.9805272         NaN                  NaN                
  NaN               0.9712585         NaN                  NaN                
  NaN               0.9652041         NaN                  NaN                
  NaN               0.9632993         NaN                  NaN                
  NaN               0.9493878         NaN                  NaN                
  NaN               0.9497279         NaN                  NaN                
  NaN               0.9405612         NaN                  NaN                
  NaN               0.9381803         NaN                  NaN                
  Mean_Precision  Mean_Recall  Mean_Detection_Rate  Mean_Balanced_Accuracy
  NaN             NaN          0.1339286            NaN                   
  NaN             NaN          0.1305556            NaN                   
  NaN             NaN          0.1271825            NaN                   
  NaN             NaN          0.1214286            NaN                   
  NaN             NaN          0.1190476            NaN                   
  NaN             NaN          0.1172619            NaN                   
  NaN             NaN          0.1089909            NaN                   
  NaN             NaN          0.1089909            NaN                   
  NaN             NaN          0.1025624            NaN                   
  NaN             NaN          0.1007766            NaN                   

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 5.

k-NN

k = 5
Log Loss: 0.5583384
AUC: 0.9919921
prAUC: 0.01388889
Accuracy: 0.9375000
Kappa: 0.9195499
Mean Specificity: 0.9879592
Mean_Detection_Rate: 0.1339286

Logistic Regression:

# Multinomial Logistic Regression
set.seed(123)
log_reg_model <- train(oilType ~ ., data = train_data_clean, method = "multinom", trControl = ctrl)

# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 0.230492
iter  20 value 0.009305
iter  30 value 0.005935
iter  40 value 0.003696
iter  50 value 0.000954
iter  60 value 0.000826
iter  70 value 0.000260
iter  80 value 0.000245
final  value 0.000060 
converged
# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 13.646415
iter  20 value 12.851693
final  value 12.849291 
converged
# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 0.426250
iter  20 value 0.207953
iter  30 value 0.168443
iter  40 value 0.149413
iter  50 value 0.139491
iter  60 value 0.133972
iter  70 value 0.127703
iter  80 value 0.124725
iter  90 value 0.122750
iter 100 value 0.121841
final  value 0.121841 
stopped after 100 iterations
# weights:  63 (48 variable)
initial  value 118.700519 
iter  10 value 0.649572
iter  20 value 0.072909
iter  30 value 0.019004
iter  40 value 0.003326
iter  50 value 0.000645
iter  60 value 0.000422
iter  70 value 0.000364
final  value 0.000098 
converged
# weights:  63 (48 variable)
initial  value 118.700519 
iter  10 value 13.670469
iter  20 value 12.717571
final  value 12.717563 
converged
# weights:  63 (48 variable)
initial  value 118.700519 
iter  10 value 0.686614
iter  20 value 0.158090
iter  30 value 0.125404
iter  40 value 0.118784
iter  50 value 0.114113
iter  60 value 0.112865
iter  70 value 0.110695
iter  80 value 0.109572
iter  90 value 0.109285
iter 100 value 0.109183
final  value 0.109183 
stopped after 100 iterations
# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 1.152097
iter  20 value 0.109926
iter  30 value 0.001534
iter  40 value 0.000261
final  value 0.000067 
converged
# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 14.882114
iter  20 value 13.391351
final  value 13.391006 
converged
# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 1.193110
iter  20 value 0.224979
iter  30 value 0.168419
iter  40 value 0.160797
iter  50 value 0.156729
iter  60 value 0.153777
iter  70 value 0.150874
iter  80 value 0.149621
iter  90 value 0.148691
iter 100 value 0.148334
final  value 0.148334 
stopped after 100 iterations
# weights:  63 (48 variable)
initial  value 126.484160 
iter  10 value 1.375234
iter  20 value 0.124856
iter  30 value 0.012044
iter  40 value 0.002572
final  value 0.000097 
converged
# weights:  63 (48 variable)
initial  value 126.484160 
iter  10 value 15.628076
iter  20 value 13.989692
final  value 13.989554 
converged
# weights:  63 (48 variable)
initial  value 126.484160 
iter  10 value 1.417908
iter  20 value 0.245160
iter  30 value 0.180883
iter  40 value 0.170537
iter  50 value 0.167035
iter  60 value 0.162098
iter  70 value 0.157875
iter  80 value 0.156208
iter  90 value 0.154991
iter 100 value 0.154037
final  value 0.154037 
stopped after 100 iterations
# weights:  63 (48 variable)
initial  value 122.592339 
iter  10 value 1.160417
iter  20 value 0.162686
iter  30 value 0.015871
iter  40 value 0.001935
iter  50 value 0.000669
final  value 0.000081 
converged
# weights:  63 (48 variable)
initial  value 122.592339 
iter  10 value 15.295508
iter  20 value 13.937812
final  value 13.936771 
converged
# weights:  63 (48 variable)
initial  value 122.592339 
iter  10 value 1.211200
iter  20 value 0.266794
iter  30 value 0.174726
iter  40 value 0.165736
iter  50 value 0.162175
iter  60 value 0.157912
iter  70 value 0.155364
iter  80 value 0.153778
iter  90 value 0.152625
iter 100 value 0.151735
final  value 0.151735 
stopped after 100 iterations
# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 0.982411
iter  20 value 0.066584
iter  30 value 0.011904
iter  40 value 0.002347
iter  50 value 0.001356
final  value 0.000097 
converged
# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 17.094756
iter  20 value 13.621311
final  value 13.620050 
converged
# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 1.025922
iter  20 value 0.188570
iter  30 value 0.160388
iter  40 value 0.154288
iter  50 value 0.149946
iter  60 value 0.145432
iter  70 value 0.143887
iter  80 value 0.141985
iter  90 value 0.140837
iter 100 value 0.140255
final  value 0.140255 
stopped after 100 iterations
# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 0.201740
iter  20 value 0.036329
iter  30 value 0.015300
iter  40 value 0.006737
iter  50 value 0.001509
iter  60 value 0.001034
iter  70 value 0.000403
iter  80 value 0.000390
final  value 0.000095 
converged
# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 15.237850
iter  20 value 13.504443
final  value 13.503651 
converged
# weights:  63 (48 variable)
initial  value 120.646429 
iter  10 value 0.345888
iter  20 value 0.205508
iter  30 value 0.195911
iter  40 value 0.175069
iter  50 value 0.161029
iter  60 value 0.154607
iter  70 value 0.150725
iter  80 value 0.149049
iter  90 value 0.147205
iter 100 value 0.146646
final  value 0.146646 
stopped after 100 iterations
# weights:  63 (48 variable)
initial  value 124.538250 
iter  10 value 1.423335
iter  20 value 0.097869
iter  30 value 0.013049
iter  40 value 0.000310
final  value 0.000099 
converged
# weights:  63 (48 variable)
initial  value 124.538250 
iter  10 value 15.998770
iter  20 value 13.988523
final  value 13.987954 
converged
# weights:  63 (48 variable)
initial  value 124.538250 
iter  10 value 1.461579
iter  20 value 0.221927
iter  30 value 0.182446
iter  40 value 0.172022
iter  50 value 0.166376
iter  60 value 0.158815
iter  70 value 0.155713
iter  80 value 0.154033
iter  90 value 0.153355
iter 100 value 0.152889
final  value 0.152889 
stopped after 100 iterations
# weights:  63 (48 variable)
initial  value 126.484160 
iter  10 value 1.036502
iter  20 value 0.114252
iter  30 value 0.014528
iter  40 value 0.002648
iter  50 value 0.000355
final  value 0.000100 
converged
# weights:  63 (48 variable)
initial  value 126.484160 
iter  10 value 14.854834
iter  20 value 13.259516
final  value 13.259460 
converged
# weights:  63 (48 variable)
initial  value 126.484160 
iter  10 value 1.076774
iter  20 value 0.208832
iter  30 value 0.157131
iter  40 value 0.147462
iter  50 value 0.144709
iter  60 value 0.143059
iter  70 value 0.140412
iter  80 value 0.139310
iter  90 value 0.138768
iter 100 value 0.138344
final  value 0.138344 
stopped after 100 iterations
# weights:  63 (48 variable)
initial  value 124.538250 
iter  10 value 1.127422
iter  20 value 0.120905
iter  30 value 0.019095
iter  40 value 0.000191
final  value 0.000076 
converged
# weights:  63 (48 variable)
initial  value 124.538250 
iter  10 value 16.170644
iter  20 value 14.131864
final  value 14.130439 
converged
# weights:  63 (48 variable)
initial  value 124.538250 
iter  10 value 1.178679
iter  20 value 0.235910
iter  30 value 0.180086
iter  40 value 0.170725
iter  50 value 0.165276
iter  60 value 0.160358
iter  70 value 0.157981
iter  80 value 0.156756
iter  90 value 0.155457
iter 100 value 0.154352
final  value 0.154352 
stopped after 100 iterations

Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
: There were missing values in resampled performance measures.

# weights:  63 (48 variable)
initial  value 136.213710 
iter  10 value 18.641421
iter  20 value 14.337670
final  value 14.337620 
converged

log_reg_model

Penalized Multinomial Regression 

70 samples
 7 predictor
 7 classes: 'A', 'B', 'C', 'D', 'E', 'F', 'G' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 62, 61, 62, 65, 63, 62, ... 
Resampling results across tuning parameters:

  decay  logLoss    AUC        prAUC      Accuracy   Kappa      Mean_F1
  0e+00  0.5777232  0.9892857  0.2177480  0.9313889  0.9115174  NaN    
  1e-04  0.4544908  0.9930556  0.2457738  0.9513889  0.9387330  NaN    
  1e-01  0.1882385  0.9972222  0.2473611  0.9513889  0.9387330  NaN    
  Mean_Sensitivity  Mean_Specificity  Mean_Pos_Pred_Value  Mean_Neg_Pred_Value
  NaN               0.9882993         NaN                  NaN                
  NaN               0.9920918         NaN                  NaN                
  NaN               0.9920918         NaN                  NaN                
  Mean_Precision  Mean_Recall  Mean_Detection_Rate  Mean_Balanced_Accuracy
  NaN             NaN          0.1330556            NaN                   
  NaN             NaN          0.1359127            NaN                   
  NaN             NaN          0.1359127            NaN                   

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was decay = 0.1.

Logistic Regression

decay: 0.1
logLoss: 0.1882385
AUC: 0.9972222
prAUC: 0.2473611
Accuracy: 0.9513889
Kappa: 0.9387330
Mean_Specificity: 0.9920918
Mean_Detection_Rate: 0.1359127

Random Forest:

# Random Forest
set.seed(123)
rf_grid <- expand.grid(mtry = seq(1, ncol(train_data) - 1, length.out = 6))
rf_model <- train(oilType ~ ., data = train_data, method = "rf", trControl = ctrl, tuneGrid = rf_grid)

Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
: There were missing values in resampled performance measures.

rf_model

Random Forest 

70 samples
 7 predictor
 7 classes: 'A', 'B', 'C', 'D', 'E', 'F', 'G' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 62, 61, 62, 65, 63, 62, ... 
Resampling results across tuning parameters:

  mtry  logLoss    AUC  prAUC      Accuracy  Kappa      Mean_F1
  1.0   0.2490844  1    0.2522222  0.9625    0.9523009  NaN    
  2.2   0.1537911  1    0.2188889  0.9750    0.9679487  NaN    
  3.4   0.1361475  1    0.2147222  0.9750    0.9679487  NaN    
  4.6   0.1333570  1    0.1791667  0.9550    0.9405111  NaN    
  5.8   0.1378782  1    0.1583333  0.9550    0.9405111  NaN    
  7.0   0.1509535  1    0.1455556  0.9550    0.9405111  NaN    
  Mean_Sensitivity  Mean_Specificity  Mean_Pos_Pred_Value  Mean_Neg_Pred_Value
  NaN               0.9934524         NaN                  NaN                
  NaN               0.9959184         NaN                  NaN                
  NaN               0.9959184         NaN                  NaN                
  NaN               0.9933163         NaN                  NaN                
  NaN               0.9933163         NaN                  NaN                
  NaN               0.9933163         NaN                  NaN                
  Mean_Precision  Mean_Recall  Mean_Detection_Rate  Mean_Balanced_Accuracy
  NaN             NaN          0.1375000            NaN                   
  NaN             NaN          0.1392857            NaN                   
  NaN             NaN          0.1392857            NaN                   
  NaN             NaN          0.1364286            NaN                   
  NaN             NaN          0.1364286            NaN                   
  NaN             NaN          0.1364286            NaN                   

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 2.2.

Random Forest:

mtry = 2.2
LogLoss: 0.1537911
AUC: 1
prAUC: 0.2188889
Accuracy: 0.9750
Kappa: 0.9679487
Mean_Specificity: 0.9959184
Mean_Detection_Rate: 0.1392857

Summary:

Model	Tuning Parameter	Accuracy	Kappa	AUC	prAUC
k-NN	k = 5	0.9375	0.9195499	0.9920	0.0139
Logistic Regression	decay = 0.1	0.9514	0.9387330	0.9972	0.2474
Random Forest	mtry = 2.2	0.9750	0.9679487	1.0000	0.2189

Conclusion: Out of all the models the Random Forest performs the best with the highest accuracy, Kappa, and AUC score. Then followed by the Logistic Regression model, and the model that performed the worst is the k-NN model. Altogether these models performed well, but the Random Forest out performs the rest of the models.

Of the models presented in this chapter, which performs best on these data? Which oil type does the model most accurately predict? Least accurately predict?

# Generate predictions for each model
knn_pred <- predict(knn_model, test_data)
log_reg_pred <- predict(log_reg_model, test_data)
rf_pred <- predict(rf_model, test_data)

# Create confusion matrices
knn_cm <- confusionMatrix(knn_pred, test_data$oilType)
log_reg_cm <- confusionMatrix(log_reg_pred, test_data$oilType)
rf_cm <- confusionMatrix(rf_pred, test_data$oilType)

# Print confusion matrices
print(knn_cm)

Confusion Matrix and Statistics

          Reference
Prediction  A  B  C  D  E  F  G
         A 10  0  0  0  0  0  0
         B  1  7  0  0  0  0  0
         C  0  0  0  0  0  0  0
         D  0  0  0  2  0  0  0
         E  0  0  0  0  3  0  0
         F  0  0  0  0  0  3  0
         G  0  0  0  0  0  0  0

Overall Statistics
                                         
               Accuracy : 0.9615         
                 95% CI : (0.8036, 0.999)
    No Information Rate : 0.4231         
    P-Value [Acc > NIR] : 7.058e-09      
                                         
                  Kappa : 0.9467         
                                         
 Mcnemar's Test P-Value : NA             

Statistics by Class:

                     Class: A Class: B Class: C Class: D Class: E Class: F
Sensitivity            0.9091   1.0000       NA  1.00000   1.0000   1.0000
Specificity            1.0000   0.9474        1  1.00000   1.0000   1.0000
Pos Pred Value         1.0000   0.8750       NA  1.00000   1.0000   1.0000
Neg Pred Value         0.9375   1.0000       NA  1.00000   1.0000   1.0000
Prevalence             0.4231   0.2692        0  0.07692   0.1154   0.1154
Detection Rate         0.3846   0.2692        0  0.07692   0.1154   0.1154
Detection Prevalence   0.3846   0.3077        0  0.07692   0.1154   0.1154
Balanced Accuracy      0.9545   0.9737       NA  1.00000   1.0000   1.0000
                     Class: G
Sensitivity                NA
Specificity                 1
Pos Pred Value             NA
Neg Pred Value             NA
Prevalence                  0
Detection Rate              0
Detection Prevalence        0
Balanced Accuracy          NA

kNN:

Accuracy: 0.9615
Kappa: 0.9467
Class A, B, D, E, and F are good.
Class C, and G have no predictions

print(log_reg_cm)

Confusion Matrix and Statistics

          Reference
Prediction  A  B  C  D  E  F  G
         A 10  0  0  0  0  0  0
         B  1  7  0  0  0  0  0
         C  0  0  0  0  0  0  0
         D  0  0  0  2  0  0  0
         E  0  0  0  0  3  0  0
         F  0  0  0  0  0  3  0
         G  0  0  0  0  0  0  0

Overall Statistics
                                         
               Accuracy : 0.9615         
                 95% CI : (0.8036, 0.999)
    No Information Rate : 0.4231         
    P-Value [Acc > NIR] : 7.058e-09      
                                         
                  Kappa : 0.9467         
                                         
 Mcnemar's Test P-Value : NA             

Statistics by Class:

                     Class: A Class: B Class: C Class: D Class: E Class: F
Sensitivity            0.9091   1.0000       NA  1.00000   1.0000   1.0000
Specificity            1.0000   0.9474        1  1.00000   1.0000   1.0000
Pos Pred Value         1.0000   0.8750       NA  1.00000   1.0000   1.0000
Neg Pred Value         0.9375   1.0000       NA  1.00000   1.0000   1.0000
Prevalence             0.4231   0.2692        0  0.07692   0.1154   0.1154
Detection Rate         0.3846   0.2692        0  0.07692   0.1154   0.1154
Detection Prevalence   0.3846   0.3077        0  0.07692   0.1154   0.1154
Balanced Accuracy      0.9545   0.9737       NA  1.00000   1.0000   1.0000
                     Class: G
Sensitivity                NA
Specificity                 1
Pos Pred Value             NA
Neg Pred Value             NA
Prevalence                  0
Detection Rate              0
Detection Prevalence        0
Balanced Accuracy          NA

Logistic Regression:

Accuracy: 0.9615
Kappa: 0.9467
Class A, B, D, E, and F are predicted well
Class C and G have the worst outcome.

print(rf_cm)

Confusion Matrix and Statistics

          Reference
Prediction  A  B  C  D  E  F  G
         A 11  0  0  0  0  0  0
         B  0  7  0  0  0  0  0
         C  0  0  0  0  0  0  0
         D  0  0  0  2  0  0  0
         E  0  0  0  0  3  0  0
         F  0  0  0  0  0  3  0
         G  0  0  0  0  0  0  0

Overall Statistics
                                     
               Accuracy : 1          
                 95% CI : (0.8677, 1)
    No Information Rate : 0.4231     
    P-Value [Acc > NIR] : 1.936e-10  
                                     
                  Kappa : 1          
                                     
 Mcnemar's Test P-Value : NA         

Statistics by Class:

                     Class: A Class: B Class: C Class: D Class: E Class: F
Sensitivity            1.0000   1.0000       NA  1.00000   1.0000   1.0000
Specificity            1.0000   1.0000        1  1.00000   1.0000   1.0000
Pos Pred Value         1.0000   1.0000       NA  1.00000   1.0000   1.0000
Neg Pred Value         1.0000   1.0000       NA  1.00000   1.0000   1.0000
Prevalence             0.4231   0.2692        0  0.07692   0.1154   0.1154
Detection Rate         0.4231   0.2692        0  0.07692   0.1154   0.1154
Detection Prevalence   0.4231   0.2692        0  0.07692   0.1154   0.1154
Balanced Accuracy      1.0000   1.0000       NA  1.00000   1.0000   1.0000
                     Class: G
Sensitivity                NA
Specificity                 1
Pos Pred Value             NA
Neg Pred Value             NA
Prevalence                  0
Detection Rate              0
Detection Prevalence        0
Balanced Accuracy          NA

Random Forest:

Accuracy: 1
Kappa: 1
Class A, B, D, E, and F are predicted well
Class C and G have no predictions.

# Compute F1 Scores
f1_score <- function(conf_matrix) {
  precision <- conf_matrix$byClass[, "Precision"]
  recall <- conf_matrix$byClass[, "Recall"]
  f1 <- 2 * (precision * recall) / (precision + recall)
  f1[is.na(f1)] <- 0 # Handle cases where precision or recall is zero
  return(f1)}

# Calculate F1 Scores for each model
knn_f1 <- f1_score(knn_cm)
log_reg_f1 <- f1_score(log_reg_cm)
rf_f1 <- f1_score(rf_cm)

# Summary Table
summary_table <- data.frame(
  Model = c("k-NN", "Logistic Regression", "Random Forest"),
  Accuracy = c(knn_cm$overall['Accuracy'], log_reg_cm$overall['Accuracy'], rf_cm$overall['Accuracy']),
  Kappa = c(knn_cm$overall['Kappa'], log_reg_cm$overall['Kappa'], rf_cm$overall['Kappa']),
  F1_Score = c(mean(knn_f1), mean(log_reg_f1), mean(rf_f1)))

print(summary_table)

                Model  Accuracy     Kappa  F1_Score
1                k-NN 0.9615385 0.9467213 0.6979592
2 Logistic Regression 0.9615385 0.9467213 0.6979592
3       Random Forest 1.0000000 1.0000000 0.7142857

Best Predictive Model: Random Forest:

Accuracy: 1
Kappa: 1
F1 score: 0.7142857

Summary: Although the SVM and k-NN are identical in the confusion matrices, we would need further testing for these models. However, as previously stated the Random Forest proves to be the best performing almost perfectly. In terms of class, Class A (pumpkin) predicted the most accurate, wile the least accurate classes are Class C (peanut) and G (corn).

Use the fatty acid data from Problem 1 above.

Use the same data splitting approach (if any) and pre-processing steps that you did Problem 1. Using the same classification statistic as before, build models described in Chapter 13: Nonlinear Classification Models for these data. Which model has the best predictive ability? How does this optimal model’s performance compare to the best linear model’s performance?

# Set up cross-validation control
ctrl <- trainControl(method = "cv", number = 10)

# Define a grid of hyperparameters
tune_grid <- expand.grid(sigma = c(0.01, 0.05, 0.1, 0.2), 
                         C = c(0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128))

Support Vector Machines (SVM):

# Train SVM model
svm_model <- train(oilType ~ ., data = train_data, method = "svmRadial",
                   trControl = ctrl, tuneGrid = tune_grid)

# Generate predictions for SVM model
svm_pred <- predict(svm_model, test_data)

# Create confusion matrix and calculate F1 Score
svm_cm <- confusionMatrix(svm_pred, test_data$oilType)

svm_f1 <- f_meas(svm_cm$table, truth = "reference", estimate = "prediction", event_level = "second")

Warning: While computing multiclass `precision()`, some levels had no predicted events
(i.e. `true_positive + false_positive = 0`).
Precision is undefined in this case, and those levels will be removed from the
averaged result.
Note that the following number of true events actually occurred for each
problematic event level:
'C': 0, 'G': 0

Warning: While computing multiclass `recall()`, some levels had no true events (i.e.
`true_positive + false_negative = 0`).
Recall is undefined in this case, and those levels will be removed from the
averaged result.
Note that the following number of predicted events actually occurred for each
problematic event level:
'C': 0, 'G': 0

# Print results
print(svm_cm)

Confusion Matrix and Statistics

          Reference
Prediction  A  B  C  D  E  F  G
         A 10  0  0  0  0  0  0
         B  1  7  0  0  0  0  0
         C  0  0  0  0  0  0  0
         D  0  0  0  2  0  0  0
         E  0  0  0  0  3  0  0
         F  0  0  0  0  0  3  0
         G  0  0  0  0  0  0  0

Overall Statistics
                                         
               Accuracy : 0.9615         
                 95% CI : (0.8036, 0.999)
    No Information Rate : 0.4231         
    P-Value [Acc > NIR] : 7.058e-09      
                                         
                  Kappa : 0.9467         
                                         
 Mcnemar's Test P-Value : NA             

Statistics by Class:

                     Class: A Class: B Class: C Class: D Class: E Class: F
Sensitivity            0.9091   1.0000       NA  1.00000   1.0000   1.0000
Specificity            1.0000   0.9474        1  1.00000   1.0000   1.0000
Pos Pred Value         1.0000   0.8750       NA  1.00000   1.0000   1.0000
Neg Pred Value         0.9375   1.0000       NA  1.00000   1.0000   1.0000
Prevalence             0.4231   0.2692        0  0.07692   0.1154   0.1154
Detection Rate         0.3846   0.2692        0  0.07692   0.1154   0.1154
Detection Prevalence   0.3846   0.3077        0  0.07692   0.1154   0.1154
Balanced Accuracy      0.9545   0.9737       NA  1.00000   1.0000   1.0000
                     Class: G
Sensitivity                NA
Specificity                 1
Pos Pred Value             NA
Neg Pred Value             NA
Prevalence                  0
Detection Rate              0
Detection Prevalence        0
Balanced Accuracy          NA

print(svm_f1)

# A tibble: 1 × 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 f_meas  macro          0.977

svm:

Kappa: 0.9467
Accuracy: 0.9615
Classes E and F performed the best, followed by D then A and B.
Classes D and G continue to perform bad.
F1 score: 0.9771429

GBM Model:

# Train GBM model
gbm_model <- train(oilType ~ ., data = train_data, method = "gbm",
                   trControl = ctrl, verbose = FALSE)

# Generate predictions for GBM model
gbm_pred <- predict(gbm_model, test_data)

# Create confusion matrix and calculate F1 Score
gbm_cm <- confusionMatrix(gbm_pred, test_data$oilType)
gbm_f1 <- f_meas(gbm_cm$table, truth = "reference", estimate = "prediction", event_level = "second")

Warning: While computing multiclass `precision()`, some levels had no predicted events
(i.e. `true_positive + false_positive = 0`).
Precision is undefined in this case, and those levels will be removed from the
averaged result.
Note that the following number of true events actually occurred for each
problematic event level:
'C': 0, 'G': 0

Warning: While computing multiclass `recall()`, some levels had no true events (i.e.
`true_positive + false_negative = 0`).
Recall is undefined in this case, and those levels will be removed from the
averaged result.
Note that the following number of predicted events actually occurred for each
problematic event level:
'C': 0, 'G': 0

# Print results
print(gbm_cm)

Confusion Matrix and Statistics

          Reference
Prediction  A  B  C  D  E  F  G
         A 11  0  0  0  0  0  0
         B  0  7  0  0  0  0  0
         C  0  0  0  0  0  0  0
         D  0  0  0  2  0  0  0
         E  0  0  0  0  3  0  0
         F  0  0  0  0  0  3  0
         G  0  0  0  0  0  0  0

Overall Statistics
                                     
               Accuracy : 1          
                 95% CI : (0.8677, 1)
    No Information Rate : 0.4231     
    P-Value [Acc > NIR] : 1.936e-10  
                                     
                  Kappa : 1          
                                     
 Mcnemar's Test P-Value : NA         

Statistics by Class:

                     Class: A Class: B Class: C Class: D Class: E Class: F
Sensitivity            1.0000   1.0000       NA  1.00000   1.0000   1.0000
Specificity            1.0000   1.0000        1  1.00000   1.0000   1.0000
Pos Pred Value         1.0000   1.0000       NA  1.00000   1.0000   1.0000
Neg Pred Value         1.0000   1.0000       NA  1.00000   1.0000   1.0000
Prevalence             0.4231   0.2692        0  0.07692   0.1154   0.1154
Detection Rate         0.4231   0.2692        0  0.07692   0.1154   0.1154
Detection Prevalence   0.4231   0.2692        0  0.07692   0.1154   0.1154
Balanced Accuracy      1.0000   1.0000       NA  1.00000   1.0000   1.0000
                     Class: G
Sensitivity                NA
Specificity                 1
Pos Pred Value             NA
Neg Pred Value             NA
Prevalence                  0
Detection Rate              0
Detection Prevalence        0
Balanced Accuracy          NA

print(gbm_f1)

# A tibble: 1 × 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 f_meas  macro              1

GBM:

Kappa: 1
Accuracy: 1
Classes A, B, D, E and F in this order, predicted well.
Classes D and G continue to perform bad
F1 score: 1

Neural Network (NN):

# Train Neural Network model
nn_model <- train(oilType ~ ., data = train_data, method = "nnet",
                  trControl = ctrl, tuneLength = 3, linout = TRUE, trace = FALSE)

# Generate predictions for Neural Network model
nn_pred <- predict(nn_model, test_data)

# Create confusion matrix and calculate F1 Score
nn_cm <- confusionMatrix(nn_pred, test_data$oilType)
nn_f1 <- f_meas(nn_cm$table, truth = "reference", estimate = "prediction", event_level = "second")

Warning: While computing multiclass `precision()`, some levels had no predicted events
(i.e. `true_positive + false_positive = 0`).
Precision is undefined in this case, and those levels will be removed from the
averaged result.
Note that the following number of true events actually occurred for each
problematic event level:
'C': 0, 'G': 0

Warning: While computing multiclass `recall()`, some levels had no true events (i.e.
`true_positive + false_negative = 0`).
Recall is undefined in this case, and those levels will be removed from the
averaged result.
Note that the following number of predicted events actually occurred for each
problematic event level:
'C': 0, 'G': 0

# Print results
print(nn_cm)

Confusion Matrix and Statistics

          Reference
Prediction  A  B  C  D  E  F  G
         A 11  0  0  0  0  0  0
         B  0  7  0  0  0  0  0
         C  0  0  0  0  0  0  0
         D  0  0  0  2  0  0  0
         E  0  0  0  0  3  1  0
         F  0  0  0  0  0  2  0
         G  0  0  0  0  0  0  0

Overall Statistics
                                         
               Accuracy : 0.9615         
                 95% CI : (0.8036, 0.999)
    No Information Rate : 0.4231         
    P-Value [Acc > NIR] : 7.058e-09      
                                         
                  Kappa : 0.9463         
                                         
 Mcnemar's Test P-Value : NA             

Statistics by Class:

                     Class: A Class: B Class: C Class: D Class: E Class: F
Sensitivity            1.0000   1.0000       NA  1.00000   1.0000  0.66667
Specificity            1.0000   1.0000        1  1.00000   0.9565  1.00000
Pos Pred Value         1.0000   1.0000       NA  1.00000   0.7500  1.00000
Neg Pred Value         1.0000   1.0000       NA  1.00000   1.0000  0.95833
Prevalence             0.4231   0.2692        0  0.07692   0.1154  0.11538
Detection Rate         0.4231   0.2692        0  0.07692   0.1154  0.07692
Detection Prevalence   0.4231   0.2692        0  0.07692   0.1538  0.07692
Balanced Accuracy      1.0000   1.0000       NA  1.00000   0.9783  0.83333
                     Class: G
Sensitivity                NA
Specificity                 1
Pos Pred Value             NA
Neg Pred Value             NA
Prevalence                  0
Detection Rate              0
Detection Prevalence        0
Balanced Accuracy          NA

print(nn_f1)

# A tibble: 1 × 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 f_meas  macro          0.931

Neural Network:

Accuracy : 0.9615
Kappa : 0.9453
F1 score: 0.9246377

Matrices Models	Accuracy	Kappa	F1 Score
SVM	0.9615	0.9615	0.9771429
GBM	1	1	1
Neural Network	0.9615	0.9453	0.9314286

Conclusion: The GBM performed the best here performing exceptionally well with perfect scores in accuracy, Kappa and F1 score. This model accurately predict all classes except for peanut and corn which had no predictions. Comparing the GBM and the linear model from question 1 (Random Forest), these two models accurately class A, B, D, E, F and also failed to predict Class C and G. However, in terms of metrics the GBM model offers a higher F1 score indicating it can better distinguish true positive and avoids false negative, making this model more reliable.

Would you infer that the data have nonlinear separation boundaries based on this comparison?

Given that the GBM, and Random Forest models, are both performing exceptionally well, indicates these models are best suited to handle non-linear data sets. Thus, confirming that the data does have nonlinear separation boundaries

Which oil type does the optimal model most accurately predict? Least accurately predict?

The GBM model accurately predicted the Class (A, B, D, E, F) the best while not accurately predicting Class (C) the peanut and Class (G) corn oil type.

The “churn” data set was developed to predict telecom customer churn based on information about their account. The data files state that the data are “artificial based on claims similar to real world.” The data consist of 19 predictors related to the customer account, such as the number of customer service calls, the area code, and the number of minutes. The outcome is whether the customer churned:

Start R and use these commands to load the data

data(mlc_churn) 
str(mlc_churn)

tibble [5,000 × 20] (S3: tbl_df/tbl/data.frame)
 $ state                        : Factor w/ 51 levels "AK","AL","AR",..: 17 36 32 36 37 2 20 25 19 50 ...
 $ account_length               : int [1:5000] 128 107 137 84 75 118 121 147 117 141 ...
 $ area_code                    : Factor w/ 3 levels "area_code_408",..: 2 2 2 1 2 3 3 2 1 2 ...
 $ international_plan           : Factor w/ 2 levels "no","yes": 1 1 1 2 2 2 1 2 1 2 ...
 $ voice_mail_plan              : Factor w/ 2 levels "no","yes": 2 2 1 1 1 1 2 1 1 2 ...
 $ number_vmail_messages        : int [1:5000] 25 26 0 0 0 0 24 0 0 37 ...
 $ total_day_minutes            : num [1:5000] 265 162 243 299 167 ...
 $ total_day_calls              : int [1:5000] 110 123 114 71 113 98 88 79 97 84 ...
 $ total_day_charge             : num [1:5000] 45.1 27.5 41.4 50.9 28.3 ...
 $ total_eve_minutes            : num [1:5000] 197.4 195.5 121.2 61.9 148.3 ...
 $ total_eve_calls              : int [1:5000] 99 103 110 88 122 101 108 94 80 111 ...
 $ total_eve_charge             : num [1:5000] 16.78 16.62 10.3 5.26 12.61 ...
 $ total_night_minutes          : num [1:5000] 245 254 163 197 187 ...
 $ total_night_calls            : int [1:5000] 91 103 104 89 121 118 118 96 90 97 ...
 $ total_night_charge           : num [1:5000] 11.01 11.45 7.32 8.86 8.41 ...
 $ total_intl_minutes           : num [1:5000] 10 13.7 12.2 6.6 10.1 6.3 7.5 7.1 8.7 11.2 ...
 $ total_intl_calls             : int [1:5000] 3 3 5 7 3 6 7 6 4 5 ...
 $ total_intl_charge            : num [1:5000] 2.7 3.7 3.29 1.78 2.73 1.7 2.03 1.92 2.35 3.02 ...
 $ number_customer_service_calls: int [1:5000] 1 1 0 2 3 0 3 0 1 0 ...
 $ churn                        : Factor w/ 2 levels "yes","no": 2 2 2 2 2 2 2 2 2 2 ...

?mlc_churn

colnames(mlc_churn)

 [1] "state"                         "account_length"               
 [3] "area_code"                     "international_plan"           
 [5] "voice_mail_plan"               "number_vmail_messages"        
 [7] "total_day_minutes"             "total_day_calls"              
 [9] "total_day_charge"              "total_eve_minutes"            
[11] "total_eve_calls"               "total_eve_charge"             
[13] "total_night_minutes"           "total_night_calls"            
[15] "total_night_charge"            "total_intl_minutes"           
[17] "total_intl_calls"              "total_intl_charge"            
[19] "number_customer_service_calls" "churn"

Explore the data by visualizing the relationship between the predictors and the outcome. Are there important features of the predictor data themselves, such as between-predictor correlations or degenerate distributions? Can functions of more than one predictor be used to model the data more effectively?

# Density plot of account length
ggplot(mlc_churn, aes(x = account_length, fill = churn)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density: Account Length by Churn Status", x = "Account Length", y = "Density")

Density Plot: Although there is some overlap between churn and nonchurn customers, their are some difference. It looks like the churned customer is skewed left, while the nonchurn customer are skewed right.

# Boxplot of total day minutes by churn
ggplot(mlc_churn, aes(x = churn, y = total_day_minutes, fill = churn)) +
  geom_boxplot() +
  labs(title = "Total Day Minutes by Churn Status", x = "Churn", y = "Total Day Minutes")

Boxplot: Churned customers have slightly higher total day minutes compared to nonchurned customers. Nonchurn customers show to have outliers, and at first glance both churn and nonchurn look to be normally distributed, but after carefully evaluating the nonchurn, we can determine its normal distributed, while churned is negatively distributed, only confirming the Density graph.

ggplot(mlc_churn, aes(x = total_day_minutes, y = total_night_minutes, color = churn)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Scatter Plot of Total Day vs. Total Night Minutes", x = "Total Day Minutes", y = "Total Night Minutes")

`geom_smooth()` using formula = 'y ~ x'

Scatter Plot: There is a slight positive correlation and a linear relationship between day minutes and night minutes.

# Calculate the correlation matrix
numeric_vars <- mlc_churn %>% select_if(is.numeric)
cor_matrix <- cor(numeric_vars, use = "pairwise.complete.obs")

# Convert correlation matrix to long format
cor_long <- cor_matrix %>%
  as.data.frame() %>%
  rownames_to_column(var = "Var1") %>%
  pivot_longer(-Var1, names_to = "Var2", values_to = "value")

# Plot the heatmap
ggplot(cor_long, aes(Var1, Var2, fill = value)) +
  geom_tile(color = "white") +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0, limit = c(-1, 1)) +
  labs(title = "Correlation Heatmap of Numeric Variables", x = "", y = "") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Correlation Heat Map:

Positive correlation: day minutes and day charge
Positive correlation: eve minutes and eve charge
Positive correlation: night minutes and night charge
Positive correlation: intl minutes and intl charge

Conclusion: The visualizations suggest, that creating interaction based on relationships of predictors could create more effective models. For instance, aggregating totals across different times, or between account_length and total_day_minutes, would capture more patterns. Ultimately improving the predictive performance of the model.

Split the data into a training and a testing set, pre-process the data if appropriate.

# Split the data
set.seed(123)
split <- initial_split(mlc_churn, prop = 0.7)
train_data <- training(split)
test_data <- testing(split)

# Preprocess the data
recipe <- recipe(churn ~ ., data = train_data) %>%
  step_normalize(all_numeric()) %>%
  step_dummy(all_nominal(), -all_outcomes()) %>%
  prep(training = train_data, retain = TRUE)

train_prepped <- bake(recipe, new_data = train_data)
test_prepped <- bake(recipe, new_data = test_data)

Try building other models discussed in this chapter. Do any have better predictive performance?

# Check column names in the preprocessed data
#colnames(train_prepped)

#colnames(test_prepped)

# Decision Tree Model
tree_model <- rpart(churn ~ ., data = train_prepped, method = "class")

# Predict on test data
tree_preds <- predict(tree_model, newdata = test_prepped, type = "class")

# Evaluate performance
test_churn <- factor(test_data$churn, levels = levels(tree_preds))
tree_metrics <- confusionMatrix(tree_preds, test_churn)
print(tree_metrics)

Confusion Matrix and Statistics

          Reference
Prediction  yes   no
       yes  142   22
       no    64 1272
                                          
               Accuracy : 0.9427          
                 95% CI : (0.9297, 0.9539)
    No Information Rate : 0.8627          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.7353          
                                          
 Mcnemar's Test P-Value : 9.818e-06       
                                          
            Sensitivity : 0.68932         
            Specificity : 0.98300         
         Pos Pred Value : 0.86585         
         Neg Pred Value : 0.95210         
             Prevalence : 0.13733         
         Detection Rate : 0.09467         
   Detection Prevalence : 0.10933         
      Balanced Accuracy : 0.83616         
                                          
       'Positive' Class : yes

Decision Tree Model:

Accuracy: 0.9427
Kappa : 0.7353
Mcnemar’s Test P-Value : 9.818e-06
Balanced Accuracy : 0.83616

# Random Forest Model
rf_model <- ranger(churn ~ ., data = train_prepped, classification = TRUE)

# Predict on test data
rf_preds <- predict(rf_model, data = test_prepped)$predictions

# Evaluate performance
rf_metrics <- confusionMatrix(factor(rf_preds, levels = levels(test_data$churn)), test_churn)
print(rf_metrics)

Confusion Matrix and Statistics

          Reference
Prediction  yes   no
       yes  153    4
       no    53 1290
                                         
               Accuracy : 0.962          
                 95% CI : (0.951, 0.9711)
    No Information Rate : 0.8627         
    P-Value [Acc > NIR] : < 2.2e-16      
                                         
                  Kappa : 0.8218         
                                         
 Mcnemar's Test P-Value : 2.047e-10      
                                         
            Sensitivity : 0.7427         
            Specificity : 0.9969         
         Pos Pred Value : 0.9745         
         Neg Pred Value : 0.9605         
             Prevalence : 0.1373         
         Detection Rate : 0.1020         
   Detection Prevalence : 0.1047         
      Balanced Accuracy : 0.8698         
                                         
       'Positive' Class : yes

Random Forest Model:

Accuracy : 0.962
Kappa : 0.8128
Mcnemar’s Test P-Value : < 2.2e-16
Balanced Accuracy : 0.8698

# SVM Model
svm_model <- svm(churn ~ ., data = train_prepped, kernel = "linear")

# Predict on test data
svm_preds <- predict(svm_model, newdata = test_prepped)

# Evaluate performance
svm_metrics <- confusionMatrix(factor(svm_preds, levels = levels(test_data$churn)), test_churn)
print(svm_metrics)

Confusion Matrix and Statistics

          Reference
Prediction  yes   no
       yes    0    0
       no   206 1294
                                          
               Accuracy : 0.8627          
                 95% CI : (0.8442, 0.8797)
    No Information Rate : 0.8627          
    P-Value [Acc > NIR] : 0.5186          
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.0000          
            Specificity : 1.0000          
         Pos Pred Value :    NaN          
         Neg Pred Value : 0.8627          
             Prevalence : 0.1373          
         Detection Rate : 0.0000          
   Detection Prevalence : 0.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : yes

SVM Model:

Accuracy: 0.8627
Kappa : 0
Mcnemar’s Test P-Value : < 2e-16
Balanced Accuracy : 0.5000

Conclusion: The Random Forest performs the best with the highest accuracy and Kappa, and balance accuracy. Additionally, the Decision Tree performs well but not like the Random Forest model, and the svm model performed the worst failing to detect any churned customers.

Use the fatty acid data from Homework 1 Problem 3 above.

Use the same data splitting approach (if any) and pre-processing steps that you did in Homework 1 Problem 3.

# Re-load the data if needed
data(oil)
oil_data <- as.data.frame(fattyAcids)
oil_data$oilType <- oilType

# Set seed for reproducibility
set.seed(123)

# Split the data using stratified sampling
train_index <- createDataPartition(oil_data$oilType, p = 0.7, list = FALSE)
train_data <- oil_data[train_index, ]
test_data <- oil_data[-train_index, ]

# Pre-process the data: Centering and Scaling
preProcValues <- preProcess(train_data[,-ncol(train_data)], method = c("center", "scale"))
train_data[,-ncol(train_data)] <- predict(preProcValues, train_data[,-ncol(train_data)])
test_data[,-ncol(test_data)] <- predict(preProcValues, test_data[,-ncol(test_data)])

Fit a few basic trees to the training set.

Decision tree model:

# Fit tree model
set.seed(123)
tree_model <- train(oilType ~ ., data = train_data, method = "rpart",
                    trControl = trainControl(method = "cv", number = 10))

# Predict on the test set
test_predictions <- predict(tree_model, newdata = test_data)

# Confusion matrix
conf_matrix <- confusionMatrix(test_predictions, test_data$oilType)
print(conf_matrix)

Confusion Matrix and Statistics

          Reference
Prediction  A  B  C  D  E  F  G
         A 10  0  0  0  0  0  0
         B  0  7  0  0  0  0  0
         C  0  0  0  0  0  0  0
         D  0  0  0  0  0  0  0
         E  1  0  0  2  3  3  0
         F  0  0  0  0  0  0  0
         G  0  0  0  0  0  0  0

Overall Statistics
                                          
               Accuracy : 0.7692          
                 95% CI : (0.5635, 0.9103)
    No Information Rate : 0.4231          
    P-Value [Acc > NIR] : 0.000358        
                                          
                  Kappa : 0.6816          
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: A Class: B Class: C Class: D Class: E Class: F
Sensitivity            0.9091   1.0000       NA  0.00000   1.0000   0.0000
Specificity            1.0000   1.0000        1  1.00000   0.7391   1.0000
Pos Pred Value         1.0000   1.0000       NA      NaN   0.3333      NaN
Neg Pred Value         0.9375   1.0000       NA  0.92308   1.0000   0.8846
Prevalence             0.4231   0.2692        0  0.07692   0.1154   0.1154
Detection Rate         0.3846   0.2692        0  0.00000   0.1154   0.0000
Detection Prevalence   0.3846   0.2692        0  0.00000   0.3462   0.0000
Balanced Accuracy      0.9545   1.0000       NA  0.50000   0.8696   0.5000
                     Class: G
Sensitivity                NA
Specificity                 1
Pos Pred Value             NA
Neg Pred Value             NA
Prevalence                  0
Detection Rate              0
Detection Prevalence        0
Balanced Accuracy          NA

decision tree model:

Accuracy : 0.7692
Kappa : 0.6816
P-Value: 0.000358

Does bagging improve the performance of the trees? What about boosting?

Bagging:

# Fit a bagged decision tree model
set.seed(123)
bagged_tree_model <- train(oilType ~ ., data = train_data, method = "treebag",
                           trControl = trainControl(method = "cv", number = 10))

# Predict on the test set
bagged_predictions <- predict(bagged_tree_model, newdata = test_data)

# Confusion matrix
bagged_conf_matrix <- confusionMatrix(bagged_predictions, test_data$oilType)
print(bagged_conf_matrix)

Confusion Matrix and Statistics

          Reference
Prediction  A  B  C  D  E  F  G
         A 10  0  0  0  0  0  0
         B  0  7  0  0  0  0  0
         C  0  0  0  0  0  0  0
         D  0  0  0  2  0  0  0
         E  1  0  0  0  3  0  0
         F  0  0  0  0  0  3  0
         G  0  0  0  0  0  0  0

Overall Statistics
                                         
               Accuracy : 0.9615         
                 95% CI : (0.8036, 0.999)
    No Information Rate : 0.4231         
    P-Value [Acc > NIR] : 7.058e-09      
                                         
                  Kappa : 0.9472         
                                         
 Mcnemar's Test P-Value : NA             

Statistics by Class:

                     Class: A Class: B Class: C Class: D Class: E Class: F
Sensitivity            0.9091   1.0000       NA  1.00000   1.0000   1.0000
Specificity            1.0000   1.0000        1  1.00000   0.9565   1.0000
Pos Pred Value         1.0000   1.0000       NA  1.00000   0.7500   1.0000
Neg Pred Value         0.9375   1.0000       NA  1.00000   1.0000   1.0000
Prevalence             0.4231   0.2692        0  0.07692   0.1154   0.1154
Detection Rate         0.3846   0.2692        0  0.07692   0.1154   0.1154
Detection Prevalence   0.3846   0.2692        0  0.07692   0.1538   0.1154
Balanced Accuracy      0.9545   1.0000       NA  1.00000   0.9783   1.0000
                     Class: G
Sensitivity                NA
Specificity                 1
Pos Pred Value             NA
Neg Pred Value             NA
Prevalence                  0
Detection Rate              0
Detection Prevalence        0
Balanced Accuracy          NA

Bagging (decision tree mode):

Accuracy : 0.9615
Kappa : 0.9472
P-Value: 7.058e-09

Boosting:

# Fit a boosted decision tree model
set.seed(123)
boosted_tree_model <- train(oilType ~ ., data = train_data, method = "gbm",
                            trControl = trainControl(method = "cv", number = 10),
                            verbose = FALSE)

# Predict on the test set
boosted_predictions <- predict(boosted_tree_model, newdata = test_data)

# Confusion matrix
boosted_conf_matrix <- confusionMatrix(boosted_predictions, test_data$oilType)
print(boosted_conf_matrix)

Confusion Matrix and Statistics

          Reference
Prediction  A  B  C  D  E  F  G
         A 11  0  0  0  0  0  0
         B  0  7  0  0  0  0  0
         C  0  0  0  0  0  0  0
         D  0  0  0  2  0  0  0
         E  0  0  0  0  3  0  0
         F  0  0  0  0  0  3  0
         G  0  0  0  0  0  0  0

Overall Statistics
                                     
               Accuracy : 1          
                 95% CI : (0.8677, 1)
    No Information Rate : 0.4231     
    P-Value [Acc > NIR] : 1.936e-10  
                                     
                  Kappa : 1          
                                     
 Mcnemar's Test P-Value : NA         

Statistics by Class:

                     Class: A Class: B Class: C Class: D Class: E Class: F
Sensitivity            1.0000   1.0000       NA  1.00000   1.0000   1.0000
Specificity            1.0000   1.0000        1  1.00000   1.0000   1.0000
Pos Pred Value         1.0000   1.0000       NA  1.00000   1.0000   1.0000
Neg Pred Value         1.0000   1.0000       NA  1.00000   1.0000   1.0000
Prevalence             0.4231   0.2692        0  0.07692   0.1154   0.1154
Detection Rate         0.4231   0.2692        0  0.07692   0.1154   0.1154
Detection Prevalence   0.4231   0.2692        0  0.07692   0.1154   0.1154
Balanced Accuracy      1.0000   1.0000       NA  1.00000   1.0000   1.0000
                     Class: G
Sensitivity                NA
Specificity                 1
Pos Pred Value             NA
Neg Pred Value             NA
Prevalence                  0
Detection Rate              0
Detection Prevalence        0
Balanced Accuracy          NA

Boosting (decision tree mode):

Accuracy : 1
Kappa : 1
P-Value: 1.936e-10

In conclusion, both bagging and boosting, enchance the decision tree from the basic tree model, you can observe that the predicting class levels have drastically improve. The accuracy and Kappa have also improved from the basic decision tree to bagging and then to boosting which resulted in accuracy of 100% and Kappa of 100%. The specific classes accuracy have also increase with Class A being the best and Class C and G being the worst..

Side note: All classes (A,B,D,E,F) had an accuracy of 100%.

Which model has better performance, and what are the corresponding tuning parameters?

The Boosted decision tree performs the best resulting in perfect accuracy and Kappa.

Model	Accuracy	Kappa	No Information Rate	P-Value
Decision Tree	0.7692	0.6816	0.4231	0.000358
Bagged Decision Tree	0.9615	0.9472	0.4231	7.058e-09
Boosted Decision Tree	1.0000	1.0000	0.4231	1.936e-10

Now for the corresponding tuning parameter, they are as follows:

# Print the best tuning parameters
boosted_tree_model$bestTune

  n.trees interaction.depth shrinkage n.minobsinnode
1      50                 1       0.1             10

Parameter	Value
n.trees	50
interaction.depth	1
shrinkage	0.1
n.minobsinnode	10