In this tutorial, you will perform a logistic regression with
R. This is the first exercice and we will do it together in class. At the end you can find an exercice with a simple linear regression you should be able to do alone at home (solutions will be given in next class).
The aim is to predict whether a client will have a credit card default from a few simple covariates.
In the ISLR package, there is a dataset Default which measures for 10000 clients 4 variables: - default: a factor variable corresponding to the presence and the absence of default - student: a factor variable having value Yes for a student and No otherwise - balance: the average credit card balance at the end of the month - income : the customer income
Load the Default data set in the package ISLR. Can you describe the data? If you want to predict the credit default, what will be the outcome?
library(ISLR) data(Default) head(Default)
10000 observations, 4 columns, the outcome will be default (a binary covariate)
End of solution 1
Perform typical univariate and bivariate data exploration. Can you already observe a trend?
Here you can find several plots with sometimes strange color. It is to show you different parameters to tune and choose your favorite plots. But don’t forget you can find a lot of examples online.
## Warning: package 'ggplot2' was built under R version 3.6.2
ggplot(Default, aes(x = income)) + geom_histogram(binwidth = 500, fill = "white", color = "darkorange") + theme_bw()
ggplot(Default, aes(x = default)) + geom_bar() + facet_grid(~student) + theme_dark()
ggplot(Default, aes(x = default, y = income)) + geom_violin() + theme_bw() + geom_jitter(alpha = 0.05)