Measure Model Performance in R Using ROCR Package
R’s ROCR package can be used for evaluating and visualizing the performance of classifiers / fitted models. It is helpful for estimating performance measures and plotting these measures over a range of...
View ArticleCreate a Confusion Matrix in R
A confusion matrix is a tabular representation of Actual vs Predicted values. As you can see, the confusion matrix avoids “confusion” by measuring the actual and predicted values in a tabular format....
View ArticleCredit Risk Modelling – Case Study- Lending Club Data
To build a good model, it is important to use high quality data. For the purpose of this course, we will use the loan data available From LendingClub’s website. LendingClub is a US peer-to-peer lending...
View ArticleExplore Financial Data in R
Now that we have the data file in our working directory, we can load it in our R session and start exploring it. Use the following command to load the data into R. The “stringsAsFactors = FALSE”...
View ArticleExplore Loan Data in R – Loan Grade and Interest Rate
There is no set path to how one would go about analyzing a data set. Typically, a data scientist would spend quite some time exploring and observing the data to understand it well. Let’s look at some...
View ArticleCredit Risk Modelling – Required R Packages
During our analysis, we will make use of various R packages. So, let’s look at what these packages are and let’s install and load them in R. Dplyr ‘Dplyr’ provides a set of tools for efficiently...
View ArticleLoan Data – Training and Test Data Sets
For building the model, we will divide our data into two different data sets, namely training and testing datasets. The model will be built using the training set and then we will test it on the...
View ArticleData Cleaning in R – Part 1
Discarding Attributes LendingClub also provides a data dictionary that contains details of all attributes of out dataset. We can use that dictionary to understand more about the data columns we have...
View ArticleData Cleaning in R – Part 2
Attributes with Zero Variance Datasets can sometimes contain attributes (predictors) that have near-zero variance, or may have just one value. Such variables are considered to have less predictor...
View ArticleData Cleaning in R – Part 3
Default by States We take a look at default rate for each state. We filter out states that have too small number of loans(less than 1000): Order States by Default Rate We can order states by default...
View ArticleAdvanced Concept of Risk-reward Ratio in Trading
Naïve traders always thinking by following the simple concept of risk-reward ratio, they can make huge money. Things don’t work like this in the real world however. Most of the time, the new traders in...
View Article5 Factors That Influence the Stock Market – Explained
While the success of a trader relies mostly on their abilities to anticipate market changes and act upon them, the stock market is known for being fairly volatile. For tens of years now, experts have...
View ArticleData Cleaning in R – Part 5
Numeric Features Let’s look at all numeric features we have left. We will transform annual_inc, revol_bal, avg_cur_bal, bc_open_to_buy by dividing them by funded_amnt (amount of loan). We can now...
View ArticleRemove Dimensions By Fitting Logistic Regression
We will use the preProcess function from the caret package to center and scale (Normalize) the data. The scale transform calculates the standard deviation for an attribute and divides each value by...
View ArticleCreate a Function and Prepare Test Data in R
When we build the model, we will need the same set of columns in the test data also and will also need to apply all the same transformations that we have done to the test_data also. Kept Columns Create...
View ArticleBuilding Credit Risk Model
The loan data typically have a higher proportion of good loans. We can achieve high accuracy just by labeling all loans as Fully Paid. For our test data, we gain 70.3% accuracy by just following the...
View ArticleLogistic Regression Model in R
To build our first model, we will tune Logistic Regression to our training dataset. First we set the seed (to any number. we have chosen 100) so that we can reproduce our results. Then we create a...
View ArticleSupport Vector Machine (SVM) Model in R
A support vector machine (SVM) is a supervised learning technique that analyzes data and isolates patterns applicable to both classification and regression. The classifier is useful for choosing...
View ArticleRandom Forest Model in R
Now, we will tune RandomForest model. Like SVM, we tune parameter based on 5% downsampling data. The procedure is exactly the same as for SVM model. Below we have reproduced the code for Random Forest...
View ArticleExtreme Gradient Boosting in R
Extreme Gradient Boosting has a very efficient implementation. Unlike SVM and RandomForest, we can tune parameter using the whole downsampling set. We focus on varying Ridge & Lasso regularization...
View Article