Quantcast
Channel: Finance Train
Browsing all 822 articles
Browse latest View live

Measure Model Performance in R Using ROCR Package

R’s ROCR package can be used for evaluating and visualizing the performance of classifiers / fitted models. It is helpful for estimating performance measures and plotting these measures over a range of...

View Article


Create a Confusion Matrix in R

A confusion matrix is a tabular representation of Actual vs Predicted values. As you can see, the confusion matrix avoids “confusion” by measuring the actual and predicted values in a tabular format....

View Article


Credit Risk Modelling – Case Study- Lending Club Data

To build a good model, it is important to use high quality data. For the purpose of this course, we will use the loan data available From LendingClub’s website. LendingClub is a US peer-to-peer lending...

View Article

Explore Financial Data in R

Now that we have the data file in our working directory, we can load it in our R session and start exploring it. Use the following command to load the data into R. The “stringsAsFactors = FALSE”...

View Article

Explore Loan Data in R – Loan Grade and Interest Rate

There is no set path to how one would go about analyzing a data set. Typically, a data scientist would spend quite some time exploring and observing the data to understand it well. Let’s look at some...

View Article


Credit Risk Modelling – Required R Packages

During our analysis, we will make use of various R packages. So, let’s look at what these packages are and let’s install and load them in R. Dplyr ‘Dplyr’ provides a set of tools for efficiently...

View Article

Loan Data – Training and Test Data Sets

For building the model, we will divide our data into two different data sets, namely training and testing datasets. The model will be built using the training set and then we will test it on the...

View Article

Data Cleaning in R – Part 1

Discarding Attributes LendingClub also provides a data dictionary that contains details of all attributes of out dataset. We can use that dictionary to understand more about the data columns we have...

View Article


Data Cleaning in R – Part 2

Attributes with Zero Variance Datasets can sometimes contain attributes (predictors) that have near-zero variance, or may have just one value. Such variables are considered to have less predictor...

View Article


Data Cleaning in R – Part 3

Default by States We take a look at default rate for each state. We filter out states that have too small number of loans(less than 1000): Order States by Default Rate We can order states by default...

View Article

Advanced Concept of Risk-reward Ratio in Trading

Naïve traders always thinking by following the simple concept of risk-reward ratio, they can make huge money. Things don’t work like this in the real world however. Most of the time, the new traders in...

View Article

5 Factors That Influence the Stock Market – Explained

While the success of a trader relies mostly on their abilities to anticipate market changes and act upon them, the stock market is known for being fairly volatile. For tens of years now, experts have...

View Article

Data Cleaning in R – Part 5

Numeric Features Let’s look at all numeric features we have left. We will transform annual_inc, revol_bal, avg_cur_bal, bc_open_to_buy by dividing them by funded_amnt (amount of loan). We can now...

View Article


Remove Dimensions By Fitting Logistic Regression

We will use the preProcess function from the caret package to center and scale (Normalize) the data. The scale transform calculates the standard deviation for an attribute and divides each value by...

View Article

Create a Function and Prepare Test Data in R

When we build the model, we will need the same set of columns in the test data also and will also need to apply all the same transformations that we have done to the test_data also. Kept Columns Create...

View Article


Building Credit Risk Model

The loan data typically have a higher proportion of good loans. We can achieve high accuracy just by labeling all loans as Fully Paid. For our test data, we gain 70.3% accuracy by just following the...

View Article

Logistic Regression Model in R

To build our first model, we will tune Logistic Regression to our training dataset. First we set the seed (to any number. we have chosen 100) so that we can reproduce our results. Then we create a...

View Article


Support Vector Machine (SVM) Model in R

A support vector machine (SVM) is a supervised learning technique that analyzes data and isolates patterns applicable to both classification and regression. The classifier is useful for choosing...

View Article

Random Forest Model in R

Now, we will tune RandomForest model. Like SVM, we tune parameter based on 5% downsampling data. The procedure is exactly the same as for SVM model. Below we have reproduced the code for Random Forest...

View Article

Extreme Gradient Boosting in R

Extreme Gradient Boosting has a very efficient implementation. Unlike SVM and RandomForest, we can tune parameter using the whole downsampling set. We focus on varying Ridge & Lasso regularization...

View Article
Browsing all 822 articles
Browse latest View live