Advertisement

Cross validation of Data | Avoid over fitting of Machine Learning Algorithms | Quantra by QuantInsti

Cross validation of Data | Avoid over fitting of Machine Learning Algorithms | Quantra by QuantInsti ** Decision Tree for Trading: ** FREE PREVIEW!

Timestamp:
00:11 - 00:31 - Cross validation
00:32 - 01:33 - Prediction model
01:34 - 02:15 - Coding in Python

In this video lecture, you will understand the concept of cross-validation along with the importance of test and train datasets in the context of a machine learning model. Cross-validation, also known as rotation estimation, is a model validation technique for assessing the generalization of results to an independent data set. Cross-validation is most often used in prediction problems where one wants to estimate the accuracy of the model when implemented practically.

In a prediction problem, the model is usually given a dataset of known data, using which the model is trained to make correct predictions and once this is done the model is tested on another dataset to cross validate its training based on the earlier dataset. The dataset used to train the model is called the train dataset and the dataset used to test the model is called the test dataset. Assume that a data set contains 100 numbers, in the first round of cross validation, numbers 1 to 80 will be used to train the model and numbers 81 to 100 to test the model. In the next round, numbers 1 to 60 and 81 to 100 will be used to train the model and numbers 61 to 80 for testing. Similarly, three more rounds can be done for this dataset, the test datasets for each round would be 41 to 60, 21 to 40 and 1 to 20. The other 80 numbers in each case will be the train dataset. To code, we can use the cross_val_score() function from sklearn.model_selection to perform cross-validation.

The parameters to the cross_val_score function are the estimator algorithm, the predictor variables, the target variable and the number of folds or rounds of cross-validation The cross_val_score returns an array of scores for each run of the cross-validation. The average and standard deviation of these scores gives us a stable estimate of how the model will perform on unseen data.

In the upcoming units, your concepts will be tested through a couple of multiple-choice questions after which there will be an IPython notebook to implement the cross-validation technique. is created by taking the average of 2 data points in the ‘close’ column.
Now if we split the data into two parts, where the first 10 data points belong to the train data, and the last 4 belong to the simulation data. Then in the column of ‘2 period rolling mean‘ of simulation data, the value in row 11 is obtained by taking the average of close values in rows 10 and 11 as shown here. Usage of train data points to compute the features in the simulation data results in what we call the data leakage. The best way to avoid this data leakage is to create the features and target datasets separately for the train and the simulation data as shown here.

You can also run a demo of the machine learning model in the Interactive Brokers TWS environment by using the IBridgepy library. Please read “A Short Guide on Automated Execution” provided in the next section to install the TWS and IBridgepy.
We have also provided the sample code to paper trade or live trade the decision tree model strategy on Interactive Brokers’ TWS. To access this sample code, go to the last unit of the Downloadable Code section, download the Downloadables_DT.zip file and then copy the file sample_DT_deploy_strategy.py' in the IBridgePy strategies folder and run it.

In this section, you learned about the various challenges that you can face while using a machine learning model in live trading. You understood how to save and retrieve a model, how to update the data, and how to retain a model based on its performance. In the end, you learned how to perform a trading simulation to test the model’s performance. With this, you are all set to deploy your own models and face these challenges in live trading. In the Ipython notebook following this video, you can go through the simulation code in detail. Good Luck.


Quantra is an online education portal that specializes in Algorithmic and Quantitative trading. Quantra offers various bite-sized, self-paced and interactive courses that are perfect for busy professionals, seeking implementable knowledge in this domain.

Find more info on -
Like us on Facebook:
Follow us on Twitter:

Decision Tree Algorithm,Decision Tree,Decision Tree in python,python machine learning,decision tree machine learning,Machine Learning Algorithms,decision tree in data mining,quantra trading,quantra data science,live trading,algorithmic trading,python for trading,quantinsti,quantitative trading,coding,nodes,rules,regression tree,TWS,IBridgePy,interactive broker,Cross Valdiation,datasets,IPython notebook,Jupyter Notebook,overfitting,train dataset,test data,

Post a Comment

0 Comments