MGMT 675
AI-Assisted Financial Analysis

Outline
- Decision tree
- Random forests
- Gradient boosting
- Neural networks
Concepts from last class
- Train and test
- R-squared (score)
- Underfitting and overfitting
- Hyperparameters
- Cross validation
- Scaling and pipelines
- Start with the estimate \(\hat y = \bar y\) for all observations.
- Split into two subsets based on one variable and one threshold. All observations below the threshold go into one group. All above go into another.
- Prediction for each group is the group mean of the target variable. Calculate MSE over both groups.
- Choice of variable and threshold on which to split was based on minimizing MSE.
- Split each subset into further subsets and continue.
Example of Decision Tree Splitting
Another Example
- Ask Julius to read mldata.xlsx.
- Ask Julius to fit a decision tree regressor with max_depth=2 to predict “continuous” from x1 through x100.
- Ask Julius to plot the tree.
- Ask Julius to set x to be an array of 100 standard normals. Ask Julius to show x[:10] and ask Julius what the tree predicts for x.
- Ask Julius to use max_depth=3 and plot the tree.
Forests
- A forest is multiple trees.
- For any observation - old or new - each tree makes a prediction.
- Average the predictions to get the final prediction.
Generating random forests
- A random forest is created by generating random datasets and fitting a tree to each.
- A random dataset is generated by randomly drawing rows from the original dataset.
- Default in scikit-learn is to draw with replacement as many rows as in the original and then drop duplicates.
Example
- Ask Julius to fit random forest regression to predict “continuous” from x1 through x100 with n_estimators=2 and max_depth=2.
- Ask Julius to plot both trees.
- Ask Julius what the random forest predicts for x.
A More Realistic Example
- Ask Julius to fit a random forest regression to predict “continuous” from x1 through x100 (let Julius choose n_estimators and max_depth - will probably use defaults).
- Ask Julius what the score is on the training and test data.
Important Hyperparameters
- How much splitting to do
- Examples:
- max_depth = 3 means split 3 times (\(2^3\) leaves)
- min_samples_split = = 50 means don’t split groups smaller than 50
- min_samples_leaf = 50 means don’t create groups smaller than 50
Other important hyperparameters
- How to split
- criterion (squared error, absolute error, …). Absolute error is less influenced by outlier values for the target variable.
- max_features: Randomly choose n features at each split and split on one of them. Small max_features generates more variation in the trees.
- Number of trees (n_estimators)
Example
- Ask Julius to use GridSearchCV to find the best max_depth in [2, 4, 6, 8, 10]
- Ask what the scores are on the training and test data.
- Ask Julius to plot the test data predictions against x1 in the test data.
- Ask Julius to tell you the feature importances.
Another Example
- Ask Julius to get the Boston house price data from sklearn.
- Build a random forest model to predict MEDV using the other variables.
- GridSearchCV for max_depth
- Get score on test data
- Get feature importances
How Gradient boosting works
- Fit a decision tree.
- Look at its errors. Fit a new decision tree to predict the errors.
- New prediction is original plus a fraction of the prediction of original’s error (fraction = learning rate).
- Look at the errors of the new predictions. Fit a new decision tree to predict these errors.
- Continue …
Key hyperparameters
- Same as random forest
- Plus learning rate
Extreme Gradient Boosting (xgboost)
- Ask Julius to explain xgboost
- Ask Julius to fit xgboost to predict “continuous” from x1 through x100 in mldata.xlsx.
- Ask Julius to use GridSearchCV to find the best max_depth and learning rate.
Ask Julius
- what the score is on the test data
- what the feature importances are
- to plot the actual and predicted target values in the test data against x1 in the test data.
Example of Multi-layer Perceptron