Tree Boosting With XGBoost - Why Does XGBoost Win "Every" Machine Learning Competition?

Nielsen, Didrik

Nielsen, Didrik

Master thesis

Åpne

16128_FULLTEXT.pdf (2.119Mb)

16128_COVER.pdf (1.556Mb)

Permanent lenke

http://hdl.handle.net/11250/2433761

Utgivelsesdato

2016

Metadata

Vis full innførsel

Samlinger

Institutt for matematiske fag [2532]

Sammendrag

Tree boosting has empirically proven to be a highly effective approach to predictive modeling.

It has shown remarkable results for a vast array of problems.

For many years, MART has been the tree boosting method of choice.

More recently, a tree boosting method known as XGBoost has gained popularity by winning numerous machine learning competitions.

In this thesis, we will investigate how XGBoost differs from the more traditional MART.

We will show that XGBoost employs a boosting algorithm which we will term Newton boosting. This boosting algorithm will further be compared with the gradient boosting algorithm that MART employs.

Moreover, we will discuss the regularization techniques that these methods offer and the effect these have on the models.

In addition to this, we will attempt to answer the question of why XGBoost seems to win so many competitions.

To do this, we will provide some arguments for why tree boosting, and in particular XGBoost, seems to be such a highly effective and versatile approach to predictive modeling.

The core argument is that tree boosting can be seen to adaptively determine the local neighbourhoods of the model. Tree boosting can thus be seen to take the bias-variance tradeoff into consideration during model fitting. XGBoost further introduces some subtle improvements which allows it to deal with the bias-variance tradeoff even more carefully.

Utgiver

NTNU