eduzhai > Physical Sciences > Physics Sciences >

Using Boosted Regression Trees and Remotely Sensed Data to Drive Decision-Making

  • Save

... pages left unread,continue reading

Document pages: 17 pages

Abstract: Challenges in Big Data analysis arise due to the waythe data are recorded, maintained, processed and stored. We demonstrate that ahierarchical, multivariate, statistical machine learning algorithm, namelyBoosted Regression Tree (BRT) can address Big Data challenges to drive decisionmaking. The challenge of this study is lack of interoperability since the data,a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolatedspatio-temporal information, are stored in monolithic hardware components. Forthe modelling process, it was necessary to create one common input file. Bymerging the data sources together, a structured but noisy input file, showinginconsistencies and redundancies, was created. Here, it is shown that BRT canprocess different data granularities, heterogeneous data and missingness. Inparticular, BRT has theadvantage of dealing with missing data by default by allowing a split onwhether or not a value is missing as well as what the value is. Mostimportantly, the BRT offers a wide range of possibilities regarding theinterpretation of results and variable selection is automatically performed byconsidering how frequently a variable is used to define a split in the tree. Acomparison with two similar regression models (Random Forests and LeastAbsolute Shrinkage and Selection Operator, LASSO) shows that BRT outperforms these in this instance. BRTcan also be a starting point for sophisticated hierarchical modelling in realworld scenarios. For example, a single or ensemble approach of BRT could betested with existing models in order to improve results for a wide range ofdata-driven decisions and applications.

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×