eduzhai > Physical Sciences > Physics Sciences >

An Empirical Study of Downstream Analysis Effects of Model Pre-Processing Choices

  • Save

... pages left unread,continue reading

Document pages: 75 pages

Abstract: This study uses an empirical analysis to quantify the downstream analysis effects of data pre-processingchoices. Bootstrap data simulation is used to measure the bias-variancedecomposition of an empirical risk function, mean square error (MSE). Resultsof the risk function decomposition are used to measure the effects of modeldevelopment choices on model bias,variance, and irreducible error. Measurements of bias and variance are thenapplied as diagnostic procedures for model pre-processing and development. Bestperforming model-normalization-data structure combinations were found toillustrate the downstream analysis effects of these model development choices. In additions,results found from simulations were verified and expanded to include additionaldata characteristics (imbalanced, sparse) by testing on benchmark datasetsavailable from the UCI Machine Learning Library. Normalization results onbenchmark data were consistent with those found using simulations, while alsoillustrating that more complex and or non-linear models provide betterperformance on datasets with additional complexities. Finally, applying thefindings from simulation experiments to previously tested applications led toequivalent or improved results with less model development overhead and processingtime.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...