Applications in the Big Data


A main application of Generalized linear models(GLMs) with respect to big data is in the insurance industry for building ratemaking models. Ratemaking is the determination of insurance premiums based on risk characteristics (that are captured in rating variables); GLMs have become the standard for fitting ratemaking models due to their ability to accommodate all rating variables simultaneously, to remove noise or random effects, to provide diagnostics for the models fi t, and to allow for interactions between rating variables. The response variables modeled for ratemaking applications are typically claim counts (the number of claims an individual fi les) with a Poisson distribution, claim amounts with a gamma distribution, or pure premium models that use the Tweedie distribution. These distributions are all members of the exponential family, and the log link function typically relates the response to a linear combination of rating variables, allowing these models to be formulated as GLMs. One large insurance company based in the United States uses GLMs to build ratemaking models on data that contains 150 million policies and 70 degrees of freedom.