Home

Loss severity distributions

Kernel density estimation in R

Catalina Bolancé & Montserrat Guillén

This page presents an introduction to kernel density estimation using data from the book Quantitative Operational Risk Models. A methodological overview can be found in:

Bolancé, C., Guillén, M. and Nielsen, J.P. (2003) Kernel density estimation of actuarial loss functions, Insurance: Mathematics and Economics, 32, 19-36.
Bolancé C., Guillén, M. and Pitt, D. (2014) Non-parametric models for univatiate claim severity distributions - an approach using R. UBriskcenter Working Paper Series 2014-01.

DATA DESCRIPTION

Name	Content of operational risk loss data
Internal data set	75 observed loss amounts
External data set	700 observed loss amounts
Public data risk no. 1	1000 observed loss amounts for category no. 1
Public data risk no. 2	400 observed loss amounts for category no. 2

Download Internal data set here.	Download Public data risk no.1 here.
Download External data set here.	Download Public data risk no.2 here.

Download all files in EXCEL format here (.zip).

Download all files in CVS format, ready
for R scripts, here (.zip).

DESCRIPTIVE STATISTICS

	Internal data	External data	Public data risk no. 1	Public data risk no. 2
N	75	700	1000	400
Mean	0.1756	0.6788	42.0594	20.8933
Std Deviation	0.2777	4.0937	291.9634	95.9138
Min	0.0030	0.0010	0.0020	0.0030
Max	1.7730	52.1300	5122.1360	1027.5270

Download the R script here.

Download the results here.

KERNEL DENSITY ESTIMATION

- Classical kernel density estimation

For a random sample of n independent and identically distributed observations x₁, x₂,..., x_n of a random variable X with pdf f_X, the kernel density estimator is
\begin{equation} \hat{f}_X\left( x\right) =\frac{1}{n}\sum_{i=1}^{n}K_{b}\left(x-x_{i}\right), \label{Kerdens1} \end{equation}
where
\begin{equation}K_b(\cdot )=\frac{1}{b}K(\cdot /b)\end{equation}
K is the kernel function and b is the bandwidth.

Download the R script here

Download the results here.

- Transformations and kernel density estimation

Classical kernel density estimation does not generally perform well when the true density is asymmetric. For instance, when one is interested in the density of the claim cost variable, the presence of many small claims produces a concentration of mass near the low values of the domain and the presence of some very large claims causes positive skewness.

The lack of information in the right tail of the domain makes it difficult to obtain a reliable nonparametric estimate of the density in that area. Many authors have worked with heavy-tailed distributions and have adapted kernel estimation methods to this context. Different papers have proposed different transformed kernel estimation (TKE) methods for a pdf, based on parametric families (see Wand et al. (1991), Clements et al. (2003), Buch-Larsen et al. (2005), Bolancé (2010), Bolancé et al. (2003), (2008) and (2009)). Here we implement the shifted power transformation family as in Bolancé et al. (2003).

Download the R script here

Download the results here.

- Selecting the transformation parameters and the bandwidth

We select λ=(λ₁,λ₂) and transform the original sample y_i = (x + λ₁)^λ₂ sign(λ₂) if (λ₂ ≠0) and y_i = ln(x + λ₁) if (λ₂ =0), i = 1,...,n.
As in Bolancé et al. (2003), we restrict the set of transformation parameters, λ=(λ₁,λ₂), to those values that give approximately zero skewness for the transformed data (y₁,..,y_n) (which have also been scaled to have the same variance as the original sample, see Wand et al. (1991)).

We define our sample measure of skewness as: \begin{equation} \widehat{\gamma }_{y}=\frac{n^{-1}\sum\limits_{i=1}^{n}(y_{i}-\overline{y})^{3}}{\left\{ n^{-1}\sum\limits_{i=1}^{n}(y_{i}-\overline{y} )^{2}\right\} ^{\frac{3}{2}}} \end{equation}
where y̅ is the sample mean of the transformed observations.

Download the R script here

Download the results here.

MEASURING THE GOODNESS OF FIT

We are interested in evaluating the quality of our density estimates obtained using both parametric and non-parametric methods over the whole domain. Let us begin with the log-likelihood function. This function is not appropriate for evaluating the performance of non-parametric density estimates. The log-likelihood depends on the values of the density exclusively at sample points. In kernel estimation the density shows bumps around isolated sample observations.

The parametric estimates of the normal and the lognormal densities have been obtained using maximum likelihood estimation. We can compare the difference between the value of log-likelihood under several estimation alternatives, i.e. for any given pdf estimate, we can compute the sum of the logarithm of the estimated density at the sample points. This will provide with a straightforward measure of comparative goodness of fit. Additionaly an estimated integrated square error (ISE) and an estimated weighted integrated square error (WISE) are presented.

Download the R script here

Download the results here.

REFERENCES

[1] Bolancé, C. (2010) Optimal Inverse Beta(3,3) Transformation in kernel density estimation, SORT Statistics and Operations Research Transaction, 34, 223-238.

[2] Bolancé, C., Guillén, M., Gustafsson J. and Nielsen, J.P. (2012) Quantitative Operational Risk Models Chapman & Hall/CRC.

[3] Bolancé, C., Guillén, M. and Nielsen, J.P. (2009) Transformation kernel estimation of insurance claim cost distribution, in Corazza, M. and Pizzi, C. (Eds), Mathematical and Statistical Methods for Actuarial Sciences and Finance, Springer, Roma, 223-231.

[4] Bolancé, C., Guillén, M. and Nielsen, J.P. (2008) Inverse Beta transformation in kernel density estimation. Statistics & Probability Letters, 78, 1757-1764.

[5] Bolancé, C., Guillén, M. and Nielsen, J.P. (2003) Kernel density estimation of actuarial loss functions, Insurance: Mathematics and Economics, 32, 19-36.

[6] Bolancé, C., Guillén, M. and Pitt, D. (2014) Non-parametric models for univariate claim severity distributions - an approach using R, UB Riskcenter Working Papers Series 2014-01.

[7] Buch-Larsen, T., Guillen, M., Nielsen, J.P. and Bolancé, C. (2005) Kernel density estimation for heavy-tailed distributions using the Champernowne transformation. Statistics, 39, 503-518.

[8] Clements, A.E., Hurn, A.S. and Lindsay, K.A. (2003) Möebius-like mappings and their use in kernel density estimation, Journal of the American Statistical Association, 98, 993-1000.

Universitat de Barcelona - Last Updated: 05-23-2014