ALPHA-SKEW-NORMAL DISTRIBUTION

The main object of this paper is to introduce an alternative form of generate asymmetry in the normal distribution that allows to fit unimodal and bimodal data sets. Basic properties of this new distribution, such as stochastic representation, moments, maximum likelihood and the singularity of the Fisher information matrix are studied. The methodology developed is illustrated with a real application.


Introduction
The univariate skew-normal (SN) distribution has been studied by Azzalini (1985Azzalini ( , 1986)), Henze (1986), Pewsey (2000), and others, and synthetized in the book edited by Genton (2004).In the univariate case, this type of distributions are denominated skew-symmetric, and they have been used in multiple applications to study the asymmetric behavior of empirical data sets coming from different research areas.Thus, in the last years, different families of skew-symmetric distributions have been generated, some of which are related with the SN model introduced by Azzalini (1985).Examples of these families are those considered by Arellano-Valle et al. (2004) and Gómez et al. (2006).Most of those classes include the normal distribution as a particular case and satisfy similar properties as the normal family.Mudholkar and Hutson (2000) proposed an asymmetric normal family of distributions with a different structure of the SN class considered by Azzalini (1985), which is called epsilon-skew-normal (ESN) and is denoted {ESN ( ) : | | < 1} where represents the asymmetry parameter, so that ESN (0) corresponds to the normal distribution.On the other hand, there has been a number of works exploring bimodality arising from skew distributions.See Ma and Genton (2004), Azzalini and Capitonio (2003) The aim of this article is to introduce a new family of distributions that is flexible enough to support both unimodal and bimodal shape.Many data sets arising in practice can be adequately modeled in this way and so the proposal plays a unifying role in this context.This new family is called alpha-skew-normal (ASN) and is denoted by {ASN (α) : α ∈ R} where α represents the asymmetric parameter with effect of uni-bimodality, so that ASN (0) corresponds to the normal distribution.
The rest of this article is organized as follow.Section 2 presents the new family, and develops its main property.In particular, we show how uni-bimodality shape are obtained.Section 3, shows important probabilistic properties of the new family distribution, including the stochastic representation.Section 4 considers maximum likelihood estimation, with emphasis on the derivation of the Fisher information matrix.Real data applications are reported in section 5.In what follows, density and accumulative function of the standard normal distribution will be expressed as φ(•) and Φ(•) respectively.

Alpha-Skew-Normal Distribution
then we say that Y is a bimodal-normal random variable.We denote this as Y ∼ BN .Definition 2.2.If a random variable X has density function, where α ∈ R, then we say that X is a alpha-skew-normal random variable with parameter α.We denote this as X ∼ ASN(α).
If X ∼ ASN (α), the following properties are deduced immediately from the definition: thus, f 0 (x) has at most three zeros, hence f (x) has at most two modes.
Remark 2.1.Considering the cubic term inside the square brackets in f 0 (x), in proof above, and applying a numerical method, we conclude that the transition from bimodality to unimodality of f (x) occurs around α = ±1.34.Proposition 2.2.Let X ∼ ASN (α), then on the other hand The definition of γ 1 and γ 2 , leads to , and applying a numerical method, we obtain (2.5) Proof 2.4. ´.

Stochastic Representation
The next proposition shows a stochastic representation for the model BN .Proposition 3.1.Let T , V be two independent random variables, where T ∼ χ 2 (3) and V is such that on the other hand, if Y = HV is easy to show that the density function of Y is f (y) = y 2 φ(y), y ∈ R, which proves the required result.
Remark 3.1.From the above proof we can deduce that: where ±χ 3 is the square root of the χ 2 3 random variable, with a random sign attached.
2), of model ASN (α), can be represented as sum of two functions, as shown below where the first summand is a symmetric density function, which is considered in the following definition.
Definition 3.1.If a random variable S has density function, where α ∈ R, then we say that S is a symmetric-component random variable of the model ASN (α).We denote this as S ∼ SCASN (α).
If S ∼ SCASN (α), the following properties are deduced immediately from the definition: Remark 3.3.Note that the density function (3.1) is a mixture between a normal density and a bimodal-normal density, as shown below and M Y (t) moment generating functions of Z ∼ N (0, 1) and Y ∼ BN respectively.Considering the previous remark, we can deduced that The next proposition shows a stochastic representation for the model SCASN (α).Proposition 3.4.Let Y ∼ BN and Z ∼ N (0, 1) independent random variables.If which is the moment generating function of SCASN (α) model.
The stochastic representations, of the models BN and SCASN (α), presented in propositions 3.1 and 3.4, make possible the application of "acceptance-rejection" method to generate random numbers for model ASN (α), algorithm that is described in the next proposition Proposition 3.5.(The "acceptance-rejection" algorithm ) Let f (x) density function of X ∼ ASN (α) and f 1 (x) density function of S ∼ SCASN (α), with To generate a random variable X ∼ ASN (α): 2)(2+α 2 S 2 ) , set X = S; otherwise, return to step a.. Remark 3.4.Given that is easy to prove that and therefore on the other hand so the number of trials needed to generate one X is a geometric( 1 M ) random variable, and M = 2+ √ 2 2 = 1.707 is the expected number of trials.
The distribution function of (3.2) is given by , and the moment of order n by where X ∼ ASN (α).

Maximum likelihood estimation
This section concerns likelihood inference about the parameter θ = (µ, σ, α) of the location-scale family defined in (3.2),In particular, the Fisher information matrix of the maximum likelihood estimators (MLEs) of these parameter is obtained.Let Z 1 , Z 2 , ..., Z n be a random sample from the ASN (θ).Thus, the likelihood function is given by where f (Z i |θ) is given by (3.2).The MLE of θ is obtained maximizing (4.1), for which a numerical algorithm is necessary.In this work, the subroutine nlminb of the software S-PLUS is used.
The Fisher information for the parameter θ = (µ, σ, α) is easily computed, obtaining where which has to be evaluated numerically.
On the other hand, in the normal model when α = 0, the Fisher information matrix is Note from this matrix that the column corresponding to the parameters µ and α are linearly dependent, implying that it is a singular matrix.This irregularity is discussed by Azzalini (1985) in the context of the SN model, and afterwards it is studied systematically by Chiogna (2005) in some other context.Di Ciccio and Monti (2004) studied this singularity problem in the context of the skew-exponential power distribution and Salinas et al. (2007) generalized it in the context of the extended skew-exponential power distribution.In this latter context, the authors make use of the methodology considered in Rotnitzky et al. (2000), so that an adequate reparametrization can be found for which the asymptotic properties of the maximum likelihood estimators remain valid.
Using the same procedure as studied in Chiogna (2005), Di Ciccio and Monti (2004) and Salinas et al. (2007), start by supposing that the parameter value is θ * = (µ * , σ * , 0), that is, data set has been generated by a normal distribution with mean value µ * and variance σ * 2 .To θ = θ * , leading to the score vector S θ (θ The linear dependence between the components S * µ and S * α causes the singularity problem for the information matrix.The general result in Rotnitzky et al. (2000) concerning the maximum likelihood estimator for the parameter vector θ = (θ 1 , θ 2 , ..., θ q ) can be applied in this context to establish consistency and asymptotic normality for the MLEs.Rotnitzky et al. (2000) derive the asymptotic distributions for the MLEs θ = (θ 1 , θ 2 , ..., θ q ) under two conditions: one, the components of the score vector, say S θ 1 , is zero for θ = θ * and, secondly, higher order partial derivatives of S θ 1 with respect to θ 1 is possibly zero at this point; however, the first derivative is not a linear combination of the other components of S θ 2 , S θ 3 , ..., S θ q .Even though the above conditions are not valid for the ASN normal model, it is possible to use the iterative procedure proposed in Rotnitzky et al. (2000) to find a reparametrization so that such conditions are satisfied.This procedure has been detailed in Chiogna (2005) and by Di Ciccio and Monti (2004).After extensive algebraic manipulations, it can be shown that such reparametrization is given by e 1.The MLE of θ is unique with probability tending to 1, and it is consistent.
2. The likelihood ratio statistic for testing the simple null hypothesis H 0 : θ = θ * converges in distribution to the χ 2 3 .
3. The random vector 3 ) where (Z 1 , Z 2 , Z 3 ) is a normal random vector with mean zero and covariance matrix equals to the inverse of the covariance matrix of the vector which is given by

An illustrative application
In this section, we illustrate the use of the estimation procedures described in the previous section.The variable to be considered is the average length of stay for patients who are in hospital for acute care because of problems, hepatobiliary system and pancreas, and die for this cause.The sample, under study, corresponds to 1082 hospitals in 10 states of the United States.For more information see columns 4 in http://lib.stat.cmu.edu/data-expo/1997/ascii/p07.dat.Table 1 shows descriptive statistics for the data set.The models N (µ, σ), ASN (µ, σ, α) and SN(µ, σ, α) are fitted to the data set using the maximum likelihood approach, where SN(µ, σ, α) is the skew-normal model introduced by Azzalini (1985).Results are reported in table 2 with the estimators standard errors being estimated by using the observed information matrix.The likelihood radio test (LRT) for the hypothesis H 0 : α = 0 (N (µ, σ) model) versus H 1 : α 6 = 0 (ASN (µ, σ, α) model) is such that −2 log(LRT ) = −2(−2023.653+ 2018.41)= 10.486.Hence, comparing this quantity with the 95% critical value, namely χ 2 (1) = 3.84, there is sufficient evidence to reject the null hypothesis, that is, parameter α is significantly different from zero, concluding that ASN gives a better fit to the data than the normal model.Since the models ASN (µ, σ, α) and SN(µ, σ, α) are not nested the Akaike Information Criterion (AIC) has been used for comparison.According to this criterion, the ASN model provides a better fit than the SN model (4042.82< 4053.306).These conclusions are also corroborated by the plots of the fitted densities (using the maximum likelihood estimators), presented in Figure 2.

Figure 1 :
Figure 1 : Plots of the skewed density ASN (α) for different choice of α.

Figure 2 :
Figure 2 : The histograms correspond to the average length of stay of patients in 1082 hospitals.The lines represent fitted distributions using maximum likelihood: ASN(b µ, b σ, b α) (solid line), N(b µ, b σ) (dotted line in (a)) and SN(b µ, b σ, b α) (dotted line in (b)).
Definition 2.1.If a random variable Y has density function,

Table 5 .
1: Summary statistics of the average length of stay of patients in 1082 hospitals, where g 1 and g 2 represent the coefficients of asymmetry and kurtosis respectively.