- Research
- Open Access
- Published:

# Goodness-of-fit testing for the inverse Gaussian distribution based on new entropy estimation using ranked set sampling and double ranked set sampling

*Environmental Systems Research*
**volume 1**, Article number: 8 (2012)

## Abstract

### Background

Entropy is a measure of uncertainty and dispersion associated with a random variable. Several goodness-of-fit tests based on entropy are available in literature and the entropy been widely used in many applications.

### Results

Goodness-of-fit test for the inverse Gaussian distribution is studied based on new entropy estimation using simple random sampling (SRS), ranked set sampling (RSS) and double ranked set sampling (DRSS) methods. The critical values of the new tests are obtained using Monte Carlo simulations. The power values of the suggested tests based on several alternative hypotheses using SRS, RSS, and DRSS are also presented. It is observed that the proposed tests are more powerful as compared to the test under SRS. Also, it turns out that the test based on DRSS is superior to the RSS test for all of the cases considered in this study.

### Conclusion

Since the suggested goodness-of-fit tests for the inverse Gaussian distribution using DRSS are more efficient than that based on RSS, one may consider them using multistage RSS.

## Background

Entropy is a measure of uncertainty and dispersion associated with a random variable. It is not uniquely defined, there exist axiom systems that justify the particular entropies. Shannon (1948) defined the entropy *H*(*f*) of the random variable *X* as

where *X* is a continuous random variable with probability density function (pdf) *f*(*x*) and cumulative distribution function (cdf) *F*(*x*). Vasicek (1976) defined *H*(*f*) as

Let${X}_{1},{X}_{2},\dots ,{X}_{n}$ be a simple random sample of size *n* from *F*(*x*) and let${X}_{\left(1\right)}\le {X}_{\left(2\right)}\le \cdots \le {X}_{\left(n\right)}$ be the order statistics of the sample. Vasicek (1976) estimator of *H*(*f*) is given by

where *m* is a positive integer, known as a window size, *m* < *n*/2. Here *X*_{(i)} = *X*_{(1)} if *i* < 1 and *X*_{(i)} = *X*_{(1)} if *i* > *n*. It is of interest to note that$V{E}_{\left(m,n\right)}\to PH\left(f\right)$ as *n* → ∞, *m* → ∞ and *m*/*n* → 0.

Van Es (1992) suggested another entropy estimator based on spacing's, given by

They proved the consistency and asymptotic normality of this estimator under some conditions.

Ebrahimi et al. (1994) suggested a new estimator by assigning different weights in Vasicek (1976) entropy estimator, and proposed the following estimator

where

Based on the simulation study, it is shown that this estimator has smaller bias and mean square error as compared to the Vasicek (1976) entropy estimator. They proved that *EE*_{(m,n)} converges in probability to *H*(*f*) as *n* → ∞, *m* → ∞ and *m/n* → 0.

(Al-Omari AI (2012): Modified entropy estimators using simple random sampling, ranked set sampling and double ranked set sampling, Submitted) suggested a modified estimator of entropy of an unknown continuous pdf *f*(*x*) as

where

Alizadeh (2010) proposed a new estimator of entropy and studied its application in testing normality. Park and Park (2003) considered correcting moments for goodness-of-fit tests for two entropy estimates.

### Inverse Gaussian distribution

A random variable *X* is said to have an inverse Gaussian distribution function *IG* (*x*; *μ, β*), if its pdf is of the following form

where *μ >* 0 is the mean and *β >* 0 is the shape parameter. The variance of *X* is *μ*^{3}*β*. Its characteristic function is given by

The *IG* (*x*; *μ, β*) has many applications in the field, for example see Seshadri (1999), and Folks and Chhikara (1998).

## Method

### The test procedure

Let${X}_{1},{X}_{2},\dots ,{X}_{n}$ be a random sample of size *n* drawn from the pdf *f*(*x*) and let${X}_{\left(1\right)}\le {X}_{\left(2\right)}\phantom{\rule{0.5em}{0ex}}\le \cdots \le {X}_{\left(n\right)}$ be the order statistics of this sample. Our interest is to test that this random sample is coming from an inverse Gaussian population or not. Thus, the composite null hypothesis is *H*_{0}: *X* ~ *IG* (*x*; *μ, β*).

The following corollary is due to Mahdizaheh and Arghami (2010).

*Corollary 1*: Assume that *X* is a random variable has an inverse Gaussian distribution *IG* (*x*; *μ, β*) and let$Y=1/\sqrt{X}$ Then the entropy of *Y* is given by$H\left(f\left(y\right)\right)=log\left(0.5\phantom{\rule{0.12em}{0ex}}\varphi \sqrt{2\pi e}\right)$, where${\varphi}^{2}=1/\beta =E\left({Y}^{2}\right)-1/E\left({Y}^{-2}\right)\text{.}$

The following corollary is due to Mudholkar and Tian (2002).

*Corollary 2*: The random variable *X* with inverse Gaussian distribution *IG* (*x*; *μ, β*) is characterized by the property that$1/\sqrt{X}$ attains the maximum entropy among all nonnegative, absolutely continuous random variables *Y* with a given value at$E\left({Y}^{2}\right)-1/E\left({Y}^{-2}\right)\text{.}$

Let$V{E}_{\left(m,n\right)}\left({f}_{y}\right)$ be the sample estimate of$VE\left({f}_{y}\right)$ for the distribution of$Y=1/\sqrt{X}$ defined as

where${y}_{\left(i\right)}={\left({x}_{\left(n-i+1\right)}\right)}^{-1/2}\phantom{\rule{0.12em}{0ex}}\left(i=1,2,\dots ,n\right)\text{.}$

Mahdizaheh and Arghami (2010) followed Vasicek (1976) and proposed rejecting the null hypothesis *H*_{0}: *X* ~ *IG* (*x*; *μ, β*) if

where$\psi $^{2} is a uniform minimum variance unbiased (UMVU) estimate of *Ø*^{2} defined as

### Suggested test

Let *X*_{i(i)} denote the *i* th order statistic from the *i* th sample$\left(i=1,2,\dots ,n\right)$. Then, the measured RSS units are denoted by *X*_{1(1)}, *X*_{2(2)}, …,*X*_{n(n)}. The cumulative distribution function of *X*_{i(i)} is given by

with probability density function defined as

The mean and the variance of the *i* th order statistic, *X*_{i(i)} can be written respectively as

The ranked set sampling method was suggested by McIntyre (1952) for estimating the mean of pasture and forage yields. The RSS can be described as follows:

Step 1: Select *n* simple random samples each of size *n* from the target population.

Step 2: Without cost, visually rank the units within each sample with respect to the variable of interest.

Step 3: For actual measurement, from the *i* th$\left(i=1,2,\dots ,n\right)$ sample of *n* units, select the *i* th smallest ranked unit. The method is repeated *h* times if needed to increase the sample size to *hn* units.

Al-Saleh and Al-Kadiri (2000) suggested double ranked set sampling (DRSS) method for estimating the population mean. The DRSS can be described as in the following steps:

Step 1 Randomly select *n*^{2} samples each of size *n* from the target population.

Step 2 Apply the RSS method on the *n*^{2} samples obtained in Step 1. This step yields *n* samples each of size *n*.

Step 3 Reapply the RSS method again on the *n* samples obtained on Step 2 to obtain a sample of size *n* from the DRSS data. The cycle can be repeated *h* times if needed to obtain a sample of size *hn* units.

The SRS estimator of the population mean is given by${\widehat{\mu}}_{\mathit{SRS}}={\displaystyle {\sum}_{i=1}^{n}{X}_{i}}/n\text{,}$ with variance$Var\left({\widehat{\mu}}_{\mathit{SRS}}\right)={\sigma}^{2}/n$. The RSS estimator of the population mean is defined as${\widehat{\mu}}_{\mathit{RSS}}={\displaystyle {\sum}_{i=1}^{n}{{X}_{i}}_{\left(i\right)}}/n$, with variance given by$Var\left({\widehat{\mu}}_{\mathit{RSS}}\right)=\frac{{\sigma}^{2}}{n}-\frac{1}{{n}^{2}}{\displaystyle {\sum}_{i=1}^{n}{\left({\mu}_{\left(i\right)}-\mu \right)}^{2}}$. The relative precision (RP) of RSS relative to SRS for estimating the population mean is

Takahasi and Wakimoto (1968) showed that the parent pdf *f* (*x*) and the population mean can be expressed as$f\left(x\right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{f}_{\left(i\right)}\left(x\right),\phantom{\rule{0.5em}{0ex}}}and\phantom{\rule{0.5em}{0ex}}\mu =\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{\mu}_{\left(i\right)}}$, respectively. Also, they showed that$1\le RP\le \frac{m+1}{2}$, where the lower bound is attained if and only if the underlying distribution is degenerate, while the upper bound is attained if and only if the underlying distribution of the data is rectangular.

Al-Saleh and Al-Omari (2002) extended the DRSS for multistage RSS method to increase the efficiency of the estimators for fixed value of the sample size, Al-Omari and Raqab (2012) suggested truncation RSS method for estimating the population mean and median, Al-Omari (2011) suggested double robust extreme RSS for estimating the population mean, Haq and Shabbir (2010) proposed a family of ratio estimators of the population mean using extreme RSS based on two auxiliary variables.

Goodness-of-fit test for the *IG* (*x*; *μ, β*) distribution is considered using SRS, RSS and DRSS methods. Our composite null hypothesis is *H*_{0}: *X* ~ *IG* (*x*; *μ, β*). Following Mudholkar and Tian (2002), we reject *H*_{0} if

where

and

Note that,$A{E}_{\left(m,n\right)}\left({f}_{y}\right)$ is the sample estimate of$AE\left({f}_{y}\right)$. Since the entropy estimators are functions of order statistics, then the entropy estimation using RSS and DRSS involves ordering the RSS units.

## Results and discussion

In this section, a Monte Carlo experiment is presented to investigate the performance of the entropy estimators i.e. *AE*_{(m,n)} as well as *VE*_{(m,n)} and as well as to study the powers of the suggested tests under different alternatives hypotheses. The root mean square errors (RMSEs) and the bias values are obtained for the estimators based on 10,000 samples of sizes *n* = 10, 20, 30 with window sizes 1 ≤ *m* ≤5, 1 ≤ *m* ≤10 and 1 ≤ *m* ≤ 15, respectively.

### Comparison between *VE*_{(m,n)}and *AE*_{(m,n)}

The samples are selected from the uniform, exponential and the standard normal distributions using SRS, RSS and DRSS methods. From Tables 1,2,3,4,5,6, and7 we can see that$A{E}_{\left(m,n\right)}$ is more efficient than$V{E}_{\left(m,n\right)}$ for all cases considered in this study. Also, the DRSS is superior to SRS and RSS. For more details about this comparison see (Al-Omari AI (2012): Modified entropy estimators using simple random sampling, ranked set sampling and double ranked set sampling, Submitted).

We can see that these optimal values are different from Mahdizaheh and Arghami (2010) values where their suggested test is based on Vasicek (1976) entropy estimator. Here, we can conclude that the optimal window size depends on the entropy estimator used for the goodness-of-fit test.

### Power of the tests

The power of the suggested goodness-of-fit tests using SRS, RSS and DRSS is considered here relative to the same alternatives considered by Mahdizaheh and Arghami (2010) for the distributions, exponential(1), uniform(0,1), Weibull(2,1), lognormal(0,2), beta(2,2), and beta(5,2). 10000 samples of sizes *n* = 30, 20, 30 are generated for each method at the significance level 0.05.

Based on Tables 8 and9, we can conclude that gain in the performance of the new suggested tests using different methods considered in this paper is obtained. However, we found that the DRSS is superior to both RSS and SRS methods based on the sample size. Also, the RSS performs better than SRS for all cases considered here. The bold fonts in Tables 8 and9 are the optimal power values for each design with the same sample size. These optimal power values are$<n/2$. However, the optimal values of the window size are 2, 3, 4, 5. For fixed *n*, the power values decreases as *m* increases, while it increases in *n.*

## Conclusion

In this paper, new goodness-of-fit tests for the inverse Gaussian distribution are suggested using SRS, RSS and DRSS based on the maximum entropy characterization. It is found that the new tests are more powerful under RSS and DRSS, and the test under DRSS is superior to the tests under RSS and SRS methods. We recommend using the suggested goodness-of-fit tests for the inverse Gaussian distribution. As the DRSS is better than RSS, the current work can be extended to multistage RSS design and for some other probability distributions.

## References

Alizadeh HN:

**A new estimator of entropy and its application in testing normality.***J Stat Comput Simul*2010,**80:**1151–1162. 10.1080/00949650903005656Al-Omari AI:

**Estimation of mean based on modified robust extreme ranked set sampling.***J Stat Comput Simul*2011,**81**(8):1055–1066. 10.1080/00949651003649161Al-Omari AI, Raqab MZ:

**Estimation of the population mean and median using truncation-based ranked set samples.***Accepted in J Stat Comput Simul*2012. 10.1080/00949655.2012.662684Al-Saleh MF, Al-Kadiri MA:

**Double ranked set sampling.***Stat probability lett*2000,**48**(2):205–212. 10.1016/S0167-7152(99)00206-0Al-Saleh MF, Al-Omari AI:

**Multistage ranked set sampling.***J Stat Planning and Inference*2002,**102**(2):273–286. 10.1016/S0378-3758(01)00086-6Ebrahimi N, Pflughoeft K, Soofi E:

**Two measures of sample entropy.***Stat Probability Lett*1994,**20:**225–234. 10.1016/0167-7152(94)90046-9Folks JL, Chhikara RS:

**The inverse Gaussian distribution and its statistical application-a review.***J R Soc, Series B*1998,**40:**263–289.Haq A, Shabbir J:

**A family of ratio estimators for population mean in extreme ranked set sampling using two auxiliary variables.***SORT*2010,**34**(1):45–64.Mahdizaheh M, Arghami NR:

**Efficiency of ranked set sampling in entropy estimation and goodness-of-fit testing for the inverse Gaussian law.***J Stat Comput Simul*2010,**80**(7):761–774. 10.1080/00949650902773551McIntyre GA:

**A method for unbiased selective sampling using ranked sets.***Australian J Agricultural Res*1952,**3:**385–390. 10.1071/AR9520385Mudholkar GS, Tian L:

**An entropy characterization of the inverse Gaussian distribution and related goodness-of-fit test.***J Stat Planning and Inference*2002,**102:**211–221. 10.1016/S0378-3758(01)00099-4Park S, Park D:

**Correcting moments for goodness of fit tests based on two entropy estimates.***J Stat Comput Simul*2003,**73**(9):685–694. 10.1080/0094965031000070367Seshadri V:

*The inverse Gaussian distribution: Statistical theory and applications*. Springer, New York; 1999.Shannon CE:

**A mathematical theory of communications.***Bell System Technical J*1948,**27**(379–423):623–656.Takahasi K, Wakimoto K:

**On the unbiased estimates of the population mean based on the sample stratified by means of ordering.***Annals of the Institute of Statistical Mathematics*1968,**20:**1–31. 10.1007/BF02911622Van Es B:

**Estimating functionals related to a density by class of statistics based on spacing's.***Scand J Stat*1992,**19:**61–72.Vasicek O:

**A test for normality based on sample entropy.***J Royal Stat Soc B*1976,**38:**54–59.

## Acknowledgment

The authors are grateful to the editors and the anonymous reviewers for their valuable comments and suggestions.

## Author information

## Additional information

### Competing interests

Both authors declared that they have no competing.

### Authors’ contribution

The work presented here was carried out in collaboration between authors. AA carried out the theoretical and discussion of this paper. AH carried out the Monte Carlo simulations. All authors read and approved the final manuscript.

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

#### Received

#### Accepted

#### Published

#### DOI

### Keywords

- Entropy
- Goodness-of-fit test
- Inverse Gaussian
- Root mean square error
- Simple random sampling
- Ranked set sampling
- Double ranked set sampling