# Goodness-of-fit testing for the inverse Gaussian distribution based on new entropy estimation using ranked set sampling and double ranked set sampling

## Abstract

### Background

Entropy is a measure of uncertainty and dispersion associated with a random variable. Several goodness-of-fit tests based on entropy are available in literature and the entropy been widely used in many applications.

### Results

Goodness-of-fit test for the inverse Gaussian distribution is studied based on new entropy estimation using simple random sampling (SRS), ranked set sampling (RSS) and double ranked set sampling (DRSS) methods. The critical values of the new tests are obtained using Monte Carlo simulations. The power values of the suggested tests based on several alternative hypotheses using SRS, RSS, and DRSS are also presented. It is observed that the proposed tests are more powerful as compared to the test under SRS. Also, it turns out that the test based on DRSS is superior to the RSS test for all of the cases considered in this study.

### Conclusion

Since the suggested goodness-of-fit tests for the inverse Gaussian distribution using DRSS are more efficient than that based on RSS, one may consider them using multistage RSS.

## Background

Entropy is a measure of uncertainty and dispersion associated with a random variable. It is not uniquely defined, there exist axiom systems that justify the particular entropies. Shannon (1948) defined the entropy H(f) of the random variable X as

$H\left(f\right)=âˆ’{âˆ«}_{âˆ’\mathrm{âˆž}}^{\mathrm{âˆž}}f\left(x\right)logf\left(x\right)dx\text{,}$
(1)

where X is a continuous random variable with probability density function (pdf) f(x) and cumulative distribution function (cdf) F(x). Vasicek (1976) defined H(f) as

$H\left(f\right)={âˆ«}_{0}^{1}log\left(\frac{\mathrm{d}}{\mathrm{d}p}{F}^{âˆ’1}\left(p\right)\right)dp\text{.}$
(2)

Let${X}_{1},{X}_{2},â€¦,{X}_{n}$ be a simple random sample of size n from F(x) and let${X}_{\left(1\right)}â‰¤{X}_{\left(2\right)}â‰¤â‹¯â‰¤{X}_{\left(n\right)}$ be the order statistics of the sample. Vasicek (1976) estimator of H(f) is given by

$V{E}_{\left(m,n\right)}=\frac{1}{n}{âˆ‘}_{i=1}^{n}log\left\{\frac{n}{2m}\left({X}_{\left(i+m\right)}âˆ’{X}_{\left(iâˆ’m\right)}\right)\right\}\text{,}$
(3)

where m is a positive integer, known as a window size, mâ€‰<â€‰n/2. Here X(i)â€‰=â€‰X(1) if iâ€‰<â€‰1 and X(i)â€‰=â€‰X(1) if iâ€‰>â€‰n. It is of interest to note that$V{E}_{\left(m,n\right)}â†’PH\left(f\right)$ as nâ€‰â†’â€‰âˆž, mâ€‰â†’â€‰âˆž and m/nâ€‰â†’â€‰0.

Van Es (1992) suggested another entropy estimator based on spacing's, given by

$V{E}_{\left(m,n\right)}=\frac{1}{nâˆ’m}{âˆ‘}_{i=1}^{nâˆ’m}log\left(\frac{n+1}{m}\left({X}_{\left(i+m\right)}âˆ’{X}_{\left(i\right)}\right)\right)+{âˆ‘}_{k=m}^{n}\frac{1}{k}+log\left(\frac{m}{n+1}\right)\text{.}$
(4)

They proved the consistency and asymptotic normality of this estimator under some conditions.

Ebrahimi et al. (1994) suggested a new estimator by assigning different weights in Vasicek (1976) entropy estimator, and proposed the following estimator

$E{E}_{\left(m,n\right)}=\frac{1}{n}{âˆ‘}_{i=1}^{n}log\left(\frac{n}{{c}_{i}m}\left({X}_{\left(i+m\right)}âˆ’{X}_{\left(iâˆ’m\right)}\right)\right)\text{,}$
(5)

where

${c}_{i}=\left\{\begin{array}{l}1+\frac{iâˆ’1}{m},1â‰¤iâ‰¤m,\\ 2,m+1â‰¤iâ‰¤nâˆ’m,\\ 1+\frac{nâˆ’i}{m},nâˆ’m+1â‰¤iâ‰¤n\text{.}\end{array}$

Based on the simulation study, it is shown that this estimator has smaller bias and mean square error as compared to the Vasicek (1976) entropy estimator. They proved that EE(m,n) converges in probability to H(f) as nâ€‰â†’â€‰âˆž, mâ€‰â†’â€‰âˆž and m/nâ€‰â†’â€‰0.

(Al-Omari AI (2012): Modified entropy estimators using simple random sampling, ranked set sampling and double ranked set sampling, Submitted) suggested a modified estimator of entropy of an unknown continuous pdf f(x) as

$A{E}_{\left(m,n\right)}=\frac{1}{n}{âˆ‘}_{i=1}^{n}log\left(\frac{n}{{c}_{i}m}\left({X}_{\left(i+m\right)}âˆ’{X}_{\left(iâˆ’m\right)}\right)\right)\text{,}$
(6)

where

${c}_{i}=\left\{\begin{array}{l}1+\frac{1}{2},\phantom{\rule{0.6em}{0ex}}1â‰¤iâ‰¤m,\\ 2,\phantom{\rule{1.56em}{0ex}}m+1â‰¤iâ‰¤nâˆ’m,\\ 1+\frac{1}{2},\phantom{\rule{0.6em}{0ex}}nâˆ’m+1â‰¤iâ‰¤n\text{.}\end{array}$

Alizadeh (2010) proposed a new estimator of entropy and studied its application in testing normality. Park and Park (2003) considered correcting moments for goodness-of-fit tests for two entropy estimates.

### Inverse Gaussian distribution

A random variable X is said to have an inverse Gaussian distribution function IG (x; Î¼, Î²), if its pdf is of the following form

$f\left(x\right)=\sqrt{\frac{\mathrm{Î²}}{2\mathrm{Ï€}{x}^{3}}}exp\left(âˆ’\frac{\mathrm{Î²}}{2{\mathrm{Î¼}}^{2}x}{\left(xâˆ’\mathrm{Î¼}\right)}^{2}\right),\phantom{\rule{0.5em}{0ex}}\mathrm{f}\mathrm{o}\mathrm{r}\phantom{\rule{0.5em}{0ex}}x>0\text{,}$
(7)

where Î¼ > 0 is the mean and Î²â€‰>â€‰0 is the shape parameter. The variance of X is Î¼3Î². Its characteristic function is given by

${\mathrm{Ï•}}_{x}\left(t\right)=exp\left(\frac{\mathrm{Î²}}{\mathrm{Î¼}}âˆ’\sqrt{\mathrm{Î²}}\sqrt{\frac{\mathrm{Î²}}{{\mathrm{Î¼}}^{2}}âˆ’2it}\right)\text{.}$

The IG (x; Î¼, Î²) has many applications in the field, for example see Seshadri (1999), and Folks and Chhikara (1998).

## Method

### The test procedure

Let${X}_{1},{X}_{2},â€¦,{X}_{n}$ be a random sample of size n drawn from the pdf f(x) and let${X}_{\left(1\right)}â‰¤{X}_{\left(2\right)}\phantom{\rule{0.5em}{0ex}}â‰¤â‹¯â‰¤{X}_{\left(n\right)}$ be the order statistics of this sample. Our interest is to test that this random sample is coming from an inverse Gaussian population or not. Thus, the composite null hypothesis is H0: Xâ€‰~â€‰IG (x; Î¼, Î²).

The following corollary is due to Mahdizaheh and Arghami (2010).

Corollary 1: Assume that X is a random variable has an inverse Gaussian distribution IG (x; Î¼, Î²) and let$Y=1/\sqrt{X}$ Then the entropy of Y is given by$H\left(f\left(y\right)\right)=log\left(0.5\phantom{\rule{0.12em}{0ex}}\mathrm{Ï•}\sqrt{2\mathrm{Ï€}e}\right)$, where${\mathrm{Ï•}}^{2}=1/\mathrm{Î²}=E\left({Y}^{2}\right)âˆ’1/E\left({Y}^{âˆ’2}\right)\text{.}$

The following corollary is due to Mudholkar and Tian (2002).

Corollary 2: The random variable X with inverse Gaussian distribution IG (x; Î¼, Î²) is characterized by the property that$1/\sqrt{X}$ attains the maximum entropy among all nonnegative, absolutely continuous random variables Y with a given value at$E\left({Y}^{2}\right)âˆ’1/E\left({Y}^{âˆ’2}\right)\text{.}$

Let$V{E}_{\left(m,n\right)}\left({f}_{y}\right)$ be the sample estimate of$VE\left({f}_{y}\right)$ for the distribution of$Y=1/\sqrt{X}$ defined as

$V{E}_{\left(m,n\right)}\left({f}_{y}\right)=\frac{1}{n}{âˆ‘}_{i=1}^{n}\mathrm{L}\mathrm{o}\mathrm{g}\left(\frac{n}{2m}\left({y}_{\left(i+m\right)}âˆ’{y}_{\left(iâˆ’m\right)}\right)\right)\text{,}$
(8)

where${y}_{\left(i\right)}={\left({x}_{\left(nâˆ’i+1\right)}\right)}^{âˆ’1/2}\phantom{\rule{0.12em}{0ex}}\left(i=1,2,â€¦,n\right)\text{.}$

Mahdizaheh and Arghami (2010) followed Vasicek (1976) and proposed rejecting the null hypothesis H0: Xâ€‰~â€‰IG (x; Î¼, Î²) if

${K}_{\left(m,n\right)}\left({f}_{y}\right)=\frac{2exp\left(V{E}_{\left(m,n\right)}\left({f}_{y}\right)\right)}{\mathrm{Ïˆ}}â‰¤{K}_{\left(m,n,\mathrm{Î±}\right)}^{*}\left({f}_{y}\right)\text{,}$
(9)

where$\mathrm{Ïˆ}$2 is a uniform minimum variance unbiased (UMVU) estimate of Ã˜2 defined as

$\begin{array}{c}\hfill {\mathrm{Ïˆ}}^{2}=\frac{1}{nâˆ’1}âˆ‘\left(1/{x}_{i}âˆ’1/\stackrel{Â¯}{x}\right)\hfill \\ \hfill =\frac{1}{n=1}\left({âˆ‘}_{i=1}^{n}{y}_{i}^{2}âˆ’{n}^{2}{\left({âˆ‘}_{i=1}^{n}y{i}^{âˆ’2}\right)}^{âˆ’1}\right)\text{.}\hfill \end{array}$
(10)

### Suggested test

Let Xi(i) denote the i th order statistic from the i th sample$\left(i=1,2,â€¦,n\right)$. Then, the measured RSS units are denoted by X1(1), X2(2), â€¦,Xn(n). The cumulative distribution function of Xi(i) is given by

${F}_{\left(i\right)}\left(x\right)={âˆ‘}_{j=i}^{n}\left(\begin{array}{c}\hfill n\hfill \\ \hfill j\hfill \end{array}\right)\phantom{\rule{0.5em}{0ex}}{F}^{j}\left(x\right){\left(1âˆ’F\left(x\right)\right)}^{nâˆ’j},âˆ’\mathrm{âˆž}

with probability density function defined as

${f}_{\left(i\right)}\left(x\right)=n\left(\begin{array}{l}nâˆ’1\\ iâˆ’1\end{array}\right){F}^{iâˆ’1}\left(x\right){\left(1âˆ’F\left(x\right)\right)}^{nâˆ’i}f\left(x\right),\phantom{\rule{0.5em}{0ex}}âˆ’\mathrm{âˆž}

The mean and the variance of the i th order statistic, Xi(i) can be written respectively as

$\mathrm{Î¼}\left(i\right)={{âˆ«}_{âˆ’\mathrm{âˆž}}^{\mathrm{âˆž}}}_{xf\left(i\right)}\left(x\right)\mathrm{d}x,\phantom{\rule{0.5em}{0ex}}\mathrm{a}\mathrm{n}\mathrm{d}\phantom{\rule{0.5em}{0ex}}{\mathrm{Ïƒ}}_{\left(i\right)}^{2}={{âˆ«}_{âˆ’\mathrm{âˆž}}^{\mathrm{âˆž}}\left(xâˆ’{\mathrm{Î¼}}_{\left(i\right)}\right)}^{2}{f}_{\left(i\right)}\left(x\right)\mathrm{d}x\text{.}$

The ranked set sampling method was suggested by McIntyre (1952) for estimating the mean of pasture and forage yields. The RSS can be described as follows:

Step 1: Select n simple random samples each of size n from the target population.

Step 2: Without cost, visually rank the units within each sample with respect to the variable of interest.

Step 3: For actual measurement, from the i th$\left(i=1,2,â€¦,n\right)$ sample of n units, select the i th smallest ranked unit. The method is repeated h times if needed to increase the sample size to hn units.

Al-Saleh and Al-Kadiri (2000) suggested double ranked set sampling (DRSS) method for estimating the population mean. The DRSS can be described as in the following steps:

Step 1 Randomly select n2 samples each of size n from the target population.

Step 2 Apply the RSS method on the n2 samples obtained in Step 1. This step yields n samples each of size n.

Step 3 Reapply the RSS method again on the n samples obtained on Step 2 to obtain a sample of size n from the DRSS data. The cycle can be repeated h times if needed to obtain a sample of size hn units.

The SRS estimator of the population mean is given by${\stackrel{^}{\mathrm{Î¼}}}_{\mathit{SRS}}={âˆ‘}_{i=1}^{n}{X}_{i}/n\text{,}$ with variance$Var\left({\stackrel{^}{\mathrm{Î¼}}}_{\mathit{SRS}}\right)={\mathrm{Ïƒ}}^{2}/n$. The RSS estimator of the population mean is defined as${\stackrel{^}{\mathrm{Î¼}}}_{\mathit{RSS}}={âˆ‘}_{i=1}^{n}{{X}_{i}}_{\left(i\right)}/n$, with variance given by$Var\left({\stackrel{^}{\mathrm{Î¼}}}_{\mathit{RSS}}\right)=\frac{{\mathrm{Ïƒ}}^{2}}{n}âˆ’\frac{1}{{n}^{2}}{âˆ‘}_{i=1}^{n}{\left({\mathrm{Î¼}}_{\left(i\right)}âˆ’\mathrm{Î¼}\right)}^{2}$. The relative precision (RP) of RSS relative to SRS for estimating the population mean is

$RP=\text{Var}\mathrm{Î¼}^\mathit{SRS}\text{Var}\mathrm{Î¼}^\mathit{RSS}=1âˆ’i=1n\mathrm{Î¼}iâˆ’\mathrm{Î¼}2n{\mathrm{Ïƒ}}^{2}\text{.}$

Takahasi and Wakimoto (1968) showed that the parent pdf f (x) and the population mean can be expressed as$f\left(x\right)=\frac{1}{n}{âˆ‘}_{i=1}^{n}{f}_{\left(i\right)}\left(x\right),\phantom{\rule{0.5em}{0ex}}and\phantom{\rule{0.5em}{0ex}}\mathrm{Î¼}=\frac{1}{n}{âˆ‘}_{i=1}^{n}{\mathrm{Î¼}}_{\left(i\right)}$, respectively. Also, they showed that$1â‰¤RPâ‰¤\frac{m+1}{2}$, where the lower bound is attained if and only if the underlying distribution is degenerate, while the upper bound is attained if and only if the underlying distribution of the data is rectangular.

Al-Saleh and Al-Omari (2002) extended the DRSS for multistage RSS method to increase the efficiency of the estimators for fixed value of the sample size, Al-Omari and Raqab (2012) suggested truncation RSS method for estimating the population mean and median, Al-Omari (2011) suggested double robust extreme RSS for estimating the population mean, Haq and Shabbir (2010) proposed a family of ratio estimators of the population mean using extreme RSS based on two auxiliary variables.

Goodness-of-fit test for the IG (x; Î¼, Î²) distribution is considered using SRS, RSS and DRSS methods. Our composite null hypothesis is H0: Xâ€‰~â€‰IG (x; Î¼, Î²). Following Mudholkar and Tian (2002), we reject H0 if

${K}_{\left(m,n\right)}\left({f}_{y}\right)=\frac{2exp\left[A{E}_{\left(m,n\right)}\left({f}_{y}\right)\right]}{\mathrm{Ïˆ}}â‰¤{K}_{\left(m,n,\mathrm{Î±}\right)}^{*}\left({f}_{y}\right)\text{,}$
(11)

where

$A{E}_{\left(m,n\right)}=\frac{1}{n}\underset{i=1}{\overset{n}{âˆ‘}}\text{Log}\left(\frac{n}{{c}_{i}m}\left({X}_{\left(i+m\right)}âˆ’{X}_{\left(iâˆ’m\right)}\right)\right)$

and

${c}_{i}=\left\{\begin{array}{l}1+\frac{1}{2},\phantom{\rule{0.6em}{0ex}}1â‰¤iâ‰¤m,\\ 2,\phantom{\rule{1.56em}{0ex}}m+1â‰¤iâ‰¤nâˆ’m,\\ 1+\frac{1}{2},\phantom{\rule{0.6em}{0ex}}nâˆ’m+1â‰¤iâ‰¤n\text{.}\end{array}$

Note that,$A{E}_{\left(m,n\right)}\left({f}_{y}\right)$ is the sample estimate of$AE\left({f}_{y}\right)$. Since the entropy estimators are functions of order statistics, then the entropy estimation using RSS and DRSS involves ordering the RSS units.

## Results and discussion

In this section, a Monte Carlo experiment is presented to investigate the performance of the entropy estimators i.e. AE(m,n) as well as VE(m,n) and as well as to study the powers of the suggested tests under different alternatives hypotheses. The root mean square errors (RMSEs) and the bias values are obtained for the estimators based on 10,000 samples of sizes nâ€‰=â€‰10, 20, 30 with window sizes 1â€‰â‰¤â€‰m â‰¤5, 1â€‰â‰¤â€‰m â‰¤10 and 1â€‰â‰¤â€‰mâ€‰â‰¤â€‰15, respectively.

### Comparison between VE(m,n)and AE(m,n)

The samples are selected from the uniform, exponential and the standard normal distributions using SRS, RSS and DRSS methods. From Tablesâ€‰1,2,3,4,5,6, and7 we can see that$A{E}_{\left(m,n\right)}$ is more efficient than$V{E}_{\left(m,n\right)}$ for all cases considered in this study. Also, the DRSS is superior to SRS and RSS. For more details about this comparison see (Al-Omari AI (2012): Modified entropy estimators using simple random sampling, ranked set sampling and double ranked set sampling, Submitted).

We can see that these optimal values are different from Mahdizaheh and Arghami (2010) values where their suggested test is based on Vasicek (1976) entropy estimator. Here, we can conclude that the optimal window size depends on the entropy estimator used for the goodness-of-fit test.

### Power of the tests

The power of the suggested goodness-of-fit tests using SRS, RSS and DRSS is considered here relative to the same alternatives considered by Mahdizaheh and Arghami (2010) for the distributions, exponential(1), uniform(0,1), Weibull(2,1), lognormal(0,2), beta(2,2), and beta(5,2). 10000 samples of sizes nâ€‰=â€‰30, 20, 30 are generated for each method at the significance level 0.05.

Based on Tablesâ€‰8 and9, we can conclude that gain in the performance of the new suggested tests using different methods considered in this paper is obtained. However, we found that the DRSS is superior to both RSS and SRS methods based on the sample size. Also, the RSS performs better than SRS for all cases considered here. The bold fonts in Tablesâ€‰8 and9 are the optimal power values for each design with the same sample size. These optimal power values are$. However, the optimal values of the window size are 2, 3, 4, 5. For fixed n, the power values decreases as m increases, while it increases in n.

## Conclusion

In this paper, new goodness-of-fit tests for the inverse Gaussian distribution are suggested using SRS, RSS and DRSS based on the maximum entropy characterization. It is found that the new tests are more powerful under RSS and DRSS, and the test under DRSS is superior to the tests under RSS and SRS methods. We recommend using the suggested goodness-of-fit tests for the inverse Gaussian distribution. As the DRSS is better than RSS, the current work can be extended to multistage RSS design and for some other probability distributions.

## References

• Alizadeh HN: A new estimator of entropy and its application in testing normality. J Stat Comput Simul 2010, 80: 1151â€“1162. 10.1080/00949650903005656

• Al-Omari AI: Estimation of mean based on modified robust extreme ranked set sampling. J Stat Comput Simul 2011,81(8):1055â€“1066. 10.1080/00949651003649161

• Al-Omari AI, Raqab MZ: Estimation of the population mean and median using truncation-based ranked set samples. Accepted in J Stat Comput Simul 2012. 10.1080/00949655.2012.662684

• Al-Saleh MF, Al-Kadiri MA: Double ranked set sampling. Stat probability lett 2000,48(2):205â€“212. 10.1016/S0167-7152(99)00206-0

• Al-Saleh MF, Al-Omari AI: Multistage ranked set sampling. J Stat Planning and Inference 2002,102(2):273â€“286. 10.1016/S0378-3758(01)00086-6

• Ebrahimi N, Pflughoeft K, Soofi E: Two measures of sample entropy. Stat Probability Lett 1994, 20: 225â€“234. 10.1016/0167-7152(94)90046-9

• Folks JL, Chhikara RS: The inverse Gaussian distribution and its statistical application-a review. J R Soc, Series B 1998, 40: 263â€“289.

• Haq A, Shabbir J: A family of ratio estimators for population mean in extreme ranked set sampling using two auxiliary variables. SORT 2010,34(1):45â€“64.

• Mahdizaheh M, Arghami NR: Efficiency of ranked set sampling in entropy estimation and goodness-of-fit testing for the inverse Gaussian law. J Stat Comput Simul 2010,80(7):761â€“774. 10.1080/00949650902773551

• McIntyre GA: A method for unbiased selective sampling using ranked sets. Australian J Agricultural Res 1952, 3: 385â€“390. 10.1071/AR9520385

• Mudholkar GS, Tian L: An entropy characterization of the inverse Gaussian distribution and related goodness-of-fit test. J Stat Planning and Inference 2002, 102: 211â€“221. 10.1016/S0378-3758(01)00099-4

• Park S, Park D: Correcting moments for goodness of fit tests based on two entropy estimates. J Stat Comput Simul 2003,73(9):685â€“694. 10.1080/0094965031000070367

• Seshadri V: The inverse Gaussian distribution: Statistical theory and applications. Springer, New York; 1999.

• Shannon CE: A mathematical theory of communications. Bell System Technical J 1948,27(379â€“423):623â€“656.

• Takahasi K, Wakimoto K: On the unbiased estimates of the population mean based on the sample stratified by means of ordering. Annals of the Institute of Statistical Mathematics 1968, 20: 1â€“31. 10.1007/BF02911622

• Van Es B: Estimating functionals related to a density by class of statistics based on spacing's. Scand J Stat 1992, 19: 61â€“72.

• Vasicek O: A test for normality based on sample entropy. J Royal Stat Soc B 1976, 38: 54â€“59.

## Acknowledgment

The authors are grateful to the editors and the anonymous reviewers for their valuable comments and suggestions.

## Author information

Authors

### Corresponding author

Correspondence to Amer Ibrahim Al-Omari.

### Competing interests

Both authors declared that they have no competing.

### Authorsâ€™ contribution

The work presented here was carried out in collaboration between authors. AA carried out the theoretical and discussion of this paper. AH carried out the Monte Carlo simulations. All authors read and approved the final manuscript.

## Rights and permissions

Reprints and permissions

Al-Omari, A.I., Haq, A. Goodness-of-fit testing for the inverse Gaussian distribution based on new entropy estimation using ranked set sampling and double ranked set sampling. Environ Syst Res 1, 8 (2012). https://doi.org/10.1186/2193-2697-1-8

• Accepted:

• Published:

• DOI: https://doi.org/10.1186/2193-2697-1-8