Goodness-of-fit testing for the inverse Gaussian distribution based on new entropy estimation using ranked set sampling and double ranked set sampling

Al-Omari, Amer Ibrahim; Haq, Abdul

doi:10.1186/2193-2697-1-8

Research
Open access
Published: 15 September 2012

Goodness-of-fit testing for the inverse Gaussian distribution based on new entropy estimation using ranked set sampling and double ranked set sampling

Amer Ibrahim Al-Omari¹ &
Abdul Haq²

Environmental Systems Research volume 1, Article number: 8 (2012) Cite this article

3952 Accesses
4 Citations
Metrics details

Abstract

Background

Entropy is a measure of uncertainty and dispersion associated with a random variable. Several goodness-of-fit tests based on entropy are available in literature and the entropy been widely used in many applications.

Results

Goodness-of-fit test for the inverse Gaussian distribution is studied based on new entropy estimation using simple random sampling (SRS), ranked set sampling (RSS) and double ranked set sampling (DRSS) methods. The critical values of the new tests are obtained using Monte Carlo simulations. The power values of the suggested tests based on several alternative hypotheses using SRS, RSS, and DRSS are also presented. It is observed that the proposed tests are more powerful as compared to the test under SRS. Also, it turns out that the test based on DRSS is superior to the RSS test for all of the cases considered in this study.

Conclusion

Since the suggested goodness-of-fit tests for the inverse Gaussian distribution using DRSS are more efficient than that based on RSS, one may consider them using multistage RSS.

Background

Entropy is a measure of uncertainty and dispersion associated with a random variable. It is not uniquely defined, there exist axiom systems that justify the particular entropies. Shannon (1948) defined the entropy H(f) of the random variable X as

H (f) = - \int_{- \infty}^{\infty} f (x) log f (x) d x,

(1)

where X is a continuous random variable with probability density function (pdf) f(x) and cumulative distribution function (cdf) F(x). Vasicek (1976) defined H(f) as

H (f) = \int_{0}^{1} log (\frac{d}{d p} F^{- 1} (p)) d p .

(2)

Let $X_{1}, X_{2}, \dots, X_{n}$ be a simple random sample of size n from F(x) and let $X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(n)}$ be the order statistics of the sample. Vasicek (1976) estimator of H(f) is given by

V E_{(m, n)} = \frac{1}{n} \sum_{i = 1}^{n} log \{\frac{n}{2 m} (X_{(i + m)} - X_{(i - m)})\},

(3)

where m is a positive integer, known as a window size, m < n/2. Here X_(i) = X₍₁₎ if i < 1 and X_(i) = X₍₁₎ if i > n. It is of interest to note that $V E_{(m, n)} \to P H (f)$ as n → ∞, m → ∞ and m/n → 0.

Van Es (1992) suggested another entropy estimator based on spacing's, given by

V E_{(m, n)} = \frac{1}{n - m} \sum_{i = 1}^{n - m} log (\frac{n + 1}{m} (X_{(i + m)} - X_{(i)})) + \sum_{k = m}^{n} \frac{1}{k} + log (\frac{m}{n + 1}) .

(4)

They proved the consistency and asymptotic normality of this estimator under some conditions.

Ebrahimi et al. (1994) suggested a new estimator by assigning different weights in Vasicek (1976) entropy estimator, and proposed the following estimator

E E_{(m, n)} = \frac{1}{n} \sum_{i = 1}^{n} log (\frac{n}{c_{i} m} (X_{(i + m)} - X_{(i - m)})),

(5)

where

c_{i} = {\begin{cases} 1 + \frac{i - 1}{m}, 1 \leq i \leq m, \\ 2, m + 1 \leq i \leq n - m, \\ 1 + \frac{n - i}{m}, n - m + 1 \leq i \leq n . \end{cases}

Based on the simulation study, it is shown that this estimator has smaller bias and mean square error as compared to the Vasicek (1976) entropy estimator. They proved that EE_(m,n) converges in probability to H(f) as n → ∞, m → ∞ and m/n → 0.

(Al-Omari AI (2012): Modified entropy estimators using simple random sampling, ranked set sampling and double ranked set sampling, Submitted) suggested a modified estimator of entropy of an unknown continuous pdf f(x) as

A E_{(m, n)} = \frac{1}{n} \sum_{i = 1}^{n} log (\frac{n}{c_{i} m} (X_{(i + m)} - X_{(i - m)})),

(6)

where

c_{i} = {\begin{cases} 1 + \frac{1}{2}, 1 \leq i \leq m, \\ 2, m + 1 \leq i \leq n - m, \\ 1 + \frac{1}{2}, n - m + 1 \leq i \leq n . \end{cases}

Alizadeh (2010) proposed a new estimator of entropy and studied its application in testing normality. Park and Park (2003) considered correcting moments for goodness-of-fit tests for two entropy estimates.

Inverse Gaussian distribution

A random variable X is said to have an inverse Gaussian distribution function IG (x; μ, β), if its pdf is of the following form

f (x) = \sqrt{\frac{β}{2 π x^{3}}} exp (- \frac{β}{2 μ^{2} x} {(x - μ)}^{2}), f o r x > 0,

(7)

where μ > 0 is the mean and β > 0 is the shape parameter. The variance of X is μ³β. Its characteristic function is given by

ϕ_{x} (t) = e x p (\frac{β}{μ} - \sqrt{β} \sqrt{\frac{β}{μ^{2}} - 2 i t}) .

The IG (x; μ, β) has many applications in the field, for example see Seshadri (1999), and Folks and Chhikara (1998).

Method

The test procedure

Let $X_{1}, X_{2}, \dots, X_{n}$ be a random sample of size n drawn from the pdf f(x) and let $X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(n)}$ be the order statistics of this sample. Our interest is to test that this random sample is coming from an inverse Gaussian population or not. Thus, the composite null hypothesis is H₀: X ~ IG (x; μ, β).

The following corollary is due to Mahdizaheh and Arghami (2010).

Corollary 1: Assume that X is a random variable has an inverse Gaussian distribution IG (x; μ, β) and let $Y = 1 / \sqrt{X}$ Then the entropy of Y is given by $H (f (y)) = log (0.5 ϕ \sqrt{2 π e})$ , where $ϕ^{2} = 1 / β = E (Y^{2}) - 1 / E (Y^{- 2}) .$

The following corollary is due to Mudholkar and Tian (2002).

Corollary 2: The random variable X with inverse Gaussian distribution IG (x; μ, β) is characterized by the property that $1 / \sqrt{X}$ attains the maximum entropy among all nonnegative, absolutely continuous random variables Y with a given value at $E (Y^{2}) - 1 / E (Y^{- 2}) .$

Let $V E_{(m, n)} (f_{y})$ be the sample estimate of $V E (f_{y})$ for the distribution of $Y = 1 / \sqrt{X}$ defined as

V E_{(m, n)} (f_{y}) = \frac{1}{n} \sum_{i = 1}^{n} L o g (\frac{n}{2 m} (y_{(i + m)} - y_{(i - m)})),

(8)

where $y_{(i)} = {(x_{(n - i + 1)})}^{- 1 / 2} (i = 1, 2, \dots, n) .$

Mahdizaheh and Arghami (2010) followed Vasicek (1976) and proposed rejecting the null hypothesis H₀: X ~ IG (x; μ, β) if

K_{(m, n)} (f_{y}) = \frac{2 e x p (V E_{(m, n)} (f_{y}))}{ψ} \leq K_{(m, n, α)}^{*} (f_{y}),

(9)

where $ψ$ ² is a uniform minimum variance unbiased (UMVU) estimate of Ø² defined as

\begin{array}{c} ψ^{2} = \frac{1}{n - 1} \sum (1 / x_{i} - 1 / \bar{x}) \\ = \frac{1}{n = 1} (\sum_{i = 1}^{n} y_{i}^{2} - n^{2} {(\sum_{i = 1}^{n} y i^{- 2})}^{- 1}) . \end{array}

(10)

Suggested test

Let X_i(i) denote the i th order statistic from the i th sample $(i = 1, 2, \dots, n)$ . Then, the measured RSS units are denoted by X₁₍₁₎, X₂₍₂₎, …,X_n(n). The cumulative distribution function of X_i(i) is given by

F_{(i)} (x) = \sum_{j = i}^{n} (\begin{array}{c} n \\ j \end{array}) F^{j} (x) {(1 - F (x))}^{n - j}, - \infty < x < \infty,

with probability density function defined as

f_{(i)} (x) = n (\begin{array}{l} n - 1 \\ i - 1 \end{array}) F^{i - 1} (x) {(1 - F (x))}^{n - i} f (x), - \infty < x < \infty .

The mean and the variance of the i th order statistic, X_i(i) can be written respectively as

μ (i) = {\int_{- \infty}^{\infty}}_{x f (i)} (x) d x, a n d σ_{(i)}^{2} = {\int_{- \infty}^{\infty} (x - μ_{(i)})}^{2} f_{(i)} (x) d x .

The ranked set sampling method was suggested by McIntyre (1952) for estimating the mean of pasture and forage yields. The RSS can be described as follows:

Step 1: Select n simple random samples each of size n from the target population.

Step 2: Without cost, visually rank the units within each sample with respect to the variable of interest.

Step 3: For actual measurement, from the i th $(i = 1, 2, \dots, n)$ sample of n units, select the i th smallest ranked unit. The method is repeated h times if needed to increase the sample size to hn units.

Al-Saleh and Al-Kadiri (2000) suggested double ranked set sampling (DRSS) method for estimating the population mean. The DRSS can be described as in the following steps:

Step 1 Randomly select n² samples each of size n from the target population.

Step 2 Apply the RSS method on the n² samples obtained in Step 1. This step yields n samples each of size n.

Step 3 Reapply the RSS method again on the n samples obtained on Step 2 to obtain a sample of size n from the DRSS data. The cycle can be repeated h times if needed to obtain a sample of size hn units.

The SRS estimator of the population mean is given by ${\hat{μ}}_{SRS} = \sum_{i = 1}^{n} X_{i} / n,$ with variance $V a r ({\hat{μ}}_{SRS}) = σ^{2} / n$ . The RSS estimator of the population mean is defined as ${\hat{μ}}_{RSS} = \sum_{i = 1}^{n} {X_{i}}_{(i)} / n$ , with variance given by $V a r ({\hat{μ}}_{RSS}) = \frac{σ^{2}}{n} - \frac{1}{n^{2}} \sum_{i = 1}^{n} {(μ_{(i)} - μ)}^{2}$ . The relative precision (RP) of RSS relative to SRS for estimating the population mean is

R P = Var μ^SRS Var μ^RSS = 1 - i = 1 n μ i - μ 2 n σ^{2} .

Takahasi and Wakimoto (1968) showed that the parent pdf f (x) and the population mean can be expressed as $f (x) = \frac{1}{n} \sum_{i = 1}^{n} f_{(i)} (x), a n d μ = \frac{1}{n} \sum_{i = 1}^{n} μ_{(i)}$ , respectively. Also, they showed that $1 \leq R P \leq \frac{m + 1}{2}$ , where the lower bound is attained if and only if the underlying distribution is degenerate, while the upper bound is attained if and only if the underlying distribution of the data is rectangular.

Al-Saleh and Al-Omari (2002) extended the DRSS for multistage RSS method to increase the efficiency of the estimators for fixed value of the sample size, Al-Omari and Raqab (2012) suggested truncation RSS method for estimating the population mean and median, Al-Omari (2011) suggested double robust extreme RSS for estimating the population mean, Haq and Shabbir (2010) proposed a family of ratio estimators of the population mean using extreme RSS based on two auxiliary variables.

Goodness-of-fit test for the IG (x; μ, β) distribution is considered using SRS, RSS and DRSS methods. Our composite null hypothesis is H₀: X ~ IG (x; μ, β). Following Mudholkar and Tian (2002), we reject H₀ if

K_{(m, n)} (f_{y}) = \frac{2 exp [A E_{(m, n)} (f_{y})]}{ψ} \leq K_{(m, n, α)}^{*} (f_{y}),

(11)

where

A E_{(m, n)} = \frac{1}{n} \sum_{i = 1}^{n} Log (\frac{n}{c_{i} m} (X_{(i + m)} - X_{(i - m)}))

and

c_{i} = {\begin{cases} 1 + \frac{1}{2}, 1 \leq i \leq m, \\ 2, m + 1 \leq i \leq n - m, \\ 1 + \frac{1}{2}, n - m + 1 \leq i \leq n . \end{cases}

Note that, $A E_{(m, n)} (f_{y})$ is the sample estimate of $A E (f_{y})$ . Since the entropy estimators are functions of order statistics, then the entropy estimation using RSS and DRSS involves ordering the RSS units.

Results and discussion

In this section, a Monte Carlo experiment is presented to investigate the performance of the entropy estimators i.e. AE_(m,n) as well as VE_(m,n) and as well as to study the powers of the suggested tests under different alternatives hypotheses. The root mean square errors (RMSEs) and the bias values are obtained for the estimators based on 10,000 samples of sizes n = 10, 20, 30 with window sizes 1 ≤ m ≤5, 1 ≤ m ≤10 and 1 ≤ m ≤ 15, respectively.

Comparison between VE_(m,n)and AE_(m,n)

The samples are selected from the uniform, exponential and the standard normal distributions using SRS, RSS and DRSS methods. From Tables 1,2,3,4,5,6, and7 we can see that $A E_{(m, n)}$ is more efficient than $V E_{(m, n)}$ for all cases considered in this study. Also, the DRSS is superior to SRS and RSS. For more details about this comparison see (Al-Omari AI (2012): Modified entropy estimators using simple random sampling, ranked set sampling and double ranked set sampling, Submitted).

Table 1 Monte Carlo RMSEs and bias values of the entropy estimators VE_{(
m,n
)} and AE_{(
m,n
)} for the uniform distribution , H(f) = 0

Full size table

Table 2 Monte Carlo RMSEs and bias values of the entropy estimators VE_{(
m,n
)} and AE_{(
m,n
)} for the exponential distribution, H(f) = 1

Full size table

Table 3 Monte Carlo RMSEs and bias values of the entropy estimators VE_{(
m,n
)} and AE_{(
m,n
)} for the standard normal distribution, H(f) = 1.419

Full size table

Table 4 Monte Carlo RMSEs and bias values of the entropy estimators VE_{(
m,n
)} and AE_{(
m,n
)} for the uniform distribution with H(f) = 0 and exponential distribution with H(f) = 1 using DRSS

Full size table

Table 5 Monte Carlo RMSEs and bias values of the entropy estimators VE_{(
m,n
)} and AE_{(
m,n
)} for the standard normal distribution and H(f) = 1.419 using DRSS

Full size table

Table 6 Critical values of the test statistics at significance level α = 0.05 using SRS, RSS and DRSS

Full size table

Table 7 Optimal window sizes

Full size table

We can see that these optimal values are different from Mahdizaheh and Arghami (2010) values where their suggested test is based on Vasicek (1976) entropy estimator. Here, we can conclude that the optimal window size depends on the entropy estimator used for the goodness-of-fit test.

Power of the tests

The power of the suggested goodness-of-fit tests using SRS, RSS and DRSS is considered here relative to the same alternatives considered by Mahdizaheh and Arghami (2010) for the distributions, exponential(1), uniform(0,1), Weibull(2,1), lognormal(0,2), beta(2,2), and beta(5,2). 10000 samples of sizes n = 30, 20, 30 are generated for each method at the significance level 0.05.

Based on Tables 8 and9, we can conclude that gain in the performance of the new suggested tests using different methods considered in this paper is obtained. However, we found that the DRSS is superior to both RSS and SRS methods based on the sample size. Also, the RSS performs better than SRS for all cases considered here. The bold fonts in Tables 8 and9 are the optimal power values for each design with the same sample size. These optimal power values are $< n / 2$ . However, the optimal values of the window size are 2, 3, 4, 5. For fixed n, the power values decreases as m increases, while it increases in n.

Table 8 Power comparison for the entropy tests at the significance level α = 0.05

Full size table

Table 9 Power comparison for the entropy tests at the significance level α = 0.05

Full size table

Conclusion

In this paper, new goodness-of-fit tests for the inverse Gaussian distribution are suggested using SRS, RSS and DRSS based on the maximum entropy characterization. It is found that the new tests are more powerful under RSS and DRSS, and the test under DRSS is superior to the tests under RSS and SRS methods. We recommend using the suggested goodness-of-fit tests for the inverse Gaussian distribution. As the DRSS is better than RSS, the current work can be extended to multistage RSS design and for some other probability distributions.

References

Alizadeh HN: A new estimator of entropy and its application in testing normality. J Stat Comput Simul 2010, 80: 1151–1162. 10.1080/00949650903005656
Article Google Scholar
Al-Omari AI: Estimation of mean based on modified robust extreme ranked set sampling. J Stat Comput Simul 2011,81(8):1055–1066. 10.1080/00949651003649161
Article Google Scholar
Al-Omari AI, Raqab MZ: Estimation of the population mean and median using truncation-based ranked set samples. Accepted in J Stat Comput Simul 2012. 10.1080/00949655.2012.662684
Google Scholar
Al-Saleh MF, Al-Kadiri MA: Double ranked set sampling. Stat probability lett 2000,48(2):205–212. 10.1016/S0167-7152(99)00206-0
Article Google Scholar
Al-Saleh MF, Al-Omari AI: Multistage ranked set sampling. J Stat Planning and Inference 2002,102(2):273–286. 10.1016/S0378-3758(01)00086-6
Article Google Scholar
Ebrahimi N, Pflughoeft K, Soofi E: Two measures of sample entropy. Stat Probability Lett 1994, 20: 225–234. 10.1016/0167-7152(94)90046-9
Article Google Scholar
Folks JL, Chhikara RS: The inverse Gaussian distribution and its statistical application-a review. J R Soc, Series B 1998, 40: 263–289.
Google Scholar
Haq A, Shabbir J: A family of ratio estimators for population mean in extreme ranked set sampling using two auxiliary variables. SORT 2010,34(1):45–64.
Google Scholar
Mahdizaheh M, Arghami NR: Efficiency of ranked set sampling in entropy estimation and goodness-of-fit testing for the inverse Gaussian law. J Stat Comput Simul 2010,80(7):761–774. 10.1080/00949650902773551
Article Google Scholar
McIntyre GA: A method for unbiased selective sampling using ranked sets. Australian J Agricultural Res 1952, 3: 385–390. 10.1071/AR9520385
Article Google Scholar
Mudholkar GS, Tian L: An entropy characterization of the inverse Gaussian distribution and related goodness-of-fit test. J Stat Planning and Inference 2002, 102: 211–221. 10.1016/S0378-3758(01)00099-4
Article Google Scholar
Park S, Park D: Correcting moments for goodness of fit tests based on two entropy estimates. J Stat Comput Simul 2003,73(9):685–694. 10.1080/0094965031000070367
Article Google Scholar
Seshadri V: The inverse Gaussian distribution: Statistical theory and applications. Springer, New York; 1999.
Book Google Scholar
Shannon CE: A mathematical theory of communications. Bell System Technical J 1948,27(379–423):623–656.
Article Google Scholar
Takahasi K, Wakimoto K: On the unbiased estimates of the population mean based on the sample stratified by means of ordering. Annals of the Institute of Statistical Mathematics 1968, 20: 1–31. 10.1007/BF02911622
Article Google Scholar
Van Es B: Estimating functionals related to a density by class of statistics based on spacing's. Scand J Stat 1992, 19: 61–72.
Google Scholar
Vasicek O: A test for normality based on sample entropy. J Royal Stat Soc B 1976, 38: 54–59.
Google Scholar

Download references

Acknowledgment

The authors are grateful to the editors and the anonymous reviewers for their valuable comments and suggestions.

Author information

Authors and Affiliations

Department of Mathematics, Faculty of Science, Al al-Bayt University, Mafraq, 25113, Jordan
Amer Ibrahim Al-Omari
Department of Statistics, Quaid-i-Azam University, Islamabad, 45320, Pakistan
Abdul Haq

Authors

Amer Ibrahim Al-Omari
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Haq
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amer Ibrahim Al-Omari.

Additional information

Competing interests

Both authors declared that they have no competing.

Authors’ contribution

The work presented here was carried out in collaboration between authors. AA carried out the theoretical and discussion of this paper. AH carried out the Monte Carlo simulations. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Al-Omari, A.I., Haq, A. Goodness-of-fit testing for the inverse Gaussian distribution based on new entropy estimation using ranked set sampling and double ranked set sampling. Environ Syst Res 1, 8 (2012). https://doi.org/10.1186/2193-2697-1-8

Download citation

Received: 26 July 2012
Accepted: 29 August 2012
Published: 15 September 2012
DOI: https://doi.org/10.1186/2193-2697-1-8

Goodness-of-fit testing for the inverse Gaussian distribution based on new entropy estimation using ranked set sampling and double ranked set sampling

Abstract

Background

Results

Conclusion

Background

Inverse Gaussian distribution

Method

The test procedure

Suggested test

Results and discussion

Comparison between VE(m,n)and AE(m,n)

Power of the tests

Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contribution

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Comparison between VE_(m,n)and AE_(m,n)