Last year's top ten most cited papers
Top ten most cited papers in 2025 according to Web of Science (WOS)
-
Abstract: P-splines first appeared in the limelight twenty years ago. Since then, they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a “backbone” for the “mixing and matching” of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines.
Keywords: B-splines, penalty, additive model, mixed model, multidimensional smoothing
Pages: 149–186
DOI: 10.2436/20.8080.02.25
Volume 39 (2) 2015
-
Thirty years of progeny from Chao's inequality: Estimating and comparing richness with incidence data and incomplete sampling
Abstract: In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals' capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao's inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally large or equally complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao's inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.
Keywords: Cauchy-Schwarz inequality, Chao2 estimator, extrapolation, Good-Turing frequency, formula, incidence data, phylogenetic diversity, rarefaction, sampling effort, shared species richness, species richness
Pages: 3–54
DOI: 10.2436/20.8080.02.49
Volume 41 (1) 2017
-
The exponentiated discrete Weibull distribution
Vahid Nekoukhou and Hamid Bidram
Abstract: In this paper, the exponentiated discrete Weibull distribution is introduced. This new generalization of the discrete Weibull distribution can also be considered as a discrete analogue of the exponentiated Weibull distribution. A special case of this exponentiated discrete Weibull distribution defines a new generalization of the discrete Rayleigh distribution for the first time in the literature. In addition, discrete generalized exponential and geometric distributions are some special sub-models of the new distribution. Here, some basic distributional properties, moments, and order statistics of this new discrete distribution are studied. We will see that the hazard rate function can be in- creasing, decreasing, bathtub, and upside-down bathtub shaped. Estimation of the parameters is illustrated using the maximum likelihood method. The model with a real data set is also examined.
Keywords: Discrete generalized exponential distribution, exponentiated discrete Weibull distribution, exponentiated Weibull distribution, geometric distribution, infinite divisibility, order statistics, resilience parameter family, stress-strength parameter
Pages: 127–146
DOI: 10.2436/20.8080.02.24
Volume 39 (1) 2015
-
On the interpretation of differences between groups for compositional data
Josep-Antoni Martín-Fernández , Josep Daunis-i-Estadella , and Glòria Mateu-Figueras
Abstract: Social policies are designed using information collected in surveys, such as the Catalan Time Use survey. Accurate comparisons of time use data among population groups are commonly analysed using statistical methods. The total daily time expended on different activities by a single person is equal to 24 hours. Because this type of data is compositional, its sample space has particular properties that statistical methods should respect. The critical points required to interpret differences between groups are provided and described in terms of log-ratio methods. These techniques facilitate the interpretation of the relative differences detected in multivariate and univariate analysis.
Keywords: Log-ratio transformations, MANOVA, perturbation, simplex, subcomposition
Pages: 231–252
DOI: 10.2436/20.8080.02.28
Volume 39 (2) 2015
-
Stress-strength reliability of Weibull distribution based on progressively censored samples
Akbar Asgharzadeh, Reza Valiollahi, and Mohammad Z. Raqab
Abstract: Based on progressively Type-II censored samples, this paper deals with inference for the stress strength reliability R = P (Y < X) when X and Y are two independent Weibull distributions with different scale parameters but having the same shape parameter. The maximum likelihood estimator, and the approximate maximum likelihood estimator of R are obtained. Different confidence intervals are presented. The Bayes estimator of R and the corresponding credible interval using the Gibbs sampling technique are also proposed. Further, we consider the estimation of R when the same shape parameter is known. The results for exponential and Rayleigh distributions can be obtained as special cases with different scale parameters. Analysis of a real data set a Monte Carlo simulation as well have been presented for illustrative purposes.
Keywords: Maximum likelihood estimator, Approximate maximum likelihood estimator, Bootstrap confidence interval, Bayesian estimation, Metropolis-Hasting method, Progressive Type-II censoring
Pages: 103–124
Volume 35 (2) 2011
-
A simheuristic for routing electric vehicles with limited driving ranges and stochastic travel times
Lorena Reyes-Rubiano , Daniele Ferone , Angel A. Juan and Javier Faulin
Abstract: Green transportation is becoming relevant in the context of smart cities, where the use of electric vehicles represents a promising strategy to support sustainability policies. However, the use of electric vehicles shows some drawbacks as well, such as their limited driving-range capacity. This paper analyses a realistic vehicle routing problem in which both driving-range constraints and stochastic travel times are considered. Thus, the main goal is to minimize the expected time-based cost required to complete the freight distribution plan. To design reliable Routing plans, a simheuristic algorithm is proposed. It combines Monte Carlo simulation with a multi-start metaheuristic, which also employs biased-randomization techniques. By including simulation, simheuristics extend the capabilities of metaheuristics to deal with stochastic problems. A series of computational experiments are performed to test our solving approach as well as to analyse the effect of uncertainty on the routing plans.Keywords: Vehicle routing problem, electric vehicles, green transport and logistics, smart cities, simheuristics, biased-randomized heuristics.
Keywords: Vehicle routing problem, electric vehicles, green transport and logistics, smart cities, simheuristics, biased-randomized heuristics
Pages: 3–24
DOI: 10.2436/20.8080.02.77
Volume 43 (1) 2019
-
On developing ridge regression parameters: a graphical investigation
Gisela Muniz, B. M. Golam Kibria, Kristofer Mansson and Ghazi Shukur
Abstract: In this paper we review some existing and propose some new estimators for estimating the ridge parameter. All in all 19 different estimators have been studied. The investigation has been carried out using Monte Carlo simulations. A large number of different models have been investigated where the variance of the random error, the number of variables included in the model, the correlations among the explanatory variables, the sample size and the unknown coefficient vector were varied. For each model we have performed 2000 replications and presented the results both in term of figures and tables. Based on the simulation study, we found that increasing the number of correlated variables, the variance of the random error and increasing the correlation between the independent variables have negative effect on the mean squared error. When the sample size increases the mean squared error decreases even when the correlation between the independent variables and the variance of the random error are large. In all situations, the proposed estimators have smaller mean squared error than the ordinary least squares and other existing estimators.
Keywords: Linear model, LSE, MSE, Monte Carlo simulations, multicollinearity, ridge regression
Pages: 115–138
Volume 36 (2) 2012
-
Abstract: In order to apply group sequential methods for interim analysis for early stopping in clinical trials, the joint distribution of test statistics over time has to be known. Often the distribution is multivariate normal or asymptotically so, and an application of group sequential methods requires multivariate integration to determine the group sequential boundaries. However, if the increments between successive test statistics are independent, the multivariate integration reduces to a univariate integration involving simple recursion based on convolution. This allows application of standard group sequential methods. In this paper we review group sequential methods and the development that established independent increments in test statistics for the primary outcomes of longitudinal or failure time data.
Keywords: Failure time data, interim analysis, longitudinal data, clinical trials, repeated significance tests, sequential methods
Pages: 223–264
DOI: 10.2436/20.8080.02.101
Volume 44 (2) 2020
-
On interpretations of tests and effect sizes in regression models with a compositional predictor
Abstract: Compositional data analysis is concerned with the relative importance of positive variables, expressed through their log-ratios. The literature has proposed a range of manners to compute log-ratios, some of whose interrelationships have never been reported when used as explanatory variables in regression models. This article shows their similarities and differences in interpretation based on the notion that one log-ratio has to be interpreted keeping all others constant. The article shows that centred, additive, pivot, balance and pairwise log-ratios lead to simple reparametrizations of the same model which can be combined to provide useful tests and comparable effect size estimates.
Keywords: Compositional regression models, CoDa, composition as explanatory, centred log-ratios, pivot coordinates, pairwise log-ratios, additive log-ratios, effect size
Pages: 201–220
DOI: 10.2436/20.8080.02.100
Volume 44 (1) 2020
-
A goodness-of-fit test for the multivariate Poisson distribution
Abstract: Bivariate count data arise in several different disciplines and the bivariate Poisson distribution is commonly used to model them. This paper proposes and studies a computationally convenient goodness-of-fit test for this distribution, which is based on an empirical counterpart of a system of equations. The test is consistent against fixed alternatives. The null distribution of the test can be consistently approximated by a parametric bootstrap and by a weighted bootstrap. The goodness of these bootstrap estimators and the power for finite sample sizes are numerically studied. It is shown that the proposed test can be naturally extended to the multivariate Poisson distribution.
Keywords: Bivariate Poisson distribution, goodness-of-fit, empirical probability generating function, parametric bootstrap, weighted bootstrap, multivariate Poisson distribution
Pages: 113–138
DOI: 10.2436/20.8080.02.37
Volume 40 (1) 2016