Skip to main content

Last year's top ten most cited papers

Top ten most cited papers in 2020 according to Web of Science (WOS)

  • Abstract: P-splines first appeared in the limelight twenty years ago. Since then they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a “backbone” for the “mixing and matching” of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines.

    Keywords: B-splines, penalty, additive model, mixed model, multidimensional smoothing.

    Pages: 149–186

    DOI: 10.2436/20.8080.02.25

    Vol 39 (2) 2015

  • Abstract: Social polices are designed using information collected in surveys; such as the Catalan Time Use survey. Accurate comparisons of time use data among population groups are commonly analysed using statistical methods. The total daily time expended on different activities by a single person is equal to 24 hours. Because this type of data are compositional, its sample space has particular properties that statistical methods should respect. The critical points required to interpret differences between groups are provided and described in terms of log-ratio methods. These techniques facilitate the interpretation of the relative differences detected in multivariate and univariate analysis.

    Keywords: Log-ratio transformations, MANOVA, perturbation, simplex, subcomposition.

    Pages: 231–252

    DOI: 10.2436/20.8080.02.28

    Vol 39 (2) 2015

  • Abstract: Scale mixtures of normal (SMN) distributions are used for modeling symmetric data. Membersof this family have appealing properties such as robust estimates, easy number generation, andefficient computation of the ML estimates via the EM-algorithm. The Birnbaum-Saunders (BS)distribution is a positively skewed model that is related tothe normal distribution and has receivedconsiderable attention. We introduce a type of BS distributions based on SMN models, producea lifetime analysis, develop the EM-algorithm for ML estimation of parameters, and illustrate theobtained results with real data showing the robustness of the estimation procedure.

    Keywords: Birnbaum-Saunders distribution, EM-algorithm, kurtosis, maximum likelihood methods,robust estimation, scale mixtures of normal distribution.

    Pages: 171–192

    Vol 33 (2) 2009

  • Abstract: Phenomena with a constrained sample space appear frequently in practice. This is the case,for example, with strictly positive data, or with compositional data, such as percentages orproportions. If the natural measure of difference is not theabsolute one, simple algebraicproperties show that it is more convenient to work with a geometry different from the usualEuclidean geometry in real space, and with a measure different from the usual Lebesguemeasure, leading to alternative models that better fit the phenomenon under study. The generalapproach is presented and illustrated using the normal distribution, both on the positive real lineand on theD-part simplex. The original ideas of McAlister in his introduction to the lognormaldistribution in 1879, are recovered and updated.

    Keywords: Additive logistic normal distribution, Aitchison measure, Lebesgue measure, lognor-mal distribution, orthonormal basis, simplex.

    Pages: 29–56

    Vol 37 (1) 2013

  • Abstract: In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals’ capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao’s inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally-large or equally-complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao’s inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.

    Keywords: Cauchy-Schwarz inequality, Chao2 estimator, extrapolation, Good-Turing frequency, formula, incidence data, phylogenetic diversity, rarefaction, sampling effort, shared species richness, species richness.

    Pages: 3–54

    DOI: 10.2436/20.8080.02.49

    Vol 41 (1) 2017

  • Abstract: Metaheuristics are approximation methods used to solve combinatorial optimization problems. Their performance usually depends on a set of parameters that need to be adjusted. The selection of appropriate parameter values causes a loss of efficiency, as it requires time, and advanced analytical and problem-specific skills. This paper provides an overview of the principal approaches to tackle the Parameter Setting Problem, focusing on the statistical procedures employed so far by the scientific community. In addition, a novel methodology is proposed, which is tested using an already existing algorithm for solving the Multi-Depot Vehicle Routing Problem.

    Keywords: Parameter fine-tuning, metaheuristics, statistical learning, biased randomization.

    Pages: 201–224

    DOI: 10.2436/20.8080.02.41

    Volume 40 (1) 2016

  • Abstract: Based on progressively Type-II censored samples, this paper deals with inference for the stress-strength reliabilityR=P(Y<X) whenXandYare two independent Weibull distributions withdifferent scale parameters, but having the same shape parameter. The maximum likelihood esti-mator, and the approximate maximum likelihood estimator ofRare obtained. Different confidenceintervals are presented. The Bayes estimator ofRand the corresponding credible interval usingthe Gibbs sampling technique are also proposed. Further, weconsider the estimation ofRwhenthe same shape parameter is known. The results for exponential and Rayleigh distributions canbe obtained as special cases with different scale parameters. Analysis of a real data set as well aMonte Carlo simulation have been presented for illustrative purposes.

    Keywords: Maximum likelihood estimator, Approximate maximum likelihood estimator, Bootstrapconfidence interval, Bayesian estimation, Metropolis-Hasting method, Progressive Type-II censorin.

    Pages: 103–124

    Vol 35 (2) 2011

  • Abstract: This paper considers several confidence intervals for estimating the population coefficient ofvariation based on parametric, nonparametric and modified methods. A simulation study has beenconducted to compare the performance of the existing and newly proposed interval estimators.Many intervals were modified in our study by estimating the variance with the median insteadof the mean and these modifications were also successful. Data were generated from normal,chi-square, and gamma distributions for CV = 0.1, 0.3, and 0.5. We reported coverage probabilityand interval length for each estimator. The results were applied to two public health data: childbirth weight and cigarette smoking prevalence. Overall, good intervals included an interval forchi-square distributions by McKay (1932), an interval estimator for normal distributions by Miller(1991), and our proposed interval.

    Keywords: Average width, coefficient of variation, inverted coefficient of variation, confidence in-terval, coverage probability, simulation study, skewed distributions.

    Pages: 45–68

    Vol 36 (1) 2012

  • Abstract: Ng and Kotz (1995) introduced a distribution that provides greater flexibility to extremes. We defineand study a new class of distributions called the Kummer betageneralized family to extend thenormal, Weibull, gamma and Gumbel distributions, among several other well-known distributions.Some special models are discussed. The ordinary moments of any distribution in the new familycan be expressed as linear functions of probability weighted moments of the baseline distribution.We examine the asymptotic distributions of the extreme values. We derive the density functionof the order statistics, mean absolute deviations and entropies. We use maximum likelihoodestimation to fit the distributions in the new class and illustrate its potentiality with an applicationto a real data set.

    Keywords: Generalized distribution, Kummer beta distribution, likelihood ratio test, moment, orderstatistic, Weibull distribution.

    Pages: 153–180

    Vol 36 (2) 2012

  • Abstract: A Poisson model typically is assumed for count data. In many cases because of many zeros inthe response variable, the mean is not equal to the variance value of the dependent variable.Therefore, the Poisson model is no longer suitable for this kind of data. Thus, we suggestusing a hurdle negative binomial regression model to overcome the problem of overdispersion.Furthermore, the response variable in such cases is censored for some values. In this paper,a censored hurdle negative binomial regression model is introduced on count data with manyzeros. The estimation of regression parameters using maximum likelihood is discussed and thegoodness-of-fit for the regression model is examined.

    Keywords: Hurdle negative binomial regression, censored data, maximum likelihood method,simulation.

    Pages: 181–194

    Vol 36 (2) 2012