Skip to main content

Last year's top ten most cited papers

Top ten most cited papers in 2025 according to Web of Science (WOS)

  • Abstract: P-splines first appeared in the limelight twenty years ago. Since then, they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a “backbone” for the “mixing and matching” of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines.

    Keywords: B-splines, penalty, additive model, mixed model, multidimensional smoothing

    Pages: 149–186

    DOI: 10.2436/20.8080.02.25

    Volume 39 (2) 2015

  • Abstract: In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals' capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao's inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally large or equally complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao's inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.

    Keywords: Cauchy-Schwarz inequality, Chao2 estimator, extrapolation, Good-Turing frequency, formula, incidence data, phylogenetic diversity, rarefaction, sampling effort, shared species richness, species richness

    Pages: 3–54

    DOI: 10.2436/20.8080.02.49

    Volume 41 (1) 2017

  • Abstract: In this paper, the exponentiated discrete Weibull distribution is introduced. This new generalization of the discrete Weibull distribution can also be considered as a discrete analogue of the exponentiated Weibull distribution. A special case of this exponentiated discrete Weibull distribution defines a new generalization of the discrete Rayleigh distribution for the first time in the literature. In addition, discrete generalized exponential and geometric distributions are some special sub-models of the new distribution. Here, some basic distributional properties, moments, and order statistics of this new discrete distribution are studied. We will see that the hazard rate function can be in- creasing, decreasing, bathtub, and upside-down bathtub shaped. Estimation of the parameters is illustrated using the maximum likelihood method. The model with a real data set is also examined.

    Keywords: Discrete generalized exponential distribution, exponentiated discrete Weibull distribution, exponentiated Weibull distribution, geometric distribution, infinite divisibility, order statistics, resilience parameter family, stress-strength parameter

    Pages: 127–146

    DOI: 10.2436/20.8080.02.24

    Volume 39 (1) 2015

  • Abstract: Social policies are designed using information collected in surveys, such as the Catalan Time Use survey. Accurate comparisons of time use data among population groups are commonly analysed using statistical methods. The total daily time expended on different activities by a single person is equal to 24 hours. Because this type of data is compositional, its sample space has particular properties that statistical methods should respect. The critical points required to interpret differences between groups are provided and described in terms of log-ratio methods. These techniques facilitate the interpretation of the relative differences detected in multivariate and univariate analysis.

    Keywords: Log-ratio transformations, MANOVA, perturbation, simplex, subcomposition

    Pages: 231–252

    DOI: 10.2436/20.8080.02.28

    Volume 39 (2) 2015

  • Abstract: Based on progressively Type-II censored samples, this paper deals with inference for the stress strength reliability R = P (Y < X) when X and Y are two independent Weibull distributions with different scale parameters but having the same shape parameter. The maximum likelihood estimator, and the approximate maximum likelihood estimator of R are obtained. Different confidence intervals are presented. The Bayes estimator of R and the corresponding credible interval using the Gibbs sampling technique are also proposed. Further, we consider the estimation of R when the same shape parameter is known. The results for exponential and Rayleigh distributions can be obtained as special cases with different scale parameters. Analysis of a real data set a Monte Carlo simulation as well have been presented for illustrative purposes.

    Keywords: Maximum likelihood estimator, Approximate maximum likelihood estimator, Bootstrap confidence interval, Bayesian estimation, Metropolis-Hasting method, Progressive Type-II censoring

    Pages: 103–124

    Volume 35 (2) 2011

  • Abstract: Green transportation is becoming relevant in the context of smart cities, where the use of electric vehicles represents a promising strategy to support sustainability policies. However, the use of electric vehicles shows some drawbacks as well, such as their limited driving-range capacity. This paper analyses a realistic vehicle routing problem in which both driving-range constraints and stochastic travel times are considered. Thus, the main goal is to minimize the expected time-based cost required to complete the freight distribution plan. To design reliable Routing plans, a simheuristic algorithm is proposed. It combines Monte Carlo simulation with a multi-start metaheuristic, which also employs biased-randomization techniques. By including simulation, simheuristics extend the capabilities of metaheuristics to deal with stochastic problems. A series of computational experiments are performed to test our solving approach as well as to analyse the effect of uncertainty on the routing plans.Keywords: Vehicle routing problem, electric vehicles, green transport and logistics, smart cities, simheuristics, biased-randomized heuristics.

    Keywords: Vehicle routing problem, electric vehicles, green transport and logistics, smart cities, simheuristics, biased-randomized heuristics

    Pages: 3–24

    DOI: 10.2436/20.8080.02.77

    Volume 43 (1) 2019

  • Abstract: In this paper we review some existing and propose some new estimators for estimating the ridge parameter. All in all 19 different estimators have been studied. The investigation has been carried out using Monte Carlo simulations. A large number of different models have been investigated where the variance of the random error, the number of variables included in the model, the correlations among the explanatory variables, the sample size and the unknown coefficient vector were varied. For each model we have performed 2000 replications and presented the results both in term of figures and tables. Based on the simulation study, we found that increasing the number of correlated variables, the variance of the random error and increasing the correlation between the independent variables have negative effect on the mean squared error. When the sample size increases the mean squared error decreases even when the correlation between the independent variables and the variance of the random error are large. In all situations, the proposed estimators have smaller mean squared error than the ordinary least squares and other existing estimators.

    Keywords: Linear model, LSE, MSE, Monte Carlo simulations, multicollinearity, ridge regression

    Pages: 115–138

    Volume 36 (2) 2012

  • Abstract: In order to apply group sequential methods for interim analysis for early stopping in clinical trials, the joint distribution of test statistics over time has to be known. Often the distribution is multivariate normal or asymptotically so, and an application of group sequential methods requires multivariate integration to determine the group sequential boundaries. However, if the increments between successive test statistics are independent, the multivariate integration reduces to a univariate integration involving simple recursion based on convolution. This allows application of standard group sequential methods. In this paper we review group sequential methods and the development that established independent increments in test statistics for the primary outcomes of longitudinal or failure time data.

    Keywords: Failure time data, interim analysis, longitudinal data, clinical trials, repeated significance tests, sequential methods

    Pages: 223–264

    DOI: 10.2436/20.8080.02.101

    Volume 44 (2) 2020

  • Abstract: Compositional data analysis is concerned with the relative importance of positive variables, expressed through their log-ratios. The literature has proposed a range of manners to compute log-ratios, some of whose interrelationships have never been reported when used as explanatory variables in regression models. This article shows their similarities and differences in interpretation based on the notion that one log-ratio has to be interpreted keeping all others constant. The article shows that centred, additive, pivot, balance and pairwise log-ratios lead to simple reparametrizations of the same model which can be combined to provide useful tests and comparable effect size estimates.

    Keywords: Compositional regression models, CoDa, composition as explanatory, centred log-ratios, pivot coordinates, pairwise log-ratios, additive log-ratios, effect size

    Pages: 201–220

    DOI: 10.2436/20.8080.02.100

    Volume 44 (1) 2020

  • Abstract: Bivariate count data arise in several different disciplines and the bivariate Poisson distribution is commonly used to model them. This paper proposes and studies a computationally convenient goodness-of-fit test for this distribution, which is based on an empirical counterpart of a system of equations. The test is consistent against fixed alternatives. The null distribution of the test can be consistently approximated by a parametric bootstrap and by a weighted bootstrap. The goodness of these bootstrap estimators and the power for finite sample sizes are numerically studied. It is shown that the proposed test can be naturally extended to the multivariate Poisson distribution.

    Keywords: Bivariate Poisson distribution, goodness-of-fit, empirical probability generating function, parametric bootstrap, weighted bootstrap, multivariate Poisson distribution

    Pages: 113–138

    DOI: 10.2436/20.8080.02.37

    Volume 40 (1) 2016