Skip to main content

Last year's top ten most cited papers

Top ten most cited papers in 2023 according to Web of Science (WOS)

  • Abstract: P-splines first appeared in the limelight twenty years ago. Since then they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a “backbone” for the “mixing and matching” of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines.

    Keywords: B-splines, penalty, additive model, mixed model, multidimensional smoothing.

    Pages: 149–186

    DOI: 10.2436/20.8080.02.25

    Vol 39 (2) 2015

  • Abstract: Social polices are designed using information collected in surveys; such as the Catalan Time Use survey. Accurate comparisons of time use data among population groups are commonly analysed using statistical methods. The total daily time expended on different activities by a single person is equal to 24 hours. Because this type of data are compositional, its sample space has particular properties that statistical methods should respect. The critical points required to interpret differences between groups are provided and described in terms of log-ratio methods. These techniques facilitate the interpretation of the relative differences detected in multivariate and univariate analysis.

    Keywords: Log-ratio transformations, MANOVA, perturbation, simplex, subcomposition.

    Pages: 231–252

    DOI: 10.2436/20.8080.02.28

    Vol 39 (2) 2015

  • Abstract: In this paper, the exponentiated discrete Weibull distribution is introduced. This new generalization of the discrete Weibull distribution can also be considered as a discrete analogue of the exponentiated Weibull distribution. A special case of this exponentiated discrete Weibull distribution defines a new generalization of the discrete Rayleigh distribution for the first time in the literature. In addition, discrete generalized exponential and geometric distributions are some special sub-models of the new distribution. Here, some basic distributional properties, moments, and order statistics of this new discrete distribution are studied. We will see that the hazard rate function can be in- creasing, decreasing, bathtub, and upside-down bathtub shaped. Estimation of the parameters is illustrated using the maximum likelihood method. The model with a real data set is also examined.

    Keywords: Discrete generalized exponential distribution, exponentiated discrete Weibull distribution, exponentiated Weibull distribution, geometric distribution, infinite divisibility, order statistics, resilience parameter family, stress-strength parameter.

    Pages: 127–146

    DOI: 10.2436/20.8080.02.24

    Vol 39 (1) 2015

  • Abstract: Many decision-making processes in our society involve NP-hard optimization problems. The largescale, dynamism, and uncertainty of these problems constrain the potential use of stand-alone optimization methods. The same applies for isolated simulation models, which do not have the potential to find optimal solutions in a combinatorial environment. This paper discusses the utilization of modelling and solving approaches based on the integration of simulation with metaheuristics. These ‘simheuristic’ algorithms, which constitute a natural extension of both metaheuristics and simulation techniques, should be used as a ‘first-resort’ method when addressing large-scale and NP-hard optimization problems under uncertainty –which is a frequent case in real-life applications. We outline the benefits and limitations of simheuristic algorithms, provide numerical experiments that validate our arguments, review some recent publications, and outline the best practices to consider during their design and implementation stages.

    Keywords: Simulation, metaheuristics, combinatorial optimization, simheuristics

    Pages: 311–334

    DOI: 10.2436/20.8080.02.104

    Vol 44 (2) 2020

  • Abstract: Compositional data analysis is concerned with the relative importance of positive variables, expressed through their log-ratios. The literature has proposed a range of manners to compute log-ratios, some of whose interrelationships have never been reported when used as explanatory variables in regression models. This article shows their similarities and differences in interpretation based on the notion that one log-ratio has to be interpreted keeping all others constant. The article shows that centred, additive, pivot, balance and pairwise log-ratios lead to simple reparametrizations of the same model which can be combined to provide useful tests and comparable effect size estimates.

    Keywords: Compositional regression models, CoDa, composition as explanatory, centred log-ratios, pivot coordinates, pairwise log-ratios, additive log-ratios, effect size

    Pages: 201–220

    DOI: 10.2436/20.8080.02.100

  • Abstract: In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals’ capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao’s inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally-large or equally-complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao’s inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.

    Keywords: Cauchy-Schwarz inequality, Chao2 estimator, extrapolation, Good-Turing frequency, formula, incidence data, phylogenetic diversity, rarefaction, sampling effort, shared species richness, species richness.

    Pages: 3– 54

    DOI: 10.2436/20.8080.02.49

    Vol 41 (1) 2017

  • Abstract: Green transportation is becoming relevant in the context of smart cities, where the use of electric vehicles represents a promising strategy to support sustainability policies. However the use of electric vehicles shows some drawbacks as well, such as their limited driving-range capacity. This paper analyses a realistic vehicle routing problem in which both driving-range constraints and stochastic travel times are considered. Thus, the main goal is to minimize the expected time-based cost required to complete the freight distribution plan. In order to design reliable Routing plans, a simheuristic algorithm is proposed. It combines Monte Carlo simulation with a multi-start metaheuristic, which also employs biased-randomization techniques. By including simulation, simheuristics extend the capabilities of metaheuristics to deal with stochastic problems. A series of computational experiments are performed to test our solving approach as well as to analyse the effect of uncertainty on the routing plans.

    Keywords: Vehicle routing problem, electric vehicles, green transport and logistics, smart cities, simheuristics, biased-randomized heuristics

    Pages: 3–24

    DOI: 10.2436/20.8080.02.77

    Vol 43 (1) 2019

  • Abstract: This paper considers several confidence intervals for estimating the population coefficient of variation based on parametric, nonparametric and modified methods. A simulation study has been conducted to compare the performance of the existing and newly proposed interval estimators. Many intervals were modified in our study by estimating the variance with the median instead of the mean and these modifications were also successful. Data were generated from normal, chi-square, and gamma distributions for CV = 0.1, 0.3, and 0.5. We reported coverage probability and interval length for each estimator. The results were applied to two public health data: child birth weight and cigarette smoking prevalence. Overall, good intervals included an interval for chi-square distributions by McKay (1932), an interval estimator for normal distributions by Miller (1991), and our proposed interval.

    Keywords: Average width, coefficient of variation, inverted coefficient of variation, confidence interval, coverage probability, simulation study, skewed distributions.

    Pages: 45–68

    Vol 36 (1) 2012

  • Abstract: Since its appearance in the 1990s, horizontal collaboration (HC) practices have revealed themselves as catalyzers for optimizing the distribution of goods in freight transport logistics. After introducing the main concepts related to HC, this paper offers a literature review on the topic and provides a classification of best practices in HC. Then, the paper analyses the main benefits and optimization challenges associated with the use of HC at the strategic, tactical, and operational levels. Emerging trends such as the concept of ‘green’ or environmentally-friendly HC in freight transport logistics are also introduced. Finally, the paper discusses the need of using hybrid optimization methods, such as simheuristics and learnheuristics, in solving some of the previously identified challenges in real-life scenarios dominated by uncertainty and dynamic conditions.

    Keywords: Horizontal collaboration, freight transport, sustainable logistics, supply chain management, combinatorial optimization.

    Pages: 393–414

    DOI: 10.2436/20.8080.02.65

    Vol 41 (2) 2017

  • Abstract: Little attention has so far been paid to the problems inherent in interpreting the meaning of results from standard impact analyses using symmetric input-output tables. Impacts as well as drivers of these impacts must be either of the product type or of the industry type. Interestingly, since supply-use tables distinguish products and industries, they can cope with product impacts driven by changes in industries, and vice versa. This paper contributes in two ways. Firstly, the demand-driven Leontief quantity model, both for industry-by-industry as well as for product-by-product tables, is formalised on the basis of supply-use tables, thus leading to impact multipliers, both for industries and products. Secondly, we demonstrate how the supply-use formulation can improve the incorporation of disparate satellite data into input-output models, by offering both industry and product representation. Supply-use blocks can accept any mix of industry and product satellite data, as long as these are not overlapping.

    Keywords: Technology assumptions, supply-use framework, multipliers

    Pages: 139–152

    Vol 36 (2) 2012