Journal SORT

Last year's top ten most cited papers

Top ten most cited papers in 2024 according to Web of Science (WOS)

Why simheuristics? Benefits, limitations, and best practices when combining metaheuristics with simulation

Manuel Chica (ORCID ID: orcid.org/), Angel A. Juan (ORCID ID: orcid.org/), Christopher Bayliss (ORCID ID: orcid.org/), Oscar Cordón (ORCID ID: orcid.org/) and W. David Kelton (ORCID ID: orcid.org/)
Abstract: Many decision-making processes in our society involve NP-hard optimization problems. The largescale, dynamism, and uncertainty of these problems constrain the potential use of stand-alone optimization methods. The same applies for isolated simulation models, which do not have the potential to find optimal solutions in a combinatorial environment. This paper discusses the utilization of modelling and solving approaches based on the integration of simulation with metaheuristics. These "simheuristic" algorithms, which constitute a natural extension of both metaheuristics and simulation techniques, should be used as a "first-resort" method when addressing large-scale and NP-hard optimization problems under uncertainty —which is a frequent case in real-life applications. We outline the benefits and limitations of simheuristic algorithms, provide numerical experiments that validate our arguments, review some recent publications, and outline the best practices to consider during their design and implementation stages.
Keywords: Simulation, metaheuristics, combinatorial optimization, simheuristics
Pages: 311–334
DOI: 10.2436/20.8080.02.104
Volume 44 (2) 2020
- PDF
On interpretations of tests and effect sizes in regression models with a compositional predictor

Germà Coenders (ORCID ID: orcid.org/) and Vera Pawlowsky-Glahn (ORCID ID: orcid.org/)
Abstract: Compositional data analysis is concerned with the relative importance of positive variables, expressed through their log-ratios. The literature has proposed a range of manners to compute log-ratios, some of whose interrelationships have never been reported when used as explanatory variables in regression models. This article shows their similarities and differences in interpretation based on the notion that one log-ratio has to be interpreted keeping all others constant. The article shows that centred, additive, pivot, balance and pairwise log-ratios lead to simple reparametrizations of the same model which can be combined to provide useful tests and comparable effect size estimates.
Keywords: Compositional regression models, CoDa, composition as explanatory, centred log-ratios, pivot coordinates, pairwise log-ratios, additive log-ratios, effect size
Pages: 201–220
DOI: 10.2436/20.8080.02.100
Volume 44 (1) 2020
- PDF
A simheuristic for routing electric vehicles with limited driving ranges and stochastic travel times

Lorena Reyes-Rubiano (ORCID ID: orcid.org/), Daniele Ferone (ORCID ID: orcid.org/), Angel A. Juan (ORCID ID: orcid.org/) and Javier Faulin (ORCID ID: orcid.org/)
Abstract: Green transportation is becoming relevant in the context of smart cities, where the use of electric vehicles represents a promising strategy to support sustainability policies. However the use of electric vehicles shows some drawbacks as well, such as their limited driving-range capacity. This paper analyses a realistic vehicle routing problem in which both driving-range constraints and stochastic travel times are considered. Thus, the main goal is to minimize the expected time-based cost required to complete the freight distribution plan. In order to design reliable Routing plans, a simheuristic algorithm is proposed. It combines Monte Carlo simulation with a multi-start metaheuristic, which also employs biased-randomization techniques. By including simulation, simheuristics extend the capabilities of metaheuristics to deal with stochastic problems. A series of computational experiments are performed to test our solving approach as well as to analyse the effect of uncertainty on the routing plans.
Keywords: Vehicle routing problem, electric vehicles, green transport and logistics, smart cities, simheuristics, biased-randomized heuristics
Pages: 3–24
DOI: 10.2436/20.8080.02.77
Volume 43 (1) 2019
- PDF
Twenty years of P-splines

Paul H.C. Eilers, Brian D. Marx and Maria Durbán
Abstract: P-splines first appeared in the limelight twenty years ago. Since then they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a “backbone” for the “mixing and matching” of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines.
Keywords: B-splines, penalty, additive model, mixed model, multidimensional smoothing.
Pages: 149–186
DOI: 10.2436/20.8080.02.25
Volume 39 (2) 2015
- PDF
Data wrangling, computational burden, automation, robustness and accuracy in ecological inference forecasting of RxC tables

Jose M. Pavía (ORCID ID: orcid.org/) and Rafael Romero
Abstract: This paper assesses the two current major alternatives for ecological inference, based on a multinomial-Dirichlet Bayesian model and on mathematical programming. Their performance is evaluated in a database made up of almost 2000 real datasets for which the actual cross-distributions are known. The analysis reveals both approaches as complementarity, each one of them performing better in a different area of the simplex space, although with Bayesian solutions deteriorating when the amount of information is scarce. After offering some guidelines regarding the appropriate contexts for employing each one of the algorithms, we conclude with some ideas for exploiting their complementarities.
Keywords: Ecological inference; Voter transitions; US voting rights; two-way contingency tables; ei.MD.bayes; lphom; R-packages
Pages: 151–186
DOI: 10.57645/20.8080.02.4
Volume 47 (1) 2023
- PDF
Efficiency of propensity score adjustment and calibration on the estimation from non-probabilistic online surveys

Ramón Ferri-García (ORCID ID: orcid.org/) and Maria del Mar Rueda (ORCID ID: orcid.org/)
Abstract: One of the main sources of inaccuracy in modern survey techniques, such as online and smartphone surveys, is the absence of an adequate sampling frame that could provide a probabilistic sampling. This kind of data collection leads to the presence of high amounts of bias in final estimates of the survey, specially if the estimated variables (also known as target variables) have some influence on the decision of the respondent to participate in the survey. Various correction techniques, such as calibration and propensity score adjustment or PSA, can be applied to remove the bias. This study attempts to analyse the efficiency of correction techniques in multiple situations, applying a combination of propensity score adjustment and calibration on both types of variables (correlated and not correlated with the missing data mechanism) and testing the use of a reference survey to get the population totals for calibration variables. The study was performed using a simulation of a fictitious population of potential voters and a real volunteer survey aimed to a population for which a complete census was available. Results showed that PSA combined with calibration results in a bias removal considerably larger when compared with calibration with no prior adjustment. Results also showed that using population totals from the estimates of a reference survey instead of the available population data does not make a difference in estimates accuracy, although it can contribute to slightly increment the variance of the estimator.
Keywords: Online surveys, Smartphone surveys, propensity score adjustment, calibration, simulation
Pages: 159–182
DOI: 10.2436/20.8080.02.73
Volume 42 (2) 2018
- PDF
Small area estimation of the proportion of single-person households: Application to the Spanish Household Budget Survey

María Bugallo (ORCID ID: orcid.org/), Domingo Morales (ORCID ID: orcid.org/) and María Dolores Esteban (ORCID ID: orcid.org/)
Abstract: Household composition reveals vital aspects of the socioeconomic situation and major changes in developed countries for decision-making and mapping the distribution of single-person households is highly relevant and useful. Driven by the Spanish Household Budget Survey data, we propose a new statistical methodology for small area estimation of proportions and total counts of single-person households. Estimation domains are defined as crosses of province, sex and age group of the main breadwinner of the household. Predictors are based on area-level zero-inflated Poisson mixed models. Model parameters are estimated by maximum likelihood and mean squared errors by parametric bootstrap. Several simulation experiments are carried out to empirically investigate the properties of these estimators and predictors. Finally, the paper concludes with an application to real data from 2016.
Keywords: Small area estimation, zero-inflated Poisson mixed model, area-level data, Household Budget Survey, single-person household
Pages: 125–152

DOI: 10.57645/20.8080.02.16
Volume 39 (1) 2015
- PDF
Modelling consumer credit risk via survival analysis

Ricardo Cao, Juan M. Vilar and Andrés Devia
Abstract: Credit risk models are used by financial companies to evaluate in advance the insolvency risk caused by credits that enter into default. Many models for credit risk have been developed over the past few decades. In this paper, we focus on those models that can be formulated in terms of the probability of default by using survival analysis techniques. With this objective three different mechanisms are proposed based on the key idea of writing the default probability in terms of the conditional distribution function of the time to default. The first method is based on a Cox's regression model, the second approach uses generalized linear models under censoring and the third one is based on nonparametric kernel estimation, using the product-limit conditional distribution function estimator by Beran. The resulting nonparametric estimator of the default probability is proved to be consistent and asymptotically normal. An empirical study, based on modified real data, illustrates the three methods.
Keywords: Probability of default, Basel II, nonparametric regression, conditional survival function, generalized product-limit estimator.
Pages: 3–30

Volume 33 (1) 2009
- PDF
Independent increments in group sequential tests: a review (invited article)

KyungMann Kim (ORCID ID: orcid.org/) and Anastasios A. Tsiatis (ORCID ID: orcid.org/)
Abstract: In order to apply group sequential methods for interim analysis for early stopping in clinical trials, the joint distribution of test statistics over time has to be known. Often the distribution is multivariate normal or asymptotically so, and an application of group sequential methods requires multivariate integration to determine the group sequential boundaries. However, if the increments between successive test statistics are independent, the multivariate integration reduces to a univariate integration involving simple recursion based on convolution. This allows application of standard group sequential methods. In this paper we review group sequential methods and the development that established independent increments in test statistics for the primary outcomes of longitudinal or failure time data.
Keywords: Failure time data, interim analysis, longitudinal data, clinical trials, repeated significance tests, sequential methods
Pages: 223–264
DOI: 10.2436/20.8080.02.101
Volume 44 (2) 2020
- PDF
Thirty years of progeny from Chao's inequality: Estimating and comparing richness with incidence data and incomplete sampling (invited article)

Anne Chao (ORCID ID: orcid.org/0000-0002-4364-8101) and Robert K. Colwell (ORCID ID: orcid.org/0000-0002-1384-0354)
Abstract: In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals' capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao's inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally-large or equally-complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao's inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.
Keywords: Cauchy-Schwarz inequality, Chao2 estimator, extrapolation, Good-Turing frequency, formula, incidence data, phylogenetic diversity, rarefaction, sampling effort, shared species richness, species richness.
Pages: 3– 54
DOI: 10.2436/20.8080.02.49
Volume 41 (1) 2017
- PDF

Last year's top ten most cited papers

Top ten most cited papers in 2024 according to Web of Science (WOS)

Why simheuristics? Benefits, limitations, and best practices when combining metaheuristics with simulation

On interpretations of tests and effect sizes in regression models with a compositional predictor

A simheuristic for routing electric vehicles with limited driving ranges and stochastic travel times

Twenty years of P-splines

Data wrangling, computational burden, automation, robustness and accuracy in ecological inference forecasting of RxC tables

Efficiency of propensity score adjustment and calibration on the estimation from non-probabilistic online surveys

Small area estimation of the proportion of single-person households: Application to the Spanish Household Budget Survey

Modelling consumer credit risk via survival analysis

Independent increments in group sequential tests: a review (invited article)

Thirty years of progeny from Chao's inequality: Estimating and comparing richness with incidence data and incomplete sampling (invited article)