Journal SORT

Downloadable articles

The full text of articles can be downloaded by clicking the PDF button. These may contain supplementary material that can be downloaded by clicking the ZIP button.

Volume 49 (2), July-December 2025

A stochastic partial differential equation for Bayesian spatio-temporal modelling of crime

Julia Calatayud, Marc Jornet, Javier Platero and Jorge Mateu
Abstract: We propose a stochastic partial differential equation to model geo-referenced data in the plane, with spatially correlated noise and a temporal log-normal evolution. Discretization in space permits us to develop the model in a finite-dimensional framework, reducing it to a set of stochastic differential equations coupled by correlated Wiener processes. The correlations are considered time-varying and stochastic, with a transformed log-normal distribution. The final model is framed within a hierarchical structure, and parameter inference is conducted jointly using Bayesian methods. The statistical methodology is illustrated by analyzing crime activity in the city of Valencia, Spain.
Keywords: Bayesian inference, crime time series, lattice data, space-time correlation, space-time intensity, stochastic log-Gaussian model, stochastic partial differential equation
Pages: 149–178
DOI: 10.57645/20.8080.02.26
- PDF
Optimism correction of the area under the ROC curve, with missing data

Susana Rafaela Martins , María del Carmen Iglesias-Pérez and Jacobo de Uña-Álvarez
Abstract: The area under the ROC curve (AUC) plays an important role in the study of the predictive capacity of regression models. It is well known that an inflated AUC may result when the same data are used for training and testing the model. In this paper optimism correction of the AUC in the presence of missing data is investigated. Complete case analysis, inverse probability weighting and multiple imputation are employed to address the issue of missing data. For each of these approaches, split-sample, K-fold cross-validation and leave-one-out cross-validation are employed to correct for the optimism of the AUC. The methods are compared through intensive Monte Carlo simulations in the particular setting of binary regression. Results suggest that all estimators are consistent with the exception of complete case analysis, which may be biased when missing is not completely at random. In general, a combined application of multiple imputation and leave-one-out cross-validation is recommended.

Keywords: cross-validation, logistic regression, missing values, multiple imputation, prediction
Pages: 179–212
DOI: 10.57645/20.8080.02.27
- PDF
On generalized Gower distance for mixed-type data: extensive simulation study and new software tools

Aurea Grané and Fabio Scielzo-Ortiz
Abstract: Data scientists address real-world problems using multivariate and heterogeneous data-sets, characterized by multiple variables of different natures. Selecting a suitable distance function between units is crucial, as many statistical techniques and machine learning algorithms depend on this concept. Traditional distances, such as Euclidean or Manhattan, are unsuitable for mixed-type data, and although Gower distance was designed to handle this kind of data, it may lead to suboptimal results in the presence of outlying units or underlying correlation structure. In this work robust distances for mixed-type data are defined and explored, namely robust generalized Gower and robust related metric scaling. A new Python package is developed, which enables to compute these robust proposals as well as classical ones.

Keywords: distances, generalized Gower, multivariate heterogeneous data, outliers, robust Mahalanobis, related metric scaling
Pages: 213–244
DOI: 10.57645/20.8080.02.28
- PDF
Bayesian estimation for conditional probabilities associated to directed acyclic graphs: study of hospitalization of severe influenza cases

Lesly Acosta and Carmen Armero
Abstract: This paper presents a Bayesian framework to estimate joint, conditional, and marginal probabilities in directed acyclic graphs to study the progression of hospitalized patients with confirmed severe influenza. Using data from the PIDIRAC retrospective cohort in Catalonia, we model patient pathways from admission to discharge, death, or transfer. Transition probabilities are estimated using a Bayesian Dirichlet-multinomial approach, while posterior distributions for absorbing states or inverse probabilities are assessed via simulation. Bayesian methodology quantifies uncertainty through posterior distributions, offering insights into disease progression and in improving hospital planning. These findings support more effective patient management and informed decision making during seasonal influenza outbreaks.

Keywords: confirmed influenza hospitalization, directed acyclic graphs (DAGs), Dirichlet-multinomial Bayesian inferential process, healthcare decision-making, transition probabilities
Pages: 245–264
DOI: 10.57645/20.8080.02.29
- PDF

Volume 49 (1), January-June 2025

Recent advances in copula-based methods for dependent censoring (invited article)

Gilles Crommen, Negera Wakgari Deresa, Myrthe D’Haen, Jie Ding, Ilias Willems and Ingrid Van Keilegom
Abstract: When modeling time-to-event data that are subject to right censoring, it is commonly assumed that the survival time T and the censoring time C are independent. However, this assumption frequently fails in practice, leading to biased estimators and testing procedures having invalid type 1 error rates. To overcome this issue, several models relaxing the independent censoring assumption have been proposed in the literature. Among these, copula-based approaches have become popular due to their ability to separately model the marginal distributions of T and C and their dependence structure. This review paper gives a comprehensive overview of recent advances in copula-based methods for dependent censoring, along with a discussion of the most important historical papers on this topic. As it is well known that the distribution of (T, C) (and hence of T) is not identified in a fully nonparametric way, we examine different strategies to achieve model identifiability. These strategies consist of imposing assumptions on either the copula or the marginal distributions of T and C. Both of these approaches will be discussed, with and without covariates. We also consider the case where a dependent censoring time is accompanied by an additional latent independent censoring time. Lastly, we briefly explain alternative approaches that are not based on copulas.
Keywords: copula, dependent censoring, identifiability, survival analysis
Pages: 3–42
DOI: 10.57645/20.8080.02.21
- PDF
On statistical model extensions based on randomly stopped extremes

Jordi Valero and Josep Ginebra
Abstract: The maxima and the minima of a randomly stopped sample of a random variable, X, together with two newly defined random variables that make X into the maxima or minima of a randomly stopped sample of them, can be used to define statistical model transformation mechanisms. These transformations can be used to define models for extreme-value data that are not grounded on large sample theory. The relationship between the stopping model and characteristics of the corresponding model transformations obtained is investigated. In particular, one looks into which stopping models make these model transformations into model extensions, and which stopping models lead to statistically stable extensions in the sense that using the model extension a second time leaves the extended model unchanged. The stopping models under which the extensions based on randomly stopped maxima and their inverses coincide with the extensions based on randomly stopped minima and their inverses are also characterized. The advantages of using models obtained through these model extension mechanisms instead of resorting to extreme-value models grounded on asymptotic arguments is illustrated by way of examples.
Keywords: Marshall-Olkin extension, extreme value, randomly stopped maximum, randomly stopped minimum, statistical stability, stopping model
Pages: 43–72
DOI: 10.57645/20.8080.02.22
- PDF
- ZIP
Lattice structures for the stochastic comparison of call ratio backspread derivatives with an application

María Concepción López-Díaz, Miguel López-Díaz and Sergio Martínez-Fernández
Abstract: The comparison of investments in financial derivatives is an appealing topic in the optimization of resources. A relevant derivative is the call ratio backspread. Motivated by the need to compare investments in such derivatives, a new family of stochastic orders is introduced. That permits to reach decisions on the allocations of funds in those derivatives under general conditions and without assuming specific probability distributions of the asset prices. Characterizations of the orders are developed. Special emphasis is placed on the existence of infima and suprema in such dominance criteria, which leads to lattice structures on some special spaces and to the reduction of some optimization problems with stochastic dominance constraints. The method is illustrated with an application using real data from financial markets.
Keywords: call ratio backspread derivative, integrated survival function, lattice, stochastic order
Pages: 73–92

DOI: 10.57645/20.8080.02.23
- PDF
Spatial autoregressive modelling of epidemiological data: geometric mean model proposal

Mabel Morales-Otero, Christel Faes and Vicente Núñez-Antón
Abstract: We propose the geometric mean spatial conditional model for fitting spatial public Health data, assuming that the disease incidence in one region depends on that of neighbouring regions, and incorporating an autoregressive spatial term based on their geometric mean. We explore alternative spatial weights matrices, including those based on contiguity, distance, covariate differences and individuals’ mobility. A simulation study assesses the model’s performance with mobility-based spatial correlation. We illustrate our proposals by analysing the COVID-19 spread in Flanders, Belgium, and comparing the proposed model with other commonly used spatial models. Our approach demonstrates advantages in interpretability, computational efficiency, and flexibility over the commonly used and previously existing methods.
Keywords: Bayesian approaches, COVID-19 incidence, epidemiology, spatial modelling
Pages: 93–120

DOI: 10.57645/20.8080.02.24
- PDF
- ZIP
Leave-group-out cross-validation for latent gaussian models

Zhedong Liu, Janet Van Niekerk and Håvard Rue
Abstract: Evaluating the predictive performance of a statistical model is commonly done using cross-validation. Among the various methods, leave-one-out cross-validation (LOOCV) is frequently used. Originally designed for exchangeable observations, LOOCV has since been extended to other cases such as hierarchical models. However, it focuses primarily on short-range prediction and may not fully capture long-range prediction scenarios. For structured hierarchical models, particularly those involving multiple random effects, the concepts of short-and long-range predictions become less clear, which can complicate the interpretation of LOOCV results. In this paper, we propose a complementary cross-validation framework specifically tailored for longer-range prediction in latent Gaussian models, including those with structured random effects. Our approach differs from LOOCV by excluding a carefully constructed set from the training set, which better emulates longer-range prediction conditions. Furthermore, we achieve computational efficiency by adjusting the full joint posterior for this modified cross-validation, thus eliminating the need for model refitting. This method is implemented in the R-INLA package (www.r-inla.org) and can be adapted to a variety of inferential frameworks.
Keywords: Bayesian Cross-Validation, Latent Gaussian Models, R-INLA
Pages: 121–146
DOI: 10.57645/20.8080.02.25
- PDF

Volume 48 (2), July-December 2024

Patient-reported outcomes and survival analysis of chronic obstructive pulmonary disease patients: a two-stage joint modelling approach

Cristina Galán-Arcicollar, Josu Najera-Zuloagaand Dae-Jin Lee
Abstract: Joint modelling has gained attention in longitudinal studies incorporating biomarkers and survival data. In the context of chronic diseases, patient evolution is often tracked through multiple assessments, with patient-reported outcomes playing a crucial role. The Beta-Binomial distribution is suggested as a suitable model for these longitudinal variables. However, its integration into joint modelling remains unexplored. This study introduces an estimation procedure for analyzing longitudinal patient-reported outcomes and survival data together. We compare different estimation approaches through simulation experiments, including the proposed model. Furthermore, the methodologies are applied to real data from a follow-up study on chronic obstructive pulmonary disease patients
Keywords: joint modelling, Beta-Binomial regression, patient-reported outcomes, survival analysis
Pages: 155–182

DOI:10.57645/20.8080.02.17
- PDF
Non-parametric estimation of the covariate-dependent bivariate distribution for censored gap times

Ewa Strzalkowska-Kominiak, Elisa M. Molanes-López and Emilio Letón
Abstract: In many biomedical studies, recurrent or consecutive events may occur during the follow up of the individuals. This situation can be found, for example, in transplant studies, where there are two consecutive events which give rise to two times of interest subject to a common random right-censoring time, the first one being the elapsed time from acceptance into the transplantation program to transplant, and the second one the time from transplant to death. In this work, we incorporate the information of a continuous covariate into the bivariate distribution of the two gap times of interest and propose a non-parametric method to cope with it. We prove the asymptotic properties of the proposed method and carry out a simulation study to see the performance of this approach. Additionally, we illustrate its use with Stanford heart transplant data and colon cancer data.
Keywords: bivariate distribution, copula function, covariate, serial dependence, random censoring, kernel estimation
Pages: 183–208

DOI:10.57645/20.8080.02.18
- PDF
Second-order Markov multistate models

Mireia Besalú and Guadalupe Gómez Melis
Abstract: Multistate models are well developed for continuous and discrete times under a first order Markov assumption. Motivated by a cohort of COVID-19 patients, a multistate model was designed based on 14 transitions among 7 states of a patient. Since a preliminary analysis showed that the first-order Markov condition was not met for some transitions, we have developed a second-order Markov model where the future evolution not only depends on the state at the current time but also on the state at the preceding time. Under a discrete time analysis, assuming homogeneity and that past information is restricted to two consecutive times, we expanded the transition probability matrix and proposed an extension of the Chapman-Kolmogorov equations. We propose two estimators for the second-order transition probabilities and illustrate them within the cohort of COVID-19 patients.
Keywords: multistate models, non-Markov, COVID-19
Pages: 209–234

DOI:10.57645/20.8080.02.19
- PDF
Conditional likelihood based inference on single-index models for motor Insurance claim severity

Catalina Bolancé, Ricardo Cao and Montserrat Guillen
Abstract: Prediction of a traffic accident cost is one of the major problems in motor insurance. To identify the factors that influence costs is one of the main challenges of actuarial modelling. Telematics data about individual driving patterns could help calculating the expected claim severity in motor insurance. We propose using single-index models to assess the marginal effects of covariates on the claim severity conditional distribution. Thus, drivers with a claim cost distribution that has a long tail can be identified. These are risky drivers, who should pay a higher insurance premium and for whom preventative actions can be designed. A new kernel approach to estimate the covariance matrix of coefficients’ estimator is outlined. Its statistical properties are described and an application to an innovative data set containing information on driving styles is presented. The method provides good results when the response variable is skewed.
Keywords: covariance matrix of estimator, kernel estimator, marginal effects, telematics covariates, right-skewed cost variable
Pages: 235–258

DOI:10.57645/20.8080.02.20
- PDF

Volume 48 (1), January-June 2024

A diffusion-based spatio-temporal extension of Gaussian Matérn fields (invited article with discussion)

Finn Lindgren, Haakon Bakka, David Bolin, Elias Krainski and Håvard Rue
Abstract: Gaussian random fields with Matérn covariance functions are popular models in spatial statistics and machine learning. In this work, we develop a spatio-temporal extension of the Gaussian Matérn fields formulated as solutions to a stochastic partial differential equation. The spatially stationary subset of the models have marginal spatial Matérn covariances, and the model also extends to Whittle-Matérn fields on curved manifolds, and to more general non-stationary fields. In addition to the parameters of the spatial dependence (variance, smoothness, and practical correlation range) it additionally has parameters controlling the practical correlation range in time, the smoothness in time, and the type of non-separability of the spatio-temporal covariance. Through the separability parameter, the model also allows for separable covariance functions. We provide a sparse representation based on a finite element approximation, that is well suited for statistical inference and which is implemented in the R-INLA software. The flexibility of the model is illustrated in an application to spatio-temporal modeling of global temperature data.
Keywords: Stochastic partial differential equations, diffusion, Gaussian fields, non-separable space-time models, INLA, finite element methods
Pages: 3–66

DOI:10.57645/20.8080.02.13
- PDF
Estimation of logistic regression parameters for complex survey data: simulation study based on real survey data

Amaia Iparragirre, Irantzu Barrio , Jorge Aramendi and Inmaculada Arostegui
Abstract: In complex survey data, each sampled observation has assigned a sampling weight, indicating the number of units that it represents in the population. Whether sampling weights should or not be considered in the estimation process of model parameters is a question that still continues to generate much discussion among researchers in different fields. We aim to contribute to this debate by means of a real data based simulation study in the framework of logistic regression models. In order to study their performance, three methods have been considered for estimating the coefficients of the logistic regression model: a) the unweighted model, b) the weighted model, and c) the unweighted mixed model. The results suggest the use of the weighted logistic regression model is superior, showing the importance of using sampling weights in the estimation of the model parameters.
Keywords: complex survey data, sampling weights, logistic regression, estimation of model parameters, real data based simulation study
Pages: 67–92
DOI:10.57645/20.8080.02.14
- PDF
Kernel Weighting for blending probability and non-probability survey samples

María del Mar Rueda, Beatriz Cobo, Jorge Luis Rueda-Sánchez, Ramon Ferri-García and Luis Castro-Martín
Abstract: In this paper we review some methods proposed in the literature for combining a nonprobability and a probability sample with the purpose of obtaining an estimator with a smaller bias and standard error than the estimators that can be obtained using only the probability sample. We propose a new methodology based on the kernel weighting method. We discuss the properties of the new estimator when there is only selection bias and when there are both coverage and selection biases. We perform an extensive simulation study to better understand the behaviour of the proposed estimator.
Keywords: Kernel weighting, survey sampling, non-probability sample, coverage bias, selection bias
Pages: 93–124
DOI:10.57645/20.8080.02.15
- PDF
Small area estimation of the proportion of single-person households: Application to the Spanish Household Budget Survey

María Bugallo, Domingo Morales and María Dolores Esteban
Abstract: Household composition reveals vital aspects of the socioeconomic situation and major changes in developed countries for decision-making and mapping the distribution of single-person households is highly relevant and useful. Driven by the Spanish Household Budget Survey data, we propose a new statistical methodology for small area estimation of proportions and total counts of single-person households. Estimation domains are defined as crosses of province, sex and age group of the main breadwinner of the household. Predictors are based on area-level zero-inflated Poisson mixed models. Model parameters are estimated by maximum likelihood and mean squared errors by parametric bootstrap. Several simulation experiments are carried out to empirically investigate the properties of these estimators and predictors. Finally, the paper concludes with an application to real data from 2016.
Keywords: Small area estimation, zero-inflated Poisson mixed model, area-level data, Household Budget Survey, single-person household
Pages: 125–152

DOI:10.57645/20.8080.02.16
- PDF
- ZIP

Volume 47 (2), July-December 2023. Special issue devoted to 9th International Workshop on Compositional Data Analysis (CODAWORK, 2022). Guest editors: Germà Coenders and Javier Palarea-Albaladejo

40 years after Aitchison’s article “The statistical analysis of compositional data”. Where we are and where we are heading

Germà Coenders , Juan José Egozcue , Kamila Fačevicová , Carolina Navarro-López , Javier Palarea-Albaladejo , Vera Pawlowsky-Glahn and Raimon Tolosana-Delgado
Abstract: The year 2022 marked 40 years since Aitchison published the article “The statistical analysis of compositional data”. It is considered to be the foundation of contemporary compositional data analysis. It is time to review what has been accomplished in the feld and what needs to be addressed. Astonishingly enough, many aspects seen as challenging in 1982 continue to lead to fruitful scholarly work. We commence with a bibliometric study and continue with some hot topics such as multi-way compositions, compositional regression models, dealing with zero values, non-logratio transformations, new application felds, and a number of current loose ends. Finally, a tentative future research agenda is outlined.
Keywords: Compositional data (CoDa), logratios, Aitchison geometry, multi-way compositions, zero replacement, compositional regression
Pages: 207–228
DOI: 10.57645/20.8080.02.6
- PDF
Subcompositional coherence and and a novel proportionality index of parts

Juan José Egozcue and Vera Pawlowsky-Glahn
Abstract: Research in compositional data analysis was motivated by spurious (Pearson) correlation. Spurious results are due to semantic incoherence, but the question of ways to relate parts in a statistically consistent way remains open. To solve this problem, we first define a coherent system of functions with respect to a subcomposition and analyze the space of parts. This leads to understanding why measures like covariance and correlation depend on the subcomposition considered, while measures like the distance between parts are independent of the same. It allows the definition of a novel index of proportionality between parts.
Keywords: Compositional data analysis, Aitchison geometry, simplex, compositional parts, proportionality, dominance, correlation
Pages: 229–244
DOI: 10.57645/20.8080.02.7
- PDF
- ZIP
Compositional covariance shrinkage and regularised partial correlations

Suzanne Jin , Cédric Notredame and Ionas Erb
Abstract: We propose an estimation procedure for covariation in wide compositional data sets. For compositions, widely-used logratio variables are interdependent due to a common reference. Logratio uncorrelated compositions are linearly independent before the unitsum constraint is imposed. We show how they are used to construct bespoke shrinkage targets for logratio covariance matrices and test a simple procedure for partial correlation estimates on both a simulated and a single-cell gene expression data set. For the underlying counts, different zero imputations are evaluated. The partial correlation induced by the closure is derived analytically. Data and code are available from GitHub.
Keywords: Compositional covariance structure, logratio analysis, partial correlation, James-Stein shrinkage
Pages: 245–268
DOI: 10.57645/20.8080.02.8
- PDF
- ZIP
Simple enough, but not simpler: reconsidering additive logratio coordinates in compositional analysis

Viktorie Nesrstová , Paulína Jašková , Ivana Pavlů , Karel Hron , Javier Palarea-Albaladejo , Aleš Gába , Jana Pelclová and Kamila Fačevicová
Abstract: Compositional data, multivariate observations carrying relative information, are popularly expressed in additive logratio coordinates which are easily interpretable as they use one of the components as ratioing part to produce pairwise logratios. These coordinates are however oblique and they lead to issues when applying multivariate methods on them, including widely-used techniques such as principal component analysis and linear regression. In this paper we propose a way to redefine alr coordinates with respect to an orthonormal system and we also extend the idea to the case of compositional tables. The new approach is demonstrated in an application to movement behavior data.
Keywords: Compositional data, compositional tables, regression, principal component analysis
Pages: 269–294
DOI: 10.57645/20.8080.02.9
- PDF
- ZIP
Classification of probability density functions in the framework of Bayes spaces: methods and applications

Ivana Pavlů, Alessandra Menafoglio, Enea Bongiorno and Karel Hron
Abstract: The process of supervised classification when the data set consists of probability density functions is studied. Due to the relative information contained in densities, it is necessary to convert the functional data analysis methods into an appropriate framework, here represented by the Bayes spaces. This work develops Bayes space counterparts to a set of commonly used functional methods with a focus on classification. Hereby, a clear guideline is provided on how some classification approaches can be adapted for the case of densities. Comparison of the methods is based on simulation studies and real-world applications, reflecting their respective strengths and weaknesses.
Keywords: Probability density functions, Bayes spaces, classification, functional data analysis
Pages: 295–322
DOI: 10.57645/20.8080.02.10
- PDF
Fundamentals of convex optimization for compositional data

Jordi Saperas Riera, Josep Antoni Martín Fernández and Glòria Mateu Figueras
Abstract: Many of the most popular statistical techniques incorporate optimisation problems in their inner workings. A convex optimisation problem is defined as the problem of minimising a convex function over a convex set. When traditional methods are applied to compositional data, misleading and incoherent results could be obtained. In this paper, we fill a gap in the specialised literature by introducing and rigorously defining novel concepts of convex optimisation for compositional data according to the Aitchison geometry. Convex sets and convex functions on the simplex are defined and illustrated.
Keywords: Compositional data, logratio, simplex, proportion, function, convexity, optimisation
Pages: 323–344
DOI: 10.57645/20.8080.02.11
- PDF
Interpretation of coal compositional data on whole-coal versus ash bases through the weighted symmetric pivot coordinates method

Na Xu, Ru Wang, Mark A. Engle, Wei Zhu, Qiang Li and Zhiwei Wang
Abstract: In addition to approaches based on a number of physical and chemical analyses, statistical methods have been commonly used for determining the modes of occurrence of elements in coal. The Pearson correlation coefficient of element concentrations vs. ash yields is the simplest method that has been widely used. Concentrations of elements in coal are usually reported on two bases: whole-coal and ash bases. Coal compositional data on whole-coal basis can be converted back to ash basis. However, in many cases, the correlation between corresponding pairs of elements in coal is inconsistent when reported on whole-coal versus ash bases. Therefore, traditional statistical methods, such as correlation analysis, based on whole-coal and ash bases can sometimes lead to misleading or confusing results. Previous investigations have suggested using logratio variance or related parameters (i.e., stability) to examine these data, as they provide consistent results regardless of the sample basis. However, logratio variance based approaches are unable to distinguish the inverse relationships between parts. To provide more clarity on the relationships between parts, weighted symmetric pivot coordinates are used to analyze the correlation between elements in coal on whole-coal basis and ash basis. To illustrate this approach, 106 late Paleozoic coal samples from the Datanhao and Adaohai coal mines, Daqingshan Coalfeld, northern China, are used for performance evaluation. Experimental results show that the weight symmetric pivots method is more effective than the stability method in predicting the modes of occurrence of elements in coal for these samples, providing deeper insight than logratio variance based approaches.
Keywords: whole-coal basis, ash basis, correlation, WSPC method
Pages: 345–362
DOI: 10.57645/20.8080.02.12
- PDF
- ZIP

Volume 47 (1), January-June 2023

Transport systems analysis: models and data (invited article)

Jaume Barceló
Abstract: Rapid advancements in new technologies, especially information and communication technologies (ICT), have significantly increased the number of sensors that capture data, namely those embedded in mobile devices. This wealth of data has garnered particular interest in analyzing transport systems, with some researchers arguing that the data alone are sufficient enough to render transport models unnecessary. However, this paper takes a contrary position and holds that models and data are not mutually exclusive but rather depend upon each other. Transport models are built upon established families of optimization and simulation approaches, and their development aligns with the scientific principles of operations research, which involves acquiring knowledge to derive modeling hypotheses. We provide an overview of these modeling principles and their application to transport systems, presenting numerous models that vary according to study objectives and corresponding modeling hypotheses. The data required for building, calibrating, and validating selected models are discussed, along with examples of using data analytics techniques to collect and handle the data supplied by ICT applications. The paper concludes with some comments on current and future trends.
Keywords: Optimization, Simulation, Data Analytics, Traffic Assignment, Traffic Simulation
Pages: 3–80
DOI:10.57645/20.8080.02.1
- PDF
Data science, analytics and artificial intelligence in e-health: trends, applications and challenges

Juliana Castaneda , Laura Calvet , Sergio Benito , Abtin Tondar and Angel A. Juan
Abstract: More than ever, healthcare systems can use data, predictive models, and intelligent algorithms to optimize their operations and the service they provide. This paper reviews the existing literature regarding the use of data science/analytics methods and artificial intelligence algorithms in healthcare. The paper also discusses how healthcare organizations can benefit from these tools to efficiently deal with a myriad of new possibilities and strategies. Examples of real applications are discussed to illustrate the potential of these methods. Finally, the paper highlights the main challenges regarding the use of these methods in healthcare, as well as some open research lines.
Keywords: e-health, data science, analytics, artificial intelligence, machine learning.
Pages: 81–128
DOI:10.57645/20.8080.02.2
- PDF
Optimal threshold of data envelopment analysis in bankruptcy prediction

Michaela Staňková and David Hampel
Abstract: Data envelopment analysis is not typically used for bankruptcy prediction. However, this paper shows that a correctly set up a model for this approach can be very useful in that context. A superefficiency model was applied to classify bankrupt and actively manufactured companies in the European Union. To select an appropriate threshold, the Youden index and the distance from the corner were used in addition to the total accuracy. The results indicate that selecting a suitable threshold improves specificity visibly with only a small reduction in the total accuracy. The thresholds of the best models appear to be robust enough for predictions in different time and economic sectors.
Keywords: Bankruptcy prediction, data envelopment analysis, ROC curve, threshold optimization, validation
Pages: 129–150
DOI:10.57645/20.8080.02.3
- PDF
Data wrangling, computational burden, automation, robustness and accuracy in ecological inference forecasting of RxC tables

Jose M. Pavía and Rafael Romero
Abstract: This paper assesses the two current major alternatives for ecological inference, based on a multinomial-Dirichlet Bayesian model and on mathematical programming. Their performance is evaluated in a database made up of almost 2000 real datasets for which the actual cross-distributions are known. The analysis reveals both approaches as complementarity, each one of them performing better in a different area of the simplex space, although with Bayesian solutions deteriorating when the amount of information is scarce. After offering some guidelines regarding the appropriate contexts for employing each one of the algorithms, we conclude with some ideas for exploiting their complementarities.
Keywords: Ecological inference; Voter transitions; US voting rights; two-way contingency tables; ei.MD.bayes; lphom; R-packages
Pages: 151–186
DOI:10.57645/20.8080.02.4
- PDF
- ZIP
Inference on the symmetry point-based optimal cut-off point and associated sensitivity and specificity with application to SARS-CoV-2 antibody data

Alba María Franco-Pereira , M. Carmen Pardo Llorente , Christos T. Nakas and Benjamin Reiser
Abstract: In the presence of a continuous response test/biomarker, it is often necessary to identify a cut-off point value to aid binary classification between diseased and non-diseased subjects. The symmetry-point approach which maximizes simultaneously both types of correct classification is one way to determine an optimal cut-off point. In this article, we study methods for constructing confidence intervals independently for the symmetry point and its corresponding sensitivity, as well as respective joint nonparametric confidence regions. We illustrate using data on the generation of antibodies elicited two weeks post-injection after the second dose of the Pfizer/BioNTech vaccine in adult healthcare workers.
Keywords: Empirical likelihood function, Empirical chi-square function, Box-Cox transformation, Confidence regions, Sensitivity, Specificity
Pages: 187–204
DOI:10.57645/20.8080.02.5
- PDF
- ZIP

Volume 46 (2), July-December 2022

Granger causality and time series regression for modelling the migratory dynamics of influenza into Brazil

Aline Foerster Grande , Guilherme Pumi and Gabriela Bettella Cybis
Abstract: In this work we study the problem of modelling and forecasting the dynamics of the influenza virus in Brazil at a given month, from data on reported cases and genetic diversity collected from previous months, in other locations. Granger causality is employed as a tool to assess possible predictive relationships between covariates. For modelling and forecasting purposes, a time series regression approach is applied considering lagged information regarding reported cases and genetic diversity in other regions. Three different models are analysed, including stepwise time series regression and LASSO.
Keywords: Flu, time series regression, variable selection, genetic diversity, Granger causality

Pages: 161–188
DOI:10.2436/20.8080.02.122
- PDF
- ZIP
Compositional combination and selection of forecasters

Antonio Martín Arroyo and Aránzazu de Juan Fernández
Abstract: The Split-Then-Combine approach has previously been used to generate the weights of forecasts in a combination in the Euclidean space. This paper extends this approach to combine forecasts inside the simplex space, the sample space of positive weights adding up to one. As it turns out, the simplicial statistic given by the sample centre compares favourably against the fixed-weight, average forecast. Besides, we also develop a Combination-After-Selection method to get rid of redundant forecasters. We apply these approaches to make out-of-sample one-step ahead combinations and subcombinations of forecasts for several economic variables. This methodology is particularly useful when the sample size is smaller than the number of forecasts, a case where other methods (e.g., ordinary least squares or principal component analysis) are not applicable.
Keywords: Aitchison geometry, Combination-After-Selection, Dimensionality problem, Simplex, Split-Then-Combine

Pages: 189–216
DOI:10.2436/20.8080.02.123
- PDF
Missing data analysis and imputation via latent Gaussian Markov random fields

Virgilio Gómez Rubio , Michela Cameletti and Marta Blangiardo
Abstract: This paper recasts the problem of missing values in the covariates of a regression model as a latent Gaussian Markov random field (GMRF) model in a fully Bayesian framework. The proposed approach is based on the definition of the covariate imputation sub-model as a latent effect with a GMRF structure. This formulation works for continuous covariates but for categorical covariates a typical multiple imputation approach is employed. Both techniques can be easily combined for the case in which continuous and categorical variables have missing values. The resulting Bayesian hierarchical model naturally fts within the integrated nested Laplace approximation (INLA) framework, which is used for model fitting. Hence, this work fills an important gap in the INLA methodology as it allows to treat models with missing values in the covariates. As in any other fully Bayesian framework, by relying on INLA for model fitting it is possible to formulate a joint model for the data, the imputed covariates and their missingness mechanism. In this way, it is possible to tackle the more general problem of assessing the missingness mechanism by conducting a sensitivity analysis on the different alternatives to model the non-observed covariates. Finally, the proposed approach is illustrated in two examples on modeling health risk factors and disease mapping.
Keywords: Imputation, missing values, GMRF, INLA, sensitivity analysis

Pages: 217–244
DOI:10.2436/20.8080.02.124
- PDF
Alternate-wrapped circular distributions

Savitri Joshi and R. N. Rattihalli
Abstract: To generate a circular distribution, we use the alternate-wrapping technique (unlike the usual wrapping), by wrapping in the alternate directions, after each single-wrapping. The resulting distribution is called alternate-wrapped distribution. Some general properties and distinctions between the two wrapping schemes are indicated. As an illustration, alternate-wrapped-exponential distribution and alternate-wrapped-normal distribution are considered. The moment and maximum likelihood estimator of the parameters of alternative-wrapped-exponential distribution are obtained and their performance is evaluated using simulation. Maximum likelihood estimators are obtained for the parameters of the alternate-wrapped-normal distribution and simulation study is conducted, and this distribution is used to analyse a real-life data set and is compared with the wrapped normal distribution.
Keywords: Akaike information criterion, Bayesian information criterion, circular distribution, exponential distribution, trigonometric moments, wrapped normal distribution

Pages: 245–262
DOI:10.2436/20.8080.02.125
- PDF
- ZIP

Volume 46 (1), January-June 2022

Fifty years later: new directions in Hawkes processes (invited article)

John Worrall, Raiha Browning, Paul Wu and Kerrie Mengersen
Abstract: The Hawkes process is a self-exciting Poisson point process, characterised by a conditional intensity function. Since its introduction fifty years ago, it has been the subject of numerous research directions and continues to inspire new methodological and theoretical developments as well as new applications. This paper marks half a century of interest in Hawkes processes by presenting a snapshot of four state-of-the-art research directions, categorised as frequentist and Bayesian methods, other modelling approaches and notable theoretical developments. A particular focus is on nonparametric approaches, with advances in kernel estimation and computational efficiencies. A survey of real world applications is provided to illustrate the breadth of application of this remarkable approach.
Keywords: Hawkes process, point process, nonparametric
Pages: 3–38
DOI:10.2436/20.8080.02.116
- PDF
Unusual-event processes for count data

Wanrudee Skulpakdee and Mongkol Hunkrajok
Abstract: At least one unusual event appears in some count datasets. It will lead to a more concentrated (or dispersed) distribution than the Poisson, gamma, Weibull, Conway-Maxwell-Poisson (CMP), and Faddy (1997) models can accommodate. These well-known count models are based on the monotonic rates of interarrival times between successive events. Under the assumption of non-monotonic rates and independent exponential interarrival times, a new class of parametric models for unusual-event (UE) count data is proposed. These models are applied to two empirical applications, the number of births and the number of bids, and yield considerably better results to the above well-known count models.
Keywords: Poisson count model, Gamma count model, Weibull count model, Conway-Maxwell-Poisson count model, Faddy count model
Pages: 39–66
DOI:10.2436/20.8080.02.117
- PDF
- ZIP
Estimation of finite population distribution function with auxiliary information in a complex survey sampling

Mohsin Abbas and Abdul Haq
Abstract: In this paper, we consider the problem of estimating the finite population cumulative distribution function (CDF) in a complex survey sampling, which includes two-stage and three-stage cluster sampling schemes with and without stratification. We propose two new families of CDF estimators using supplementary information on a single auxiliary variable. Explicit mathematical expressions of the biases and mean squared errors of the proposed CDF estimators are developed under the first order of the approximation. Real datasets are also considered to support the proposed theory.
Keywords: Ratio estimator, exponential ratio estimator, auxiliary information, stratification, two-stage and three-stage cluster sampling, relative efficiencies, bias, mean-squared error
Pages: 67–94
DOI:10.2436/20.8080.02.118
- PDF
Penalized spline smoothing using Kaplan-Meier weights in semiparametric censored regression models

Jesus Orbe and Jorge Virto
Abstract: In this article we consider an extension of the penalized splines approach in the context of censored semiparametric modelling using Kaplan-Meier weights to take into account the effect of censorship. We proposed an estimation method and develop statistical inferences in the model. Using various simulation studies we show that the performance of the method is quite satisfactory. A real data set is used to illustrate that the proposed method is comparable to parametric approaches when assuming a probability distribution of the response variable and/or the functional form. However, our proposal does not need these assumptions since it avoids model specification problems
Keywords: Censored data, Kaplan-Meier weights, P-splines, semiparametric models, survival analysis
Pages: 95– 114
DOI:10.2436/20.8080.02.119
- PDF
Topological Data Analysis and its usefulness for precision medicine studies

Raquel Iniesta, Ewan Carr, Mathieu Carrière, Naya Yerolemou, Bertrand Michel , and Frédéric Chazal
Abstract: Precision medicine allows the extraction of information from complex datasets to facilitate clinical decision-making at the individual level. Topological Data Analysis (TDA) offers promising tools that complement current analytical methods in precision medicine studies. We introduce the fundamental concepts of the TDA corpus (the simplicial complex, the Mapper graph, the persistence diagram and persistence landscape). We show how these can be used to enhance the prediction of clinical outcomes and to identify novel subpopulations of interest, particularly applied to understand remission of depression in data from the GENDEP clinical trial.
Keywords: Precision medicine, data shape, topology, topological data analysis, persistence diagram, Mapper, persistence landscapes, machine learning
Pages: 115– 136
DOI:10.2436/20.8080.02.120
- PDF
Estimation of cut-off points under complex-sampling design data

Amaia Iparragirre, Irantzu Barrio, Jorge Aramendi and Inmaculada Arostegui
Abstract: In the context of logistic regression models, a cut-off point is usually selected to dichotomize the estimated predicted probabilities based on the model. The techniques proposed to estimate optimal cut-off points in the literature, are commonly developed to be applied in simple random samples and their applicability to complex sampling designs could be limited. Therefore, in this work we propose a methodology to incorporate sampling weights in the estimation process of the optimal cut-off points, and we evaluate its performance using a real data-based simulation study. The results suggest the convenience of considering sampling weights for estimating optimal cut-off points.
Keywords: Optimal cut-off points, complex survey data, sampling weights
Pages: 137– 158
DOI:10.2436/20.8080.02.121
- PDF

Volume 45 (2), July-December 2021

Nonparametric estimation of the probability of default with double smoothing

Rebeca Peláez , Ricardo Cao and Juan M. Vilar
Abstract: In this paper, a general nonparametric estimator of the probability of default is proposed and studied. It is derived from an estimator of the conditional survival function for censored data obtained with a double smoothing, on the covariate and on the variable of interest. An empirical study, based on modified real data, illustrates its practical application and a simulation study shows the performance of the proposed estimator and compares its behaviour with smoothed estimators only in the covariate. Asymptotic expressions for the bias and the variance of the probability of default estimator are found and asymptotic normality is proved.
Keywords: Censored data, kernel method, probability of default, risk analysis, survival analysis
Pages: 93–120
DOI:10.2436/20.8080.02.111
- PDF
Modified almost unbiased two-parameter estimator for the Poisson regression model with an application to accident data

Mustafa I. Alheety , Muhammad Qasim , Kristofer Månsson and B. M. Golam Kibria
Abstract: Due to the large amount of accidents negatively affecting the wellbeing of the survivors and their families, a substantial amount of research is conducted to determine the causes of road accidents. This type of data come in the form of non-negative integers and may be modelled using the Poisson regression model. Unfortunately, the commonly used maximum likelihood estimator is unstable when the explanatory variables of the Poisson regression model are highly correlated. Therefore, this paper proposes a new almost unbiased estimator which reduces the instability of the maximum likelihood estimator and at the same time produce smaller mean squared error. We study the statistical properties of the proposed estimator and a simulation study has been conducted to compare the performance of the estimators in the smaller mean squared error sense. Finally, Swedish traffic fatality data are analyzed to show the benefit of the proposed method.
Keywords: Applied traffic modeling, Maximum likelihood estimator, mean squared error matrix, Poisson regression, simulation study, traffic fatality
Pages: 121–142
DOI:10.2436/20.8080.02.112
- PDF
- ZIP
Bayesian hierarchical nonlinear modelling of intra-abdominal volume during pneumoperitoneum for laparoscopic surgery

Gabriel Calvo , Carmen Armero , Virgilio Gómez-Rubio and Guido Mazzinari
Abstract: Laparoscopy is an operation carried out in the abdomen through small incisions with visual control by a camera. This technique needs the abdomen to be insufflated with carbon dioxide to obtain a working space for surgical instruments’ manipulation. Identifying the critical point at which insufflation should be limited is crucial to maximizing surgical working space and minimizing injurious effects. A Bayesian nonlinear growth mixed-effects model for the relationship between the insufflation pressure and the intra–abdominal volume generated is discussed as well as its plausibility to represent the data.
Keywords: Intra-abdominal pressure, logistic growth function, Markov chain, Monte Carlo methods, random effects
Pages: 143–162
DOI:10.2436/20.8080.02.113
- PDF
Median bilinear models in presence of extreme values

Miguel Santolino
Abstract: Bilinear regression models involving a nonlinear interaction term are applied in many fields (e.g., Goodman’s RC model, Lee-Carter mortality model or CAPM financial model). In many of these contexts data often exhibit extreme values. We propose the use of bilinear models to estimate the median of the conditional distribution in the presence of extreme values. The aim of this paper is to provide alternative methods to estimate median bilinear models. A calibration strategy based on an iterative estimation process of a sequence of median linear regression is developed. Mean and median bilinear models are compared in two applications with extreme observations. The first application deals with simulated data. The second application refers to Spanish mortality data involving years with atypical high mortality (Spanish flu, civil war and HIV/AIDS). The performance of the median bilinear model was superior to that of the mean bilinear model. Median bilinear models may be a good alternative to mean bilinear models in the presence of extreme values when the centre of the conditional distribution is of interest.
Keywords: Outliers, quantile regression, single factor models, nonlinear, multiplicative
Pages: 163–180
DOI:10.2436/20.8080.02.114
- PDF
Exponentiated power Maxwell distribution with quantile regression and applications

Francisco A. Segovia , Yolanda M. Gómez and Diego I. Gallardo
Abstract: In this paper we introduce an extension of the power Maxwell distribution. We also discuss a reparametrized version of this model applied to quantile regression. Some properties of the model and estimation based on the maximum likelihood estimation method are studied. We also present a simulation study to assess the performance of estimators in such finite samples, and two applications to real data sets to illustrate the model. .
Keywords: Maxwell distribution, exponentiated distributions, maximum likelihood, quantile regression
Pages: 181–200
DOI:10.2436/20.8080.02.115
- PDF
- ZIP

Volume 45 (1), January-June 2021

The radiant diagrams of Florence Nightingale (invited article)

Michael Friendly and RJ Andrews
Abstract: This article is a tribute to the contributions of Florence Nightingale to statistics and statistical graphics on her bicentennial. We start with her most famous “rose” diagram and describe how she came to this graphic, designed to influence medical practice in the British army. But this study takes us backward in time to consider where and when the ideas of radial diagrams arose, why they were useful, and why we call these her “radiant diagrams.”
Keywords: Data visualization, polar area diagram, radial diagram, nursing, sanitation
Pages: 3–18
DOI: 10.2436/20.8080.02.106
- PDF
Verifying compliance with ballast water standards: a decision-theoretic approach

Eliardo G. Costa , Carlos Daniel Paulino and Julio M. Singer
Abstract: We construct credible intervals to estimate the mean organism (zooplankton and phytoplankton) concentration in ballast water via a decision-theoretic approach. To obtain the required optimal sample size, we use a total cost minimization criterion defined as the sum of the sampling cost and the Bayes risk either under a Poisson or a negative binomial model for organism counts, both with a gamma prior distribution. Such credible intervals may be employed to verify whether the ballast water discharged from a ship is in compliance with international standards. We also conduct a simulation study to evaluate the credible interval lengths associated with the proposed optimal sample sizes.
Keywords: Optimal sample size, Bayes risk, Poisson distribution, negative binomial distribution
Pages: 19–32
DOI: 10.2436/20.8080.02.107
- PDF
- ZIP
Bayesian classification for dating archaeological sites via projectile points

Carmen Armero , Gonzalo García-Donato , Joaquín Jimenez-Puerto , Salvador Pardo-Gordó and Joan Bernabeu
Abstract: Dating is a key element for archaeologists. We propose a Bayesian approach to provide chronology to sites that have neither radiocarbon dating nor clear stratigraphy and whose only information comes from lithic arrowheads. This classifier is based on the Dirichlet-multinomial inferential process and posterior predictive distributions. The procedure is applied to predict the period of a set of undated sites located in the east of the Iberian Peninsula during the 4th and 3rd millennium cal BC.
Keywords: Bifacial flint arrowheads, chronological model, Dirichlet-multinomial process, posterior predictive distribution, radiocarbon dating
Pages: 33–46
DOI: 10.2436/20.8080.02.108
- PDF
Joint outlier detection and variable selection using discrete optimization

Mahdi Jammal , Stephane Canu and Maher Abdallah
Abstract: In regression, the quality of estimators is known to be very sensitive to the presence of spurious variables and outliers. Unfortunately, this is a frequent situation when dealing with real data. To handle outlier proneness and achieve variable selection, we propose a robust method performing the outright rejection of discordant observations together with the selection of relevant variables. A natural way to define the corresponding optimization problem is to use the ℓ0 norm and recast it as a mixed integer optimization problem. To retrieve this global solution more efficiently, we suggest the use of additional constraints as well as a clever initialization. To this end, an efficient and scalable non-convex proximal alternate algorithm is introduced. An empirical comparison between the ℓ0 norm approach and its ℓ1 relaxation is presented as well. Results on both synthetic and real data sets provided that the mixed integer programming approach and its discrete first order warm start provide high quality solutions.
Keywords: Robust optimization, statistical learning, linear regression, variable selection, outlier detection, mixed integer programming
Pages: 47–66
DOI:10.2436/20.8080.02.109
- PDF
The unilateral spatial autogressive process for the regular lattice two-dimensional spatial discrete data

Azmi Chutoo, Dimitris Karlis , Naushad Mamode Khan and Vandna Jowaheer
Abstract: This paper proposes a generalized framework to analyze spatial count data under a unilateral regular lattice structure based on thinning type models. We start from the simple spatial integer-valued auto-regressive model of order 1. We extend this model in certain directions. First, we consider various distributions as choices for the innovation distribution to allow for additional overdispersion. Second, we allow for use of covariate information, leading to a non-stationary model. Finally, we derive and use other models related to this simple one by considering simplification on the existing model. Inference is based on conditional maximum likelihood approach. We provide simulation results under different scenarios to understand the behaviour of the conditional maximum likelihood. A real data application is also provided. Remarks on how the results extend to other families of models are also given.
Keywords: Unilateral, spatial, regular, lattice, thinning
Pages: 67–90
DOI: 10.2436/20.8080.02.110
- PDF

Volume 44 (2), July-December 2020

Independent increments in group sequential tests: a review (invited article)

KyungMann Kim and Anastasios A. Tsiatis
Abstract: In order to apply group sequential methods for interim analysis for early stopping in clinical trials, the joint distribution of test statistics over time has to be known. Often the distribution is multivariate normal or asymptotically so, and an application of group sequential methods requires multivariate integration to determine the group sequential boundaries. However, if the increments between successive test statistics are independent, the multivariate integration reduces to a univariate integration involving simple recursion based on convolution. This allows application of standard group sequential methods. In this paper we review group sequential methods and the development that established independent increments in test statistics for the primary outcomes of longitudinal or failure time data.
Keywords: Failure time data, interim analysis, longitudinal data, clinical trials, repeated significance tests, sequential methods
Pages: 223–264
DOI: 10.2436/20.8080.02.101
- PDF
Discrete generalized half-normal distribution and its applications in quantile regression

Diego I. Gallardo , Emilio Gómez-Déniz and Héctor W. Gómez
Abstract: A new discrete two-parameter distribution is introduced by discretizing a generalized half-normal distribution. The model is useful for fitting overdispersed as well as underdispersed data. The failure function can be decreasing, bathtub shaped or increasing. A reparameterization of the distribution is introduced for use in a regression model based on the median. The behaviour of the maximum likelihood estimates is studied numerically, showing good performance in finite samples. Three real data set applications reveal that the new model can provide a better explanation than some other competitors.
Keywords: Discretizing, generalized half-normal distribution, failure function, health, quantile regression, stochastic order
Pages: 265–284
DOI: 10.2436/20.8080.02.102
- PDF
A simheuristic algorithm for time-dependent waste collection management with stochastic travel times

Aljoscha Gruler , Antoni Perez-Navarro , Laura Calvet and Angel A. Juan
Abstract: A major operational task in city logistics is related to waste collection. Due to large problem sizes and numerous constraints, the optimization of real-life waste collection problems on a daily basis requires the use of metaheuristic solving frameworks to generate near-optimal collection routes in low computation times. This paper presents a simheuristic algorithm for the time-dependent waste collection problem with stochastic travel times. By combining Monte Carlo simulation with a biased randomized iterated local search metaheuristic, time-varying and stochastic travel speeds between different network nodes are accounted for. The algorithm is tested using real instances in a medium-sized city in Spain.
Keywords: Waste collection management, vehicle routing problem, stochastic optimization, simheuristics, biased randomization, case study
Pages: 285–310
DOI: 10.2436/20.8080.02.103
- PDF
Why simheuristics? Benefits, limitations, and best practices when combining metaheuristics with simulation

Manuel Chica , Angel A. Juan , Christopher Bayliss , Oscar Cordón and W. David Kelton
Abstract: Many decision-making processes in our society involve NP-hard optimization problems. The largescale, dynamism, and uncertainty of these problems constrain the potential use of stand-alone optimization methods. The same applies for isolated simulation models, which do not have the potential to find optimal solutions in a combinatorial environment. This paper discusses the utilization of modelling and solving approaches based on the integration of simulation with metaheuristics. These ‘simheuristic’ algorithms, which constitute a natural extension of both metaheuristics and simulation techniques, should be used as a ‘first-resort’ method when addressing large-scale and NP-hard optimization problems under uncertainty –which is a frequent case in real-life applications. We outline the benefits and limitations of simheuristic algorithms, provide numerical experiments that validate our arguments, review some recent publications, and outline the best practices to consider during their design and implementation stages.
Keywords: Simulation, metaheuristics, combinatorial optimization, simheuristics
Pages: 311–334
DOI: 10.2436/20.8080.02.104
- PDF
Modelling multivariate, overdispersed count data with correlated and non-normal heterogeneity effects

Iraj Kazemi and Fatemeh Hassanzadeh
Abstract: Mixed Poisson models are most relevant to the analysis of longitudinal count data in various disciplines. A conventional specification of such models relies on the normality of unobserved heterogeneity effects. In practice, such an assumption may be invalid, and non-normal cases are appealing. In this paper, we propose a modelling strategy by allowing the vector of effects to follow the multivariate skew-normal distribution. It can produce dependence between the correlated longitudinal counts by imposing several structures of mixing priors. In a Bayesian setting, the estimation process proceeds by sampling variants from the posterior distributions. We highlight the usefulness of our approach by conducting a simulation study and analysing two real-life data sets taken from the German Socioeconomic Panel and the US Centers for Disease Control and Prevention. By a comparative study, we indicate that the new approach can produce more reliable results compared to traditional mixed models to fit correlated count data.
Keywords: Bayesian computation, correlated random effects, hierarchical representation, longitudinal data, multivariate skew-normal distribution, over-dispersion
Pages: 335–356
DOI: 10.2436/20.8080.02.105
- PDF
- ZIP

Volume 44 (1), January-June 2020

Small area estimation of additive parameters under unit-level generalized linear mixed models

Tomáš Hobza , Yolanda Marhuenda and Domingo Morales
Abstract: Average incomes and poverty proportions are additive parameters obtained as averages of a given function of an income variable. As the variable income has an asymmetric distribution, it is not properly modelled via normal distributions. When dealing with this type of variable, a first option is to apply transformations that approximate normality. A second option is to use nonsymmetric distributions from the exponential family. This paper proposes unit-level generalized linear mixed models for modelling asymmetric positive variables and for deriving three types of predictors of small area additive parameters, called empirical best, marginal and plug-in. The parameters of the introduced model are estimated by applying the maximum likelihood method to the Laplace approximation of the likelihood. The mean squared errors of the predictors are estimated by parametric bootstrap. The introduced methodology is applied and illustrated under unit-level gamma mixed models. Some simulation experiments are carried out to study the behaviour of the fitting algorithm, the small area predictors and the bootstrap estimator of the mean squared errors. By using data of the Spanish living condition survey of 2013, an application to the estimation of average incomes and poverty proportions in counties of the region of Valencia is given.
Keywords: Average income, poverty proportion, generalized linear mixed models, empirical best predictor, mean squared error, bootstrap
Pages: 3–38
DOI: 10.2436/20.8080.02.93
- PDF
Finding archetypal patterns for binary questionnaires

Ismael Cabero and Irene Epifanio
Abstract: Archetypal analysis is an exploratory tool that explains a set of observations as mixtures of pure (extreme) patterns. If the patterns are actual observations of the sample, we refer to them as archetypoids. For the first time, we propose to use archetypoid analysis for binary observations. This tool can contribute to the understanding of a binary data set, as in the multivariate case. We illustrate the advantages of the proposed methodology in a simulation study and two applications, one exploring objects (rows) and the other exploring items (columns). One is related to determining student skill set profiles and the other to describing item response functions.
Keywords: Dichotomous item test, archetypal analysis, functional data analysis, item response theory, skill profile
Pages: 39–66
DOI: 10.2436/20.8080.02.94
- PDF
- ZIP
Integer constraints for enhancing interpretability in linear regression

Emilio Carrizosa , Alba V. Olivares-Nadal and Pepa Ramírez-Cobo
Abstract: One of the main challenges researchers face is to identify the most relevant features in a prediction model. As a consequence, many regularized methods seeking sparsity have flourished. Although sparse, their solutions may not be interpretable in the presence of spurious coefficients and correlated features. In this paper we aim to enhance interpretability in linear regression in presence of multicollinearity by: (i) forcing the sign of the estimated coefficients to be consistent with the sign of the correlations between predictors, and (ii) avoiding spurious coefficients so that only significant features are represented in the model. This will be addressed by modelling constraints and adding them to an optimization problem expressing some estimation procedure such as ordinary least squares or the lasso. The so-obtained constrained regression models will become Mixed Integer Quadratic Problems. The numerical experiments carried out on real and simulated datasets show that tightening the search space of some standard linear regression models by adding the constraints modelling (i) and/or (ii) help to improve the sparsity and interpretability of the solutions with competitive predictive quality.
Keywords: Linear regression, Multicollinearity, Sparsity, Cardinality constraint, Mixed Integer Non Linear Programming
Pages: 67–98
DOI: 10.2436/20.8080.02.95
- PDF
Modelling count data using the logratio-normal-multinomial distribution

Marc Comas-Cufí , Josep Antoni Martín-Fernández , Glòria Mateu-Figueras and Javier Palarea-Albaladejo
Abstract: The logratio-normal-multinomial distribution is a count data model resulting from compounding a multinomial distribution for the counts with a multivariate logratio-normal distribution for the multinomial event probabilities. However, the logratio-normal-multinomial probability mass function does not admit a closed form expression and, consequently, numerical approximation is required for parameter estimation. In this work, different estimation approaches are introduced and evaluated. We concluded that estimation based on a quasi-Monte Carlo Expectation-Maximisation algorithm provides the best overall results. Building on this, the performances of the Dirichlet-multinomial and logratio-normal-multinomial models are compared through a number of examples using simulated and real count data.
Keywords: Count data, Compound probability distribution, Dirichlet Multinomial, Logratio coordinates, Monte Carlo method, Simplex
Pages: 99–126
DOI: 10.2436/20.8080.02.96
- PDF
Bartlett and Bartlett-type corrections for censored data from a Weibull distribution

Tiago M. Magalhães and Diego I. Gallardo
Abstract: In this paper, we obtain the Bartlett factor for the likelihood ratio statistic and the Bartlett-type correction factor for the score and gradient test in censored data from a Weibull distribution. The expressions derived are simple, we only have to define a few matrices. We conduct an extensive Monte Carlo study to evaluate the performance of the corrected tests in small sample sizes and we show how they improve the original versions. Finally, we apply the results to a real data set with a small sample size illustrating that conclusions about the regressors could be different if corrections were not applied to the three mentioned classical statistics for the hypothesis test.
Keywords: Bartlett correction, censored data, Weibull distribution, chi-squared distribution, maximum likelihood estimates, type I and II censoring
Pages: 127–140
DOI: 10.2436/20.8080.02.97
- PDF
- ZIP
Green hybrid fleets using electric vehicles: solving the heterogeneous vehicle routing problem with multiple driving ranges and loading capacities

Sara Hatami , Majid Eskandarpour , Manuel Chica , Angel A. Juan and Djamila Ouelhadj
Abstract: The introduction of Electric Vehicles (EVs) in modern fleets facilitates green road transportation. However, the driving ranges of EVs are limited by the duration of their batteries, which arise new operational challenges. Hybrid fleets of gas and EVs might be heterogeneous both in loading capacities as well as in driving-range capabilities,whichmakes the design of efficient routing plans a difficult task. In this paper, we propose a newMulti-Round IteratedGreedy (MRIG) metaheuristic to solve the Heterogeneous Vehicle Routing Problem with Multiple Driving ranges and loading capacities (HeVRPMD). MRIG uses a successive approximations method to offer the decision maker a set of alternative fleet configurations,with different distance-based costs and green levels. The numerical experiments show that MRIG is able to outperform previous works dealing with the homogeneous version of the problem, which assumes the same loading capacity for all vehicles in the fleet. The numerical experiments also confirm that the proposed MRIG approach extends previous works by solving a more realistic HeVRPMD and provides the decision-maker with fleets with higher green levels.
Keywords: Vehicle Routing Problem, Electric Vehicles, Heterogeneous Fleet, Multiple Driving Ranges. Iterated Greedy heuristic, Successive Approximations Method
Pages: 141–170
DOI: 10.2436/20.8080.02.98
- PDF
Bayesian structured antedependence model proposals for longitudinal data

Edwin Castillo-Carreno, Edilberto Cepeda-Cuervo and Vicente Núñez-Antón
Abstract: An important problem in Statistics is the study of longitudinal data taking into account the effect of other explanatory variables, such as treatments and time and, simultaneously, the incorporation into the model of the time dependence between observations on the same individual. The latter is specially relevant in the case of nonstationary correlations, and nonconstant variances for the different time point at which measurements are taken. Antedependence models constitute a well known commonly used set of models that can accommodate this behaviour. These covariance models can include too many parameters and estimation can be a complicated optimization problem requiring the use of complex algorithms and programming. In this paper, a new Bayesian approach to analyse longitudinal data within the context of antedependence models is proposed. This innovative approach takes into account the possibility of having nonstationary correlations and variances, and proposes a robust and computationally efficient estimation method for this type of data. We consider the joint modelling of the mean and covariance structures for the general antedependence model, estimating their parameters in a longitudinal data context. Our Bayesian approach is based on a generalization of the Gibbs sampling and Metropolis-Hastings by blocks algorithm, properly adapted to the antedependence models longitudinal data settings. Finally, we illustrate the proposed methodology by analysing several examples where antedependence models have been shown to be useful: the small mice, the speech recognition and the race data sets.
Keywords: Antedependence models, Bayesian methods, Gibbs sampling, Mean-covariance modelling, Nonstationary correlation
Pages: 171–200
DOI: 10.2436/20.8080.02.99
- PDF
- ZIP
On interpretations of tests and effect sizes in regression models with a compositional predictor

Germà Coenders and Vera Pawlowsky-Glahn
Abstract: Compositional data analysis is concerned with the relative importance of positive variables, expressed through their log-ratios. The literature has proposed a range of manners to compute log-ratios, some of whose interrelationships have never been reported when used as explanatory variables in regression models. This article shows their similarities and differences in interpretation based on the notion that one log-ratio has to be interpreted keeping all others constant. The article shows that centred, additive, pivot, balance and pairwise log-ratios lead to simple reparametrizations of the same model which can be combined to provide useful tests and comparable effect size estimates.
Keywords: Compositional regression models, CoDa, composition as explanatory, centred log-ratios, pivot coordinates, pairwise log-ratios, additive log-ratios, effect size
Pages: 201–220
DOI: 10.2436/20.8080.02.100
- PDF

Volume 43 (2), July-December 2019

Modelling human network behaviour using simulation and optimization tools: the need for hybridization

Aljoscha Gruler , Jesica de Armas , Angel A. Juan and David Goldsman
Abstract: The inclusion of stakeholder behaviour in Operations Research / Industrial Engineering (OR/IE) models has gained much attention in recent years. Behavioural and cognitive traits of people and groups have been integrated in simulation models (mainly through agent-based approaches) as well as in optimization algorithms. However, especially the influence of relations between different actors in human networks is a broad and interdisciplinary topic that has not yet been fully investigated. This paper analyses, from an OR/IE point of view, the existing literature on behaviour-related factors in human networks. This review covers different application fields, including: supply chain management, public policies in emergency situations, and Internet-based human networks. The review reveals that the methodological approach of choice (either simulation or optimization) is highly dependent on the application area. However, an integrated approach combining simulation and optimization is rarely used. Thus, the paper proposes the hybridization of simulation with optimization as one of the best strategies to incorporate human behaviour in human networks and the resulting uncertainty, randomness, and dynamism in related OR/IE models.
Keywords: Modelling human behaviour, human networks, simulation, optimization, simheuristics
Pages: 193–222
DOI: 10.2436/20.8080.02.85
- PDF
Tail risk measures using flexible parametric distributions

José María Sarabia , Montserrat Guillen , Helena Chuliá and Faustino Prieto
Abstract: We propose a new type of risk measure for non-negative random variables that focuses on the tail of the distribution. The measure is inspired in general parametric distributions that are well-known in the statistical analysis of the size of income. We derive simple expressions for the conditional moments of these distributions, and we show that they are suitable for analysis of tail risk. The proposed method can easily be implemented in practice because it provides a simple one-step way to compute value-at-risk and tail value-at-risk. We show an illustration with currency exchange data. The data and implementation are open access for reproducibility.
Keywords: Moments, multi-period risk assessment, value-at-risk
Pages: 223–236
DOI: 10.2436/20.8080.02.86
- PDF
False discovery rate control for grouped or discretely supported p-values with application to a neuroimaging study

Hien Nguyen , Yohan Yee, Geoffrey McLachlan and Jason Lerch
Abstract: False discovery rate (FDR) control is important in multiple testing scenarios that are common in neuroimaging experiments, and p-values from such experiments may often arise from some discretely supported distribution or may be grouped in some way. Two situations that may lead to discretely supported distributions are when the p-values arise from Monte Carlo or permutation tests are used. Grouped p-values may occur when p-values are quantized for storage. In the neuroimaging context, grouped p-values may occur when data are stored in an integer-encoded form. We present a method for FDR control that is applicable in cases where only p-values are available for inference, and when those p-values are discretely supported or grouped. We assess our method via a comprehensive set of simulation scenarios and find that our method can outperform commonly used FDR control schemes in various cases. An implementation to a mouse imaging data set is used as an example to demonstrate the applicability of our approach.
Keywords: Censored data, data quantization, discrete support, empirical-Bayes, false discovery rate control, grouped data, incompletely observed data, mixture model
Pages: 237–258
DOI: 10.2436/20.8080.02.87
- PDF
- ZIP
Kernel distribution estimation for grouped data

Miguel Reyes , Mario Francisco-Fernández , Ricardo Cao and Daniel Barreiro-Ures
Abstract: Interval-grouped data appear when the observations are not obtained in continuous time, but monitored in periodical time instants. In this framework, a nonparametric kernel distribution estimator is proposed and studied. The asymptotic bias, variance and mean integrated squared error of the new approach are derived. From the asymptotic mean integrated squared error, a plug-in bandwidth is proposed. Additionally, a bootstrap selector to be used in this context is designed. Through a comprehensive simulation study, the behaviour of the estimator and the bandwidth selectors considering different scenarios of data grouping is shown. The performance of the different approaches is also illustrated with a real grouped emergence data set of Avena sterilis (wild oat).
Keywords: Bootstrap bandwidth, cumulative distribution function estimator, interval data, plug-in bandwidth
Pages: 259–288
DOI: 10.2436/20.8080.02.88
- PDF
- ZIP
Detecting outliers in multivariate volatility models: A wavelet procedure

Aurea Grané , Belén Martín-Barragán and Helena Veiga
Abstract: It is well known that outliers can affect both the estimation of parameters and volatilities when fitting a univariate GARCH-type model. Similar biases and impacts are expected to be found on correlation dynamics in the context of multivariate time series. We study the impact of outliers on the estimation of correlations when fitting multivariate GARCH models and propose a general detection algorithm based on wavelets, that can be applied to a large class of multivariate volatility models. Its effectiveness is evaluated through a Monte Carlo study before it is applied to real data. The method is both effective and reliable, since it detects very few false outliers.
Keywords: Correlations, multivariate GARCH models, outliers, wavelets.
Pages: 289–316
DOI: 10.2436/20.8080.02.89
- PDF
A class of goodness-of-fit tests for circular distributions based on trigonometric moments

Sreenivasa Rao Jammalamadaka, M. Dolores Jiménez-Gamero and Simos G. Meintanis
Abstract: We propose a class of goodness–of–fit test procedures for arbitrary parametric families of circular distributions with unknown parameters. The tests make use of the specific form of the characteristic function of the family being tested, and are shown to be consistent. We derive the asymptotic null distribution and suggest that the new method be implemented using a bootstrap resampling technique that approximates this distribution consistently. As an illustration, we then specialize this method to testing whether a given data set is from the von Mises distribution, a model that is commonly used and for which considerable theory has been developed. An extensive Monte Carlo study is carried out to compare the new tests with other existing omnibus tests for this model. An application involving five real data sets is provided in order to illustrate the new procedure.
Keywords: Goodness-of-fit, Circular data, Empirical characteristic function, Maximum likelihood estimation, von Mises distribution.
Pages: 317–336
DOI: 10.2436/20.8080.02.90
- PDF
- ZIP
Data envelopment analysis efficiency of public services: bootstrap simultaneous confidence region

Jesús A. Tapia , Bonifacio Salvador and Jesús M. Rodríguez
Abstract: Public services, such as higher education, medical services, libraries or public administration offices, provide services to their customers. To obtain opinion-satisfaction indices of customers, it would be necessary to survey all the customers of the service (census), which is impossible. What is possible is to estimate the indices by surveying a random customer sample. The efficiency obtained with the classic data envelopment analysis models, considering the opinion indices of the customers of the public service as output data estimated with a user sample, will be an estimation of the obtained efficiency if the census is available. This paper proposes a bootstrap methodology to build a confidence region to simultaneously estimate the population data envelopment analysis efficiency score vector of a set of public service-producing units, with a fixed confidence level and using deterministic input data and estimated customer opinion indices as output data. The usefulness of the result is illustrated by describing a case study comparing the efficiency of libraries.
Keywords: Data envelopment analysis, sampling survey research, public sector, bootstrap, simultaneous confidence region.
Pages: 337–354
DOI: 10.2436/20.8080.02.91
- PDF
Forecasting with two generalized integer-valued autoregressive processes of order one in the mutual random environment

Predrag M. Popović , Petra N. Laketa and Aleksandar S. Nastić
Abstract: In this article, we consider two univariate random environment integer-valued autoregressive processes driven by the same hidden process. A model of this kind is capable of describing two correlated non-stationary counting time series using its marginal variable parameter values. The properties of the model are presented. Some parameter estimators are described and implemented on the simulated time series. The introduction of this bivariate integer-valued autoregressive model with a random environment is justified at the end of the paper, where its real-life data-fitting performance was checked and compared to some other appropriate models. The forecasting properties of the model are tested on a few data sets, and forecasting errors are discussed through the residual analysis of the components that comprise the model.
Keywords: INAR, negative binomial thinning, random states, time series of count, non-stationary process.
Pages: 355–384
DOI: 10.2436/20.8080.02.92
- PDF

Volume 43 (1), January-June 2019

A simheuristic for routing electric vehicles with limited driving ranges and stochastic travel times

Lorena Reyes-Rubiano , Daniele Ferone , Angel A. Juan and Javier Faulin
Abstract: Green transportation is becoming relevant in the context of smart cities, where the use of electric vehicles represents a promising strategy to support sustainability policies. However the use of electric vehicles shows some drawbacks as well, such as their limited driving-range capacity. This paper analyses a realistic vehicle routing problem in which both driving-range constraints and stochastic travel times are considered. Thus, the main goal is to minimize the expected time-based cost required to complete the freight distribution plan. In order to design reliable Routing plans, a simheuristic algorithm is proposed. It combines Monte Carlo simulation with a multi-start metaheuristic, which also employs biased-randomization techniques. By including simulation, simheuristics extend the capabilities of metaheuristics to deal with stochastic problems. A series of computational experiments are performed to test our solving approach as well as to analyse the effect of uncertainty on the routing plans.
Keywords: Vehicle routing problem, electric vehicles, green transport and logistics, smart cities, simheuristics, biased-randomized heuristics
Pages: 3–24
DOI: 10.2436/20.8080.02.77
- PDF
New L²-type exponentiality tests

Marija Cuparić, Bojana Milosević and Marko Obradović
Abstract: We introduce new consistent and scale-free goodness-of-fit tests for the exponential distribution based on the Puri-Rubin characterization. For the construction of test statistics we employ weighted L² distance between V-empirical Laplace transforms of random variables that appear in the characterization. We derive the asymptotic behaviour under the null hypothesis as well as under fixed alternatives. We compare our tests, in terms of the Bahadur efficiency, to the likelihood ratio test, as well as some recent characterization based goodness-of-fit tests for the exponential distribution. We also compare the power of our tests to the power of some recent and classical exponentiality tests. According to both criteria, our tests are shown to be strong and outperform most of their competitors.
Keywords: Goodness-of-fit, exponential distribution, Laplace transform, Bahadur efficiency, V-statistics with estimated parameters
Pages: 25–50
DOI: 10.2436/20.8080.02.78
- PDF
Bayesian joint spatio-temporal analysis of multiple diseases

Virgilio Gómez-Rubio, Francisco Palmí-Perales, Gonzalo López-Abente, Rebeca Ramis-Prieto and Pablo Fernández-Navarro
Abstract: In this paper we propose a Bayesian hierarchical spatio-temporal model for the joint analysis of multiple diseases which includes specific and shared spatial and temporal effects. Dependence on shared terms is controlled by disease-specific weights so that their posterior distribution can be used to identify diseases with similar spatial and temporal patterns. The model proposed here has been used to study three different causes of death (oral cavity, esophagus and stomach cancer) in Spain at the province level. Shared and specific spatial and temporal effects have been estimated and mapped in order to study similarities and differences among these causes. Furthermore, estimates using Markov chain Monte Carlo and the integrated nested Laplace approximation are compared.
Keywords: Bayesian modelling, Joint modelling, Multivariate disease mapping, Shared components. Spatio-temporal epidemiology
Pages: 51–74
DOI: 10.2436/20.8080.02.79
- PDF
Internalizing negative externalities in vehicle routing problems through green taxes and green tolls

Adrián Serrano-Hernández and Javier Faulín
Abstract: Road freight transportation includes various internal and external costs that need to be accounted for in the construction of efficient routing plans. Typically, the resulting optimization problem is formulated as a vehicle routing problem in any of its variants. While the traditional focus of the vehicle routing problem was the minimization of internal routing costs such as travel distance or duration, numerous approaches to include external factors related to environmental routing aspects have been recently discussed in the literature. However, internal and external routing costs are often treated as competing objectives. This paper discusses the internalization of external routing costs through the consideration of green taxes and green tolls. Numeric experiments with a biased-randomization savings algorithm, show benefits of combining internal and external costs in delivery route planning.
Keywords: Vehicle routing problem, biased randomization, green logistics, negative road externalities, internalization
Pages: 75–94
DOI: 10.2436/20.8080.02.80
- PDF
A probabilistic model for explaining the points achieved by a team in football competition. Forecasting and regression with applications to the Spanish competition

Emilio Gómez-Déniz, Nancy Dávila Cárdenes and José María Pérez Sánchez
Abstract: In the last decades, a lot of research papers applying statistical methods for analysing sports data have been published. Football, also called soccer, is one of the most popular sports all over the world organised in national championships in a round robin format in which the team reaching the most points at the end of the tournament wins the competition. The aim of this work is to develop a suitable probability model for studying the points achieved by a team in a football match. For this purpose, we built a discrete probability distribution taking values, zero for losing, one for a draw and three for a victory. We test its performance using data from the Spanish Football League (First division) during the 2013-14 season. Furthermore, the model provides an attractive framework for predicting points and incorporating covariates in order to study the factors affecting the points achieved by the teams.
Keywords: Covariate, football data, forecasting, regression, sport statistics, truncated distribution, weighted distribution
Pages: 95–112
DOI: 10.2436/20.8080.02.81
- PDF
Automatic regrouping of strata in the goodness-of-fit chi-square test

Vicente Núñez-Antón, Juan Manuel Pérez-Salamero González, Marta Regúlez-Castillo, Manuel Ventura-Marco and Carlos Vidal-Meliá
Abstract: Pearson’s chi-square test is widely employed in social and health sciences to analyse categorical data and contingency tables. For the test to be valid, the sample size must be large enough to provide a minimum number of expected elements per category. This paper develops functions for regrouping strata automatically, thus enabling the goodness-of-fit test to be performed within an iterative procedure. The usefulness and performance of these functions is illustrated by means of a simulation study and the application to different datasets. Finally, the iterative use of the functions is applied to the Continuous Sample of Working Lives, a dataset that has been used in a considerable number of studies, especially on labour economics and the Spanish public pension system.
Keywords: Goodness-of-fit chi-square test, statistical software, Visual Basic for Applications, Mathematica, Continuous Sample of Working Lives
Pages: 113–142
DOI: 10.2436/20.8080.02.83
- PDF
- ZIP
On the optimism correction of the area under the receiver operating characteristic curve in logistic prediction models

Amaia Iparragirre, Irantzu Barrio and María Xosé Rodríguez-Álvarez
Abstract: When the same data are used to fit a model and estimate its predictive performance, this estimate may be optimistic, and its correction is required. The aim of this work is to compare the behaviour of different methods proposed in the literature when correcting for the optimism of the estimated area under the receiver operating characteristic curve in logistic regression models. A simulation study (where the theoretical model is known) is conducted considering different number of covariates, sample size, prevalence and correlation among covariates. The results suggest the use of k-fold cross-validation with replication and bootstrap.
Keywords: Prediction models, logistic regression, area under the receiver operating characteristic curve, validation, bootstrap
Pages: 145–162

DOI: 10.2436/20.8080.02.82
- PDF
Efficient algorithms for constructing D- and I-optimal exact designs for linear and non-linear models in mixture experiments

Raúl Martín Martín, Irene García-Camacha Gutiérrez and Bernard Torsney
Abstract: The problem of finding optimal exact designs is more challenging than that of approximate optimal designs. In the present paper, we develop two efficient algorithms to numerically construct exact designs for mixture experiments. The first is a novel approach to the well-known multiplicative algorithm based on sets of permutation points, while the second uses genetic algorithms. Using (i) linear and non-linear models, (ii) D- and I-optimality criteria, and (iii) constraints on the ingredients, both approaches are explored through several practical problems arising in the chemical, pharmaceutical and oil industry.
Keywords: Optimal experimental design, D-optimality, I-optimality, mixture experiments, multiplicative algorithm, genetic algorithm, exact designs
Pages: 163–190
DOI: 10.2436/20.8080.02.84
- PDF
- ZIP

Volume 42 (2), July-December 2018

Evidence functions: a compositional approach to information (invited article)

Juan-José Egozcue and Vera Pawlowsky-Glahn
Abstract: The discrete case of Bayes’ formula is considered the paradigm of information acquisition. Prior and posterior probability functions, as well as likelihood functions, called evidence functions, are compositions following the Aitchison geometry of the simplex, and have thus vector character. Bayes’ formula becomes a vector addition. The Aitchison norm of an evidence function is introduced as a scalar measurement of information. A fictitious fire scenario serves as illustration. Two different inspections of affected houses are considered. Two questions are addressed: (a) which is the information provided by the outcomes of inspections, and (b) which is the most informative inspection.
Keywords: Evidence function, Bayes’ formula, Aitchison geometry, compositions, orthonormal basis, simplex, scalar information
Pages: 101–124
DOI: 10.2436/20.8080.02.71
- PDF
A contingency table approach based on nearest neighbour relations for testing self and mixed correspondence

Elvan Ceyhan
Abstract: Nearest neighbour methods are employed for drawing inferences about spatial patterns of points from two or more classes. We introduce a new pattern called correspondence which is motivated by (spatial) niche/habitat specificity and segregation, and define an associated contingency table called a correspondence contingency table, and examine the relation of correspondence with the motivating patterns (namely, segregation and niche specificity). We propose tests based on the correspondence contingency table for testing self and mixed correspondence and determine the appropriate null hypotheses and the underlying conditions appropriate for these tests. We compare finite sample performance of the tests in terms of empirical size and power by extensive Monte Carlo simulations and illustrate the methods on two artificial data sets and one real-life ecological data set.
Keywords: Association, complete spatial randomness, habitat/niche specificity, independence, random labelling, segregation
Pages: 125–158
DOI: 10.2436/20.8080.02.72
- PDF
Efficiency of propensity score adjustment and calibration on the estimation from non-probabilistic online surveys

Ramón Ferri-García and Maria del Mar Rueda
Abstract: One of the main sources of inaccuracy in modern survey techniques, such as online and smartphone surveys, is the absence of an adequate sampling frame that could provide a probabilistic sampling. This kind of data collection leads to the presence of high amounts of bias in final estimates of the survey, specially if the estimated variables (also known as target variables) have some influence on the decision of the respondent to participate in the survey. Various correction techniques, such as calibration and propensity score adjustment or PSA, can be applied to remove the bias. This study attempts to analyse the efficiency of correction techniques in multiple situations, applying a combination of propensity score adjustment and calibration on both types of variables (correlated and not correlated with the missing data mechanism) and testing the use of a reference survey to get the population totals for calibration variables. The study was performed using a simulation of a fictitious population of potential voters and a real volunteer survey aimed to a population for which a complete census was available. Results showed that PSA combined with calibration results in a bias removal considerably larger when compared with calibration with no prior adjustment. Results also showed that using population totals from the estimates of a reference survey instead of the available population data does not make a difference in estimates accuracy, although it can contribute to slightly increment the variance of the estimator.
Keywords: Online surveys, Smartphone surveys, propensity score adjustment, calibration, simulation
Pages: 159–182
DOI: 10.2436/20.8080.02.73
- PDF
Field rules and bias in random surveys with quota samples. An assessment of CIS surveys

José M. Pavía and Cristina Aybar
Abstract: Surveys applying quota sampling in their final step are widely used in opinion and market research all over the world. This is also the case in Spain, where the surveys carried out by CIS (a public institution for sociological research supported by the government) have become a point of reference. The rules used by CIS to select individuals within quotas, however, could be improved as they lead to biases in age distributions. Analysing more than 545,000 responses collected in the 220 monthly barometers conducted between 1997 and 2016 by CIS, we compare the empirical distributions of the barometers with the expected distributions from the sample design and/or target populations. Among other results, we find, as a consequence of the rules used, significant overrepresentations in the observed proportions of respondents with ages equal to the minimum and maximum of each quota (age and gender group). Furthermore, in line with previous literature, we also note a significant overrepresentation of ages ending in zero. After offering simple solutions to avoid all these biases, we discuss some of their consequences for modelling and inference and about limitations and potentialities of CIS data
Keywords: Centre for Sociological Research, quota sampling, fieldwork rules, age and gender groups, inter-quota distributions, intra-quota distributions
Pages: 183–206
DOI: 10.2436/20.8080.02.74
- PDF
- ZIP
Effect of agro-climatic conditions on near infrared spectra of extra virgin olive oils

María Isabel Sánchez-Rodríguez, Elena M. Sánchez-López, José Mª Caridad, Alberto Marinas and Francisco José Urbano
Abstract: Authentication of extra virgin olive oil requires fast and cost-effective analytical procedures, such as near infrared spectroscopy. Multivariate analysis and chemometrics have been successfully applied in several papers to gather qualitative and quantitative information of extra virgin olive oils from near infrared spectra. Moreover, there are many examples in the literature analysing the effect of agro-climatic conditions on food content, in general, and in olive oil components, in particular. But the majority of these studies considered a factor, a non-numerical variable, containing this meteorological information. The present work uses all the agro-climatic data with the aim of highlighting the linear relationships between them and the near infrared spectra. The study begins with a graphical motivation, continues with a bivariate analysis and, finally, applies redundancy analysis to extend and confirm the previous conclusions.
Keywords: Extra virgin olive oil, infrared spectroscopy, agro-climatic data, linear correlations, redundancy analysis
Pages: 209–236
DOI: 10.2436/20.8080.02.75
- PDF
Poisson excess relative risk models: new implementations and software

Manuel Higueras and Adam Howes
Abstract: Two new implementations for fitting Poisson excess relative risk methods are proposed for assumed simple models. This allows for estimation of the excess relative risk associated with a unique exposure, where the background risk is modelled by a unique categorical variable, for example gender or attained age levels. Additionally, it is shown how to fit general Poisson linear relative risk models in R. Both simple methods and the R fitting are illustrated in three examples. The first two examples are from the radiation epidemiology literature. Data in the third example are randomly generated with the purpose of sharing it jointly with the R scripts.
Keywords: Radiation epidemiology, Poisson non-linear regression, improper priors, R programming
Pages: 237–252
DOI: 10.2436/20.8080.02.76
- PDF
- ZIP

Volume 42 (1), January-June 2018

Using a Bayesian change-point statistical model with autoregressive terms to study the monthly number of dispensed asthma medications by public health services

José André Mota de Queiroz, Davi Casale Aragon, Luane Marques de Mello, Isolde Terezinha Santos Previdelli and Edson Martinez
Abstract: In this paper, it is proposed a Bayesian analysis of a time series in the presence of a random change-point and autoregressive terms. The development of this model was motivated by a data set related to the monthly number of asthma medications dispensed by the public health services of Ribeirão Preto, Southeast Brazil, from 1999 to 2011. A pronounced increase trend has been observed from 1999 to a specific change-point, with a posterior decrease until the end of the series. In order to obtain estimates for the parameters of interest, a Bayesian Markov Chain Monte Carlo (MCMC) simulation procedure using the Gibbs sampler algorithm was developed. The Bayesian model with autoregressive terms of order 1 fits well to the data, allowing to estimate the change-point at July 2007, and probably reflecting the results of the new health policies and previously adopted programs directed toward patients with asthma. The results imply that the present model is useful to analyse the monthly number of dispensed asthma medications and it can be used to describe a broad range of epidemiological time series data where a change-point is present.
Keywords: Time series, regression models, Bayesian methods, change-point model, epidemiological data
Pages: 3–26
DOI: 10.2436/20.8080.02.66
- PDF
Evaluating the complexity of some families of functional data

Enea Bongiorno, Aldo Goia and Philippe Vieu
Abstract: In this paper we study the complexity of a functional data set drawn from particular processes by means of a two-step approach. The first step considers a new graphical tool for assessing to which family the data belong: the main aim is to detect whether a sample comes from a monomial or an exponential family. This first tool is based on a nonparametric kNN estimation of small ball probability. Once the family is specified, the second step consists in evaluating the extent of complexity by estimating some specific indexes related to the assigned family. It turns out that the developed methodology is fully free from assumptions on model, distribution as well as dominating measure. Computational issues are carried out by means of simulations and finally the method is applied to analyse some financial real curves dataset.
Keywords: Small ball probability, log-Volugram, random processes, complexity class, complexity index, knn estimation, functional data analysis
Pages: 27–44
DOI: 10.2436/20.8080.02.67
- PDF
Preliminary test and Stein-type shrinkage LASSO-based estimators

Mina Norouzirad and Mohammad Arashi
Abstract: Suppose the regression vector-parameter is subjected to lie in a subspace hypothesis in a linear regression model. In situations where the use of least absolute and shrinkage selection operator (LASSO) is desired, we propose a restricted LASSO estimator. To improve its performance, LASSO-type shrinkage estimators are also developed and their asymptotic performance is studied. For numerical analysis, we used relative efficiency and mean prediction error to compare the estimators which resulted in the shrinkage estimators to have better performance compared to the LASSO.
Keywords: Double shrinking, LASSO, preliminary test LASSO, restricted LASSO, Stein-type shrinkage LASSO
Pages: 45–58
DOI: 10.2436/20.8080.02.68
- PDF
Heteroscedasticity irrelevance when testing means difference

Pablo Flores and Jordi Ocaña
Abstract: Heteroscedasticity produces a lack of type I error control in Student’s t test for difference between means. Pretesting for it (e.g., by means of Levene’s test) should be avoided as this also induces type I error. These pretests are inadequate for their objective: not rejecting the null hypotheses is not a proof of homoscedasticity; and rejecting it may simply suggest an irrelevant heteroscedasticity. We propose a method to establish irrelevance limits for the ratio of variances. In conjunction with a test for dispersion equivalence, this appears to be a more affordable pretesting strategy.
Keywords: Homoscedasticity, equivalence test, indifference zone, pretest, Student’s t test
Pages: 59–72
DOI: 10.2436/20.8080.02.69
- PDF
Empirical analysis of daily cash flow time-series and its implications for forecasting

Francisco Salas-Molina, Juan A. Rodríguez-Aguilar, Joan Serrà, Montserrat Guillen and Francisco J. Martin
Abstract: Usual assumptions on the statistical properties of daily net cash flows include normality, absence of correlation and stationarity. We provide a comprehensive study based on a real-world cash flow data set showing that: (i) the usual assumption of normality, absence of correlation and stationarity hardly appear; (ii) non-linearity is often relevant for forecasting; and (iii) typical data transformations have little impact on linearity and normality. This evidence may lead to consider a more data-driven approach such as time-series forecasting in an attempt to provide cash managers with expert systems in cash management.
Keywords: Statistics, forecasting, cash flow, non-linearity, time-series
Pages: 73–98
DOI: 10.2436/20.8080.02.70
- PDF

Volume 41 (2), July-December 2017

Hierarchical models with normal and conjugate random effects: a review (invited article)

Geert Molenberghs, Geert Verbeke and Clarice G.B. Demétrio
Abstract: Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs et al. (2010) proposed a general framework to model hierarchical data subject to within-unit correlation and/or overdispersion. The framework extends classical overdispersion models as well as generalized linear mixed models. Subsequent work has examined various aspects that lead to the formulation of several extensions. A unified treatment of the model framework and key extensions is provided. Particular extensions discussed are: explicit calculation of correlation and other moment-based functions, joint modelling of several hierarchical sequences, versions with direct marginally interpretable parameters, zero-inflation in the count case, and influence diagnostics. The basic models and several extensions are illustrated using a set of key examples, one per data type (count, binary, multinomial, ordinal, and time-to-event).
Keywords: Conjugacy, frailty, joint modelling, marginalized multilevel model, mixed model, overdispersion, underdispersion, variance component, zero-inflation.
Pages: 191–254
DOI: 10.2436/20.8080.02.58
- PDF
A bivariate response model for studying the marks obtained in two jointly-dependent modules in higher education

Emilio Gómez-Déniz, Nancy Dávila Cárdenes and María D. García Artiles
Abstract: We study the factors which may affect students’ marks in two modules, mathematics and statistics, taught consecutively in the first year of a Business Administration Studies degree course. For this purpose, we introduce a suitable bivariate regression model in which the dependent variables have bounded support and the marginal means are functions of explanatory variables. The marginal probability density functions have a classical beta distribution. Simulation experiments were performed to observe the behaviour of the maximum likelihood estimators. Comparisons with univariate beta regression models show the proposed bivariate regression model to be superior.
Keywords: Beta distribution, bivariate beta distribution, conditional distributions, covariate, marginal distributions, regression, mathematics, statistics, business studies.
Pages: 255–276
DOI: 10.2436/20.8080.02.59
- PDF
Bayesian hierarchical models for analysing the spatial distribution of bioclimatic indices

Xavier Barber, David Conesa, Antonio López-Quílez, Asunción Mayoral, Javier Morales and Antoni Barber
Abstract: A methodological approach for modelling the spatial distribution of bioclimatic indices is proposed in this paper. The value of the bioclimatic index is modelled with a hierarchical Bayesian model that incorporates both structured and unstructured random effects. Selection of prior distributions is also discussed in order to better incorporate any possible prior knowledge about the parameters that could refer to the particular characteristics of bioclimatic indices. MCMC methods and distributed programming are used to obtain an approximation of the posterior distribution of the parameters and also the posterior predictive distribution of the indices. One main outcome of the proposal is the spatial bioclimatic probability distribution of each bioclimatic index, which allows researchers to obtain the probability of each location belonging to different bioclimates. The methodology is evaluated on two indices in the Island of Cyprus.
Keywords: Bioclimatology, geostatistics, parallel computation, spatial prediction.
Pages: 277–296
DOI: 10.2436/20.8080.02.60
- PDF
The Pareto IV power series cure rate model with applications

Diego I. Gallardo, Yolanda M. Gómez, Barry C. Arnold and Héctor W. Gómez
Abstract: Cutaneous melanoma is thought to be triggered by intense, occasional exposure to ultraviolet radiation, either from the sun or tanning beds, especially in people who are genetically predisposed to the disease. When skin cells are damaged by ultraviolet light in this way, often showing up as a sunburn, they are more prone to genetic defects that cause them to rapidly multiply and form potentially fatal (malignant) tumors. Melanoma originates in a type of skin cell called a melanocyte, such cells help produce the pigments of our skin, hair, and eyes. We propose a new cure rate survival regression model for predicting cutaneous melanoma. We assume that the unknown number of competing causes that can influence the survival time is governed by a power series Distribution and that the time until the tumor cells are activated follows the Pareto IV distribution. The parameter estimation is based on the EM algorithm which for this model can be implemented in a simple way in computational terms. Simulation studies are presented, showing the good performance of the proposed estimation procedure. Finally, two real applications related to a cutaneous melanoma and melanoma data sets are presented.
Keywords: Competing risks, cure rate models, EM algorithm, Pareto IV distribution, power series distribution.
Pages: 297–318
DOI: 10.2436/20.8080.02.61
- PDF
- ZIP
Estimating regional social accounting matrices to analyse rural development

Alfredo Mainar-Causapé, José Manuel Rueda Cantuche, M. Alejandro Cardenete, Patricia Fuentes-Saguar, M. Carmen Delgado, Fabien Santini, Sébastien Mary and Sergio Gómez y Paloma
Abstract: This paper has two complementary objectives: on the one hand, it introduces the EURO method for the estimation of (regional) Social Accounting Matrices. This method is widely used by Eurostat for the estimation of missing national Supply, Use and Input-output tables but it has not been used before within the context of social accounting matrices or of regional statistics and/or regional impact analyses. On the other hand, this work discusses the possibility of producing non-survey based regional Social Accounting Matrices that may eventually allow the user to carry out impact analyses such as those of rural development policies, among others. The analysis is carried out for 12 selected European regions based on clusters.
Keywords: Social accounting matrices, rural development, European regions, impact analysis.
Pages: 319–346
DOI: 10.2436/20.8080.02.62
- PDF
Joint models for longitudinal counts and left-truncated time-to event data with applications to health insurance

Xavier Piulachs, Ramon Alemany, Montserrat Guillén and Dimitris Rizopoulos
Abstract: Aging societies have given rise to important challenges in the field of health insurance. Elderly policyholders need to be provided with fair premiums based on their individual health status, whereas insurance companies want to plan for the potential costs of tackling lifetimes above mean expectations. In this article, we focus on a large cohort of policyholders in Barcelona (Spain), aged 65 years and over. A shared-parameter joint model is proposed to analyse the relationship between annual demand for emergency claims and time until death outcomes, which are subject to left truncation. We compare different functional forms of the association between both processes, and, furthermore, we illustrate how the fitted model provides time-dynamic predictions of survival probabilities. The parameter estimation is performed under the Bayesian framework using Markov chain Monte Carlo methods.
Keywords: Joint models, panel count data, left truncation, Bayesian framework, health insurance.
Pages:347–372
DOI: 10.2436/20.8080.02.63
- PDF
Statistical and machine learning approaches for the minimization of trigger errors in parametric earthquake catastrophe bonds

Laura Calvet, Madeleine Lopeman, Jésica de Armas, Guillermo Franco and Angel A. Juan
Abstract: Catastrophe bonds are financial instruments designed to transfer risk of monetary losses arising from earthquakes, hurricanes, or floods to the capital markets. The insurance and reinsurance industry, governments, and private entities employ them frequently to obtain coverage. Parametric catastrophe bonds base their payments on physical features. For instance, given parameters such as magnitude of the earthquake and the location of its epicentre, the bond may pay a fixed amount or not pay at all. This paper reviews statistical and machine learning techniques for designing trigger mechanisms and includes a computational experiment. Several lines of future research are discussed.
Keywords: Catastrophe bonds, risk of natural hazards, classification techniques, earthquakes, insurance.
Pages: 373–392
DOI: 10.2436/20.8080.02.64
- PDF
Horizontal collaboration in freight transport: concepts, benefits and environmental challenges

Adrián Serrano-Hernández, Angel A. Juan, Javier Faulin and Elena Perez-Bernabeu

Abstract: Since its appearance in the 1990s, horizontal collaboration (HC) practices have revealed themselves as catalyzers for optimizing the distribution of goods in freight transport logistics. After introducing the main concepts related to HC, this paper offers a literature review on the topic and provides a classification of best practices in HC. Then, the paper analyses the main benefits and optimization challenges associated with the use of HC at the strategic, tactical, and operational levels. Emerging trends such as the concept of ‘green’ or environmentally-friendly HC in freight transport logistics are also introduced. Finally, the paper discusses the need of using hybrid optimization methods, such as simheuristics and learnheuristics, in solving some of the previously identified challenges in real-life scenarios dominated by uncertainty and dynamic conditions.
Keywords: Horizontal collaboration, freight transport, sustainable logistics, supply chain management, combinatorial optimization.
Pages: 393–414
PDF

Volume 41 (1), January-June 2017

Thirty years of progeny from Chao’s inequality: Estimating and comparing richness with incidence data and incomplete sampling (invited article)

Anne Chao and Robert K. Colwell
Abstract: In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals’ capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao’s inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally-large or equally-complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao’s inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.
Keywords: Cauchy-Schwarz inequality, Chao2 estimator, extrapolation, Good-Turing frequency, formula, incidence data, phylogenetic diversity, rarefaction, sampling effort, shared species richness, species richness.
Pages: 3– 54
DOI: 10.2436/20.8080.02.49
- PDF
On a property of Lorenz curves with monotone elasticity and its application to the study of inequality by using tax data

Miguel A. Sordo, Angel Berihuete, Carmen Dolores Ramos and Héctor M. Ramos
Abstract: The Lorenz curve is the most widely used graphical tool for describing and comparing inequality of income distributions. In this paper, we show that the elasticity of this curve is an indicator of the effect, in terms of inequality, of a truncation of the income distribution. As an application, we consider tax returns as equivalent to the truncation from below of a hypothetical income distribution. Then, we replace this hypothetical distribution by the income distribution obtained from a general household survey and use the dual Lorenz curve to anticipate this effect.
Keywords Lorenz curve, tax data, truncation, inequality.
Pages: 55– 72

DOI: 10.2436/20.8080.02.50
- PDF
Comparison of two discrimination indexes in the categorisation of continuous predictors in time-to-event studies

Irantzu Barrio, María Xosé Rodríguez-Álvarez, Luis Meira-Machado, Cristóbal Esteban and Inmaculada Arostegui
Abstract: The Cox proportional hazards model is the most widely used survival prediction model for analysing time-to-event data. To measure the discrimination ability of a survival model the concordance probability index is widely used. In this work we studied and compared the performance of two different estimators of the concordance probability when a continuous predictor variable is categorised in a Cox proportional hazards regression model. In particular, we compared the c-index and the concordance probability estimator. We evaluated the empirical performance of both estimators through simulations. To categorise the predictor variable we propose a methodology which considers the maximal discrimination attained for the categorical variable. We applied this methodology to a cohort of patients with chronic obstructive pulmonary disease, in particular, we categorised the predictor variable forced expiratory volume in one second in percentage.
Keywords: Categorisation, prediction models, cutpoint, Cox model.
Pages: 73– 92
DOI: 10.2436/20.8080.02.51
- PDF
- ZIP
Bayesian correlated models for assessing the prevalence of viruses in organic and non-organic agroecosystems

Elena Lázaro, Carmen Armero and Luis Rubio
Abstract: Cultivation of horticultural species under organic management has increased in importance in recent years. However, the sustainability of this new production method needs to be supported by scientific research, especially in the field of virology. We studied the prevalence of three important virus diseases in agroecosystems with regard to its management system: organic versus non-organic, with and without greenhouse. Prevalence was assessed by means of a Bayesian correlated binary model which connects the risk of infection of each virus within the same plot and was defined in terms of a logit generalized linear mixed model (GLMM). Model robustness was checked through a sensitivity analysis based on different hyperprior scenarios. Inferential results were examined in terms of changes in the marginal posterior distributions, both for fixed and for random effects, through the Hellinger distance and a derived measure of sensitivity. Statistical results suggested that organic systems show lower or similar prevalence than non-organic ones in both single and multiple infections as well as the relevance of the prior specification of the random effects in the inferential process.
Keywords: Hellinger distance, model robustness, risk infection, sensitivity analysis, virus epidemiology.
Pages: 93– 116
DOI: 10.2436/20.8080.02.52
- PDF
Corrigendum to "Transmuted geometric distribution with applications in modelling and regression analysis of count data"

Subrata Chakraborty and Deepesh Bhati
Pages: 117– 118
DOI: 10.2436/20.8080.02.53
- PDF
Goodness-of-fit test for randomly censored data based on maximum correlation

Ewa Strzalkowska-Kominiak and Aurea Grané
Abstract: In this paper we study a goodness-of-fit test based on the maximum correlation coefficient, in the context of randomly censored data. We construct a new test statistic under general right- censoring and prove its asymptotic properties. Additionally, we study a special case, when the censoring mechanism follows the well-known Koziol-Green model. We present an extensive simulation study on the empirical power of these two versions of the test statistic, showing their ad- vantages over the widely used Pearson-type test. Finally, we apply our test to the head-and-neck cancer data.
Keywords: Goodness-of-fit, Kaplan-Meier estimator, maximum correlation, random censoring.
Pages: 119– 138
DOI: 10.2436/20.8080.02.54
- PDF
A quadtree approach based on European geographic grids: reconciling data privacy and accuracy

Raymond Lagonigro, Ramon Oller and Joan Carles Martori
Abstract: Methods to preserve confidentiality when publishing geographic information conflict with the need to publish accurate data. The goal of this paper is to create a European geographic grid frame- work to disseminate statistical data over maps. We propose a methodology based on quadtree hierarchical geographic data structures. We create a varying size grid adapted to local area densities. High populated zones are disaggregated in small squares to allow dissemination of accurate data. Alternatively, information on low populated zones is published in big squares to avoid identification of individual data. The methodology has been applied to the 2014 population register data in Catalonia.
Keywords: Official statistics, confidentiality, disclosure limitation, dissemination, geographic information systems, hierarchical data structures, small area geography.
Pages: 139– 158
DOI: 10.2436/20.8080.02.55
- PDF
A Bayesian stochastic SIRS model with a vaccination strategy for the analysis of respiratory syncytial virus

Marc Jornet-Sanz, Ana Corberán-Vallet, Francisco Santonja and Rafael Villanueva
Abstract: Our objective in this paper is to model the dynamics of respiratory syncytial virus in the region of Valencia (Spain) and analyse the effect of vaccination strategies from a health-economic point of view. Compartmental mathematical models based on differential equations are commonly used in epidemiology to both understand the underlying mechanisms that influence disease transmission and analyse the impact of vaccination programs. However, a recently proposed Bayesian stochastic susceptible-infected-recovered-susceptible model in discrete-time provided an improved and more natural description of disease dynamics. In this work, we propose an extension of that stochastic model that allows us to simulate and assess the effect of a vaccination strategy that consists on vaccinating a proportion of newborns.
Keywords: Infectious diseases, respiratory syncytial virus (RSV), discrete-time epidemic model, stochastic compartmental model, Bayesian analysis, intervention strategies.
Pages: 159– 176
DOI: 10.2436/20.8080.02.56
- PDF
Statistical modeling of warm-spell duration series using hurdle models

Jesper Rydén
Abstract: Regression models for counts could be applied to the earth sciences, for instance when studying trends of extremes of climatological quantities. Hurdle models are modified count models which can be regarded as mixtures of distributions. In this paper, hurdle models are applied to model the sums of lengths of periods of high temperatures. A modification to the common versions presented in the literature is presented, as left truncation as well as a particular treatment of zeros is needed for the problem. The outcome of the model is compared to those of simpler count models.
Keywords: Count data, hurdle models, Poisson regression, negative binomial distribution, climate.
Pages: 177– 188
DOI: 10.2436/20.8080.02.57
- PDF

Volume 40 (2), July-December 2016

Improving the resolution of the simple assembly line balancing problem type E

Albert Corominas, Alberto García-Villoria and Rafael Pastor
Abstract: The simple assembly line balancing problem type E (abbreviated as SALBP-E) occurs when the number of workstations and the cycle time are variables and the objective is to maximise the line efficiency. In contrast with other types of SALBPs, SALBP-E has received little attention in the literature. In order to solve optimally SALBP-E, we propose a mixed integer liner programming model and an iterative procedure. Since SALBP-E is NP-hard, we also propose heuristics derived from the aforementioned procedures for solving larger instances. An extensive experimentation is carried out and its results show the improvement of the SALBP-E resolution.
Keywords: Assembly line balancing, SALBP, manufacturing optimisation.
Pages: 227– 242
DOI: 10.2436/20.8080.02.42
- PDF
Kernel-based estimation of P(X >Y) in ranked set sampling

Mahdi Mahdizadeh and Ehsan Zamanzade
Abstract: This article is directed at the problem of reliability estimation using ranked set sampling. A nonparametric estimator based on kernel density estimation is developed. The estimator is shown to be superior to its analog in simple random sampling. Monte Carlo simulations are employed to assess performance of the proposed estimator. Two real data sets are analysed for illustration.
Keywords: Bandwidth selection, Judgment ranking, Stress-strength model.
Pages: 243– 266
DOI: 10.2436/20.8080.02.43
- PDF
- ZIP
A construction of continuous-time ARMA models by iterations of Ornstein-Uhlenbeck processes

Argimiro Arratia, Alejandra Cabaña and Enrique M. Cabaña
Abstract: We present a construction of a family of continuous-time ARMA processes based on p iterations of the linear operator that maps a Lévy process onto an Ornstein-Uhlenbeck process. The construction resembles the procedure to build an AR(p) from an AR(1). We show that this family is in fact a subfamily of the well-known CARMA(p,q) processes, with several interesting advantages, including a smaller number of parameters. The resulting processes are linear combinations of Ornstein-Uhlenbeck processes all driven by the same Lévy process. This provides a straightforward computation of covariances, a state-space model representation and methods for estimating parameters. Furthermore, the discrete and equally spaced sampling of the process turns to be an ARMA(p, p−1) process. We propose methods for estimating the parameters of the iterated Ornstein-Uhlenbeck process when the noise is either driven by a Wiener or a more general Lévy process, and show simulations and applications to real data.
Keywords: Ornstein-Uhlenbeck process, Lévy process, Continuous ARMA, stationary process.
Pages: 267– 302
DOI: 10.2436/20.8080.02.44
- PDF
Modelling extreme values by the residual coefficient of variation

Joan del Castillo and Maria Padilla
Abstract: The possibilities of the use of the coefficient of variation over a high threshold in tail modelling are discussed. The paper also considers multiple threshold tests for a generalized Pareto distribution, together with a threshold selection algorithm. One of the main contributions is to extend the methodology based on moments to all distributions, even without finite moments. These techniques are applied to euro/dollar daily exchange rates and to Danish fire insurance losses.
Keywords: Statistics of extremes, heavy tails, high quantile estimation, value at risk.
Pages: 303– 320
DOI: 10.2436/20.8080.02.45
- PDF
Using robust FPCA to identify outliers in functional time series, with applications to the electricity market

Juan M. Vilar, Paula Raña and Germán Aneiros
Abstract: This study proposes two methods for detecting outliers in functional time series. Both methods take dependence in the data into account and are based on robust functional principal component analysis. One method seeks outliers in the series of projections on the first principal component. The other obtains uncontaminated forecasts for each data set and determines that those observations whose residuals have an unusually high norm are considered outliers. A simulation study shows the performance of these proposed procedures and the need to take dependence in the time series into account. Finally, the usefulness of our methodology is illustrated in two real datasets from the electricity market: daily curves of electricity demand and price in mainland Spain, for the year 2012.
Keywords: Functional data analysis, functional principal component analysis, functional time series, outlier detection, electricity demand and price.
Pages: 321– 348
DOI: 10.2436/20.8080.02.46
- PDF
Log-ratio methods in mixture models for compositional data sets

Marc Comas-Cufí, Josep Antoni Martín-Fernández and Glòria Mateu-Figueras
Abstract: When traditional methods are applied to compositional data misleading and incoherent results could be obtained. Finite mixtures of multivariate distributions are becoming increasingly important nowadays. In this paper, traditional strategies to fit a mixture model into compositional data sets are revisited and the major difficulties are detailed. A new proposal using a mixture of distributions defined on orthonormal log-ratio coordinates is introduced. A real data set analysis is presented to illustrate and compare the different methodologies.
Keywords: Compositional data, Finite Mixture, Log ratio, Model-based clustering, Normal distribution, Orthonormal coordinates, Simplex.
Pages: 340– 374
DOI: 10.2436/20.8080.02.47
- PDF
Smoothed landmark estimators of the transition probabilities

Luís Meira-Machado
Abstract: One important goal in clinical applications of multi-state models is the estimation of transition probabilities. Recently, landmark estimators were proposed to estimate these quantities, and their superiority with respect to the competing estimators has been proved in situations in which the Markov condition is violated. As a weakness, it provides large standard errors in estimation in some circumstances. In this article, we propose two approaches that can be used to reduce the variability of the proposed estimator. Simulations show that the proposed estimators may be much more efficient than the unsmoothed estimator. A real data illustration is included.
Keywords: Kaplan-Meier, Multi-state model, Nonparametric estimation, Presmoothing, Survival Analysis.
Pages: 375– 398
DOI: 10.2436/20.8080.02.48
- PDF

Volume 40 (1), January-June 2016

The relevance of multi-country input-output tables in measuring emissions trade balance of countries: the case of Spain

Teresa Sanz, Rocío Yñiguez and José Manuel Rueda-Cantuche
Abstract: As part of national accounts, input-output tables are becoming crucial statistical tools to study the economic, social and environmental impacts of globalization and international trade. In particular, global input-output tables extend the national dimension to the international dimension by relating individual countries’ input-output tables among each other, thus providing an opportunity to balance the global economy as a whole. Concerning emissions of greenhouse gases, the relative position that countries hold among their main trade partners at the global level is a key issue in terms of international climate negotiations. With this purpose, we show that (official) Multi-country input-output tables are crucial to analyse the greenhouse gas emission trade balance of individual countries. Spain has a negative trade emissions balance for all three gases analysed, being the most negative balances those associated to the bilateral trade with China, Russia, United States and the rest of the European Union as a whole.
Keywords: WIOD, Emissions Trade Balance, Spain, GHG footprint, GHG.
Pages: 3– 30

DOI: 10.2436/20.8080.02.33
- PDF
Two alternative estimation procedures for the negative binomial cure rate model with a latent activation scheme

Diego I. Gallardo and Heleno Bolfarine
Abstract: In this paper two alternative estimation procedures based on the EM algorithm are proposed for the flexible negative binomial cure rate model with a latent activation scheme. The Weibull model as well as the log-normal and gamma distributions are also considered for the time-to-event data for the non-destroyed cells. Simulation studies show the satisfactory performance of the proposed methodology. The impact of misspecifying the survival function on both components of the model (cured and susceptible) is also evaluated. The use of the new methodology is illustrated with a real data set related to a clinical trial on Phase III cutaneous melanoma patients.
Keywords: Competing risks, EM algorithm, latent activation scheme.
Pages: 31–54

DOI: 10.2436/20.8080.02.34
- PDF
A test for normality based on the empirical distribution function

Hamzeh Torabi, Narges H. Montazeri and Aurea Grané
Abstract: In this paper, a goodness-of-fit test for normality based on the comparison of the theoretical and empirical distributions is proposed. Critical values are obtained via Monte Carlo for several sample sizes and different significance levels.We study and compare the power of forty selected normality tests for a wide collection of alternative distributions. The new proposal is compared to some traditional test statistics, such as Kolmogorov-Smirnov, Kuiper, Cramér-von Mises, Anderson-Darling, Pearson Chi-square, Shapiro-Wilk, Shapiro-Francia, Jarque-Bera, SJ, Robust Jarque-Bera, and also to entropy-based test statistics. From the simulation study results it is concluded that the best performance against asymmetric alternatives with support on the whole real line and alternative distributions with support on the positive real line is achieved by the new test. Other findings derived from the simulation study are that SJ and Robust Jarque-Bera tests are the most powerful ones for symmetric alternatives with support on the whole real line, whereas entropy-based tests are preferable for alternatives with support on the unit interval.
Keywords: Empirical distribution function, entropy estimator, goodness-of-fit tests, Monte Carlo simulation, Robust Jarque-Bera test, Shapiro-Francia test, SJ test, test for normality.
Pages: 55–88
DOI: 10.2436/20.8080.02.35
- PDF
Point and interval estimation for the logistic distribution based on record data

Akbar Asgharzadeh, Reza Valiollahi and Mousa Abdi
Abstract: In this paper, based on record data from the two-parameter logistic distribution, the maximum likelihood and Bayes estimators for the two unknown parameters are derived. The maximum likelihood estimators and Bayes estimators can not be obtained in explicit forms. We present a simple method of deriving explicit maximum likelihood estimators by approximating the likelihood function. Also, an approximation based on the Gibbs sampling procedure is used to obtain the Bayes estimators. Asymptotic confidence intervals, bootstrap confidence intervals and credible intervals are also proposed. Monte Carlo simulations are performed to compare the performances of the different proposed methods. Finally, one real data set has been analysed for illustrative purposes.
Keywords: Logistic distribution, record data,maximum likelihood estimator, Bayes estimator, Gibbs sampling.
Pages: 89–112
DOI: 10.2436/20.8080.02.36
- PDF
A goodness-of-fit test for the multivariate Poisson distribution

Francisco Novoa-Muñoz and María Dolores Jiménez-Gamero
Abstract: Bivariate count data arise in several different disciplines and the bivariate Poisson distribution is commonly used to model them. This paper proposes and studies a computationally convenient goodness-of-fit test for this distribution, which is based on an empirical counterpart of a system of equations. The test is consistent against fixed alternatives. The null distribution of the test can be consistently approximated by a parametric bootstrap and by a weighted bootstrap. The goodness of these bootstrap estimators and the power for finite sample sizes are numerically studied. It is shown that the proposed test can be naturally extended to the multivariate Poisson distribution.
Keywords: Bivariate Poisson distribution, goodness-of-fit, empirical probability generating function, parametric bootstrap, weighted bootstrap, multivariate Poisson distribution.
Pages: 113–138
DOI: 10.2436/20.8080.02.37
- PDF
Exploring Bayesian models to evaluate control procedures for plant disease

Danilo Alvares, Carmen Armero, Anabel Forte and Luis Rubio
Abstract: Tigernut tubers are the main ingredient in the production of orxata in Valencia, a white soft sweet popular drink. In recent years, the appearance of black spots in the skin of tigernuts has led to important economic losses in orxata production because severely diseased tubers must be discarded. In this paper, we discuss three complementary statistical models to assess the disease incidence of harvested tubers from selected or treated seeds, and propose a measure of effectiveness for different treatments against the disease based on the probability of germination and the incidence of the disease. Statistical methods for these studies are approached from Bayesian reasoning and include mixed-effects models, Dirichlet-multinomial inferential processes and mixed-effects logistic regression models. Statistical analyses provide relevant information to carry out measures to palliate the black spot disease and achieve a high-quality production. For instance, the study shows that avoiding affected seeds increases the probability of harvesting asymptomatic tubers. It is also revealed that the best chemical treatment, when prioritizing germination, is disinfection with hydrochloric acid while sodium hypochlorite performs better if the priority is to have a reduced disease incidence. The reduction of the incidence of the black spots syndrome by disinfection with chemical agents supports the hypothesis that the causal agent is a pathogenic organism.
Keywords: Dirichlet-multinomial model, logistic regression, measures of effectiveness, tigernuts tubers.
Pages: 139–152
DOI: 10.2436/20.8080.02.38
- PDF
Transmuted geometric distribution with applications in modeling and regression analysis of count data

Subrata Chakraborty and Deepesh Bhati
Abstract: A two-parameter transmuted geometric distribution is proposed as a new generalization of the geometric distribution by employing the quadratic transmutation techniques of Shaw and Buckley. The additional parameter plays the role of controlling the tail length. Distributional properties of the proposed distribution are investigated. Maximum likelihood estimation method is discussed along with some data fitting experiments to show its advantages over some existing distributions in literature. The tail flexibility of density of aggregate loss random variable assuming the proposed distribution as primary distribution is outlined and presented along with a illustrative modelling of aggregate claim of a vehicle insurance data. Finally, we present a count regression model based on the proposed distribution and carry out its comparison with some established models.
Keywords: Aggregate claim, count regression, geometric distribution, transmuted distribution.
Pages: 153–176
DOI: 10.2436/20.8080.02.39
- PDF
Compound distributions motivated by linear failure rate

Narjes Gitifar, Sadegh Rezaei and Saralees Nadarajah
Abstract: Motivated by three failure data sets (lifetime of patients, failure time of hard drives and failure time of a product), we introduce three different three-parameter distributions, study basic mathematical properties, address estimation by the method of maximum likelihood and investigate finite sample performance of the estimators. We show that one of the new distributions provides a better fit to each data set than eight other distributions each having three parameters and three distributions each having two parameters.
Keywords: Linear failure rate distribution, maximum likelihood estimation, Poisson distribution.
Pages: 177–200
DOI: 10.2436/20.8080.02.40
- PDF
A statistical learning based approach for parameter fine-tuning of metaheuristics

Laura Calvet, Angel A. Juan, Carles Serrat and Jana Ries
Abstract: Metaheuristics are approximation methods used to solve combinatorial optimization problems. Their performance usually depends on a set of parameters that need to be adjusted. The selection of appropriate parameter values causes a loss of efficiency, as it requires time, and advanced analytical and problem-specific skills. This paper provides an overview of the principal approaches to tackle the Parameter Setting Problem, focusing on the statistical procedures employed so far by the scientific community. In addition, a novel methodology is proposed, which is tested using an already existing algorithm for solving the Multi-Depot Vehicle Routing Problem.
Keywords: Parameter fine-tuning, metaheuristics, statistical learning, biased randomization.
Pages: 201–224
DOI: 10.2436/20.8080.02.41
- PDF

Volume 39 (2), July-December 2015

Twenty years of P-splines (invited article)

Paul H.C. Eilers, Brian D. Marx and Maria Durbán
Abstract: P-splines first appeared in the limelight twenty years ago. Since then they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a “backbone” for the “mixing and matching” of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines.
Keywords: B-splines, penalty, additive model, mixed model, multidimensional smoothing.
Pages: 149–186
DOI: 10.2436/20.8080.02.25
- PDF
Likelihood-based inference for the power regression model

Guillermo Martínez-Flórez, Heleno Bolfarine and Héctor W. Gómez
Abstract: In this paper we investigate an extension of the power-normal model, called the alpha-power model and specialize it to linear and nonlinear regression models, with and without correlated errors. Maximum likelihood estimation is considered with explicit derivation of the observed and expected Fisher information matrices. Applications are considered for the Australian athletes data set and also to a data set studied in Xie et al. (2009). The main conclusion is that the proposed model can be a viable alternative in situations were the normal distribution is not the most adequate model.
Keywords: Correlation, maximum likelihood, power-normal distribution, regression.
Pages: 187–208
DOI: 10.2436/20.8080.02.26
- PDF
On the bivariate Sarmanov distribution and copula. An application on insurance data using truncated marginal distributions

Zuhair Bahraoui, Catalina Bolancé, Elena Pelican and Raluca Vernic
Abstract: The Sarmanov family of distributions can provide a good model for bivariate random variables and it is used to model dependency in a multivariate setting with given marginals. In this paper, we focus our attention on the bivariate Sarmanov distribution and copula with different truncated extreme value marginal distributions. We compare a global estimation method based on maximizing the full log-likelihood function with the estimation based on maximizing the pseudo-log-likelihood function for copula (or partial estimation). Our aim is to estimate two statistics that can be used to evaluate the risk of the sum exceeding a given value. Numerical results using a real data set from the motor insurance sector are presented.
Keywords: Bivariate Sarmanov distribution, truncated marginal distributions, copula representation, risk measures.
Pages: 209–230
DOI: 10.2436/20.8080.02.27
- PDF
On the interpretation of differences between groups for compositional data

Josep-Antoni Martín-Fernández, Josep Daunis-i-Estadella and Glòria Mateu-Figueras
Abstract: Social polices are designed using information collected in surveys; such as the Catalan Time Use survey. Accurate comparisons of time use data among population groups are commonly analysed using statistical methods. The total daily time expended on different activities by a single person is equal to 24 hours. Because this type of data are compositional, its sample space has particular properties that statistical methods should respect. The critical points required to interpret differences between groups are provided and described in terms of log-ratio methods. These techniques facilitate the interpretation of the relative differences detected in multivariate and univariate analysis.
Keywords: Log-ratio transformations, MANOVA, perturbation, simplex, subcomposition.
Pages: 231–252
DOI: 10.2436/20.8080.02.28
- PDF
Robust project management with the tilted beta distribution

Eugene D. Hahn and María del Mar López Martín
Abstract: Recent years have seen an increase in the development of robust approaches for stochastic project management methodologies such as PERT (Program Evaluation and Review Technique). These robust approaches allow for elevated likelihoods of outlying events, thereby widening interval estimates of project completion times. However, little attention has been paid to the fact that outlying events and/or expert judgments may be asymmetric. We propose the tilted beta distribution which permits both elevated likelihoods of outlying events as well as an asymmetric representation of these events. We examine the use of the tilted beta distribution in PERT with respect to other project management distributions.
Keywords: Activity times, finite mixture, PERT, tilted beta distribution, robust project management, sensitivity analysis.
Pages: 253–272
DOI: 10.2436/20.8080.02.29
- PDF
A note on "Double bounded Kumaraswamy-power series class of distributions"

Tibor K. Pogány and Saralees Nadarajah
Abstract: In a recent edition of SORT, Bidram and Nekoukhou proposed a novel class of distributions and derived its mathematical properties. Several of the mathematical properties are expressed as single infinite sums or double infinite sums. Here, we show that many of these properties can be expressed in terms of known special functions, functions for which in-built routines are widely available.
Keywords: Double bounded Kumaraswamy-power series class of distributions, Fox Wright generalized, hypergeometric function, generalized hypergeometric function.
Pages: 273–280
DOI: 10.2436/20.8080.02.30
- PDF
Parameter estimation of Poisson generalized linear mixed models based on three different statistical principles: a simulation study

Martí Casals, Klaus Langohr, Josep Lluís Carrasco and Lars Rönnegård
Abstract: Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles —marginal likelihood, extended likelihood, Bayesian analysis— via simulation studies. Real data on contact wrestling are used for illustration.
Keywords: Estimation methods, overdispersion, Poisson generalized linear mixed models, simulation study, statistical principles, sport injuries.
Pages: 281–308
DOI: 10.2436/20.8080.02.31
- PDF
- ZIP
Multinomial logistic estimation in dual frame surveys

David Molina, Maria del Mar Rueda, Antonio Arcos and Maria Giovanna Ranalli
Abstract: We consider estimation techniques from dual frame surveys in the case of estimation of proportions when the variable of interest has multinomial outcomes. We propose to describe the joint distribution of the class indicators by a multinomial logistic model. Logistic generalized regression estimators and model calibration estimators are introduced for class frequencies in a population. Theoretical asymptotic properties of the proposed estimators are shown and discussed. Monte Carlo experiments are also carried out to compare the efficiency of the proposed procedures for finite size samples and in the presence of different sets of auxiliary variables. The simulation studies indicate that the multinomial logistic formulation yields better results than the classical estimators that implicitly assume individual linear models for the variables. The proposed methods are also applied in an attitude survey.
Keywords: Finite population, survey sampling, auxiliary information, model assisted inference, calibration.
Pages: 309–336
DOI: 10.2436/20.8080.02.32
- PDF

Volume 39 (1), January-June 2015

Inference on the parameters of the Weibull distribution using records

Ali Akbar Jafari and Hojatollah Zakerzadeh
Abstract: The Weibull distribution is a very applicable model for lifetime data. In this paper, we have investigated inference on the parameters of Weibull distribution based on record values. We first propose a simple and exact test and a confidence interval for the shape parameter. Then, in addition to a generalized confidence interval, a generalized test variable is derived for the scale parameter when the shape parameter is unknown. The paper presents a simple and exact joint confidence region as well. In all cases, simulation studies show that the proposed approaches are more satisfactory and reliable than previous methods. All proposed approaches are illustrated using a real example.
Keywords: Coverage probability, generalized confidence interval, generalized p-value, records, Weibull distribution.
Pages: 3–18
DOI: 10.2436/20.8080.02.17
- PDF
Small area estimation of poverty indicators under partitioned area-level time models

Domingo Morales, Maria Chiara Pagliarella and Renato Salvatore
Abstract: his paper deals with small area estimation of poverty indicators. Small area estimators of these quantities are derived from partitioned time-dependent area-level linear mixed models. The introduced models are useful for modelling the different behaviour of the target variable by sex or any other dichotomic characteristic. The mean squared errors are estimated by explicit formulas. An application to data from the Spanish Living Conditions Survey is given.
Keywords: Area-level models, small area estimation, time correlation, poverty indicators.
Pages: 19–34
DOI: 10.2436/20.8080.02.18
- PDF
A new class of Skew-Normal-Cauchy distribution

Jaime Arrué, Héctor W. Gomez, Hugo S. Salinas and Heleno Bolfarine
Abstract: In this paper we study a new class of skew-Cauchy distributions inspired on the family extended two-piece skew normal distribution. The new family of distributions encompasses three well known families of distributions, the normal, the two-piece skew-normal and the skew-normal-Cauchy distributions. Some properties of the new distribution are investigated, inference via maximum likelihood estimation is implemented and results of a real data application, which reveal good performance of the new model, are reported.ious methods. All proposed approaches are illustrated using a real example.
Keywords: Cauchy distribution, kurtosis, maximum likelihood estimation, singular information matrix, skewness, Skew-Normal-Cauchy distribution.
Pages: 35–50
DOI: 10.2436/20.8080.02.19
- PDF
Diagnostic plot for the Identification of high leverage collinearity-influential observations

Arezoo Bagheri and Habshah Midi
Abstract: High leverage collinearity influential observations are those high leverage points that change the multicollinearity pattern of a data. It is imperative to identify these points as they are responsible for misleading inferences on the fitting of a regression model. Moreover, identifying these observations may help statistics practitioners to solve the problem of multicollinearity, which is caused by high leverage points. A diagnostic plot is very useful for practitioners to quickly capture abnormalities in a data. In this paper, we propose new diagnostic plots to identify high leverage collinearity influential observations. The merit of our proposed diagnostic plots is confirmed by some well-known examples and Monte Carlo simulations.
Keywords: Collinearity influential observation, diagnostic robust generalized potential, high lever-age points, multicollinearity.
Pages: 51–70
DOI: 10.2436/20.8080.02.20
- PDF
Discrete Alpha-Skew-Laplace distribution

S. Shams Harandi and M. H. Alamatsaz
Abstract: Classical discrete distributions rarely support modelling data on the set of whole integers. In this paper, we shall introduce a flexible discrete distribution on this set, which can, in addition, cover bimodal as well as unimodal data sets. The proposed distribution can also be fitted to positive and negative skewed data. The distribution is indeed a discrete counterpart of the continuous alpha-skew-Laplace distribution recently introduced in the literature. The proposed distribution can also be viewed as a weighted version of the discrete Laplace distribution. Several distributional properties of this class such as cumulative distribution function, moment generating function, moments, modality, infinite divisibility and its truncation are studied. A simulation study is also performed. Finally, a real data set is used to show applicability of the new model comparing to several rival models, such as the discrete normal and Skellam distributions.
Keywords: Discrete Laplace distribution, discretization, maximum likelihood estimation, uni-bimodality, weighted distribution.
Pages: 71–84
DOI: 10.2436/20.8080.02.21
- PDF
A mathematical programming approach for different scenarios of bilateral bartering

Stefano Nasini, Jordi Castro and Pau Fonseca
Abstract: The analysis of markets with indivisible goods and fixed exogenous prices has played an important role in economic models, especially in relation to wage rigidity and unemployment. This paper provides a novel mathematical programming based approach to study pure exchange economies where discrete amounts of commodities are exchanged at fixed prices. Barter processes, consisting in sequences of elementary reallocations of couple of commodities among couples of agents, are formalized as local searches converging to equilibrium allocations. A direct application of the analysed processes in the context of computational economics is provided, along with a Java implementation of the described approaches.
Keywords: Numerical optimization, combinatorial optimization, microeconomic theory.
Pages: 85–108
DOI: 10.2436/20.8080.02.22
- PDF
A comparison of computational approaches for maximum likelihood estimation of the Dirichlet parameters on high-dimensional data

Marco Giordan and Ron Wehrens
Abstract: Likelihood estimates of the Dirichlet distribution parameters can be obtained only through numerical algorithms. Such algorithms can provide estimates outside the correct range for the parameters and/or can require a large amount of iterations to reach convergence. These problems can be aggravated if good starting values are not provided. In this paper we discuss several approaches that can partially avoid these problems providing a good trade-off between efficiency and stability. The performances of these approaches are compared on high-dimensional real and simulated data.
Keywords: Levenberg-Marquardt algorithm, re-parametrization, starting values, metabolomics data.
Pages: 109–126
DOI: 10.2436/20.8080.02.23
- PDF
The exponentiated discrete Weibull distribution

Vahid Nekoukhou and Hamid Bidram
Abstract: In this paper, the exponentiated discrete Weibull distribution is introduced. This new generalization of the discrete Weibull distribution can also be considered as a discrete analogue of the exponentiated Weibull distribution. A special case of this exponentiated discrete Weibull distribution defines a new generalization of the discrete Rayleigh distribution for the first time in the literature. In addition, discrete generalized exponential and geometric distributions are some special sub-models of the new distribution. Here, some basic distributional properties, moments, and order statistics of this new discrete distribution are studied. We will see that the hazard rate function can be in- creasing, decreasing, bathtub, and upside-down bathtub shaped. Estimation of the parameters is illustrated using the maximum likelihood method. The model with a real data set is also examined.
Keywords: Discrete generalized exponential distribution, exponentiated discrete Weibull distribution, exponentiated Weibull distribution, geometric distribution, infinite divisibility, order statistics, resilience parameter family, stress-strength parameter.
Pages: 127–146
DOI: 10.2436/20.8080.02.24
- PDF

Volume 38, number 2 (July–December 2014)

Fulvio Gismondi, Jacques Janssen, Raimondo Manca and Ernesto Volpe di Prignano: Stochastic cash flows modelled by homogeneous and non-homogeneous discrete time backward semi-Markov reward processes, pp. 107–138. DOI: 10.2436/20.8080.02.7
Albert Roso-Llorach, Carles Forné, Francesc Macià, Jaume Galceran, Rafael Marcos-Gragera and Montserrat Rué: Assessing the impact of early detection biases on breast cancer survival of Catalan women, pp. 139–160. DOI: 10.2436/20.8080.02.8
Saeid Tahmasebi and Ali Akbar Jafari: Estimators for the parameter mean of Morgenstern type bivariate generalized exponential distribution using ranked set sampling, pp. 161–180. DOI: 10.2436/20.8080.02.9
Francisco López-Ramos: Integrating network design and frequency setting in public transportation networks: a survey, pp. 181–214. DOI: 10.2436/20.8080.02.10
Mario A. Rojas, Heleno Bolfarine and Héctor W. Gómez: An extension of the slash-elliptical distribution, pp. 215–230. DOI: 10.2436/20.8080.02.11
María Isabel Sánchez-Rodríguez, Elena M. Sánchez-López, Alberto Marinas, José Mª Caridad, Francisco José Urbano and José Mª Marinas: New approaches in the chemometric analysis of infrared spectra of extra-virgin olive oils, pp. 231–250. DOI: 10.2436/20.8080.02.12
Haroon Barakat, Elsayed Nigm and Ramy Aldallal: Exact prediction intervals for future current records and record range from any continuous distribution, pp. 251–270. DOI: 10.2436/20.8080.02.13
Arantza Urkaregi, Lorea Martinez-Indart and José Ignacio Pijoán: Balancing properties. A need for the application of propensity score methods in estimation of treatment effects, pp. 271–284. DOI: 10.2436/20.8080.02.14
Mónica Bécue-Bertaut, Jérôme Pagès and Belchin Kostov: Untangling the influence of several contextual variables on the respondents'lexical choices. A statistical approach, pp. 285–302. DOI: 10.2436/20.8080.02.15
Ana Eugenia Marín Jiménez and José Antonio Roldán Nofuentes: Global hypothesis test to compare the likelihood ratios of multiple binary diagnostic tests with ignorable missing data, pp. 305–324. DOI: 10.2436/20.8080.02.16

Volume 38, number 1 (January–June 2014)

Editor's report
Albert Corominas, Alberto García-Villoria and Rafael Pastor: Improving parametric Clarke and Wright algorithms by means of iterative empirically adjusted greedy heurístics, pp. 3–12. DOI: 10.2436/20.8080.02.1
Belén Nieto, Susan Orbe and Ainhoa Zarraga: Time-Varying Market Beta: Does the estimation methodology matter?, pp. 13–42. DOI: 10.2436/20.8080.02.2
Abdul Rasoul Ziaei, Ayyub Sheikhi and Vahid Amirzadeh: Regression analysis using order statistics and their concomitants, pp. 43–52. DOI: 10.2436/20.8080.02.3
Yijiang Li, Peter X.-K. Song, Alan B. Leichtman, Michael A. Rees, and John D. Kalbfleisch: Decision making in kidney paired donation programs with altruistic donors, pp. 53–72. DOI: 10.2436/20.8080.02.4
Guadalupe Gómez Melis and Moisés Gómez-Mateu: The asymptotic relative efficiency and the ratio of sample sizes when testing two different null hypotheses, pp. 73–88. DOI: 10.2436/20.8080.02.5
Zuhair Bahraoui, Catalina Bolancé and Ana M. Pérez-Marín: Testing extreme value copulas to estimate the quantile, pp. 89–102. DOI: 10.2436/20.8080.02.6

Volume 37, number 2 (July–December 2013)

Guillermo Henry, Andrés Muñoz and Daniela Rodriguez: Locally adaptive density estimation on Riemannian manifolds, pp. 111–130
Ali Satty and H. Mwambi: Selection and pattern mixture models for modelling longitudinal data with dropout: An application study pp. 131–152
Jaap Spreeuw, Jens Perch Nielsen and Søren Fiig Jarner: A nonparametric visual test of mixed hazard models, pp. 153–174
Klaus Langohr, Guadalupe Gómez and Guillermo Hough: Quantile estimation of the rejection distribution of food products integrating assessor values and interval-censored consumer data, pp. 175–188
Housila P. Singh and Tanveer A. Tarray: An alternative to Kim and Warde's mixed randomized response model, pp. 189–210
Hamid Bidram and Vahid Nekoukhou: Double bounded Kumaraswamy-power series class of distributions, pp. 211–230
Héctor M. Ramos, Antonio Peinado, Jorge Ollero and María G. Ramos: Analysis of inequality in fertility curves fitted by Gamma distributions, pp. 233–240

Volume 37, number 1 (January–June 2013)

Mahdi Mahdizadeh and Nasser Reza Arghami: Improved entropy based test of uniformity using ranked set sample, pp. 3–18
Reinaldo B. Arellano-Valle, Héctor W. Gómez and Hugo S. Salinas: A note on the Fisher information matrix for the skew-generalized-normal model, pp. 19–28
Glòria Mateu-Figueras, Vera Pawlowsky-Glahn and Juan-José Egozcue: The normal distribution in some constrained sample spaces, pp. 29–56
María Isabel Sánchez-Rodríguez, Elena Sánchez-López, José Mª Caridad, Alberto Marinas, Jose Mª Marinas and Francisco José Urbano: New insights into evaluation of regression models through a decomposition of the prediction errors: application to near-infrared spectral data, pp. 57–78
Isabel Martínez-Silva, Javier Roca-Pardiñas, Vicente Lustres-Pérez, Altea Lorenzo-Arribas, Carmen Cadarso-Suárez: Flexible quantile regression models: application to the study of the purple sea urchin, pp. 81–94
Joan Simó, Marçal Plans, Francesc Casañas and Jose Sabaté: Modelling 'calçots' (Allium cepa L.) growth by Gompertz function, pp. 95–106

Volume 36, number 2 (July–December 2012)

Gisela Muniz, B. M. Golam Kibria, Kristofer Mansson and Ghazi Shukur: On developing ridge regression parameters: a graphical investigation, pp. 115–138
Manfred Lenzen and José M. Rueda-Cantuche: A note on the use of supply-use tables in impact analyses, pp. 139–152
Rodrigo R. Pescim, Gauss M. Cordeiro, Clarice G. B. Demétrio, Edwin M. M. Ortega and Saralees Nadarajah: The new class of Kummer beta generalized distributions, pp. 153–180
Seyed Ehsan Saffari, Robiah Adnan and William Greene: Hurdle negative binomial regression model with right censored count data, pp. 181–194
María Dolores Martínez-Miranda, Jens Perch Nielsen and Mario V.Wüthrich: Statistical modelling and forecasting of outstanding liabilities in non-life insurance, pp. 195–218
Claudio Flores, Mar Rodríguez-Girondo, Carmen Cadarso-Suárez, Thomas Kneib, Guadalupe Gómez and Luis Casanova: Flexible geoadditive survival analysis of non-Hodgkin lymphoma in Peru, pp. 221–230

Volume 36, number 1 (January–June 2012)

Editor's report
María Xosé Rodríguez-Álvarez, Carmen Cadarso-Suárez and Francisco González: Analysing visual receptive fields through generalised additive models with interactions (invited article with discussion: María L. Durbán and Thomas Kneib), pp. 3–44
Monika Gulhar, B. M. Golam Kibria, Ahmed N. Albatineh and Nasar U. Ahmed: A comparison of some confidence intervals for estimating the population coefficient of variation: a simulation study, pp. 45–68
Julián de la Horra and María Teresa Rodríguez-Bernal: Comparing and calibrating discrepancy measures for Bayesian model selection, pp. 69–80
José M. Merigó and Anna M. Gil-Lafuente: Decision making techniques with similarity measures and OWA operators, pp. 81–102
Jose M. Pavía, Francisco Morillas and Josep Lledó: Introducing migratory flows in life table construction, pp. 103–114

Volume 35, number 2 (July–December 2011)

Moustafa Omar Ahmed Abu-Shawiesh, Shipra Banik and B. M. Golam Kibria: A simulation study on some confidence intervals for the population standard deviation, pp. 83–102
Akbar Asgharzadeh, Reza Valiollahi, and Mohammad Z. Raqab: Stress-strength reliability of Weibull distribution based on progressively censored samples, pp. 103–124
Karim Zare and Abdolrahman Rasekh: Diagnostic measures for linear mixed measurement error models, pp. 125–144
Christian Blum: Iterative beam search for simple assembly line balancing with a fixed number of work stations, pp. 145–164
Edilberto Cepeda: A generalized spatio-temporal models, pp. 165–178

Special issue: Privacy in statistical databases. September 2011

Introduction to the special issue on Privacy in Statistical Databases
Philipp Bleninger, Jörg Drechsler and Gerd Ronning: Remote data access and the risk of disclosure from linear regression, pp. 7–24
Josep Domingo-Ferrer: Coprivacy: an introduction to the theory and applications of co-operative privacy, pp. 25–40
Arnau Erola, Jordi Castellà-Roca, Guillermo Navarro-Arribas and Vicenç Torra: Semantic microaggregation for the anonymization of query logs using the open directory project, pp. 41–58
Sarah Giessing and Jörg Höhne: Eliminating small cells from census counts tables: empirical vs. design transition probabilities, pp. 59–76
Jason Lucero, Michael Freiman, Lisa Singh, Jiashen You, Michael DePersio and Laura Zayatz The microdata analysis system at the U.S. Census Bureau, pp. 77–98
Anna Oganian: Multiplicative noise for masking numerical microdata with constraints, pp. 99–112

Volume 35, number 1 (January–June 2011)

Jordi Castro: Extending controlled tabular adjustment for non-additive tabular data with negative protection levels, pp. 3–20
José M. Rueda-Cantuche: The choice of type of input-output table revisited: moving towards the use of supply-use tables in impact analysis, pp. 21–38
Wieger Coutinho, Ton de Waal and Marco Remmerswaal: Imputation of numerical data under linear edit restrictions, pp. 39–62
Miguel A. Sordo and Carmen D. Ramos: Poverty comparisons when TIP curves intersect, pp. 65–80

Volume 34, number 2 (July–December 2010)

Jesús Artalejo, Antonio Gómez-Corral and Qi-Ming He: Markovian arrivals in stochastic modeling: a survey and some new results (invited article with discussion: Rafael Pérez-Ocón, Miklos Telek and Yiqiang Q. Zhao), pp. 3–20
Rajesh Tailor, Housila P. Singh and Ritesh Tailor: On ratio and product methods with certain known population parameters of auxiliary variable in sample surveys, pp. 157–180
Ignacio Díaz Emparanza and Vicente Núñez-Antón: On the use of simulation methods to compute probabilities: application to the first division Spanish soccer league, pp. 181–200
Karl Gerald Van Den Boogart, Juan José Egozcue and Vera Pawlowsky-Glahn: Bayes linear spaces, pp. 201–222
Catalina Bolancé: Optimal inverse beta (3,3) transformation in kernel density estimation, pp. 223–238
Vicente Lustres-Pérez, María Xosé Rodríguez-Álvarez, María Pazos Pata, Eugenio Fernández Pulpeiro and Carmen Cadarso- Suárez: Application of Receiver Operating Characteristic (ROC) methodology in biological studies of marine resources: sex determination of Paracentrotus lividus (Lamarck, 1816), pp. 239–248

Volume 34, number 1 (January–June 2010)

Editor's report
Nicholas T. Longford: Small-sample inference about variance and its transformations, pp. 3–20
Pilar Abad and Sonia Benito: Variance reduction technique for calculating value at risk in fixed income portfolios, pp. 21–44
Abdul Haq and Javid Shabbir: A family of ratio estimators for population mean in extreme ranked set sampling using two auxiliary variables, pp.45–66
María Pazos Pata, María Xosé Rodríguez-Álvarez, Vicente Lustres-Pérez, Eugenio Fernández Pulpeiro and Carmen Cadarso Suárez: Modelling spatial patterns of distribution and abundance of mussel seed using STAR models, pp. 67–78
José Pablo Arias-Nicolás, Julio Mulero, Olga Núñez-Barrera and Alfonso Suárez-Llorens: New aging properties of the Clayton-Oakes model based on multivariate dispersion, pp. 79–93

Volume 33, number 2 (July–December 2009)

Claudio Fuentes and George Casella: Testing for the existence of clusters, pp. 115–158 (invited article with discussion: María Jesús Bayarri, Adolfo Álvarez and Daniel Peña)
Guglielmo d'Amico: Nonparametric estimation of the expected accumulated reward for semi-Markov chains, pp. 159–170
Narayanaswamy Balakrishnan, Víctor Leiva, Antonio Sanhueza and Filidor Vilca: Estimation in the Birnbaum-Saunders distribution based on scale-mixture of normals and the EM-algorithm, pp. 171–192
María Martel, Miguel Angel Negrín and Francisco José Vázquez-Polo: Eliciting expert opinion for cost-effectiveness analysis: a flexible family of prior distributions, pp. 193–212
Morteza Amini and Jafar Ahmadi: How much Fisher information is contained in record values and their concomitants in the presence of inter-record times, pp. 213–232
Zuhair Al-Hemyari: Some improved two-stage shrinkage testimators for the mean of normal distribution, pp. 233–248

Special issue. 30 years of Qüestiió-SORT (1977–2007) and Centenari del naixement de Joaquim Torrens-Ibern (December 2009)

Anna Ventura, director of Idescat/directora de l'Idescat. Presentation/Presentació

30 years of Qüestiió-SORT (1977–2007)

Centenari del naixement de Joaquim Torrens-Ibern

Ressenya biogràfica de Joaquim Torrens-Ibern
Manuel Martí Recober. Contribucions de J. Torrens-Ibern a l'estadística catalana
Jaume Bassa. La trajectòria vital del professor J. Torrens-Ibern
Miquel Siguan. Contribucions del professor Joaquim Torrens-Ibern als estudis de Psicologia a Catalunya
Josep M. Domènech. El professor J. Torrens-Ibern i l'ensenyament de l'estadística a la Llicenciatura de Psicologia
J. Torrens-Ibern. Los métodos estadísticos de control en los procesos industriales continuos (1a i 2a part). Cuadernos de Estadística Aplicada e Investigación Operativa, vol. II, fasc. 3, 1963 i vol. II, fasc. 4, 1963.

Volume 33, number 1 (January–June 2009)

Ricardo Cao, Juan M. Vilar and Andrés Devia: Modelling consumer credit risk via survival analysis, pp. 3–30 (invited article with discussion: Noël Veraverbeke, Jean-Philippe Boucher and Jan Beran)
María Dolores Ugarte, Tomás Goicoa, Ana Fernández Militino and Marina Sagaseta-López: Estimating unemployment in very small areas, pp. 49–70
Housila P. Singh and Sunil Kumar: A general procedure of estimating the population mean in the presence of non-response under double sampling using auxiliary information, pp. 71–84
Alex Costa, Albert Satorra and Eva Ventura: On the performance of small-area estimators: fixed vs. random area parameters, pp. 85–104

Volume 32, number 2 (July–December 2008)

Edwin M. M. Ortega, Vicente G. Cancho and Victor Hugo Lachos: Assessing influence in survival data with a cured fraction and covariates, pp. 115–140
Olga Julià and Josep Vives-Rego: A microbiology application of the skew-Laplace distribution, pp. 141–150
Jordi Ocaña, M. Pilar Sánchez O., Álex Sánchez and Josep Lluís Carrasco: On equivalence and bioequivalence testing, pp. 151–176
Montserrat Herrador, Domingo Morales, María Dolores Esteban, Ángel Sánchez, Laureano Santamaría, Yolanda Marhuenda and Agustín Pérez: Sampling design variance estimation of small area estimators in the Spanish Labour Force survey, pp. 177–198

Volume 32, number 1 (January–June 2008)

Editor report
José María Sarabia and Emilio Gómez-Déniz: Construction of multivariate distributions: a review of some recent results, pp. 3–48 (invited article with discussion: M. del Carmen Pardo and Jorge Navarro)
M. Arefi, G. R. Mohtashami Borzadaran and Y. Vaghei: A note on interval estimation for the mean of inverse Gaussian distribution, pp. 49–56
Eliseo Martínez, Héctor Varela, Héctor W. Gómez and Heleno Bolfarine: A note on the likelihood and moments of the skew-normal distribution, pp. 57–66
Ana María Pérez-Marín: Empirical comparison between the Nelson-Aalen Estimator and the Naive Local Constant Estimator, pp. 67–76
Nicholas T. Longford: An alternative analysis of variance, pp. 77–92
Priscila Willems and M. Purificación Galindo Villardon: Canonical non-symmetrical correspondence analysis: an alternative in constrained ordination, pp. 93–111

Volume 31, number 2 (July–December 2007)

A. Alonso-Ayuso, L. F. Escudero and M.T. Ortuño: On modelling planning under uncertainty in manufacturing, pp. 109–150 (invited article with discussion: Monique Guinard, Gautam Mitra, Francisco Javier Prieto and Andrés Weintraub)
Miguel A. Sordo, Héctor M. Ramos and Carmen D. Ramos: Poverty measures and poverty orderings, pp. 169–180
Edilberto Cepeda-Cuervo and Vicente Núñez-Antón: Bayesian joint modelling of the mean and covariance structures for normal longitudinal data, pp. 181–200
Heinz Neudecker: A recursion formula for expected negative and positive powers of the central Wishart distribution, pp. 201–206

Volume 31, number 1 (January–June 2007)

José M. Bernardo: Objective Bayesian point and region estimation in location-scale models, pp. 3–44 (invited article with discussion: Miguel Ángel Gómez Villegas, Dennis V. Lindley and Mark J. Schervish)
Pedro Puig and Michael A. Stephens: Goodness of fit tests for the skew-Laplace distribution, pp. 45–54
I-Chun Chou, Harald Martens and Eberhard O. Voit: Parameter estimation of S-distributions with alternating regression, pp. 55–74
Carles Serrat and Guadalupe Gómez: Nonparametric bivariate estimation for successive survival times, pp. 75–96

Special issue. Albert Prat in memoriam (April 2006)

Volume 30, number 2 (July–December 2006)

Foreword
Gloria García and Josep M. Oller: What does intrinsic mean in statistical estimation?, pp. 125–170 (invited article with discussion: Jacob Burbea, Joan del Castillo, Wilfrid S. Kendall and Steven Thomas Smith)
Edwin M. M. Ortega, Vicente G. Cancho and Heleno Bolfarine: Influence diagnostics in exponentiated-Weibull regression models with censored data, pp. 171–192
Xavier Bardina, Laura Fernández, Elisabet Piñeiro, Jordi Surrallés and Antonia Velázquez: Statistical models to study subtoxic concentrations for some standard mutagens in three colon cancer cell lines, pp. 193–204
Albert Sorribas, José M. Muiño, Montserrat Rué and Joan Fibla: Univariate Parametric Survival Analysis using GS-distributions, pp. 205–218

Volume 30, number 1 (January–June 2006)

Elías Moreno and F. Javier Girón: On the frequentist and Bayesian approaches to hypothesis testing, pp. 3–54 (invited article with discussion: George Casella, Daniel Peña and Christian P. Robert)
Carles M. Cuadras: The importance of being the upper bound in the bivariate family, pp. 55–84
Heinz Neudecker: A matrix function useful in the estimation of linear continuous-time models, pp. 85–90
Mikhaïl Nikulin: About one problem of Bernoulli and Euler from the theory of statistical estimation, pp. 91–100
Àlex Costa, Albert Satorra and Eva Ventura: Improving small area estimation by combining surveys: new perspectives in regional statistics, pp. 101–122

Volume 29, number 2 (July–December 2005)

Youngjo Lee and John. A. Nelder: Likelihood for random-effect models, pp. 141–182 (invited article)
Saralees Nadarajah and Samuel Kotz: Muliere and Scarsini's bivariate Pareto distribution: sums, products, and ratios, pp. 183–200
Carlos Tenreiro: On the role played by the fixed bandwidth in the Bickel-Rosenblatt goodness-of-fit test, pp. 201–216
Mekki Terbeche, Broderick O. Oluyede and Ahmed Barbour: On sequential and fixed designs for estimation with comparisons and applications, pp. 217–234
M. Mercè Claramunt, M. Teresa Mármol and Ramon Lacayo: On the probability of reaching a barrier in an Erlang(2) risk process, pp. 235–248
Simplice Dossou-Gbété and Walter Tinsson: Factorial experimental designs and generalized linear models, pp. 249–268
Ana Debón, Francisco Montes and Ramon Sala: A comparison of parametric models for mortality graduation. Application to mortality data of the Valencia Region, pp. 269–288

Volume 29, number 1 (January–June 2005)

Jajo, N. K. Graphical display in outlier diagnostics; adequacy and robustness, pp. 1–10
Léandre, R. Positivity theorem for a general manifold, pp. 11–26
Ciampi, A., González Marcos, a. and Castejón Limas, M. Correspondence analysis and two-way clustering, pp. 27–42
Nadarajah, S. and Kotz, S. Information matrices for some elliptically symmetric distributions, pp. 43–56
De Waal, T. Automatic error localisation for categorical, continuous and integer data, pp. 57–100
Demesh, N. N. and Chekhmenok, S. L. Estimation of the spectral density of a homogeneous random stable discrete time field, pp. 101–118
Artalejo, J. R. and López Herrero, M. J. The M/G/1 retrial queue: An information theoretic approach, pp. 119–138

Volume 28, number 2 (July–December 2004)

Kutoyants, Y. On invariant density estimation for ergodic diffusion processes, pp. 111–124
Pashkevich, M. and Kharin, Y. S. Robust estimation and forecasting for beta-mixed hierarchical models of grouped binary data, pp. 125–160
Martins, A. and Ferreira, H. Extremes of periodic moving averages of random variables with regularly varying tail probabilities, pp. 161–176
Le Breton, A.; Kleptsyna, M. L. and Viot, M. Asymptotically optimal filtering in linear systems with fractional Brownian noises, pp. 177–190
Neudecker, H. Estimation of the noncentrality matrix of a noncentral Wishart distribution with unit scale matrix. A matrix generalitzation of Lenng's domination result, pp. 191–200
Kokonendji, C. C.; Dossou-Gbété, S. and Demétrio, C. Some discrete exponencial dispersion models: Poisson-Tweedie and Hinde-Demétrio classes, pp. 201–214
Santamaría, L; Morales, D. and Molina, I. A comparative study of small area estimators, pp. 215–230

Volume 28, number 1 (January–June 2004)

Khasminskii, R. On-line nonparametric estimation, pp. 1–8
Butucea, C. Asymptotic normality of the integrated square error of a density estimator in the convolution model, pp. 9–26
Neudecker, H. On best affine unbiased covariance-preserving prediction of factor scores, pp. 27–36
Bosq, D. and Blanke, D. Local superefficiency of data-driven projection density estimators in continuous time, pp. 37–54
Ferenstein, E. and Gasowski, M. Modelling stock returns with AR-GARCH processes, pp. 55–68
Costa, A.; Satorra, A. and Ventura, E. Improving both domain and total area estimation by composition, pp. 69–86
Vázquez-Polo F. J. and Negrín-Hernández, M. A. Incorporating patients, characteristics in cost-effectiveness studies with clinical trial data: a flexible Bayesian approach, pp. 87–108

Volume 27, number 2 (July–December 2003)

Romero, J. E. and López, J. J. Partial cooperation and convex sets, pp. 139–152
Neudecker, H. On two matrix derivaties by Kollo and von Rosen, pp. 153–164
Fine, J. Asymptotic study of canonical correlation analysis: from matrix and analytic approach to operator and tensor approach, pp. 165–174
Oganian, A. and Domingo-Ferrer, J. A posteriori disclosure risk measure for tabular data based on conditional entropy, pp. 175–190

Volume 27, number 1 (January–June 2003)

Foreword
Commenges, D. Likelihood for interval-censored observations from multi-state models, pp. 1–12
Cook, R. J.; Lawless J. F. and Lee, K. Cumulative processes related to event histories, pp. 13–30
Goetghebeur, E. and Loeys, T. A sensitivity analysis for causal parameters in structural proportional hazards models, pp. 31–40
Nielsen S. F. Survival analysis with coarsely observed covariates, pp. 41–64
Prentice, R. L. and Kalbfleisch, J. D. Aspects of the analysis of multivariate failure time data, pp. 65–78
Turnbull, B. W. and Jiang, W. Indirect inference for survival data, pp. 79–94
Medina J. R. and Yepes, V. Optimization of touristic distribution networks using genetic algorithms, pp. 95–112
Costa, A.; Satorra, A. and Ventura, E. An empirical evaluation of five small area estimators, pp. 113–136

Downloadable articles

Volume 49 (2), July-December 2025

A stochastic partial differential equation for Bayesian spatio-temporal modelling of crime

Optimism correction of the area under the ROC curve, with missing data

On generalized Gower distance for mixed-type data: extensive simulation study and new software tools

Bayesian estimation for conditional probabilities associated to directed acyclic graphs: study of hospitalization of severe influenza cases

Volume 49 (1), January-June 2025

Recent advances in copula-based methods for dependent censoring (invited article)

On statistical model extensions based on randomly stopped extremes

Lattice structures for the stochastic comparison of call ratio backspread derivatives with an application

Spatial autoregressive modelling of epidemiological data: geometric mean model proposal

Leave-group-out cross-validation for latent gaussian models

Volume 48 (2), July-December 2024

Patient-reported outcomes and survival analysis of chronic obstructive pulmonary disease patients: a two-stage joint modelling approach

Non-parametric estimation of the covariate-dependent bivariate distribution for censored gap times

Second-order Markov multistate models

Conditional likelihood based inference on single-index models for motor Insurance claim severity

Volume 48 (1), January-June 2024

A diffusion-based spatio-temporal extension of Gaussian Matérn fields (invited article with discussion)

Estimation of logistic regression parameters for complex survey data: simulation study based on real survey data

Kernel Weighting for blending probability and non-probability survey samples

Small area estimation of the proportion of single-person households: Application to the Spanish Household Budget Survey

Volume 47 (2), July-December 2023. Special issue devoted to 9th International Workshop on Compositional Data Analysis (CODAWORK, 2022). Guest editors: Germà Coenders and Javier Palarea-Albaladejo

40 years after Aitchison’s article “The statistical analysis of compositional data”. Where we are and where we are heading

Subcompositional coherence and and a novel proportionality index of parts

Compositional covariance shrinkage and regularised partial correlations

Simple enough, but not simpler: reconsidering additive logratio coordinates in compositional analysis

Classification of probability density functions in the framework of Bayes spaces: methods and applications

Fundamentals of convex optimization for compositional data

Interpretation of coal compositional data on whole-coal versus ash bases through the weighted symmetric pivot coordinates method

Volume 47 (1), January-June 2023

Transport systems analysis: models and data (invited article)

Data science, analytics and artificial intelligence in e-health: trends, applications and challenges

Optimal threshold of data envelopment analysis in bankruptcy prediction

Data wrangling, computational burden, automation, robustness and accuracy in ecological inference forecasting of RxC tables

Inference on the symmetry point-based optimal cut-off point and associated sensitivity and specificity with application to SARS-CoV-2 antibody data

Volume 46 (2), July-December 2022

Granger causality and time series regression for modelling the migratory dynamics of influenza into Brazil

Compositional combination and selection of forecasters

Missing data analysis and imputation via latent Gaussian Markov random fields

Alternate-wrapped circular distributions

Volume 46 (1), January-June 2022

Fifty years later: new directions in Hawkes processes (invited article)

Unusual-event processes for count data

Estimation of finite population distribution function with auxiliary information in a complex survey sampling

Penalized spline smoothing using Kaplan-Meier weights in semiparametric censored regression models

Topological Data Analysis and its usefulness for precision medicine studies

Estimation of cut-off points under complex-sampling design data

Volume 45 (2), July-December 2021

Nonparametric estimation of the probability of default with double smoothing

Modified almost unbiased two-parameter estimator for the Poisson regression model with an application to accident data

Bayesian hierarchical nonlinear modelling of intra-abdominal volume during pneumoperitoneum for laparoscopic surgery

Median bilinear models in presence of extreme values

Exponentiated power Maxwell distribution with quantile regression and applications

Volume 45 (1), January-June 2021

The radiant diagrams of Florence Nightingale (invited article)

Verifying compliance with ballast water standards: a decision-theoretic approach

Bayesian classification for dating archaeological sites via projectile points

Joint outlier detection and variable selection using discrete optimization

The unilateral spatial autogressive process for the regular lattice two-dimensional spatial discrete data

Volume 44 (2), July-December 2020

Independent increments in group sequential tests: a review (invited article)

Discrete generalized half-normal distribution and its applications in quantile regression

A simheuristic algorithm for time-dependent waste collection management with stochastic travel times

Why simheuristics? Benefits, limitations, and best practices when combining metaheuristics with simulation

Modelling multivariate, overdispersed count data with correlated and non-normal heterogeneity effects

Volume 44 (1), January-June 2020

Small area estimation of additive parameters under unit-level generalized linear mixed models

Finding archetypal patterns for binary questionnaires

Integer constraints for enhancing interpretability in linear regression

Modelling count data using the logratio-normal-multinomial distribution

Bartlett and Bartlett-type corrections for censored data from a Weibull distribution

Green hybrid fleets using electric vehicles: solving the heterogeneous vehicle routing problem with multiple driving ranges and loading capacities

Bayesian structured antedependence model proposals for longitudinal data

On interpretations of tests and effect sizes in regression models with a compositional predictor

Volume 43 (2), July-December 2019

Modelling human network behaviour using simulation and optimization tools: the need for hybridization

Tail risk measures using flexible parametric distributions

False discovery rate control for grouped or discretely supported p-values with application to a neuroimaging study

Kernel distribution estimation for grouped data

New L²-type exponentiality tests