Skip to main content

Downloadable articles

The full text of articles can be downloaded by clicking the PDF button. These may contain supplementary material that can be downloaded by clicking the ZIP button.

Volume 43 (1), January-June 2019

  • A simheuristic for routing electric vehicles with limited driving ranges and stochastic travel times

    Lorena Reyes-Rubiano , Daniele Ferone , Angel A. Juan and Javier Faulin

    Abstract: Green transportation is becoming relevant in the context of smart cities, where the use of electric vehicles represents a promising strategy to support sustainability policies. However the use of electric vehicles shows some drawbacks as well, such as their limited driving-range capacity. This paper analyses a realistic vehicle routing problem in which both driving-range constraints and stochastic travel times are considered. Thus, the main goal is to minimize the expected time-based cost required to complete the freight distribution plan. In order to design reliable Routing plans, a simheuristic algorithm is proposed. It combines Monte Carlo simulation with a multi-start metaheuristic, which also employs biased-randomization techniques. By including simulation, simheuristics extend the capabilities of metaheuristics to deal with stochastic problems. A series of computational experiments are performed to test our solving approach as well as to analyse the effect of uncertainty on the routing plans.

    Keywords: Vehicle routing problem, electric vehicles, green transport and logistics, smart cities, simheuristics, biased-randomized heuristics

    Pages: 3–24

    DOI: 10.2436/20.8080.02.77

  • New L2-type exponentiality tests

    Marija Cuparić, Bojana Milosević and Marko Obradović

    Abstract: We introduce new consistent and scale-free goodness-of-fit tests for the exponential distribution based on the Puri-Rubin characterization. For the construction of test statistics we employ weighted L2 distance between V-empirical Laplace transforms of random variables that appear in the characterization. We derive the asymptotic behaviour under the null hypothesis as well as under fixed alternatives. We compare our tests, in terms of the Bahadur efficiency, to the likelihood ratio test, as well as some recent characterization based goodness-of-fit tests for the exponential distribution. We also compare the power of our tests to the power of some recent and classical exponentiality tests. According to both criteria, our tests are shown to be strong and outperform most of their competitors.

    Keywords: Goodness-of-fit, exponential distribution, Laplace transform, Bahadur efficiency, V-statistics with estimated parameters

    Pages: 25–50

    DOI: 10.2436/20.8080.02.78

  • Bayesian joint spatio-temporal analysis of multiple diseases

    Virgilio Gómez-Rubio, Francisco Palmí-Perales, Gonzalo López-Abente, Rebeca Ramis-Prieto and Pablo Fernández-Navarro

    Abstract: In this paper we propose a Bayesian hierarchical spatio-temporal model for the joint analysis of multiple diseases which includes specific and shared spatial and temporal effects. Dependence on shared terms is controlled by disease-specific weights so that their posterior distribution can be used to identify diseases with similar spatial and temporal patterns. The model proposed here has been used to study three different causes of death (oral cavity, esophagus and stomach cancer) in Spain at the province level. Shared and specific spatial and temporal effects have been estimated and mapped in order to study similarities and differences among these causes. Furthermore, estimates using Markov chain Monte Carlo and the integrated nested Laplace approximation are compared.

    Keywords: Bayesian modelling, Joint modelling, Multivariate disease mapping, Shared components. Spatio-temporal epidemiology

    Pages: 51–74

    DOI: 10.2436/20.8080.02.79

  • Internalizing negative externalities in vehicle routing problems through green taxes and green tolls

    Adrián Serrano-Hernández and Javier Faulín

    Abstract: Road freight transportation includes various internal and external costs that need to be accounted for in the construction of efficient routing plans. Typically, the resulting optimization problem is formulated as a vehicle routing problem in any of its variants. While the traditional focus of the vehicle routing problem was the minimization of internal routing costs such as travel distance or duration, numerous approaches to include external factors related to environmental routing aspects have been recently discussed in the literature. However, internal and external routing costs are often treated as competing objectives. This paper discusses the internalization of external routing costs through the consideration of green taxes and green tolls. Numeric experiments with a biased-randomization savings algorithm, show benefits of combining internal and external costs in delivery route planning.

    Keywords: Vehicle routing problem, biased randomization, green logistics, negative road externalities, internalization

    Pages: 75–94

    DOI: 10.2436/20.8080.02.80

  • A probabilistic model for explaining the points achieved by a team in football competition. Forecasting and regression with applications to the Spanish competition

    Emilio Gómez-Déniz, Nancy Dávila Cárdenes and José María Pérez Sánchez

    Abstract: In the last decades, a lot of research papers applying statistical methods for analysing sports data have been published. Football, also called soccer, is one of the most popular sports all over the world organised in national championships in a round robin format in which the team reaching the most points at the end of the tournament wins the competition. The aim of this work is to develop a suitable probability model for studying the points achieved by a team in a football match. For this purpose, we built a discrete probability distribution taking values, zero for losing, one for a draw and three for a victory. We test its performance using data from the Spanish Football League (First division) during the 2013-14 season. Furthermore, the model provides an attractive framework for predicting points and incorporating covariates in order to study the factors affecting the points achieved by the teams.

    Keywords: Covariate, football data, forecasting, regression, sport statistics, truncated distribution, weighted distribution

    Pages: 95–112

    DOI: 10.2436/20.8080.02.81

  • Automatic regrouping of strata in the goodness-of-fit chi-square test

    Vicente Núñez-Antón, Juan Manuel Pérez-Salamero González, Marta Regúlez-Castillo, Manuel Ventura-Marco and Carlos Vidal-Meliá

    Abstract: Pearson’s chi-square test is widely employed in social and health sciences to analyse categorical data and contingency tables. For the test to be valid, the sample size must be large enough to provide a minimum number of expected elements per category. This paper develops functions for regrouping strata automatically, thus enabling the goodness-of-fit test to be performed within an iterative procedure. The usefulness and performance of these functions is illustrated by means of a simulation study and the application to different datasets. Finally, the iterative use of the functions is applied to the Continuous Sample of Working Lives, a dataset that has been used in a considerable number of studies, especially on labour economics and the Spanish public pension system.

    Keywords: Goodness-of-fit chi-square test, statistical software, Visual Basic for Applications, Mathematica, Continuous Sample of Working Lives

    Pages: 113–142

    DOI: 10.2436/20.8080.02.83

  • On the optimism correction of the area under the receiver operating characteristic curve in logistic prediction models

    Amaia Iparragirre, Irantzu Barrio and María Xosé Rodríguez-Álvarez

    Abstract: When the same data are used to fit a model and estimate its predictive performance, this estimate may be optimistic, and its correction is required. The aim of this work is to compare the behaviour of different methods proposed in the literature when correcting for the optimism of the estimated area under the receiver operating characteristic curve in logistic regression models. A simulation study (where the theoretical model is known) is conducted considering different number of covariates, sample size, prevalence and correlation among covariates. The results suggest the use of k-fold cross-validation with replication and bootstrap.

    Keywords: Prediction models, logistic regression, area under the receiver operating characteristic curve, validation, bootstrap

    Pages: 145–162

    DOI: 10.2436/20.8080.02.82

  • Efficient algorithms for constructing D- and I-optimal exact designs for linear and non-linear models in mixture experiments

    Raúl Martín Martín, Irene García-Camacha Gutiérrez and Bernard Torsney

    Abstract: The problem of finding optimal exact designs is more challenging than that of approximate optimal designs. In the present paper, we develop two efficient algorithms to numerically construct exact designs for mixture experiments. The first is a novel approach to the well-known multiplicative algorithm based on sets of permutation points, while the second uses genetic algorithms. Using (i) linear and non-linear models, (ii) D- and I-optimality criteria, and (iii) constraints on the ingredients, both approaches are explored through several practical problems arising in the chemical, pharmaceutical and oil industry.

    Keywords: Optimal experimental design, D-optimality, I-optimality, mixture experiments, multiplicative algorithm, genetic algorithm, exact designs

    Pages: 163–190

    DOI: 10.2436/20.8080.02.84

Volume 42 (2), July-December 2018

  • Evidence functions: a compositional approach to information (invited article)

    Juan-José Egozcue and Vera Pawlowsky-Glahn

    Abstract: The discrete case of Bayes’ formula is considered the paradigm of information acquisition. Prior and posterior probability functions, as well as likelihood functions, called evidence functions, are compositions following the Aitchison geometry of the simplex, and have thus vector character. Bayes’ formula becomes a vector addition. The Aitchison norm of an evidence function is introduced as a scalar measurement of information. A fictitious fire scenario serves as illustration. Two different inspections of affected houses are considered. Two questions are addressed: (a) which is the information provided by the outcomes of inspections, and (b) which is the most informative inspection.

    Keywords: Evidence function, Bayes’ formula, Aitchison geometry, compositions, orthonormal basis, simplex, scalar information

    Pages: 101–124

    DOI: 10.2436/20.8080.02.71

  • A contingency table approach based on nearest neighbour relations for testing self and mixed correspondence

    Elvan Ceyhan

    Abstract: Nearest neighbour methods are employed for drawing inferences about spatial patterns of points from two or more classes. We introduce a new pattern called correspondence which is motivated by (spatial) niche/habitat specificity and segregation, and define an associated contingency table called a correspondence contingency table, and examine the relation of correspondence with the motivating patterns (namely, segregation and niche specificity). We propose tests based on the correspondence contingency table for testing self and mixed correspondence and determine the appropriate null hypotheses and the underlying conditions appropriate for these tests. We compare finite sample performance of the tests in terms of empirical size and power by extensive Monte Carlo simulations and illustrate the methods on two artificial data sets and one real-life ecological data set.

    Keywords: Association, complete spatial randomness, habitat/niche specificity, independence, random labelling, segregation

    Pages: 125–158

    DOI: 10.2436/20.8080.02.72

  • Efficiency of propensity score adjustment and calibration on the estimation from non-probabilistic online surveys

    Ramón Ferri-García and Maria del Mar Rueda

    Abstract: One of the main sources of inaccuracy in modern survey techniques, such as online and smartphone surveys, is the absence of an adequate sampling frame that could provide a probabilistic sampling. This kind of data collection leads to the presence of high amounts of bias in final estimates of the survey, specially if the estimated variables (also known as target variables) have some influence on the decision of the respondent to participate in the survey. Various correction techniques, such as calibration and propensity score adjustment or PSA, can be applied to remove the bias. This study attempts to analyse the efficiency of correction techniques in multiple situations, applying a combination of propensity score adjustment and calibration on both types of variables (correlated and not correlated with the missing data mechanism) and testing the use of a reference survey to get the population totals for calibration variables. The study was performed using a simulation of a fictitious population of potential voters and a real volunteer survey aimed to a population for which a complete census was available. Results showed that PSA combined with calibration results in a bias removal considerably larger when compared with calibration with no prior adjustment. Results also showed that using population totals from the estimates of a reference survey instead of the available population data does not make a difference in estimates accuracy, although it can contribute to slightly increment the variance of the estimator.

    Keywords: Online surveys, Smartphone surveys, propensity score adjustment, calibration, simulation

    Pages: 159–182

    DOI: 10.2436/20.8080.02.73

  • Field rules and bias in random surveys with quota samples. An assessment of CIS surveys

    José M. Pavía and Cristina Aybar

    Abstract: Surveys applying quota sampling in their final step are widely used in opinion and market research all over the world. This is also the case in Spain, where the surveys carried out by CIS (a public institution for sociological research supported by the government) have become a point of reference. The rules used by CIS to select individuals within quotas, however, could be improved as they lead to biases in age distributions. Analysing more than 545,000 responses collected in the 220 monthly barometers conducted between 1997 and 2016 by CIS, we compare the empirical distributions of the barometers with the expected distributions from the sample design and/or target populations. Among other results, we find, as a consequence of the rules used, significant overrepresentations in the observed proportions of respondents with ages equal to the minimum and maximum of each quota (age and gender group). Furthermore, in line with previous literature, we also note a significant overrepresentation of ages ending in zero. After offering simple solutions to avoid all these biases, we discuss some of their consequences for modelling and inference and about limitations and potentialities of CIS data

    Keywords: Centre for Sociological Research, quota sampling, fieldwork rules, age and gender groups, inter-quota distributions, intra-quota distributions

    Pages: 183–206

    DOI: 10.2436/20.8080.02.74

  • Effect of agro-climatic conditions on near infrared spectra of extra virgin olive oils

    María Isabel Sánchez-Rodríguez, Elena M. Sánchez-López, José Mª Caridad, Alberto Marinas and Francisco José Urbano

    Abstract: Authentication of extra virgin olive oil requires fast and cost-effective analytical procedures, such as near infrared spectroscopy. Multivariate analysis and chemometrics have been successfully applied in several papers to gather qualitative and quantitative information of extra virgin olive oils from near infrared spectra. Moreover, there are many examples in the literature analysing the effect of agro-climatic conditions on food content, in general, and in olive oil components, in particular. But the majority of these studies considered a factor, a non-numerical variable, containing this meteorological information. The present work uses all the agro-climatic data with the aim of highlighting the linear relationships between them and the near infrared spectra. The study begins with a graphical motivation, continues with a bivariate analysis and, finally, applies redundancy analysis to extend and confirm the previous conclusions.

    Keywords: Extra virgin olive oil, infrared spectroscopy, agro-climatic data, linear correlations, redundancy analysis

    Pages: 209–236

    DOI: 10.2436/20.8080.02.75

  • Poisson excess relative risk models: new implementations and software

    Manuel Higueras and Adam Howes

    Abstract: Two new implementations for fitting Poisson excess relative risk methods are proposed for assumed simple models. This allows for estimation of the excess relative risk associated with a unique exposure, where the background risk is modelled by a unique categorical variable, for example gender or attained age levels. Additionally, it is shown how to fit general Poisson linear relative risk models in R. Both simple methods and the R fitting are illustrated in three examples. The first two examples are from the radiation epidemiology literature. Data in the third example are randomly generated with the purpose of sharing it jointly with the R scripts.

    Keywords: Radiation epidemiology, Poisson non-linear regression, improper priors, R programming

    Pages: 237–252

    DOI: 10.2436/20.8080.02.76

Volume 42 (1), January-June 2018

  • Using a Bayesian change-point statistical model with autoregressive terms to study the monthly number of dispensed asthma medications by public health services

    José André Mota de Queiroz, Davi Casale Aragon, Luane Marques de Mello, Isolde Terezinha Santos Previdelli and Edson Martinez

    Abstract: In this paper, it is proposed a Bayesian analysis of a time series in the presence of a random change-point and autoregressive terms. The development of this model was motivated by a data set related to the monthly number of asthma medications dispensed by the public health services of Ribeirão Preto, Southeast Brazil, from 1999 to 2011. A pronounced increase trend has been observed from 1999 to a specific change-point, with a posterior decrease until the end of the series. In order to obtain estimates for the parameters of interest, a Bayesian Markov Chain Monte Carlo (MCMC) simulation procedure using the Gibbs sampler algorithm was developed. The Bayesian model with autoregressive terms of order 1 fits well to the data, allowing to estimate the change-point at July 2007, and probably reflecting the results of the new health policies and previously adopted programs directed toward patients with asthma. The results imply that the present model is useful to analyse the monthly number of dispensed asthma medications and it can be used to describe a broad range of epidemiological time series data where a change-point is present.

    Keywords: Time series, regression models, Bayesian methods, change-point model, epidemiological data

    Pages: 3–26

    DOI: 10.2436/20.8080.02.66

  • Evaluating the complexity of some families of functional data

    Enea Bongiorno, Aldo Goia and Philippe Vieu

    Abstract: In this paper we study the complexity of a functional data set drawn from particular processes by means of a two-step approach. The first step considers a new graphical tool for assessing to which family the data belong: the main aim is to detect whether a sample comes from a monomial or an exponential family. This first tool is based on a nonparametric kNN estimation of small ball probability. Once the family is specified, the second step consists in evaluating the extent of complexity by estimating some specific indexes related to the assigned family. It turns out that the developed methodology is fully free from assumptions on model, distribution as well as dominating measure. Computational issues are carried out by means of simulations and finally the method is applied to analyse some financial real curves dataset.

    Keywords: Small ball probability, log-Volugram, random processes, complexity class, complexity index, knn estimation, functional data analysis

    Pages: 27–44

    DOI: 10.2436/20.8080.02.67

  • Preliminary test and Stein-type shrinkage LASSO-based estimators

    Mina Norouzirad and Mohammad Arashi

    Abstract: Suppose the regression vector-parameter is subjected to lie in a subspace hypothesis in a linear regression model. In situations where the use of least absolute and shrinkage selection operator (LASSO) is desired, we propose a restricted LASSO estimator. To improve its performance, LASSO-type shrinkage estimators are also developed and their asymptotic performance is studied. For numerical analysis, we used relative efficiency and mean prediction error to compare the estimators which resulted in the shrinkage estimators to have better performance compared to the LASSO.

    Keywords: Double shrinking, LASSO, preliminary test LASSO, restricted LASSO, Stein-type shrinkage LASSO

    Pages: 45–58

    DOI: 10.2436/20.8080.02.68

  • Heteroscedasticity irrelevance when testing means difference

    Pablo Flores and Jordi Ocaña

    Abstract: Heteroscedasticity produces a lack of type I error control in Student’s t test for difference between means. Pretesting for it (e.g., by means of Levene’s test) should be avoided as this also induces type I error. These pretests are inadequate for their objective: not rejecting the null hypotheses is not a proof of homoscedasticity; and rejecting it may simply suggest an irrelevant heteroscedasticity. We propose a method to establish irrelevance limits for the ratio of variances. In conjunction with a test for dispersion equivalence, this appears to be a more affordable pretesting strategy.

    Keywords: Homoscedasticity, equivalence test, indifference zone, pretest, Student’s t test

    Pages: 59–72

    DOI: 10.2436/20.8080.02.69

  • Empirical analysis of daily cash flow time-series and its implications for forecasting

    Francisco Salas-Molina, Juan A. Rodríguez-Aguilar, Joan Serrà, Montserrat Guillen and Francisco J. Martin

    Abstract: Usual assumptions on the statistical properties of daily net cash flows include normality, absence of correlation and stationarity. We provide a comprehensive study based on a real-world cash flow data set showing that: (i) the usual assumption of normality, absence of correlation and stationarity hardly appear; (ii) non-linearity is often relevant for forecasting; and (iii) typical data transformations have little impact on linearity and normality. This evidence may lead to consider a more data-driven approach such as time-series forecasting in an attempt to provide cash managers with expert systems in cash management.

    Keywords: Statistics, forecasting, cash flow, non-linearity, time-series

    Pages: 73–98

    DOI: 10.2436/20.8080.02.70

Volume 41 (2), July-December 2017

  • Hierarchical models with normal and conjugate random effects: a review (invited article)

    Geert Molenberghs, Geert Verbeke and Clarice G.B. Demétrio

    Abstract: Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs et al. (2010) proposed a general framework to model hierarchical data subject to within-unit correlation and/or overdispersion. The framework extends classical overdispersion models as well as generalized linear mixed models. Subsequent work has examined various aspects that lead to the formulation of several extensions. A unified treatment of the model framework and key extensions is provided. Particular extensions discussed are: explicit calculation of correlation and other moment-based functions, joint modelling of several hierarchical sequences, versions with direct marginally interpretable parameters, zero-inflation in the count case, and influence diagnostics. The basic models and several extensions are illustrated using a set of key examples, one per data type (count, binary, multinomial, ordinal, and time-to-event).

    Keywords: Conjugacy, frailty, joint modelling, marginalized multilevel model, mixed model, overdispersion, underdispersion, variance component, zero-inflation.

    Pages: 191–254

    DOI: 10.2436/20.8080.02.58

  • A bivariate response model for studying the marks obtained in two jointly-dependent modules in higher education

    Emilio Gómez-Déniz, Nancy Dávila Cárdenes and María D. García Artiles

    Abstract: We study the factors which may affect students’ marks in two modules, mathematics and statistics, taught consecutively in the first year of a Business Administration Studies degree course. For this purpose, we introduce a suitable bivariate regression model in which the dependent variables have bounded support and the marginal means are functions of explanatory variables. The marginal probability density functions have a classical beta distribution. Simulation experiments were performed to observe the behaviour of the maximum likelihood estimators. Comparisons with univariate beta regression models show the proposed bivariate regression model to be superior.

    Keywords: Beta distribution, bivariate beta distribution, conditional distributions, covariate, marginal distributions, regression, mathematics, statistics, business studies.

    Pages: 255–276

    DOI: 10.2436/20.8080.02.59

  • Bayesian hierarchical models for analysing the spatial distribution of bioclimatic indices

    Xavier Barber, David Conesa, Antonio López-Quílez, Asunción Mayoral, Javier Morales and Antoni Barber

    Abstract: A methodological approach for modelling the spatial distribution of bioclimatic indices is proposed in this paper. The value of the bioclimatic index is modelled with a hierarchical Bayesian model that incorporates both structured and unstructured random effects. Selection of prior distributions is also discussed in order to better incorporate any possible prior knowledge about the parameters that could refer to the particular characteristics of bioclimatic indices. MCMC methods and distributed programming are used to obtain an approximation of the posterior distribution of the parameters and also the posterior predictive distribution of the indices. One main outcome of the proposal is the spatial bioclimatic probability distribution of each bioclimatic index, which allows researchers to obtain the probability of each location belonging to different bioclimates. The methodology is evaluated on two indices in the Island of Cyprus.

    Keywords: Bioclimatology, geostatistics, parallel computation, spatial prediction.

    Pages: 277–296

    DOI: 10.2436/20.8080.02.60

  • The Pareto IV power series cure rate model with applications

    Diego I. Gallardo, Yolanda M. Gómez, Barry C. Arnold and Héctor W. Gómez

    Abstract: Cutaneous melanoma is thought to be triggered by intense, occasional exposure to ultraviolet radiation, either from the sun or tanning beds, especially in people who are genetically predisposed to the disease. When skin cells are damaged by ultraviolet light in this way, often showing up as a sunburn, they are more prone to genetic defects that cause them to rapidly multiply and form potentially fatal (malignant) tumors. Melanoma originates in a type of skin cell called a melanocyte, such cells help produce the pigments of our skin, hair, and eyes. We propose a new cure rate survival regression model for predicting cutaneous melanoma. We assume that the unknown number of competing causes that can influence the survival time is governed by a power series Distribution and that the time until the tumor cells are activated follows the Pareto IV distribution. The parameter estimation is based on the EM algorithm which for this model can be implemented in a simple way in computational terms. Simulation studies are presented, showing the good performance of the proposed estimation procedure. Finally, two real applications related to a cutaneous melanoma and melanoma data sets are presented.

    Keywords: Competing risks, cure rate models, EM algorithm, Pareto IV distribution, power series distribution.

    Pages: 297–318

    DOI: 10.2436/20.8080.02.61

  • Estimating regional social accounting matrices to analyse rural development

    Alfredo Mainar-Causapé, José Manuel Rueda Cantuche, M. Alejandro Cardenete, Patricia Fuentes-Saguar, M. Carmen Delgado, Fabien Santini, Sébastien Mary and Sergio Gómez y Paloma

    Abstract: This paper has two complementary objectives: on the one hand, it introduces the EURO method for the estimation of (regional) Social Accounting Matrices. This method is widely used by Eurostat for the estimation of missing national Supply, Use and Input-output tables but it has not been used before within the context of social accounting matrices or of regional statistics and/or regional impact analyses. On the other hand, this work discusses the possibility of producing non-survey based regional Social Accounting Matrices that may eventually allow the user to carry out impact analyses such as those of rural development policies, among others. The analysis is carried out for 12 selected European regions based on clusters.

    Keywords: Social accounting matrices, rural development, European regions, impact analysis.

    Pages: 319–346

    DOI: 10.2436/20.8080.02.62

  • Joint models for longitudinal counts and left-truncated time-to event data with applications to health insurance

    Xavier Piulachs, Ramon Alemany, Montserrat Guillén and Dimitris Rizopoulos

    Abstract: Aging societies have given rise to important challenges in the field of health insurance. Elderly policyholders need to be provided with fair premiums based on their individual health status, whereas insurance companies want to plan for the potential costs of tackling lifetimes above mean expectations. In this article, we focus on a large cohort of policyholders in Barcelona (Spain), aged 65 years and over. A shared-parameter joint model is proposed to analyse the relationship between annual demand for emergency claims and time until death outcomes, which are subject to left truncation. We compare different functional forms of the association between both processes, and, furthermore, we illustrate how the fitted model provides time-dynamic predictions of survival probabilities. The parameter estimation is performed under the Bayesian framework using Markov chain Monte Carlo methods.

    Keywords: Joint models, panel count data, left truncation, Bayesian framework, health insurance.


    DOI: 10.2436/20.8080.02.63

  • Statistical and machine learning approaches for the minimization of trigger errors in parametric earthquake catastrophe bonds

    Laura Calvet, Madeleine Lopeman, Jésica de Armas, Guillermo Franco and Angel A. Juan

    Abstract: Catastrophe bonds are financial instruments designed to transfer risk of monetary losses arising from earthquakes, hurricanes, or floods to the capital markets. The insurance and reinsurance industry, governments, and private entities employ them frequently to obtain coverage. Parametric catastrophe bonds base their payments on physical features. For instance, given parameters such as magnitude of the earthquake and the location of its epicentre, the bond may pay a fixed amount or not pay at all. This paper reviews statistical and machine learning techniques for designing trigger mechanisms and includes a computational experiment. Several lines of future research are discussed.

    Keywords: Catastrophe bonds, risk of natural hazards, classification techniques, earthquakes, insurance.

    Pages: 373–392

    DOI: 10.2436/20.8080.02.64

  • Horizontal collaboration in freight transport: concepts, benefits and environmental challenges

    Adrián Serrano-Hernández, Angel A. Juan, Javier Faulin and Elena Perez-Bernabeu

    Abstract: Since its appearance in the 1990s, horizontal collaboration (HC) practices have revealed themselves as catalyzers for optimizing the distribution of goods in freight transport logistics. After introducing the main concepts related to HC, this paper offers a literature review on the topic and provides a classification of best practices in HC. Then, the paper analyses the main benefits and optimization challenges associated with the use of HC at the strategic, tactical, and operational levels. Emerging trends such as the concept of ‘green’ or environmentally-friendly HC in freight transport logistics are also introduced. Finally, the paper discusses the need of using hybrid optimization methods, such as simheuristics and learnheuristics, in solving some of the previously identified challenges in real-life scenarios dominated by uncertainty and dynamic conditions.

    Keywords: Horizontal collaboration, freight transport, sustainable logistics, supply chain management, combinatorial optimization.

    Pages: 393–414

    DOI: 10.2436/20.8080.02.65

Volume 41 (1), January-June 2017

  • Thirty years of progeny from Chao’s inequality: Estimating and comparing richness with incidence data and incomplete sampling (invited article)

    Anne Chao and Robert K. Colwell

    Abstract: In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals’ capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao’s inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally-large or equally-complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao’s inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.

    Keywords: Cauchy-Schwarz inequality, Chao2 estimator, extrapolation, Good-Turing frequency, formula, incidence data, phylogenetic diversity, rarefaction, sampling effort, shared species richness, species richness.

    Pages: 3– 54

    DOI: 10.2436/20.8080.02.49

  • On a property of Lorenz curves with monotone elasticity and its application to the study of inequality by using tax data

    Miguel A. Sordo, Angel Berihuete, Carmen Dolores Ramos and Héctor M. Ramos

    Abstract: The Lorenz curve is the most widely used graphical tool for describing and comparing inequality of income distributions. In this paper, we show that the elasticity of this curve is an indicator of the effect, in terms of inequality, of a truncation of the income distribution. As an application, we consider tax returns as equivalent to the truncation from below of a hypothetical income distribution. Then, we replace this hypothetical distribution by the income distribution obtained from a general household survey and use the dual Lorenz curve to anticipate this effect.

    Keywords Lorenz curve, tax data, truncation, inequality.

    Pages: 55– 72

    DOI: 10.2436/20.8080.02.50

  • Comparison of two discrimination indexes in the categorisation of continuous predictors in time-to-event studies

    Irantzu Barrio, María Xosé Rodríguez-Álvarez, Luis Meira-Machado, Cristóbal Esteban and Inmaculada Arostegui

    Abstract: The Cox proportional hazards model is the most widely used survival prediction model for analysing time-to-event data. To measure the discrimination ability of a survival model the concordance probability index is widely used. In this work we studied and compared the performance of two different estimators of the concordance probability when a continuous predictor variable is categorised in a Cox proportional hazards regression model. In particular, we compared the c-index and the concordance probability estimator. We evaluated the empirical performance of both estimators through simulations. To categorise the predictor variable we propose a methodology which considers the maximal discrimination attained for the categorical variable. We applied this methodology to a cohort of patients with chronic obstructive pulmonary disease, in particular, we categorised the predictor variable forced expiratory volume in one second in percentage.

    Keywords: Categorisation, prediction models, cutpoint, Cox model.

    Pages: 73– 92

    DOI: 10.2436/20.8080.02.51

  • Bayesian correlated models for assessing the prevalence of viruses in organic and non-organic agroecosystems

    Elena Lázaro, Carmen Armero and Luis Rubio

    Abstract: Cultivation of horticultural species under organic management has increased in importance in recent years. However, the sustainability of this new production method needs to be supported by scientific research, especially in the field of virology. We studied the prevalence of three important virus diseases in agroecosystems with regard to its management system: organic versus non-organic, with and without greenhouse. Prevalence was assessed by means of a Bayesian correlated binary model which connects the risk of infection of each virus within the same plot and was defined in terms of a logit generalized linear mixed model (GLMM). Model robustness was checked through a sensitivity analysis based on different hyperprior scenarios. Inferential results were examined in terms of changes in the marginal posterior distributions, both for fixed and for random effects, through the Hellinger distance and a derived measure of sensitivity. Statistical results suggested that organic systems show lower or similar prevalence than non-organic ones in both single and multiple infections as well as the relevance of the prior specification of the random effects in the inferential process.

    Keywords: Hellinger distance, model robustness, risk infection, sensitivity analysis, virus epidemiology.

    Pages: 93– 116

    DOI: 10.2436/20.8080.02.52

  • Corrigendum to "Transmuted geometric distribution with applications in modelling and regression analysis of count data"

    Subrata Chakraborty and Deepesh Bhati

    Pages: 117– 118

    DOI: 10.2436/20.8080.02.53

  • Goodness-of-fit test for randomly censored data based on maximum correlation

    Ewa Strzalkowska-Kominiak and Aurea Grané

    Abstract: In this paper we study a goodness-of-fit test based on the maximum correlation coefficient, in the context of randomly censored data. We construct a new test statistic under general right- censoring and prove its asymptotic properties. Additionally, we study a special case, when the censoring mechanism follows the well-known Koziol-Green model. We present an extensive simulation study on the empirical power of these two versions of the test statistic, showing their ad- vantages over the widely used Pearson-type test. Finally, we apply our test to the head-and-neck cancer data.

    Keywords: Goodness-of-fit, Kaplan-Meier estimator, maximum correlation, random censoring.

    Pages: 119– 138

    DOI: 10.2436/20.8080.02.54

  • A quadtree approach based on European geographic grids: reconciling data privacy and accuracy

    Raymond Lagonigro, Ramon Oller and Joan Carles Martori

    Abstract: Methods to preserve confidentiality when publishing geographic information conflict with the need to publish accurate data. The goal of this paper is to create a European geographic grid frame- work to disseminate statistical data over maps. We propose a methodology based on quadtree hierarchical geographic data structures. We create a varying size grid adapted to local area densities. High populated zones are disaggregated in small squares to allow dissemination of accurate data. Alternatively, information on low populated zones is published in big squares to avoid identification of individual data. The methodology has been applied to the 2014 population register data in Catalonia.

    Keywords: Official statistics, confidentiality, disclosure limitation, dissemination, geographic information systems, hierarchical data structures, small area geography.

    Pages: 139– 158

    DOI: 10.2436/20.8080.02.55

  • A Bayesian stochastic SIRS model with a vaccination strategy for the analysis of respiratory syncytial virus

    Marc Jornet-Sanz, Ana Corberán-Vallet, Francisco Santonja and Rafael Villanueva

    Abstract: Our objective in this paper is to model the dynamics of respiratory syncytial virus in the region of Valencia (Spain) and analyse the effect of vaccination strategies from a health-economic point of view. Compartmental mathematical models based on differential equations are commonly used in epidemiology to both understand the underlying mechanisms that influence disease transmission and analyse the impact of vaccination programs. However, a recently proposed Bayesian stochastic susceptible-infected-recovered-susceptible model in discrete-time provided an improved and more natural description of disease dynamics. In this work, we propose an extension of that stochastic model that allows us to simulate and assess the effect of a vaccination strategy that consists on vaccinating a proportion of newborns.

    Keywords: Infectious diseases, respiratory syncytial virus (RSV), discrete-time epidemic model, stochastic compartmental model, Bayesian analysis, intervention strategies.

    Pages: 159– 176

    DOI: 10.2436/20.8080.02.56

  • Statistical modeling of warm-spell duration series using hurdle models

    Jesper Rydén

    Abstract: Regression models for counts could be applied to the earth sciences, for instance when studying trends of extremes of climatological quantities. Hurdle models are modified count models which can be regarded as mixtures of distributions. In this paper, hurdle models are applied to model the sums of lengths of periods of high temperatures. A modification to the common versions presented in the literature is presented, as left truncation as well as a particular treatment of zeros is needed for the problem. The outcome of the model is compared to those of simpler count models.

    Keywords: Count data, hurdle models, Poisson regression, negative binomial distribution, climate.

    Pages: 177– 188

    DOI: 10.2436/20.8080.02.57

Volume 40 (2), July-December 2016

  • Improving the resolution of the simple assembly line balancing problem type E

    Albert Corominas, Alberto García-Villoria and Rafael Pastor

    Abstract: The simple assembly line balancing problem type E (abbreviated as SALBP-E) occurs when the number of workstations and the cycle time are variables and the objective is to maximise the line efficiency. In contrast with other types of SALBPs, SALBP-E has received little attention in the literature. In order to solve optimally SALBP-E, we propose a mixed integer liner programming model and an iterative procedure. Since SALBP-E is NP-hard, we also propose heuristics derived from the aforementioned procedures for solving larger instances. An extensive experimentation is carried out and its results show the improvement of the SALBP-E resolution.

    Keywords: Assembly line balancing, SALBP, manufacturing optimisation.

    Pages: 227– 242

    DOI: 10.2436/20.8080.02.42

  • Kernel-based estimation of P(X >Y) in ranked set sampling

    Mahdi Mahdizadeh and Ehsan Zamanzade

    Abstract: This article is directed at the problem of reliability estimation using ranked set sampling. A nonparametric estimator based on kernel density estimation is developed. The estimator is shown to be superior to its analog in simple random sampling. Monte Carlo simulations are employed to assess performance of the proposed estimator. Two real data sets are analysed for illustration.

    Keywords: Bandwidth selection, Judgment ranking, Stress-strength model.

    Pages: 243– 266

    DOI: 10.2436/20.8080.02.43

  • A construction of continuous-time ARMA models by iterations of Ornstein-Uhlenbeck processes

    Argimiro Arratia, Alejandra Cabaña and Enrique M. Cabaña

    Abstract: We present a construction of a family of continuous-time ARMA processes based on p iterations of the linear operator that maps a Lévy process onto an Ornstein-Uhlenbeck process. The construction resembles the procedure to build an AR(p) from an AR(1). We show that this family is in fact a subfamily of the well-known CARMA(p,q) processes, with several interesting advantages, including a smaller number of parameters. The resulting processes are linear combinations of Ornstein-Uhlenbeck processes all driven by the same Lévy process. This provides a straightforward computation of covariances, a state-space model representation and methods for estimating parameters. Furthermore, the discrete and equally spaced sampling of the process turns to be an ARMA(p, p−1) process. We propose methods for estimating the parameters of the iterated Ornstein-Uhlenbeck process when the noise is either driven by a Wiener or a more general Lévy process, and show simulations and applications to real data.

    Keywords: Ornstein-Uhlenbeck process, Lévy process, Continuous ARMA, stationary process.

    Pages: 267– 302

    DOI: 10.2436/20.8080.02.44

  • Modelling extreme values by the residual coefficient of variation

    Joan del Castillo and Maria Padilla

    Abstract: The possibilities of the use of the coefficient of variation over a high threshold in tail modelling are discussed. The paper also considers multiple threshold tests for a generalized Pareto distribution, together with a threshold selection algorithm. One of the main contributions is to extend the methodology based on moments to all distributions, even without finite moments. These techniques are applied to euro/dollar daily exchange rates and to Danish fire insurance losses.

    Keywords: Statistics of extremes, heavy tails, high quantile estimation, value at risk.

    Pages: 303– 320

    DOI: 10.2436/20.8080.02.45

  • Using robust FPCA to identify outliers in functional time series, with applications to the electricity market

    Juan M. Vilar, Paula Raña and Germán Aneiros

    Abstract: This study proposes two methods for detecting outliers in functional time series. Both methods take dependence in the data into account and are based on robust functional principal component analysis. One method seeks outliers in the series of projections on the first principal component. The other obtains uncontaminated forecasts for each data set and determines that those observations whose residuals have an unusually high norm are considered outliers. A simulation study shows the performance of these proposed procedures and the need to take dependence in the time series into account. Finally, the usefulness of our methodology is illustrated in two real datasets from the electricity market: daily curves of electricity demand and price in mainland Spain, for the year 2012.

    Keywords: Functional data analysis, functional principal component analysis, functional time series, outlier detection, electricity demand and price.

    Pages: 321– 348

    DOI: 10.2436/20.8080.02.46

  • Log-ratio methods in mixture models for compositional data sets

    Marc Comas-Cufí, Josep Antoni Martín-Fernández and Glòria Mateu-Figueras

    Abstract: When traditional methods are applied to compositional data misleading and incoherent results could be obtained. Finite mixtures of multivariate distributions are becoming increasingly important nowadays. In this paper, traditional strategies to fit a mixture model into compositional data sets are revisited and the major difficulties are detailed. A new proposal using a mixture of distributions defined on orthonormal log-ratio coordinates is introduced. A real data set analysis is presented to illustrate and compare the different methodologies.

    Keywords: Compositional data, Finite Mixture, Log ratio, Model-based clustering, Normal distribution, Orthonormal coordinates, Simplex.

    Pages: 340– 374

    DOI: 10.2436/20.8080.02.47

  • Smoothed landmark estimators of the transition probabilities

    Luís Meira-Machado

    Abstract: One important goal in clinical applications of multi-state models is the estimation of transition probabilities. Recently, landmark estimators were proposed to estimate these quantities, and their superiority with respect to the competing estimators has been proved in situations in which the Markov condition is violated. As a weakness, it provides large standard errors in estimation in some circumstances. In this article, we propose two approaches that can be used to reduce the variability of the proposed estimator. Simulations show that the proposed estimators may be much more efficient than the unsmoothed estimator. A real data illustration is included.

    Keywords: Kaplan-Meier, Multi-state model, Nonparametric estimation, Presmoothing, Survival Analysis.

    Pages: 375– 398

    DOI: 10.2436/20.8080.02.48

Volume 40 (1), January-June 2016

  • The relevance of multi-country input-output tables in measuring emissions trade balance of countries: the case of Spain

    Teresa Sanz, Rocío Yñiguez and José Manuel Rueda-Cantuche

    Abstract: As part of national accounts, input-output tables are becoming crucial statistical tools to study the economic, social and environmental impacts of globalization and international trade. In particular, global input-output tables extend the national dimension to the international dimension by relating individual countries’ input-output tables among each other, thus providing an opportunity to balance the global economy as a whole. Concerning emissions of greenhouse gases, the relative position that countries hold among their main trade partners at the global level is a key issue in terms of international climate negotiations. With this purpose, we show that (official) Multi-country input-output tables are crucial to analyse the greenhouse gas emission trade balance of individual countries. Spain has a negative trade emissions balance for all three gases analysed, being the most negative balances those associated to the bilateral trade with China, Russia, United States and the rest of the European Union as a whole.

    Keywords: WIOD, Emissions Trade Balance, Spain, GHG footprint, GHG.

    Pages: 3– 30

    DOI: 10.2436/20.8080.02.33

  • Two alternative estimation procedures for the negative binomial cure rate model with a latent activation scheme

    Diego I. Gallardo and Heleno Bolfarine

    Abstract: In this paper two alternative estimation procedures based on the EM algorithm are proposed for the flexible negative binomial cure rate model with a latent activation scheme. The Weibull model as well as the log-normal and gamma distributions are also considered for the time-to-event data for the non-destroyed cells. Simulation studies show the satisfactory performance of the proposed methodology. The impact of misspecifying the survival function on both components of the model (cured and susceptible) is also evaluated. The use of the new methodology is illustrated with a real data set related to a clinical trial on Phase III cutaneous melanoma patients.

    Keywords: Competing risks, EM algorithm, latent activation scheme.

    Pages: 31–54

    DOI: 10.2436/20.8080.02.34

  • A test for normality based on the empirical distribution function

    Hamzeh Torabi, Narges H. Montazeri and Aurea Grané

    Abstract: In this paper, a goodness-of-fit test for normality based on the comparison of the theoretical and empirical distributions is proposed. Critical values are obtained via Monte Carlo for several sample sizes and different significance levels.We study and compare the power of forty selected normality tests for a wide collection of alternative distributions. The new proposal is compared to some traditional test statistics, such as Kolmogorov-Smirnov, Kuiper, Cramér-von Mises, Anderson-Darling, Pearson Chi-square, Shapiro-Wilk, Shapiro-Francia, Jarque-Bera, SJ, Robust Jarque-Bera, and also to entropy-based test statistics. From the simulation study results it is concluded that the best performance against asymmetric alternatives with support on the whole real line and alternative distributions with support on the positive real line is achieved by the new test. Other findings derived from the simulation study are that SJ and Robust Jarque-Bera tests are the most powerful ones for symmetric alternatives with support on the whole real line, whereas entropy-based tests are preferable for alternatives with support on the unit interval.

    Keywords: Empirical distribution function, entropy estimator, goodness-of-fit tests, Monte Carlo simulation, Robust Jarque-Bera test, Shapiro-Francia test, SJ test, test for normality.

    Pages: 55–88

    DOI: 10.2436/20.8080.02.35

  • Point and interval estimation for the logistic distribution based on record data

    Akbar Asgharzadeh, Reza Valiollahi and Mousa Abdi

    Abstract: In this paper, based on record data from the two-parameter logistic distribution, the maximum likelihood and Bayes estimators for the two unknown parameters are derived. The maximum likelihood estimators and Bayes estimators can not be obtained in explicit forms. We present a simple method of deriving explicit maximum likelihood estimators by approximating the likelihood function. Also, an approximation based on the Gibbs sampling procedure is used to obtain the Bayes estimators. Asymptotic confidence intervals, bootstrap confidence intervals and credible intervals are also proposed. Monte Carlo simulations are performed to compare the performances of the different proposed methods. Finally, one real data set has been analysed for illustrative purposes.

    Keywords: Logistic distribution, record data,maximum likelihood estimator, Bayes estimator, Gibbs sampling.

    Pages: 89–112

    DOI: 10.2436/20.8080.02.36

  • A goodness-of-fit test for the multivariate Poisson distribution

    Francisco Novoa-Muñoz and María Dolores Jiménez-Gamero

    Abstract: Bivariate count data arise in several different disciplines and the bivariate Poisson distribution is commonly used to model them. This paper proposes and studies a computationally convenient goodness-of-fit test for this distribution, which is based on an empirical counterpart of a system of equations. The test is consistent against fixed alternatives. The null distribution of the test can be consistently approximated by a parametric bootstrap and by a weighted bootstrap. The goodness of these bootstrap estimators and the power for finite sample sizes are numerically studied. It is shown that the proposed test can be naturally extended to the multivariate Poisson distribution.

    Keywords: Bivariate Poisson distribution, goodness-of-fit, empirical probability generating function, parametric bootstrap, weighted bootstrap, multivariate Poisson distribution.

    Pages: 113–138

    DOI: 10.2436/20.8080.02.37

  • Exploring Bayesian models to evaluate control procedures for plant disease

    Danilo Alvares, Carmen Armero, Anabel Forte and Luis Rubio

    Abstract: Tigernut tubers are the main ingredient in the production of orxata in Valencia, a white soft sweet popular drink. In recent years, the appearance of black spots in the skin of tigernuts has led to important economic losses in orxata production because severely diseased tubers must be discarded. In this paper, we discuss three complementary statistical models to assess the disease incidence of harvested tubers from selected or treated seeds, and propose a measure of effectiveness for different treatments against the disease based on the probability of germination and the incidence of the disease. Statistical methods for these studies are approached from Bayesian reasoning and include mixed-effects models, Dirichlet-multinomial inferential processes and mixed-effects logistic regression models. Statistical analyses provide relevant information to carry out measures to palliate the black spot disease and achieve a high-quality production. For instance, the study shows that avoiding affected seeds increases the probability of harvesting asymptomatic tubers. It is also revealed that the best chemical treatment, when prioritizing germination, is disinfection with hydrochloric acid while sodium hypochlorite performs better if the priority is to have a reduced disease incidence. The reduction of the incidence of the black spots syndrome by disinfection with chemical agents supports the hypothesis that the causal agent is a pathogenic organism.

    Keywords: Dirichlet-multinomial model, logistic regression, measures of effectiveness, tigernuts tubers.

    Pages: 139–152

    DOI: 10.2436/20.8080.02.38

  • Transmuted geometric distribution with applications in modeling and regression analysis of count data

    Subrata Chakraborty and Deepesh Bhati

    Abstract: A two-parameter transmuted geometric distribution is proposed as a new generalization of the geometric distribution by employing the quadratic transmutation techniques of Shaw and Buckley. The additional parameter plays the role of controlling the tail length. Distributional properties of the proposed distribution are investigated. Maximum likelihood estimation method is discussed along with some data fitting experiments to show its advantages over some existing distributions in literature. The tail flexibility of density of aggregate loss random variable assuming the proposed distribution as primary distribution is outlined and presented along with a illustrative modelling of aggregate claim of a vehicle insurance data. Finally, we present a count regression model based on the proposed distribution and carry out its comparison with some established models.

    Keywords: Aggregate claim, count regression, geometric distribution, transmuted distribution.

    Pages: 153–176

    DOI: 10.2436/20.8080.02.39

  • Compound distributions motivated by linear failure rate

    Narjes Gitifar, Sadegh Rezaei and Saralees Nadarajah

    Abstract: Motivated by three failure data sets (lifetime of patients, failure time of hard drives and failure time of a product), we introduce three different three-parameter distributions, study basic mathematical properties, address estimation by the method of maximum likelihood and investigate finite sample performance of the estimators. We show that one of the new distributions provides a better fit to each data set than eight other distributions each having three parameters and three distributions each having two parameters.

    Keywords: Linear failure rate distribution, maximum likelihood estimation, Poisson distribution.

    Pages: 177–200

    DOI: 10.2436/20.8080.02.40

  • A statistical learning based approach for parameter fine-tuning of metaheuristics

    Laura Calvet, Angel A. Juan, Carles Serrat and Jana Ries

    Abstract: Metaheuristics are approximation methods used to solve combinatorial optimization problems. Their performance usually depends on a set of parameters that need to be adjusted. The selection of appropriate parameter values causes a loss of efficiency, as it requires time, and advanced analytical and problem-specific skills. This paper provides an overview of the principal approaches to tackle the Parameter Setting Problem, focusing on the statistical procedures employed so far by the scientific community. In addition, a novel methodology is proposed, which is tested using an already existing algorithm for solving the Multi-Depot Vehicle Routing Problem.

    Keywords: Parameter fine-tuning, metaheuristics, statistical learning, biased randomization.

    Pages: 201–224

    DOI: 10.2436/20.8080.02.41

Volume 39 (2), July-December 2015

  • Twenty years of P-splines (invited article)

    Paul H.C. Eilers, Brian D. Marx and Maria Durbán

    Abstract: P-splines first appeared in the limelight twenty years ago. Since then they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a “backbone” for the “mixing and matching” of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines.

    Keywords: B-splines, penalty, additive model, mixed model, multidimensional smoothing.

    Pages: 149–186

    DOI: 10.2436/20.8080.02.25

  • Likelihood-based inference for the power regression model

    Guillermo Martínez-Flórez, Heleno Bolfarine and Héctor W. Gómez

    Abstract: In this paper we investigate an extension of the power-normal model, called the alpha-power model and specialize it to linear and nonlinear regression models, with and without correlated errors. Maximum likelihood estimation is considered with explicit derivation of the observed and expected Fisher information matrices. Applications are considered for the Australian athletes data set and also to a data set studied in Xie et al. (2009). The main conclusion is that the proposed model can be a viable alternative in situations were the normal distribution is not the most adequate model.

    Keywords: Correlation, maximum likelihood, power-normal distribution, regression.

    Pages: 187–208

    DOI: 10.2436/20.8080.02.26

  • On the bivariate Sarmanov distribution and copula. An application on insurance data using truncated marginal distributions

    Zuhair Bahraoui, Catalina Bolancé, Elena Pelican and Raluca Vernic

    Abstract: The Sarmanov family of distributions can provide a good model for bivariate random variables and it is used to model dependency in a multivariate setting with given marginals. In this paper, we focus our attention on the bivariate Sarmanov distribution and copula with different truncated extreme value marginal distributions. We compare a global estimation method based on maximizing the full log-likelihood function with the estimation based on maximizing the pseudo-log-likelihood function for copula (or partial estimation). Our aim is to estimate two statistics that can be used to evaluate the risk of the sum exceeding a given value. Numerical results using a real data set from the motor insurance sector are presented.

    Keywords: Bivariate Sarmanov distribution, truncated marginal distributions, copula representation, risk measures.

    Pages: 209–230

    DOI: 10.2436/20.8080.02.27

  • On the interpretation of differences between groups for compositional data

    Josep-Antoni Martín-Fernández, Josep Daunis-i-Estadella and Glòria Mateu-Figueras

    Abstract: Social polices are designed using information collected in surveys; such as the Catalan Time Use survey. Accurate comparisons of time use data among population groups are commonly analysed using statistical methods. The total daily time expended on different activities by a single person is equal to 24 hours. Because this type of data are compositional, its sample space has particular properties that statistical methods should respect. The critical points required to interpret differences between groups are provided and described in terms of log-ratio methods. These techniques facilitate the interpretation of the relative differences detected in multivariate and univariate analysis.

    Keywords: Log-ratio transformations, MANOVA, perturbation, simplex, subcomposition.

    Pages: 231–252

    DOI: 10.2436/20.8080.02.28

  • Robust project management with the tilted beta distribution

    Eugene D. Hahn and María del Mar López Martín

    Abstract: Recent years have seen an increase in the development of robust approaches for stochastic project management methodologies such as PERT (Program Evaluation and Review Technique). These robust approaches allow for elevated likelihoods of outlying events, thereby widening interval estimates of project completion times. However, little attention has been paid to the fact that outlying events and/or expert judgments may be asymmetric. We propose the tilted beta distribution which permits both elevated likelihoods of outlying events as well as an asymmetric representation of these events. We examine the use of the tilted beta distribution in PERT with respect to other project management distributions.

    Keywords: Activity times, finite mixture, PERT, tilted beta distribution, robust project management, sensitivity analysis.

    Pages: 253–272

    DOI: 10.2436/20.8080.02.29

  • A note on "Double bounded Kumaraswamy-power series class of distributions"

    Tibor K. Pogány and Saralees Nadarajah

    Abstract: In a recent edition of SORT, Bidram and Nekoukhou proposed a novel class of distributions and derived its mathematical properties. Several of the mathematical properties are expressed as single infinite sums or double infinite sums. Here, we show that many of these properties can be expressed in terms of known special functions, functions for which in-built routines are widely available.

    Keywords: Double bounded Kumaraswamy-power series class of distributions, Fox Wright generalized, hypergeometric function, generalized hypergeometric function.

    Pages: 273–280

    DOI: 10.2436/20.8080.02.30

  • Parameter estimation of Poisson generalized linear mixed models based on three different statistical principles: a simulation study

    Martí Casals, Klaus Langohr, Josep Lluís Carrasco and Lars Rönnegård

    Abstract: Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles —marginal likelihood, extended likelihood, Bayesian analysis— via simulation studies. Real data on contact wrestling are used for illustration.

    Keywords: Estimation methods, overdispersion, Poisson generalized linear mixed models, simulation study, statistical principles, sport injuries.

    Pages: 281–308

    DOI: 10.2436/20.8080.02.31

  • Multinomial logistic estimation in dual frame surveys

    David Molina, Maria del Mar Rueda, Antonio Arcos and Maria Giovanna Ranalli

    Abstract: We consider estimation techniques from dual frame surveys in the case of estimation of proportions when the variable of interest has multinomial outcomes. We propose to describe the joint distribution of the class indicators by a multinomial logistic model. Logistic generalized regression estimators and model calibration estimators are introduced for class frequencies in a population. Theoretical asymptotic properties of the proposed estimators are shown and discussed. Monte Carlo experiments are also carried out to compare the efficiency of the proposed procedures for finite size samples and in the presence of different sets of auxiliary variables. The simulation studies indicate that the multinomial logistic formulation yields better results than the classical estimators that implicitly assume individual linear models for the variables. The proposed methods are also applied in an attitude survey.

    Keywords: Finite population, survey sampling, auxiliary information, model assisted inference, calibration.

    Pages: 309–336

    DOI: 10.2436/20.8080.02.32

Volume 39 (1), January-June 2015

  • Inference on the parameters of the Weibull distribution using records

    Ali Akbar Jafari and Hojatollah Zakerzadeh

    Abstract: The Weibull distribution is a very applicable model for lifetime data. In this paper, we have investigated inference on the parameters of Weibull distribution based on record values. We first propose a simple and exact test and a confidence interval for the shape parameter. Then, in addition to a generalized confidence interval, a generalized test variable is derived for the scale parameter when the shape parameter is unknown. The paper presents a simple and exact joint confidence region as well. In all cases, simulation studies show that the proposed approaches are more satisfactory and reliable than previous methods. All proposed approaches are illustrated using a real example.

    Keywords: Coverage probability, generalized confidence interval, generalized p-value, records, Weibull distribution.

    Pages: 3–18

    DOI: 10.2436/20.8080.02.17

  • Small area estimation of poverty indicators under partitioned area-level time models

    Domingo Morales, Maria Chiara Pagliarella and Renato Salvatore

    Abstract: his paper deals with small area estimation of poverty indicators. Small area estimators of these quantities are derived from partitioned time-dependent area-level linear mixed models. The introduced models are useful for modelling the different behaviour of the target variable by sex or any other dichotomic characteristic. The mean squared errors are estimated by explicit formulas. An application to data from the Spanish Living Conditions Survey is given.

    Keywords: Area-level models, small area estimation, time correlation, poverty indicators.

    Pages: 19–34

    DOI: 10.2436/20.8080.02.18

  • A new class of Skew-Normal-Cauchy distribution

    Jaime Arrué, Héctor W. Gomez, Hugo S. Salinas and Heleno Bolfarine

    Abstract: In this paper we study a new class of skew-Cauchy distributions inspired on the family extended two-piece skew normal distribution. The new family of distributions encompasses three well known families of distributions, the normal, the two-piece skew-normal and the skew-normal-Cauchy distributions. Some properties of the new distribution are investigated, inference via maximum likelihood estimation is implemented and results of a real data application, which reveal good performance of the new model, are reported.ious methods. All proposed approaches are illustrated using a real example.

    Keywords: Cauchy distribution, kurtosis, maximum likelihood estimation, singular information matrix, skewness, Skew-Normal-Cauchy distribution.

    Pages: 35–50

    DOI: 10.2436/20.8080.02.19

  • Diagnostic plot for the Identification of high leverage collinearity-influential observations

    Arezoo Bagheri and Habshah Midi

    Abstract: High leverage collinearity influential observations are those high leverage points that change the multicollinearity pattern of a data. It is imperative to identify these points as they are responsible for misleading inferences on the fitting of a regression model. Moreover, identifying these observations may help statistics practitioners to solve the problem of multicollinearity, which is caused by high leverage points. A diagnostic plot is very useful for practitioners to quickly capture abnormalities in a data. In this paper, we propose new diagnostic plots to identify high leverage collinearity influential observations. The merit of our proposed diagnostic plots is confirmed by some well-known examples and Monte Carlo simulations.

    Keywords: Collinearity influential observation, diagnostic robust generalized potential, high lever-age points, multicollinearity.

    Pages: 51–70

    DOI: 10.2436/20.8080.02.20

  • Discrete Alpha-Skew-Laplace distribution

    S. Shams Harandi and M. H. Alamatsaz

    Abstract: Classical discrete distributions rarely support modelling data on the set of whole integers. In this paper, we shall introduce a flexible discrete distribution on this set, which can, in addition, cover bimodal as well as unimodal data sets. The proposed distribution can also be fitted to positive and negative skewed data. The distribution is indeed a discrete counterpart of the continuous alpha-skew-Laplace distribution recently introduced in the literature. The proposed distribution can also be viewed as a weighted version of the discrete Laplace distribution. Several distributional properties of this class such as cumulative distribution function, moment generating function, moments, modality, infinite divisibility and its truncation are studied. A simulation study is also performed. Finally, a real data set is used to show applicability of the new model comparing to several rival models, such as the discrete normal and Skellam distributions.

    Keywords: Discrete Laplace distribution, discretization, maximum likelihood estimation, uni-bimodality, weighted distribution.

    Pages: 71–84

    DOI: 10.2436/20.8080.02.21

  • A mathematical programming approach for different scenarios of bilateral bartering

    Stefano Nasini, Jordi Castro and Pau Fonseca

    Abstract: The analysis of markets with indivisible goods and fixed exogenous prices has played an important role in economic models, especially in relation to wage rigidity and unemployment. This paper provides a novel mathematical programming based approach to study pure exchange economies where discrete amounts of commodities are exchanged at fixed prices. Barter processes, consisting in sequences of elementary reallocations of couple of commodities among couples of agents, are formalized as local searches converging to equilibrium allocations. A direct application of the analysed processes in the context of computational economics is provided, along with a Java implementation of the described approaches.

    Keywords: Numerical optimization, combinatorial optimization, microeconomic theory.

    Pages: 85–108

    DOI: 10.2436/20.8080.02.22

  • A comparison of computational approaches for maximum likelihood estimation of the Dirichlet parameters on high-dimensional data

    Marco Giordan and Ron Wehrens

    Abstract: Likelihood estimates of the Dirichlet distribution parameters can be obtained only through numerical algorithms. Such algorithms can provide estimates outside the correct range for the parameters and/or can require a large amount of iterations to reach convergence. These problems can be aggravated if good starting values are not provided. In this paper we discuss several approaches that can partially avoid these problems providing a good trade-off between efficiency and stability. The performances of these approaches are compared on high-dimensional real and simulated data.

    Keywords: Levenberg-Marquardt algorithm, re-parametrization, starting values, metabolomics data.

    Pages: 109–126

    DOI: 10.2436/20.8080.02.23

  • The exponentiated discrete Weibull distribution

    Vahid Nekoukhou and Hamid Bidram

    Abstract: In this paper, the exponentiated discrete Weibull distribution is introduced. This new generalization of the discrete Weibull distribution can also be considered as a discrete analogue of the exponentiated Weibull distribution. A special case of this exponentiated discrete Weibull distribution defines a new generalization of the discrete Rayleigh distribution for the first time in the literature. In addition, discrete generalized exponential and geometric distributions are some special sub-models of the new distribution. Here, some basic distributional properties, moments, and order statistics of this new discrete distribution are studied. We will see that the hazard rate function can be in- creasing, decreasing, bathtub, and upside-down bathtub shaped. Estimation of the parameters is illustrated using the maximum likelihood method. The model with a real data set is also examined.

    Keywords: Discrete generalized exponential distribution, exponentiated discrete Weibull distribution, exponentiated Weibull distribution, geometric distribution, infinite divisibility, order statistics, resilience parameter family, stress-strength parameter.

    Pages: 127–146

    DOI: 10.2436/20.8080.02.24

Volume 38, number 2 (July–December 2014)