The full text of articles can be downloaded by clicking the PDF button. These may contain supplementary material that can be downloaded by clicking the ZIP button.
Next issue articles are papers that have been copy-edited and typeset but not yet paginated for inclusion in an issue of the journal. The final version of articles can be downloaded from the “Current issue” and "Downloadable articles" section.
Abstract: In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals’ capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao’s inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally-large or equally-complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao’s inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.
Keywords: Cauchy-Schwarz inequality, Chao2 estimator, extrapolation, Good-Turing frequency, formula, incidence data, phylogenetic diversity, rarefaction, sampling effort, shared species richness, species richness.
Pages: 3– 54
Abstract: The Lorenz curve is the most widely used graphical tool for describing and comparing inequality of income distributions. In this paper, we show that the elasticity of this curve is an indicator of the effect, in terms of inequality, of a truncation of the income distribution. As an application, we consider tax returns as equivalent to the truncation from below of a hypothetical income distribution. Then, we replace this hypothetical distribution by the income distribution obtained from a general household survey and use the dual Lorenz curve to anticipate this effect.
Keywords Lorenz curve, tax data, truncation, inequality.
Pages: 55– 72
Abstract: The Cox proportional hazards model is the most widely used survival prediction model for analysing time-to-event data. To measure the discrimination ability of a survival model the concordance probability index is widely used. In this work we studied and compared the performance of two different estimators of the concordance probability when a continuous predictor variable is categorised in a Cox proportional hazards regression model. In particular, we compared the c-index and the concordance probability estimator. We evaluated the empirical performance of both estimators through simulations. To categorise the predictor variable we propose a methodology which considers the maximal discrimination attained for the categorical variable. We applied this methodology to a cohort of patients with chronic obstructive pulmonary disease, in particular, we categorised the predictor variable forced expiratory volume in one second in percentage.
Keywords: Categorisation, prediction models, cutpoint, Cox model.
Pages: 73– 92
Abstract: Cultivation of horticultural species under organic management has increased in importance in recent years. However, the sustainability of this new production method needs to be supported by scientific research, especially in the field of virology. We studied the prevalence of three important virus diseases in agroecosystems with regard to its management system: organic versus non-organic, with and without greenhouse. Prevalence was assessed by means of a Bayesian correlated binary model which connects the risk of infection of each virus within the same plot and was defined in terms of a logit generalized linear mixed model (GLMM). Model robustness was checked through a sensitivity analysis based on different hyperprior scenarios. Inferential results were examined in terms of changes in the marginal posterior distributions, both for fixed and for random effects, through the Hellinger distance and a derived measure of sensitivity. Statistical results suggested that organic systems show lower or similar prevalence than non-organic ones in both single and multiple infections as well as the relevance of the prior specification of the random effects in the inferential process.
Keywords: Hellinger distance, model robustness, risk infection, sensitivity analysis, virus epidemiology.
Pages: 93– 116
Pages: 117– 118
Abstract: In this paper we study a goodness-of-fit test based on the maximum correlation coefficient, in the context of randomly censored data. We construct a new test statistic under general right- censoring and prove its asymptotic properties. Additionally, we study a special case, when the censoring mechanism follows the well-known Koziol-Green model. We present an extensive simulation study on the empirical power of these two versions of the test statistic, showing their ad- vantages over the widely used Pearson-type test. Finally, we apply our test to the head-and-neck cancer data.
Keywords: Goodness-of-fit, Kaplan-Meier estimator, maximum correlation, random censoring.
Pages: 119– 138
Abstract: Methods to preserve confidentiality when publishing geographic information conflict with the need to publish accurate data. The goal of this paper is to create a European geographic grid frame- work to disseminate statistical data over maps. We propose a methodology based on quadtree hierarchical geographic data structures. We create a varying size grid adapted to local area densities. High populated zones are disaggregated in small squares to allow dissemination of accurate data. Alternatively, information on low populated zones is published in big squares to avoid identification of individual data. The methodology has been applied to the 2014 population register data in Catalonia.
Keywords: Official statistics, confidentiality, disclosure limitation, dissemination, geographic information systems, hierarchical data structures, small area geography.
Pages: 139– 158
Abstract: Our objective in this paper is to model the dynamics of respiratory syncytial virus in the region of Valencia (Spain) and analyse the effect of vaccination strategies from a health-economic point of view. Compartmental mathematical models based on differential equations are commonly used in epidemiology to both understand the underlying mechanisms that influence disease transmission and analyse the impact of vaccination programs. However, a recently proposed Bayesian stochastic susceptible-infected-recovered-susceptible model in discrete-time provided an improved and more natural description of disease dynamics. In this work, we propose an extension of that stochastic model that allows us to simulate and assess the effect of a vaccination strategy that consists on vaccinating a proportion of newborns.
Keywords: Infectious diseases, respiratory syncytial virus (RSV), discrete-time epidemic model, stochastic compartmental model, Bayesian analysis, intervention strategies.
Pages: 159– 176
Abstract: Regression models for counts could be applied to the earth sciences, for instance when studying trends of extremes of climatological quantities. Hurdle models are modified count models which can be regarded as mixtures of distributions. In this paper, hurdle models are applied to model the sums of lengths of periods of high temperatures. A modification to the common versions presented in the literature is presented, as left truncation as well as a particular treatment of zeros is needed for the problem. The outcome of the model is compared to those of simpler count models.
Keywords: Count data, hurdle models, Poisson regression, negative binomial distribution, climate.
Pages: 177– 188