Bayesian Modeling of Sequential Discoveries

Zito, Alessandro; Rigon, Tommaso; Ovaskainen, Otso; Dunson, David B.

Zito_2022_Sequential_discoveries.pdf

acceptedVersion

Bayesian Modeling of Sequential Discoveries

Abstract

We aim at modelling the appearance of distinct tags in a sequence of labelled objects. Common examples of this type of data include words in a corpus or distinct species in a sample. These sequential discoveries are often summarised via accumulation curves, which count the number of distinct entities observed in an increasingly large set of objects. We propose a novel Bayesian method for species sampling modelling by directly specifying the probability of a new discovery, therefore allowing for flexible specifications. The asymptotic behavior and finite sample properties of such an approach are extensively studied. Interestingly, our enlarged class of sequential processes includes highly tractable special cases. We present a subclass of models characterized by appealing theoretical and computational properties, including one that shares the same discovery probability with the Dirichlet process. Moreover, due to strong connections with logistic regression models, the latter subclass can naturally account for covariates. We finally test our proposal on both synthetic and real data, with special emphasis on a large fungal biodiversity study in Finland.

Main Authors

Zito, Alessandro Rigon, Tommaso Ovaskainen, Otso Dunson, David B.

Format

Articles Research article

Published

2022

Series

Journal of the American Statistical Association

Subjects

accumulation curves

dirichlet process

logistic regression

poisson-binomial distribution

species sampling models

tilastolliset mallit

bayesilainen menetelmä

lajistokartoitus

otanta

Publication in research information system

https://converis.jyu.fi/converis/portal/detail/Publication/117772805

Publisher

Taylor & Francis

The permanent address of the publication

https://urn.fi/URN:NBN:fi:jyu-202211285379Use this for linking

Review status

Peer reviewed

ISSN

0162-1459

DOI

https://doi.org/10.1080/01621459.2022.2060835

Language

English

Published in

Journal of the American Statistical Association

Citation

Zito, A., Rigon, T., Ovaskainen, O., & Dunson, D. B. (2022). Bayesian Modeling of Sequential Discoveries. Journal of the American Statistical Association, Early online. https://doi.org/10.1080/01621459.2022.2060835

License

Funder(s)

European Commission

Funding program(s)

ERC European Research Council, H2020

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.

Additional information about funding

This project has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 856506).

Bayesian Modeling of Sequential Discoveries

Share

Similar Items