Project CEMAPRE internal
Title | Bayesian digit analysis |
Participants | Pedro Miguel Fonseca, Rui Paulo (Principal Investigator) |
Summary | Naturally-occurring, non-manipulated numerical datasets are known to often exhibit distinctive patterns in leading digits frequencies such as Benford's law, a logarithmically decaying pattern in leading digits frequencies. Through digit analysis, which makes use of statistical tests to evaluate the observed deviations from the expected pattern, such empirical regularities can be used to screen numerical datasets for erroneous or fraudulent data. Rigorous digit analysis usually requires testing Binomial or Multinomial precise hypotheses. In that context, classical significance tests with fixed dimension are known to over-reject the null hypothesis in large samples due to the high levels of power they attain, with the acceptance region shrinking with sample size, hence being prone to produce high false-positive rates. On the other hand, the Bayesian approach requires the specification of prior distributions. Without any further information available, it is natural to require that these prior distributions should be uni-modal, symmetrical, centred on the null hypothesis parameter value, and non-increasing around that same parameter value. Conjugate prior distributions are convenient for analytical purposes as they result in closed-form expressions for Bayes factors and posterior distributions, but usually are not versatile enough to allow for hyperparameter specifications such that all the aforementioned conditions are met without becoming too informative. Using MCMC methods, we intend to overcome this limitation in prior distribution specification by extending the application of Bayesian model selection and Bayesian parameter estimation to digit analysis beyond the scope of the family of conjugate prior distributions. |