Research projects

Project CEMAPRE internal

TitleBayesian digit analysis
ParticipantsPedro Miguel Fonseca, Rui Paulo (Principal Investigator)
SummaryNaturally-occurring, non-manipulated numerical datasets are known to often exhibit distinctive
patterns in leading digits frequencies such as Benford's law, a logarithmically decaying pattern in
leading digits frequencies. Through digit analysis, which makes use of statistical tests to evaluate
the observed deviations from the expected pattern, such empirical regularities can be used to screen
numerical datasets for erroneous or fraudulent data.

Rigorous digit analysis usually requires testing Binomial or Multinomial precise hypotheses. In that
context, classical significance tests with fixed dimension are known to over-reject the null
hypothesis in large samples due to the high levels of power they attain, with the acceptance region
shrinking with sample size, hence being prone to produce high false-positive rates. On the other
hand, the Bayesian approach requires the specification of prior distributions. Without any further
information available, it is natural to require that these prior distributions should be uni-modal,
symmetrical, centred on the null hypothesis parameter value, and non-increasing around that same
parameter value. Conjugate prior distributions are convenient for analytical purposes as they result
in closed-form expressions for Bayes factors and posterior distributions, but usually are not
versatile enough to allow for hyperparameter specifications such that all the aforementioned
conditions are met without becoming too informative.

Using MCMC methods, we intend to overcome this limitation in prior distribution specification by
extending the application of Bayesian model selection and Bayesian parameter estimation to digit
analysis beyond the scope of the family of conjugate prior distributions.