Category Archives: statistics

Modeling and Indexing

It has been well-tested in the real-world and is generally accepted that simple models of indexing perform really well. They have no problems scaling or dealing with gigantic vocabularies. The biggest downside to them is that they can only match … Continue reading

Posted in modeling, statistics | Tagged , , , | Leave a comment

Modeling atop a document representation

The paper “DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification” [1] describes a model that not only generates documents but learns them by associating each document with a label. The discrimination of a document is a function of the generative … Continue reading

Posted in optimization, statistics | Tagged , , , , | Leave a comment

Entropic Priors

Dirichlet (either by itself, or as a mixture of, or as a hierarchy of) priors are by no means the only option of controlling sparsity of topic mixtures. Entropic priors stand out as an interesting alternative. Given a probability distribution … Continue reading

Posted in optimization, statistics | Tagged , , | Leave a comment

Optimizing the Dirichlet hyperparameters

One of the things you’ll notice in papers describing generative models of documents using a Dirichlet prior is to simply fix the Dirichlet hyperparameter that controls the distributions of topic mixtures for each document. This isn’t ideal when you wish … Continue reading

Posted in optimization, statistics | Tagged , , , | Leave a comment