I’m going to sketch out the next few things to plan and code for the DSL library. I have so far provided the ability to describe the network but I haven’t yet provided a way to describe the distributions. For instance, in the HMM Model, we ought to be able to say what the distribution of
Symbols is and so on. I’ll start with the ability to pick two distributions: Dirichlet and Multinomial. We can cover many models with just these two. When I provide a way to specify what type of distribution each node is, I should be able to change the distributions at will without affecting the network; for instance, using a continuous response as opposed to a discrete response in a HMM.
After this, I will want to create a function in the
Gibbs module that can take in a
Reader and sample the distributions. In the case of HMM, this would mean sampling the
Transition distributions and the
Symbols distributions by reading the network to figure out their priors and support.
Finally, with the sampled distributions and a
Reader I will write a sampler that produces a new
Reader. In the case of HMM, this means sampling the new
Topic variables. These steps cover the (uncollapsed) Gibbs sampling technique.
Looking ahead even further, I intend to write a method to compute the density of the observed variables (having marginalized out the latent variables). I will do this using the annealed importance sampling method as described in this paper “Evaluation Methods for Topic Models” by Hanna M. Wallah et. al. In the case of the HMM, this amounts to computing the probability of
Transition while marginalizing out the