DSL for Generative Models – Interpreter (54/365)

The next step is for the library to have access to the latent variable data. I also don’t want the library to decide how to store the data because the user will have a much better idea of what is the most efficient way to store it. In terms of the library, I will need both read and write access to this data.

There are two kinds of data I need access to. In the case of HMM, for example, the values of Topic and Symbol form the Support to some distribution while the values of Topics and Symbols are the index of their respective distributions that’s currently active. So, consider the following signature

> type Support = Int
> -- Int -> a -> Either (a,Int) Support

For now, I’ll restrict Support to only integers. The second type takes some integer index, a label, and returns either the index or the support depending on what is being asked. I’ll make this clear in the end with the HMM example.

For the library to have read access to the data I will provide this data type.

> data Reader a = Reader
>   {
>     size :: Int
>   , read :: Int -> a -> Either (a,Int) Support
>   , copy :: IO (Writer a)
>   }

Field size tells us how many indices are there [0..size-1]; read is the function we just saw, and copy creates a writable copy of the data.

> data Writer a = Writer
>   {
>     write :: Int -> a -> Support -> IO ()
>   , readOnly :: IO (Reader a)
>   }

Here write allows us to write at some Int index for the label a a new value. Let me take the HMM as an example again. For simplicity, let’s say we store the sequences as a list of lists.

> data HMMLabels = Alpha | Beta | Transition
>                | Initial | Topic | Symbols | Symbol
> 
> 
> type Sequences = [[(Int,Int)]]

We can provide a reader.

> reader :: Sequences -> Reader HMMLabels
> reader ss = Reader
>   {
>     size = length (concat ss)
>   , read = \idx -> let (i,j) = indices !! idx
>                        (topic,symbol) = ss !! i !! j
>                        (prev_topic,_) = ss !! i !! (j-1)
>                    in \name -> case name of
>                                  Topic -> topic
>                                  Symbol -> symbol
>                                  Symbols -> (Topic,topic)
>                                  Transition -> if j==0
>                                                then (Initial,0)
>                                                else (Topic,prev_topic)
>   , copy = error "undefined"
>   }
>   where indices = concat $
>           map (\(i,s) -> [(i,j) | j <- [0..length s-1]]) (zip [0..] ss)

Note how the signature of read encourages caching; that is, the library can first supply only the index and then repeatedly query the resulting partial function for various names. This seems to be alright so far but I’ll only know if this holds up when I look at managing the distributions in the next post.

Posted in Uncategorized | Tagged , , | Leave a comment

DSL for Generative Models – Interpreter (53/365)

In this post, I write some functions to interpret the DSL. Specifically, I present some functions to figure out the children and parents of a node and discover what the prior, observed, and latent variables are.

> import Control.Monad (msum)
> import Data.Maybe (mapMaybe)
> import Data.List (nub,(\\))

Recapping the DSL.

> data Indexed a = Only a | a :@ [a]
> data Edge a = Indexed a :-> a
> type Network a = [Edge a]

Enumerating the names.

> names :: Eq a => Network a -> [a]
> names = nub . concatMap f
>   where f (Only a :-> b) = [a,b]
>         f ((p :@ _) :-> a) = [p, a]

Enumerating the children.

> children :: Eq a => Network a -> a -> [a]
> children xs a = concatMap f xs
>   where f (Only p :-> c) | p == a = [c]
>         f ((p :@ is) :-> c) | p == a || elem a is = [c]
>         f _ = []

Enumerating the parents.

> parents :: Eq a => Network a -> a -> [a]
> parents xs a = concatMap f xs
>   where f (Only p :-> c) | c == a = [p]
>         f ((p :@ _) :-> c) | c == a = [p]
>         f _ = []

Enumerating the observed variables.

> observed :: Eq a => Network a -> [a]
> observed n = filter (null . children n) . names $ n

Enumerating the priors.

> prior :: Eq a => Network a -> [a]
> prior n = filter (null . parents n) . names $ n

Enumerating the latent variables.

> latent :: Eq a => Network a -> [a]
> latent xs = names xs \\ (prior xs ++ observed xs)

Index of a random variable

> indexOf :: Eq a => Network a -> a -> Maybe [a]
> indexOf xs a = msum (map f xs)
>   where f ((p :@ is) :-> _) | p == a = Just is
>         f _ = Nothing

Running on the hmm example.

> data HMMLabels = Alpha | Beta | Transition
>                | Initial | Topic | Symbols | Symbol
>     deriving (Show,Eq)
> 
> hmm :: Network HMMLabels
> hmm =
>   [
>     Only Alpha                      :-> Transition
>   , Only Beta                       :-> Symbols
>   , (Transition :@ [Initial,Topic]) :-> Topic
>   , (Symbols :@ [Topic])            :-> Symbol
>   ]
ghci> observed hmm
  [Symbol]

ghci> prior hmm
  [Alpha,Beta]

ghci> latent hmm
  [Transition,Symbols,Topic]

ghci> indexOf hmm Alpha
  Nothing

ghci> indexOf hmm Transition
  Just [Initial,Topic]

ghci> indexOf hmm Symbols
  Just [Topic]

ghci> children hmm Alpha
  [Transition]

ghci> parents hmm Alpha
  []

ghci> children hmm Topic
  [Topic,Symbol]

ghci> parents hmm Topic
  [Transition]

Next time, I want to look at how to specify the distributions.

Posted in Uncategorized | Tagged , , | Leave a comment

DSL for Generative Models – Examples (52/365)

In the previous post I attempted to introduce a DSL for probabilistic models inspired by the plate notation. Let’s try to see if we can define LDA with it.

> data LDALabels = Alpha | Beta | Topics | Topic
>                | Doc | Symbols | Symbol
> lda :: Network LDALabels
> lda =
>   [
>     Only Alpha :-> Docs
>   , Only Beta :-> Symbols
>   , (Topics :@ Doc) :-> Topic
>   , (Symbols :@ Topic) :-> Symbol
>   ]

Here is a topic model where topics are arranged in nodes of a fixed binary tree for each document. Let’s say the tree has depth d, then the distribution is parameterized by a TopicPath distribution (to select a leaf) and a TopicDepth distribution (to select a node along the path).

> data LDATreeLabels =
>     Alpha1 | Alpha2 | Beta
>   | TopicDepth | TopicPath | Topic | Doc
>   | Symbols | Symbol
> 
> ldaTree :: Network LDATreeLabels
> ldaTree =
>   [
>     Only Alpha1         :-> TopicDepth
>   , Only Alpha2         :-> TopicPath
>   , Only Beta           :-> Symbols
>   , (TopicPath :@ Doc)  :-> Topic
>   , (TopicDepth :@ Doc) :-> Topic
>   , (Symbols :@ Topic)  :-> Symbol
>   ]

I think it looks pretty good so far. Let’s see how it up once I start interpreting the DSL.

Posted in Uncategorized | Tagged , , | Leave a comment

DSL for Generative Models (51/365)

The backlog becomes longer. I’ve changed jobs two weeks ago and it has upset my routine. No matter. Here we go again. I want to deviate from my problem solving mode for a while and use up a few posts to develop a little library for inference and use of probabilistic models. Why?

I’ve spent a lot of time coding generative models from scratch and it’s repetitive and painful and error-prone and my current job has put me back in the thick of machine learning research and I hope I’ll get to use this. The problem with coding models from scratch is keeping track of the distributions and carefully constructing the conditional probabilities for each latent variable that needs to be sampled. For some reason, I wasn’t keen on using existing libraries and I wanted to have a go at making one myself.

After many false starts, I decided that I’d first write up a DSL that can be used to describe the generative model in the way it’s usually represented by the plate notation. The key to the plate notation is that it makes it easy to represent indexed distributions on top of the underlying bayesian network constructed by drawing nodes and edges.

I’ll keep the Hidden Markov Model as a running example. First, the user defines his own type that provides names for the various random variables.

> data HMMLabels = Alpha | Beta | Transition
>                | Initial | Topic | Symbols | Symbol

The library now needs to provide a way to define the generative model on top of this. As a first step, we need to be able to define the plates; that is, to tell when a name is indexed by another name. In the case of the HMM, the symbol distributions are indexed by a topic and the topic distributions is either initial or is indexed by a topic.

Suppose the library provides the following

> data Indexed a = Only a | a :@ [a]

Then we can write

> -- Symbols :@ [Topic]
> -- Transition :@ [Initial,Topic]

And we can also define variables that stand on their own

> -- Only Alpha
> -- Only Beta

Next, is to allow the edges to be defined. Suppose we provide

> data Edge a = Indexed a :-> a
> type Network a = [Edge a]

The whole network can now be defined

> hmm :: Network HMMLabels
> hmm =
>   [
>     Only Alpha                      :-> Transition
>   , Only Beta                       :-> Symbols
>   , (Transition :@ [Initial,Topic]) :-> Topic
>   , (Symbols :@ [Topic])            :-> Symbol
>   ]

Next time, I’ll try to define a couple more models with this language to see if I am on the right track and then start writing an interpreter.

Posted in Uncategorized | Tagged , , | 1 Comment

Random Variables – Problem (50/365)

Moving on to the next chapter “Random Variables – I”, take a look at the following problem. Show that the random variable \theta is continuous if and only if P(\theta = x) = 0 for all x \in \mathbb{R}.

(Forward direction) Suppose \theta is a continuous random variable, then its distribution function F is also continuous by definition. Hence, P(\theta = x) = F(x) - F(x-) = F(x) - \lim_{y \rightarrow x} F(y) = 0 by definition of continuity.

(Reverse direction) Suppose P(\theta = x) = 0 for all x \in \mathbb{R} and let F be the corresponding distribution function. Let \{A_n\} be a sequence of sets such that A_n \supseteq A_{n+1} such that \cap_{n=1}^\infty A_n = \{x\}. Then, P(\cap_{n=1}^\infty A_n) = \lim_n P(A_n) because P is countably additive. Thus, \lim_n P(A_n) = P(x) and since P(a,b] = F(b) - F(a) we see that \lim_{x \rightarrow c} F(x) = F(c) for any c which is the definition of continuity.

Posted in Uncategorized | Tagged , | Leave a comment

Measurable Spaces – Problem (49/365)

Let \mu be the Lebesque-Stieltjes measure generated by a continuous function. Show that if the set A is at most countable, then \mu(A) = 0. A Lebesque-Stieltjes measure is a countably additive measure and is given by a generalized distribution function G(x) such that \mu([a,b)) = G(b) - G(a) that is continuous on the right. So,

\displaystyle  A = \cup_{i=1}^\infty \{ a_i \} \\ \mu(A) = \sum_{i=1}^\infty \mu(\{ a_i \}) \text{ by countable additivity}\\ \mu(\{ a_i \}) = \lim_{n \rightarrow \infty} \mu [a_i, a_i + \frac{1}{n}) \\ = \lim_{n \rightarrow \infty} G(a_i + \frac{1}{n}) - G(a_i) \\ = 0 \\ \text{Thus } \mu(A) = 0

Posted in Uncategorized | Tagged , | Leave a comment

Measurable Spaces – Problem (48/365)

Another question on distribution functions. Show that each of the functions

\displaystyle  G(x,y) = 1 \text{ if } x+y \ge 0 \text{ and } 0 \text{ otherwise} \\ G(x,y) = [x+y] \text{, the integral part of } x+y

is continuous on the right but is not a distribution function in \mathbb{R}^2.

Take the first function. To show that it is continuous on the right, let \epsilon > 0 and let (x,y) \in \mathbb{R}^2. We need to show that there exists \delta > 0 such that for all (a,b) > (x,y) and within a distance of \delta the following holds: | G(x,y) - G(a,b) | < \epsilon. If G(x,y) = 0, then let \delta be the distance to the nearest point (p,q) where p+q = 0. We see that in this case, picking a point (a,b) > (x,y) within \delta of (x,y) will take on a value of 0 and the difference will be less that \epsilon. If G(x,y) = 1, then x \ge y and we can easily pick (a,b) > (x,y) such that a > b, meaning it will also take on a value of 1 and satisfy \epsilon. Thus, the first function is continuous on the right.

However, it is not a distribution function because it does not satisfy the requirement described in the last post that \Delta_{a_1b_1}\Delta_{a_2b_2} \ge 0 because if (a_1,a_2) = (1,2) and (b_1,b_2) = (-4,3) the difference function evaluates to -1.

The second function G(x,y) = [x+y] is not a distribution function because G(\infty, \infty) \ne 1. But it is continuous on the right because if we pick a point (x,y) the function is constant on the interval [[x+y],[x+y+1]) and it is open on the right meaning we can always find a delta on the right to satify any \epsilon > 0.

Posted in Uncategorized | Tagged , | Leave a comment