## Response Variable (55/365)

A quick aside. I was thinking about how response variables are attached to generative models. For instance, if we want to say have binary classification on documents we would normally 1) take the dot product the topic vector with a global vector of coefficients and then 2) pass it through a logistic function. This is fine, but the problem is that if we were sampling the topic model using a Gibbs sampler we have to use some form of gradient descent to compute the dot product coefficients and to compute the coefficients of the logistic function.

This is painful. I’d rather have everything be probabilistic. I was pondering on ways to do this. Let’s say that the topic vector has dimension $n$. Define two multinomial distributions $\{t_i\}$ and $\{f_i\}$. Now define the response through this procedure

1. Given topic distribution $z$
2. $i \gets t$
3. $j \gets f$
4. response $\gets \text{Bernoulli}\left( \frac{z_i}{z_i + z_j} \right)$

Essentially, the idea is $t$ and $f$ end up finding mutually exclusive dimensions such that either one or the other has a high value in order to produce the correct class label with high probability. I’d like to try it after I get done with the Gibbs code.

## DSL for Generative Models – Interpreter (54/365)

The next step is for the library to have access to the latent variable data. I also don’t want the library to decide how to store the data because the user will have a much better idea of what is the most efficient way to store it. In terms of the library, I will need both read and write access to this data.

There are two kinds of data I need access to. In the case of HMM, for example, the values of Topic and Symbol form the Support to some distribution while the values of Topics and Symbols are the index of their respective distributions that’s currently active. So, consider the following signature

> type Support = Int
> -- Int -> a -> Either (a,Int) Support


For now, I’ll restrict Support to only integers. The second type takes some integer index, a label, and returns either the index or the support depending on what is being asked. I’ll make this clear in the end with the HMM example.

> data Reader a = Reader
>   {
>     size :: Int
>   , read :: Int -> a -> Either (a,Int) Support
>   , copy :: IO (Writer a)
>   }


Field size tells us how many indices are there [0..size-1]; read is the function we just saw, and copy creates a writable copy of the data.

> data Writer a = Writer
>   {
>     write :: Int -> a -> Support -> IO ()
>   }


Here write allows us to write at some Int index for the label a a new value. Let me take the HMM as an example again. For simplicity, let’s say we store the sequences as a list of lists.

> data HMMLabels = Alpha | Beta | Transition
>                | Initial | Topic | Symbols | Symbol
>
>
> type Sequences = [[(Int,Int)]]


> reader :: Sequences -> Reader HMMLabels
>   {
>     size = length (concat ss)
>   , read = \idx -> let (i,j) = indices !! idx
>                        (topic,symbol) = ss !! i !! j
>                        (prev_topic,_) = ss !! i !! (j-1)
>                    in \name -> case name of
>                                  Topic -> topic
>                                  Symbol -> symbol
>                                  Symbols -> (Topic,topic)
>                                  Transition -> if j==0
>                                                then (Initial,0)
>                                                else (Topic,prev_topic)
>   , copy = error "undefined"
>   }
>   where indices = concat $> map ($$i,s) -> [(i,j) | j <- [0..length s-1]]) (zip [0..] ss)  Note how the signature of read encourages caching; that is, the library can first supply only the index and then repeatedly query the resulting partial function for various names. This seems to be alright so far but I’ll only know if this holds up when I look at managing the distributions in the next post. Posted in Uncategorized | Tagged , , | Leave a comment ## DSL for Generative Models – Interpreter (53/365) In this post, I write some functions to interpret the DSL. Specifically, I present some functions to figure out the children and parents of a node and discover what the prior, observed, and latent variables are. > import Control.Monad (msum) > import Data.Maybe (mapMaybe) > import Data.List (nub,(\$$)  Recapping the DSL. > data Indexed a = Only a | a :@ [a] > data Edge a = Indexed a :-> a > type Network a = [Edge a]  Enumerating the names. > names :: Eq a => Network a -> [a] > names = nub . concatMap f > where f (Only a :-> b) = [a,b] > f ((p :@ _) :-> a) = [p, a]  Enumerating the children. > children :: Eq a => Network a -> a -> [a] > children xs a = concatMap f xs > where f (Only p :-> c) | p == a = [c] > f ((p :@ is) :-> c) | p == a || elem a is = [c] > f _ = []  Enumerating the parents. > parents :: Eq a => Network a -> a -> [a] > parents xs a = concatMap f xs > where f (Only p :-> c) | c == a = [p] > f ((p :@ _) :-> c) | c == a = [p] > f _ = []  Enumerating the observed variables. > observed :: Eq a => Network a -> [a] > observed n = filter (null . children n) . names$ n


Enumerating the priors.

> prior :: Eq a => Network a -> [a]
> prior n = filter (null . parents n) . names \$ n


Enumerating the latent variables.

> latent :: Eq a => Network a -> [a]
> latent xs = names xs \\ (prior xs ++ observed xs)


Index of a random variable

> indexOf :: Eq a => Network a -> a -> Maybe [a]
> indexOf xs a = msum (map f xs)
>   where f ((p :@ is) :-> _) | p == a = Just is
>         f _ = Nothing


Running on the hmm example.

> data HMMLabels = Alpha | Beta | Transition
>                | Initial | Topic | Symbols | Symbol
>     deriving (Show,Eq)
>
> hmm :: Network HMMLabels
> hmm =
>   [
>     Only Alpha                      :-> Transition
>   , Only Beta                       :-> Symbols
>   , (Transition :@ [Initial,Topic]) :-> Topic
>   , (Symbols :@ [Topic])            :-> Symbol
>   ]

ghci> observed hmm
[Symbol]

ghci> prior hmm
[Alpha,Beta]

ghci> latent hmm
[Transition,Symbols,Topic]

ghci> indexOf hmm Alpha
Nothing

ghci> indexOf hmm Transition
Just [Initial,Topic]

ghci> indexOf hmm Symbols
Just [Topic]

ghci> children hmm Alpha
[Transition]

ghci> parents hmm Alpha
[]

ghci> children hmm Topic
[Topic,Symbol]

ghci> parents hmm Topic
[Transition]


Next time, I want to look at how to specify the distributions.

## DSL for Generative Models – Examples (52/365)

In the previous post I attempted to introduce a DSL for probabilistic models inspired by the plate notation. Let’s try to see if we can define LDA with it.

> data LDALabels = Alpha | Beta | Topics | Topic
>                | Doc | Symbols | Symbol
> lda :: Network LDALabels
> lda =
>   [
>     Only Alpha :-> Docs
>   , Only Beta :-> Symbols
>   , (Topics :@ Doc) :-> Topic
>   , (Symbols :@ Topic) :-> Symbol
>   ]


Here is a topic model where topics are arranged in nodes of a fixed binary tree for each document. Let’s say the tree has depth $d$, then the distribution is parameterized by a TopicPath distribution (to select a leaf) and a TopicDepth distribution (to select a node along the path).

> data LDATreeLabels =
>     Alpha1 | Alpha2 | Beta
>   | TopicDepth | TopicPath | Topic | Doc
>   | Symbols | Symbol
>
> ldaTree :: Network LDATreeLabels
> ldaTree =
>   [
>     Only Alpha1         :-> TopicDepth
>   , Only Alpha2         :-> TopicPath
>   , Only Beta           :-> Symbols
>   , (TopicPath :@ Doc)  :-> Topic
>   , (TopicDepth :@ Doc) :-> Topic
>   , (Symbols :@ Topic)  :-> Symbol
>   ]


I think it looks pretty good so far. Let’s see how it up once I start interpreting the DSL.

## DSL for Generative Models (51/365)

The backlog becomes longer. I’ve changed jobs two weeks ago and it has upset my routine. No matter. Here we go again. I want to deviate from my problem solving mode for a while and use up a few posts to develop a little library for inference and use of probabilistic models. Why?

I’ve spent a lot of time coding generative models from scratch and it’s repetitive and painful and error-prone and my current job has put me back in the thick of machine learning research and I hope I’ll get to use this. The problem with coding models from scratch is keeping track of the distributions and carefully constructing the conditional probabilities for each latent variable that needs to be sampled. For some reason, I wasn’t keen on using existing libraries and I wanted to have a go at making one myself.

After many false starts, I decided that I’d first write up a DSL that can be used to describe the generative model in the way it’s usually represented by the plate notation. The key to the plate notation is that it makes it easy to represent indexed distributions on top of the underlying bayesian network constructed by drawing nodes and edges.

I’ll keep the Hidden Markov Model as a running example. First, the user defines his own type that provides names for the various random variables.

> data HMMLabels = Alpha | Beta | Transition
>                | Initial | Topic | Symbols | Symbol


The library now needs to provide a way to define the generative model on top of this. As a first step, we need to be able to define the plates; that is, to tell when a name is indexed by another name. In the case of the HMM, the symbol distributions are indexed by a topic and the topic distributions is either initial or is indexed by a topic.

Suppose the library provides the following

> data Indexed a = Only a | a :@ [a]


Then we can write

> -- Symbols :@ [Topic]
> -- Transition :@ [Initial,Topic]


And we can also define variables that stand on their own

> -- Only Alpha
> -- Only Beta


Next, is to allow the edges to be defined. Suppose we provide

> data Edge a = Indexed a :-> a
> type Network a = [Edge a]


The whole network can now be defined

> hmm :: Network HMMLabels
> hmm =
>   [
>     Only Alpha                      :-> Transition
>   , Only Beta                       :-> Symbols
>   , (Transition :@ [Initial,Topic]) :-> Topic
>   , (Symbols :@ [Topic])            :-> Symbol
>   ]


Next time, I’ll try to define a couple more models with this language to see if I am on the right track and then start writing an interpreter.

Posted in Uncategorized | Tagged , , | 1 Comment

## Random Variables – Problem (50/365)

Moving on to the next chapter “Random Variables – I”, take a look at the following problem. Show that the random variable $\theta$ is continuous if and only if $P(\theta = x) = 0$ for all $x \in \mathbb{R}$.

(Forward direction) Suppose $\theta$ is a continuous random variable, then its distribution function $F$ is also continuous by definition. Hence, $P(\theta = x) = F(x) - F(x-) = F(x) - \lim_{y \rightarrow x} F(y) = 0$ by definition of continuity.

(Reverse direction) Suppose $P(\theta = x) = 0$ for all $x \in \mathbb{R}$ and let $F$ be the corresponding distribution function. Let $\{A_n\}$ be a sequence of sets such that $A_n \supseteq A_{n+1}$ such that $\cap_{n=1}^\infty A_n = \{x\}$. Then, $P(\cap_{n=1}^\infty A_n) = \lim_n P(A_n)$ because $P$ is countably additive. Thus, $\lim_n P(A_n) = P(x)$ and since $P(a,b] = F(b) - F(a)$ we see that $\lim_{x \rightarrow c} F(x) = F(c)$ for any $c$ which is the definition of continuity.

Let $\mu$ be the Lebesque-Stieltjes measure generated by a continuous function. Show that if the set $A$ is at most countable, then $\mu(A) = 0$. A Lebesque-Stieltjes measure is a countably additive measure and is given by a generalized distribution function $G(x)$ such that $\mu([a,b)) = G(b) - G(a)$ that is continuous on the right. So,
$\displaystyle A = \cup_{i=1}^\infty \{ a_i \} \\ \mu(A) = \sum_{i=1}^\infty \mu(\{ a_i \}) \text{ by countable additivity}\\ \mu(\{ a_i \}) = \lim_{n \rightarrow \infty} \mu [a_i, a_i + \frac{1}{n}) \\ = \lim_{n \rightarrow \infty} G(a_i + \frac{1}{n}) - G(a_i) \\ = 0 \\ \text{Thus } \mu(A) = 0$