## DSL for Generative Models – Next Steps (61/365)

I’m going to sketch out the next few things to plan and code for the DSL library. I have so far provided the ability to describe the network but I haven’t yet provided a way to describe the distributions. For instance, in the HMM Model, we ought to be able to say what the distribution of Symbols is and so on. I’ll start with the ability to pick two distributions: Dirichlet and Multinomial. We can cover many models with just these two. When I provide a way to specify what type of distribution each node is, I should be able to change the distributions at will without affecting the network; for instance, using a continuous response as opposed to a discrete response in a HMM.

After this, I will want to create a function in the Gibbs module that can take in a Reader and sample the distributions. In the case of HMM, this would mean sampling the Transition distributions and the Symbols distributions by reading the network to figure out their priors and support.

Finally, with the sampled distributions and a Reader I will write a sampler that produces a new Reader. In the case of HMM, this means sampling the new Topic variables. These steps cover the (uncollapsed) Gibbs sampling technique.

Looking ahead even further, I intend to write a method to compute the density of the observed variables (having marginalized out the latent variables). I will do this using the annealed importance sampling method as described in this paper “Evaluation Methods for Topic Models” by Hanna M. Wallah et. al. In the case of the HMM, this amounts to computing the probability of Symbol given Symbols and Transition while marginalizing out the Topic variables.

## Lexicographical Ordering (60/365)

> import Data.Ord (comparing)


This is completely random but I thought it was a neat use of laziness. You are familiar with lexicographical ordering? Haskell’s compare of lists implements this.

ghci> [1,2] < [1,3]
True

ghci> [1,3,4] < [1,3,4,5]
True

ghci> [2,4] < [2,3]
False


Note that this favors shorter strings.

ghci> [1,2] < [1,2,3]
True

ghci> [1,3] < [1,3,4]
True


For whatever reason, I wanted to favor longer strings. How can we do this? First, note that the above is equivalent to doing the comparison after appending an infinite list of zeros to each operand (assuming we are using only positive numbers).

ghci> let aug xs = xs ++ cycle [0]
ghci> comparing aug [1,2] [1,3]
LT

ghci> comparing aug [1,3,4] [1,3,4,5]
LT

ghci> comparing aug [2,4] [2,3]
GT


If I, instead, append an infinite list of a large number I can get what I want.

ghci> let aug xs = xs ++ cycle [9999999]
ghci> comparing aug [1,2] [1,2,3]
GT

ghci> comparing aug [1,3] [1,3,4]
GT


## Learning a Language (59/365)

I want to start another aside because this is something I spend a significant amount of time thinking about. That is, how do you learn a new language when you are approaching 30? Living in India, I think it’s a shame that people communicate only in English at work (if you are in the tech industry) because India is one of the few countries that still speaks in a vast number of tongues (though the number is perishing at an alarming rate).

Auto drivers in Bangalore speak at least four languages with ease: Hindi, Kannada, Tamil, and Telugu. And, English is a given. Everyone ought to speak as many languages as possible without inhibitions and should be encouraged to do so even if all you manage to speak is a “good morning” in that language. There is just so much to gain including new friends and perspectives and traditions and in the most beautiful way – through all these differences – it makes us aware of just how similar (not identical) we all are.

That brings me back to how one can learn a language quickly. I’ve always been fascinated with individual words in different languages. Part of that has to do with simply wanting to know the origin of a word and you end up tracing it back to different countries and sometimes end up recounting the history of two countries along the way. Ponder on the words “algebra” or “philosophy”.

I studied Hindi in school fifteen years ago and I hated it. It was boring and tiresome. There was no joy in the learning process at all. Now, I wish I could speak every language in the world. At this point I point you to this wonderful book by Anthony Burgess “Language Made Plain”. I want to start by making some random notes on the easy and difficult things I am facing as I learn Hindi once again

1. Learning a new alphabet is a problem. I really which all languages used a common alphabet. For me, this means, if I have to properly learn Kannada there is little point in me trying to read the script. That will take time.

2. Find out as many audio/video resources as possible. Once I refreshed the basics I scoured the web for children stories to listen to and try to listen to them while going and coming from work.

3. I find listening is the easiest and I have shown huge improvements on this in a short time. Speaking is harder mainly because I don’t get the chance to practice it. It takes time to construct a grammatically correct sentence. Forget writing for now.

4. Learning/picking up vocabulary is easiest. I used to think this is the main problem but it’s negligible compared to the problem of sentence structure, gender modifications, and more importantly learning phrases in the zeitgeist.

5. I need a way to measure my progress easily and to make sure improvement is not stalled. I haven’t yet found a good way. What I do now is while listening to some audio I write down words I don’t recognize and then look them up later. But again, right now my main issue is with figuring out how to improve my spoken form given that I am unable to practice out in the open regularly.

6. I like writing. My idea right now is to find a good and simple way to exploit this as an alternative to having to find people daily to practice on. What I want is a way to quickly exercise my ability to create short and grammatically correct sentences in various tenses.

Anyway, I’ll make a few posts now and then about approaches do and do not work for me when learning a language.

## DSL for Generative Models (58/365)

I’ve now cleaned up (in befb0f3cca0c212e368497e86f030aa96355be18) the Reader and Writer interfaces and added it to Statistics.GModeling.Gibbs. I’ve removed references to Support and simply parameterized using a key type k and value type v.

> data Reader k v = Reader
>   {
>     -- | Number of available indices
>     size :: Int
>     -- | Read the value at the given key
>   , readn :: Int -> k -> v
>     -- | Create a copy for writing only
>   , copy :: IO (Writer k v)
>   }
>
> data Writer k v = Writer
>   {
>     -- | Write the value at the given key
>     writen :: Int -> k -> v -> IO ()
>     -- | Create a read-only copy
>   }


I’ve also simplified the type of Indexed and added an implementation of Reader and Writer for HMM in Statistics.GModeling.Models.HMM.

## DSL for Generative Models (57/365)

I am putting together what I have so far in a repository here. So far, (133e22dc979d988706aafe52a346cee004f70ca5) it contains

1. Statistics.GModeling.DSL
2. Statistics.GModeling.Models.HMM
3. Statistics.GModeling.Models.LDA
4. Statistics.GModeling.Models.FixedTreeLDA

Will continue building the pieces in upcoming posts.

## Sharing a Birthday (56/365)

I think most have heard something like you only need suprisingly few people in a room before two people in the room end up sharing a birthday. But I never bothered to work it out. Let me do that.

First, forget leap years. Let a year have $365$ days. Note that if you have 366 people in the room you are guaranteed that someone will share a birthday. On the other end of the spectrum, if there are only two people in the room then the probability that the two of them share a birthday is given by one minus the number of ways they cannot share a birthday divided by the number of ways we can assign them a birthday: $\frac{(365)(364)}{(365)(365)}$.

In general, let’s say there are $k$ people in the room. The following gives the number of ways to assign different birthdays to each of the $k$ people.

$\displaystyle D(k) = (365)(365-1)(365-2)\dots(365-k+1) = (365)_k \\ \text{Note that when } k > 365 \text{ we get } D_k = 0$

The number of ways to assign any birthday to each of the $k$ people.

$\displaystyle N(k) = 365^k$

So, the probability that at least one pair out of $k$ will share a birthday is given by

$\displaystyle \frac{D(k)}{N(k)} = 1 - \frac{(365)_k}{365^k}$

Graphing it below. By $k=41$ the probability is already at $0.9$.

## Response Variable (55/365)

A quick aside. I was thinking about how response variables are attached to generative models. For instance, if we want to say have binary classification on documents we would normally 1) take the dot product the topic vector with a global vector of coefficients and then 2) pass it through a logistic function. This is fine, but the problem is that if we were sampling the topic model using a Gibbs sampler we have to use some form of gradient descent to compute the dot product coefficients and to compute the coefficients of the logistic function.

This is painful. I’d rather have everything be probabilistic. I was pondering on ways to do this. Let’s say that the topic vector has dimension $n$. Define two multinomial distributions $\{t_i\}$ and $\{f_i\}$. Now define the response through this procedure

1. Given topic distribution $z$
2. $i \gets t$
3. $j \gets f$
4. response $\gets \text{Bernoulli}\left( \frac{z_i}{z_i + z_j} \right)$

Essentially, the idea is $t$ and $f$ end up finding mutually exclusive dimensions such that either one or the other has a high value in order to produce the correct class label with high probability. I’d like to try it after I get done with the Gibbs code.