Symbols
is and so on. I’ll start with the ability to pick two distributions: Dirichlet and Multinomial. We can cover many models with just these two. When I provide a way to specify what type of distribution each node is, I should be able to change the distributions at will without affecting the network; for instance, using a continuous response as opposed to a discrete response in a HMM.
After this, I will want to create a function in the Gibbs
module that can take in a Reader
and sample the distributions. In the case of HMM, this would mean sampling the Transition
distributions and the Symbols
distributions by reading the network to figure out their priors and support.
Finally, with the sampled distributions and a Reader
I will write a sampler that produces a new Reader
. In the case of HMM, this means sampling the new Topic
variables. These steps cover the (uncollapsed) Gibbs sampling technique.
Looking ahead even further, I intend to write a method to compute the density of the observed variables (having marginalized out the latent variables). I will do this using the annealed importance sampling method as described in this paper “Evaluation Methods for Topic Models” by Hanna M. Wallah et. al. In the case of the HMM, this amounts to computing the probability of Symbol
given Symbols
and Transition
while marginalizing out the Topic
variables.
> import Data.Ord (comparing)
This is completely random but I thought it was a neat use of laziness. You are familiar with lexicographical ordering? Haskell’s compare of lists implements this.
ghci> [1,2] < [1,3]
True
ghci> [1,3,4] < [1,3,4,5]
True
ghci> [2,4] < [2,3]
False
Note that this favors shorter strings.
ghci> [1,2] < [1,2,3]
True
ghci> [1,3] < [1,3,4]
True
For whatever reason, I wanted to favor longer strings. How can we do this? First, note that the above is equivalent to doing the comparison after appending an infinite list of zeros to each operand (assuming we are using only positive numbers).
ghci> let aug xs = xs ++ cycle [0]
ghci> comparing aug [1,2] [1,3]
LT
ghci> comparing aug [1,3,4] [1,3,4,5]
LT
ghci> comparing aug [2,4] [2,3]
GT
If I, instead, append an infinite list of a large number I can get what I want.
ghci> let aug xs = xs ++ cycle [9999999]
ghci> comparing aug [1,2] [1,2,3]
GT
ghci> comparing aug [1,3] [1,3,4]
GT
]]>Auto drivers in Bangalore speak at least four languages with ease: Hindi, Kannada, Tamil, and Telugu. And, English is a given. Everyone ought to speak as many languages as possible without inhibitions and should be encouraged to do so even if all you manage to speak is a “good morning” in that language. There is just so much to gain including new friends and perspectives and traditions and in the most beautiful way – through all these differences – it makes us aware of just how similar (not identical) we all are.
That brings me back to how one can learn a language quickly. I’ve always been fascinated with individual words in different languages. Part of that has to do with simply wanting to know the origin of a word and you end up tracing it back to different countries and sometimes end up recounting the history of two countries along the way. Ponder on the words “algebra” or “philosophy”.
I studied Hindi in school fifteen years ago and I hated it. It was boring and tiresome. There was no joy in the learning process at all. Now, I wish I could speak every language in the world. At this point I point you to this wonderful book by Anthony Burgess “Language Made Plain”. I want to start by making some random notes on the easy and difficult things I am facing as I learn Hindi once again
Learning a new alphabet is a problem. I really which all languages used a common alphabet. For me, this means, if I have to properly learn Kannada there is little point in me trying to read the script. That will take time.
Find out as many audio/video resources as possible. Once I refreshed the basics I scoured the web for children stories to listen to and try to listen to them while going and coming from work.
I find listening is the easiest and I have shown huge improvements on this in a short time. Speaking is harder mainly because I don’t get the chance to practice it. It takes time to construct a grammatically correct sentence. Forget writing for now.
Learning/picking up vocabulary is easiest. I used to think this is the main problem but it’s negligible compared to the problem of sentence structure, gender modifications, and more importantly learning phrases in the zeitgeist.
I need a way to measure my progress easily and to make sure improvement is not stalled. I haven’t yet found a good way. What I do now is while listening to some audio I write down words I don’t recognize and then look them up later. But again, right now my main issue is with figuring out how to improve my spoken form given that I am unable to practice out in the open regularly.
I like writing. My idea right now is to find a good and simple way to exploit this as an alternative to having to find people daily to practice on. What I want is a way to quickly exercise my ability to create short and grammatically correct sentences in various tenses.
Anyway, I’ll make a few posts now and then about approaches do and do not work for me when learning a language.
]]>befb0f3cca0c212e368497e86f030aa96355be18
) the Reader
and Writer
interfaces and added it to Statistics.GModeling.Gibbs
. I’ve removed references to Support
and simply parameterized using a key type k
and value type v
.
> data Reader k v = Reader
> {
> -- | Number of available indices
> size :: Int
> -- | Read the value at the given key
> , readn :: Int -> k -> v
> -- | Create a copy for writing only
> , copy :: IO (Writer k v)
> }
>
> data Writer k v = Writer
> {
> -- | Write the value at the given key
> writen :: Int -> k -> v -> IO ()
> -- | Create a read-only copy
> , readOnly :: IO (Reader k v)
> }
I’ve also simplified the type of Indexed
and added an implementation of Reader
and Writer
for HMM in Statistics.GModeling.Models.HMM
.
133e22dc979d988706aafe52a346cee004f70ca5
) it contains
Statistics.GModeling.DSL
Statistics.GModeling.Models.HMM
Statistics.GModeling.Models.LDA
Statistics.GModeling.Models.FixedTreeLDA
Will continue building the pieces in upcoming posts.
]]>First, forget leap years. Let a year have days. Note that if you have 366 people in the room you are guaranteed that someone will share a birthday. On the other end of the spectrum, if there are only two people in the room then the probability that the two of them share a birthday is given by one minus the number of ways they cannot share a birthday divided by the number of ways we can assign them a birthday: .
In general, let’s say there are people in the room. The following gives the number of ways to assign different birthdays to each of the people.
The number of ways to assign any birthday to each of the people.
So, the probability that at least one pair out of will share a birthday is given by
Graphing it below. By the probability is already at .
This is painful. I’d rather have everything be probabilistic. I was pondering on ways to do this. Let’s say that the topic vector has dimension . Define two multinomial distributions and . Now define the response through this procedure
Essentially, the idea is and end up finding mutually exclusive dimensions such that either one or the other has a high value in order to produce the correct class label with high probability. I’d like to try it after I get done with the Gibbs code.
]]>There are two kinds of data I need access to. In the case of HMM, for example, the values of Topic
and Symbol
form the Support
to some distribution while the values of Topics
and Symbols
are the index of their respective distributions that’s currently active. So, consider the following signature
> type Support = Int
> -- Int -> a -> Either (a,Int) Support
For now, I’ll restrict Support
to only integers. The second type takes some integer index, a label, and returns either the index or the support depending on what is being asked. I’ll make this clear in the end with the HMM example.
For the library to have read access to the data I will provide this data type.
> data Reader a = Reader
> {
> size :: Int
> , read :: Int -> a -> Either (a,Int) Support
> , copy :: IO (Writer a)
> }
Field size
tells us how many indices are there [0..size-1]
; read
is the function we just saw, and copy
creates a writable copy of the data.
> data Writer a = Writer
> {
> write :: Int -> a -> Support -> IO ()
> , readOnly :: IO (Reader a)
> }
Here write
allows us to write at some Int
index for the label a
a new value. Let me take the HMM as an example again. For simplicity, let’s say we store the sequences as a list of lists.
> data HMMLabels = Alpha | Beta | Transition
> | Initial | Topic | Symbols | Symbol
>
>
> type Sequences = [[(Int,Int)]]
We can provide a reader.
> reader :: Sequences -> Reader HMMLabels
> reader ss = Reader
> {
> size = length (concat ss)
> , read = \idx -> let (i,j) = indices !! idx
> (topic,symbol) = ss !! i !! j
> (prev_topic,_) = ss !! i !! (j-1)
> in \name -> case name of
> Topic -> topic
> Symbol -> symbol
> Symbols -> (Topic,topic)
> Transition -> if j==0
> then (Initial,0)
> else (Topic,prev_topic)
> , copy = error "undefined"
> }
> where indices = concat $
> map (\(i,s) -> [(i,j) | j <- [0..length s-1]]) (zip [0..] ss)
Note how the signature of read
encourages caching; that is, the library can first supply only the index and then repeatedly query the resulting partial function for various names. This seems to be alright so far but I’ll only know if this holds up when I look at managing the distributions in the next post.
> import Control.Monad (msum)
> import Data.Maybe (mapMaybe)
> import Data.List (nub,(\\))
Recapping the DSL.
> data Indexed a = Only a | a :@ [a]
> data Edge a = Indexed a :-> a
> type Network a = [Edge a]
Enumerating the names.
> names :: Eq a => Network a -> [a]
> names = nub . concatMap f
> where f (Only a :-> b) = [a,b]
> f ((p :@ _) :-> a) = [p, a]
Enumerating the children.
> children :: Eq a => Network a -> a -> [a]
> children xs a = concatMap f xs
> where f (Only p :-> c) | p == a = [c]
> f ((p :@ is) :-> c) | p == a || elem a is = [c]
> f _ = []
Enumerating the parents.
> parents :: Eq a => Network a -> a -> [a]
> parents xs a = concatMap f xs
> where f (Only p :-> c) | c == a = [p]
> f ((p :@ _) :-> c) | c == a = [p]
> f _ = []
Enumerating the observed variables.
> observed :: Eq a => Network a -> [a]
> observed n = filter (null . children n) . names $ n
Enumerating the priors.
> prior :: Eq a => Network a -> [a]
> prior n = filter (null . parents n) . names $ n
Enumerating the latent variables.
> latent :: Eq a => Network a -> [a]
> latent xs = names xs \\ (prior xs ++ observed xs)
Index of a random variable
> indexOf :: Eq a => Network a -> a -> Maybe [a]
> indexOf xs a = msum (map f xs)
> where f ((p :@ is) :-> _) | p == a = Just is
> f _ = Nothing
Running on the hmm example.
> data HMMLabels = Alpha | Beta | Transition
> | Initial | Topic | Symbols | Symbol
> deriving (Show,Eq)
>
> hmm :: Network HMMLabels
> hmm =
> [
> Only Alpha :-> Transition
> , Only Beta :-> Symbols
> , (Transition :@ [Initial,Topic]) :-> Topic
> , (Symbols :@ [Topic]) :-> Symbol
> ]
ghci> observed hmm
[Symbol]
ghci> prior hmm
[Alpha,Beta]
ghci> latent hmm
[Transition,Symbols,Topic]
ghci> indexOf hmm Alpha
Nothing
ghci> indexOf hmm Transition
Just [Initial,Topic]
ghci> indexOf hmm Symbols
Just [Topic]
ghci> children hmm Alpha
[Transition]
ghci> parents hmm Alpha
[]
ghci> children hmm Topic
[Topic,Symbol]
ghci> parents hmm Topic
[Transition]
Next time, I want to look at how to specify the distributions.
]]>> data LDALabels = Alpha | Beta | Topics | Topic
> | Doc | Symbols | Symbol
> lda :: Network LDALabels
> lda =
> [
> Only Alpha :-> Docs
> , Only Beta :-> Symbols
> , (Topics :@ Doc) :-> Topic
> , (Symbols :@ Topic) :-> Symbol
> ]
Here is a topic model where topics are arranged in nodes of a fixed binary tree for each document. Let’s say the tree has depth , then the distribution is parameterized by a TopicPath
distribution (to select a leaf) and a TopicDepth
distribution (to select a node along the path).
> data LDATreeLabels =
> Alpha1 | Alpha2 | Beta
> | TopicDepth | TopicPath | Topic | Doc
> | Symbols | Symbol
>
> ldaTree :: Network LDATreeLabels
> ldaTree =
> [
> Only Alpha1 :-> TopicDepth
> , Only Alpha2 :-> TopicPath
> , Only Beta :-> Symbols
> , (TopicPath :@ Doc) :-> Topic
> , (TopicDepth :@ Doc) :-> Topic
> , (Symbols :@ Topic) :-> Symbol
> ]
I think it looks pretty good so far. Let’s see how it up once I start interpreting the DSL.
]]>I’ve spent a lot of time coding generative models from scratch and it’s repetitive and painful and error-prone and my current job has put me back in the thick of machine learning research and I hope I’ll get to use this. The problem with coding models from scratch is keeping track of the distributions and carefully constructing the conditional probabilities for each latent variable that needs to be sampled. For some reason, I wasn’t keen on using existing libraries and I wanted to have a go at making one myself.
After many false starts, I decided that I’d first write up a DSL that can be used to describe the generative model in the way it’s usually represented by the plate notation. The key to the plate notation is that it makes it easy to represent indexed distributions on top of the underlying bayesian network constructed by drawing nodes and edges.
I’ll keep the Hidden Markov Model as a running example. First, the user defines his own type that provides names for the various random variables.
> data HMMLabels = Alpha | Beta | Transition
> | Initial | Topic | Symbols | Symbol
The library now needs to provide a way to define the generative model on top of this. As a first step, we need to be able to define the plates; that is, to tell when a name is indexed by another name. In the case of the HMM, the symbol distributions are indexed by a topic and the topic distributions is either initial or is indexed by a topic.
Suppose the library provides the following
> data Indexed a = Only a | a :@ [a]
Then we can write
> -- Symbols :@ [Topic]
> -- Transition :@ [Initial,Topic]
And we can also define variables that stand on their own
> -- Only Alpha
> -- Only Beta
Next, is to allow the edges to be defined. Suppose we provide
> data Edge a = Indexed a :-> a
> type Network a = [Edge a]
The whole network can now be defined
> hmm :: Network HMMLabels
> hmm =
> [
> Only Alpha :-> Transition
> , Only Beta :-> Symbols
> , (Transition :@ [Initial,Topic]) :-> Topic
> , (Symbols :@ [Topic]) :-> Symbol
> ]
Next time, I’ll try to define a couple more models with this language to see if I am on the right track and then start writing an interpreter.
]]>(Forward direction) Suppose is a continuous random variable, then its distribution function is also continuous by definition. Hence, by definition of continuity.
(Reverse direction) Suppose for all and let be the corresponding distribution function. Let be a sequence of sets such that such that . Then, because is countably additive. Thus, and since we see that for any which is the definition of continuity.
]]>is continuous on the right but is not a distribution function in .
Take the first function. To show that it is continuous on the right, let and let . We need to show that there exists such that for all and within a distance of the following holds: . If , then let be the distance to the nearest point where . We see that in this case, picking a point within of will take on a value of and the difference will be less that . If , then and we can easily pick such that , meaning it will also take on a value of and satisfy . Thus, the first function is continuous on the right.
However, it is not a distribution function because it does not satisfy the requirement described in the last post that because if and the difference function evaluates to .
The second function is not a distribution function because . But it is continuous on the right because if we pick a point the function is constant on the interval and it is open on the right meaning we can always find a delta on the right to satify any .
]]>And a difference function
Then show that
Just to make clear the notation above (which confused me for a while), take the example where , then
So this is true for . I won’t do the general case.
]]>The way we do this is to start with a distribution function from which we derive a unique probability measure where . Here is a problem.
Let , then verify that .
Verify that where .
The proof for the following are similar: , , and .
]]>If is a Borel set, then so is its complement . We know that all sets in must have this form for some and (proved in book and is simple). The function belongs to and therefore . Consider
Then, since the function belongs to . But clearly does not belong to . Hence is not a Borel set and neither is .
]]>This is a Borel set because we can intersect the set of converging sequences and the set of sequences bounded from below.
This is the set of all sequences converging to a finite limit. My initial thought was to use the result from last time where we showed that the set of sequences bounded from above or below by
But we can’t then union all the sets of the above form for each possible limit because there are uncountably many choices. It would seem that we need a way to characterize limits without picking the value of the limit. Luckily, there is such a characterization for converging sequences of real numbers; namely, Cauchy sequences. A sequence is a Cauchy sequence, if for all , there is a positive integer such that for all , . All sequences of real numbers converging to a finite limit are also Cauchy sequences.
The Cauchy condition is true if and only if for all , . We can write the set of all converging sequences as
As a result, the set of all sequences converging to a finite limit is measurable.
]]>The book asks to show that certain sets are members of . Show that the following are Borel sets.
Take the first case. Note that is not satisfied if for every , . This can only happen if there are an infinite number of coordinates whose value is . Let
The set is a Borel set since we have constructed it as a countable union of Borel sets. Therefore, (this is also a countable union) is a Borel set. A similar argument is made for the other.
]]>The answer is no. We can show this by showing that the natural numbers is a strict subset of . Every natural number can be written as where and only a finite number of (because if an infinite number of the sum is ).
Note that since is a decomposition we can write every set in as a countable (because this is a -algebra) union of a subset of . This means we can encode every set in as where . First, since is countable there is a bijection between the natural numbers and , however, is not countable since we can have a countable number of .
]]>Given a set , and a set of subsets , we say that is an algebra if and is closed under unions and complementation. A -algebra adds to that the requirement that it also be closed under countable unions. The pair is called a measureable space.
Let be -algebras of . Are the following systems of sets -algebras?
The intersection of -algebras is also a -algebra because , and in the intersection is contained in both and .
However, the union of -algebras is not always a -algebra. For instance, let , , then their union does not contain .
]]>Since ,
To see that it is finitely additive, let be disjoint.
To show that it is not countably additive, consider the case where is the set of natural numbers. Then
This chapter introduces us to how we can extend the probability framework we had for finite sample spaces. The key problem we face is that in the finite case we were simply able to assign a probability to each and therefore get . But we can no longer follow this approach for an infinite sample space.
Anyway, the problem asks the following. Let be the set of rational numbers in . Let be the algebra of sets where each set takes on one of these forms: , , , and . Show that is a finitely additive set function but not countably additive.
Let be disjoint sets. Then, we see that is finitely additive.
To show that is not countably additve we need to show that we can come up with an infinite sequence of disjoint sets whose sum of probabilitites is not equal to the probability of its union. This should bring back memories of converging sequences. Consider the sets . It is clear that the union of these sets is . But
What it clarifies for me is the step in the EM algorithm where one introduces auxilliary variables – one for each value hidden value that the hidden variable can take on – which somehow turns out to be the conditional probability of given everything else. Why this turns out to be the case has always been a little fuzzy to me. And Dan’s post clarifies it greatly. The step that determines the auxilliary variables comes from equating the derivative of the log-likelihood and the derivative of the simpler function involving ’s and solving for . Please have a read.
]]>Consider the reading of a book. It’s an activity that proceeds in sequence as we read one word after another from left to right. Let be the sequence of words in a book. Let’s say that there are two actions we can take when we encounter a word .
What remains to figure out is what do we mean by ‘know the word’. Let’s get to this slowly. For now, consider the simplest form of memory. Let’s say that memory is a set to which we add an unknown word when learning and then remove a known word when recalling. I’ll end this post with some code and plots.
Let’s start with a typeclass for a memory model (for learning and recalling) that we can use again later. The learn
method updates the model with an entry and the recall
method returns a new model and also returns True
if the given a
was recalled.
> {-# LANGUAGE BangPatterns #-}
>
> import Data.Hashable
> import qualified Data.HashSet as HS
> import Data.Char
>
> class Mem m where
> recall :: (Eq a, Hashable a) => m a -> a -> (m a, Bool)
> learn :: (Eq a, Hashable a) => a -> m a -> m a
We create an instance for the simple model I described above.
> newtype SimpleMem a = SimpleMem (HS.HashSet a)
>
> instance Mem SimpleMem where
> recall (SimpleMem mem) a | HS.member a mem = (SimpleMem (HS.delete a mem), True)
> | otherwise = (SimpleMem mem, False)
> learn a (SimpleMem mem) = SimpleMem (HS.insert a mem)
Given a sequence of words we will now read it left to right and then label it if we need to learn the word and if we are recalling it.
> walk :: (Mem m, Eq a, Hashable a) => m a -> [a] -> [Int]
> walk initial = go initial
> where go _ [] = []
> go !mem (a:as) =
> case recall mem a of
> (mem', False) -> 1 : go (learn a mem') as
> (mem', True) -> (-1) : go mem' as
Finally, let’s have a simple way to read a text file. We won’t bother with stemming and all that.
> readTextFile :: String -> IO [String]
> readTextFile fp = readFile fp >>= return . words . map clean
> where clean c | isLetter c = toLower c
> | isMark c = c
> | otherwise = ' '
ghci> rs <- readTextFile "frankenstein.txt" >>= return . walk (SimpleMem HS.empty)
ghci> take 50 rs
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,-1,1,1,1,1,1,1,1,-1,1,1,1,-1,1,1,-1,1,-1,1,-1,-1,1,1,-1,-1,-1]
ghci> take 50 $ scanl1 (+) rs
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,23,24,25,26,27,28,29,30,29,30,31,32,31,32,33,32,33,32,33,32,31,32,33,32,31,30]
For now, I leave you with a plot of the walk on frankenstein.txt and pride_and_prejudice.txt. Already note something curious
Number of words in frankenstein.txt is . Value of sum of random variables is .
Number of words in pride_and_prejudice.txt is . Value of sum of random variables is .
Let me give an example. As I mentioned, I am currently working out problems on random walks, martingales, and markov chains. Take the basic random walk which we construct as follows. Let the sample space be and where .
Now, we consider the sequence of these random variables
Here are some standard ways of interpreting this sequence of random variables
Given a 2-dimensional grid, represents the position after steps (starting from ) taken by either going up one or going to the right one.
Consider a gambling game with two players. If lets say player A gains one dollar from player B and when player B gains one dollar from player A. Suppose player A and B start with dollars and dollars. therefore represents the amount of money won by player after turns. If then player B has lost all his money. So we here can ask questions like what is the probability that player A or player B will be ruined (i.e. loses all his money).
This is all well and good and the examples generalize to more general random walks but I want to consider some completely left-field examples. I can’t guarantee they will lead anywhere and may be utterly rubbish but I think it will be an interesting exercise. See you next time!
]]>Since, is a stochastic matrix we can apply the Ergodic theorem which tells us that there exists with such that as . Thus, we see that .
To show that all eigenvalues of have a magnitude less than , note that
As a result, if an eigenvalue for an eigenvector , then . Similarly, if , then . Therefore, all eigenvalues must satisfy .
]]>Let be a Markov chain with values in and a function. Will the sequence form a Markov chain? Will the reversed sequence form a Markov chain?
For the first part, the answer is yes because
The reversed sequence will also form a Markov chain
Proceed as follows.
Proceed as follows.
As a quick note as to why ,
where are given functions, is a martingale.
Once again, we let the sequence of decompositions be . Then is -measurable because (1) is -measurable, (2) is -measurable, and (3) is -measurable if .
Next, we show that
is a martingale.
To show this we need to show that (1) is -measurable. This is clear because takes on a single value (the expectation) conditioned on each . Next, we need to show that (2) .
In terms of balanced parantheses this means we have extra open parantheses to make use of. Let’s say and then
There are ways to arrange votes for each candidate
For each of the valid ways we can insert an extra open paranthesis in possible places. After this, we can insert the next extra open paranthesis in possible places. So, we now have arrangements.
The last extra parantheses we know we have to place it at the beginning. But there are choices for the first paranthesis. So now have possible arrangements.
Finally, since the extra parantheses are also indistinguishable we have to divide by .
Thus the number of ways to arrange votes for candidate and votes for candidate where candidate always has the higher number of votes is
The probability that candidate always has the higher number of votes than candidate and ends up with and votes respectively is
We have in fact arrived at the solution without the use of martingales. Next time, let’s see why this example is used in the chapter on martingales.
]]>There is a reason why I want to do this because when I was thinking of a solution it didn’t strike me at all that the answer had to do with Catalan numbers even after I realized that counting the number of balanced parantheses is an equivalent problem. Rustiness annoys me. So, this time I want to come up a proof I’ll remember.
Suppose candidate ends up with only one vote more than candidate . We know that the first vote has to be for candiate . That leaves us with votes for each candidate that we have to arrange so that always stays on top. You will note that is equivalent to arranging pairs of parantheses so that they remain balanced.
Below is a procedure for generating all sequences of balanced parantheses.
> import Data.List
>
> gen_valid :: Int -> [String]
> gen_valid n = loop n n 0
> where
> loop 0 0 _ = [""]
> loop 0 b _ = [')' : s | s <- loop 0 (b-1) 0]
> loop a b k = ['(' : s | s <- loop (a-1) b (k+1)] ++
> if k > 0 then [')' : s | s <- loop a (b-1) (k-1)] else []
ghci> gen_valid 1
["()"]
ghci> gen_valid 2
["(())","()()"]
ghci> gen_valid 3
["((()))","(()())","(())()","()(())","()()()"]
ghci> gen_valid 4
["(((())))","((()()))","((())())","((()))()","(()(()))","(()()())","(()())()","(())(())","(())()()","()((()))","()(()())","()(())()","()()(())","()()()()"]
We also know that the total number of possible arrangements (valid and invalid) is given by . The Catalan number gives the number of valid arrangements as a fraction of all possible arrangements
One way to interpret this fraction is to say that for every valid arrangement there are corresponding invalid arrangements. Can we come up with a way to transform a valid arrangement into unique invalid arragements?
I suspect that it should be possible given that the invalid arrangments might have to do with inverting each of the parantheses in the sequence. For example, we can transform to or to these two and . What happens when we have nested parantheses? What does become? Note that to flip the internal parantheses alone is bad because we get ! It would seem that we should flip the parent before its children: that way we get these two and . So far so good. I now code the general procedure and check that it generates all arrangments from just the valid ones.
The following function extracts a top level balanced string and returns the rest.
> split :: String -> Maybe (String,String)
> split [] = Nothing
> split ss = Just $ splitAt (len+1) ss
> where len = length . takeWhile (>0) . tail . scanl (\x c -> if c=='(' then x+1 else x-1) 0 $ ss
ghci> split "(())()()"
Just ("(())","()()")
This function returns all top level balanced strings.
> splits :: String -> [String]
> splits = unfoldr split
ghci> splits "(())()()"
["(())","()","()"]
The following takes a valid sequence and generates invalid sequences.
> validToInvalids :: String -> [String]
> validToInvalids str = concat $ map (\i -> modAt i lst) [0..length lst-1]
> where lst = splits str
> change (_:xs) = ')' : init xs ++ "("
> modAt i xs = let (lhs,ss:rhs) = splitAt i xs
> in -- this flips the outer and leavs the inner the same
> concat (lhs ++ [change ss] ++ rhs) :
> -- this recurses into the inner and wraps with a flipped outer
> map (\x -> concat $ lhs ++ [")" ++ x ++ "("] ++ rhs) (validToInvalids (init . tail $ ss))
>
> choose :: Int -> Int -> Int
> choose n k = fact n `div` fact k `div` fact (n-k)
> where fact a = product [2..a]
Let’s check that all the invalid sequences generated are unique and that it sums up to all possible arrangements.
ghci> validToInvalids "()"
[")("]
ghci> validToInvalids "()()"
[")(()","())("]
ghci> validToInvalids "(())"
[")()(","))(("]
ghci> choose 6 3 == (length . nub . concat . map (\x -> x : validToInvalids x) $ gen_valid 3)
True
ghci> choose 8 4 == (length . nub . concat . map (\x -> x : validToInvalids x) $ gen_valid 4)
True
ghci> choose 10 5 == (length . nub . concat . map (\x -> x : validToInvalids x) $ gen_valid 5)
True
]]>Let’s start with this ballot problem. Let be a sequence of independently and identically distributed Bernoulli random variables. Let’s say each represents a vote either for candidate () or represents a vote for candidate (). Let . Suppose and candidate receives a total of votes and candidate receives a total of votes and compute the following probability that candidate was always ahead of candidate .
Let’s try to attack this combinatorially. The total number of assignments is given by because we can place votes in one of positions.
> choose :: Int -> Int -> Double
> choose n k = fact n / fact k / fact (n-k)
> where fact a = product [2..fromIntegral a]
ghci> choose (10+4) 4
1001.0
Now, the number of sequences in which candidate always has a higher number of votes is given by the following recursion.
> valid :: (Int,Int) -> Double
> valid (_,0) = 1
> valid (a,b) | a-b == 1 = valid (a,b-1)
> | otherwise = valid (a-1,b) + valid (a,b-1)
ghci> valid (10,4)
429.0
ghci> valid (10,4) / choose (10+4) 4 == (10-4) / (10+4)
True
We can easily speed up valid using memoization. A closed form solution should not be hard to come by. But next time let’s see how this chapter approaches this problem and how martingales play a part.
]]>Let be independent Bernoulli random variables with and and . If show that the sequence is a martingale.
Hence, the sequence of is a martingale. Show that the sequence is also a martingale.
Suppose that are two decompositions of the sample space where is finer than . Finer means that .
Let be a random variable. First, recall the expectation of a random variable with respect to a decomposition .
Note the special case when (i.e., when is -measurable).
Next, recall the generalized total probability formula
Suppose we took a conditional expection instead
This gets simplified if is a finer decomposition than because is now decomposed by
Therefore if
And in general if
]]>> add1 :: Integral a => a -> a -> a -> a
> add1 m x y = (x+y) `mod` m
>
> mult1 :: Integral a => a -> a -> a -> a
> mult1 m x y = (x*y) `mod` m
But this is error prone and cumbersome because someone using these two functions to do something like
> example :: Integral a => a -> a -> a -> a
> example m x y = (((x+y) `mod` x) * y) `mod` m
when they actually intended to mod by m
both times. You could say add a newtype
wrapper to m
to fix this but this still doesn’t stop the user from using two different modulus operations in example
.
The paper suggests many different ways to solving this issue and one of them is to use a reader monad and write add as add :: (Integral a, MonadReader a m) => a -> a -> a
which would certainly thread the same modulus through all operations. But it forces us to write monadic code when it is unnecessary.
What I realized was that we can still the reader structure but without the monad instance if we roll our own type with the reader state.
> newtype M a = M (a -> a)
>
> withModulus :: Integral a => a -> M a -> a
> withModulus m (M f) = f m
>
> instance Integral a => Num (M a) where
> (M f) + (M g) = M $ \s -> (f s + g s) `mod` s
> (M f) - (M g) = M $ \s -> (f s - g s) `mod` s
> (M f) * (M g) = M $ \s -> (f s * g s) `mod` s
> negate (M f) = M $ \s -> (- f s) `mod` s
> abs _ = error "Modular numbers are not signed"
> signum _ = error "Modular numbers are not signed"
> fromInteger n = M $ \s -> fromIntegral n `mod` s
This is very convenient.
ghci> withModulus 7 $ (10+3)*(3-4) + 8
2
This solution seems as convenient and safe as what the paper does for this particular example. Of course, the paper is solving a much more general problem but this struck me as a good solution for modular arithmetic.
]]>Before that though, I want to look at a certain technique using types that the paper uses to achieve what it does. Specifically, it needs the ability to
Let’s take a look at how we can reify integers and then reflect back their corresponding type. You’ll be aware that we can specify integers recursively
> {-# LANGUAGE ScopedTypeVariables #-}
> {-# LANGUAGE Rank2Types #-}
>
> data Zero
> data Succ a
> data Pred a
This allows one to write numbers like
> type One = Succ Zero
> type Two = Succ One
Note how the number will have Succ
applied times. We can write numbers with fewer recursions if we introduce types to mimic binary encoding
> data Twice a
We can now write numbers using only recursions. This is not necessary to demonstrate reification but I just wanted to mention it.
> type Four = Twice Two
> type Eight = Twice Four
> type Nine = Succ Eight
Remember that each number is a different type. To reflect each type back to its corresponding integer we need a typeclass so each type can have an instance that gives its integeral representation.
> class ReflectNum s where
> reflectNum :: Num a => s -> a
And the following instances.
> instance ReflectNum Zero where
> reflectNum _ = 0
> instance ReflectNum s => ReflectNum (Succ s) where
> reflectNum _ = reflectNum (undefined :: s) + 1
> instance ReflectNum s => ReflectNum (Pred s) where
> reflectNum _ = reflectNum (undefined :: s) - 1
> instance ReflectNum s => ReflectNum (Twice s) where
> reflectNum _ = reflectNum (undefined :: s) * 2
Note that the local reference to type s
requires the use of the language extension {-# LANGUAGE ScopedTypeVariables #-}
. Let’s test it out.
ghci> reflectNum (undefined :: Zero) :: Int
0
ghci> reflectNum (undefined :: Nine) :: Int
9
ghci> reflectNum (undefined :: Twice Nine) :: Int
18
How do we now take an integer and reify it back to its corresponding type? You might think we just need a function like so
> -- reifyIntegral1 :: Int -> ???
But we can’t directly return the corresponding type of Int
because each integer returns a different type! One way to get around this is to not return! The programming idiom that doesn’t return is the continuation. Consider the following type signature
> -- reifyIntegral2 :: Int -> (s -> w) -> w
We are providing the function with a continuation (s -> w)
which allows us to continue the computation of the type by passing the result into the continuation.
> -- reifyIntegral2 n f | n == 0 = f (undefined :: Zero)
> -- | n > 0 = reifyIntegral (n-1) (\s -> f (undefined :: Succ s))
But this will give us the following error
reification_reflection.lhs:92:62:
Couldn't match expected type ‘s’ with actual type ‘Succ s0’
‘s’ is a rigid type variable bound by
the type signature for reifyIntegral :: Int -> (s -> w) -> w
at reification_reflection.lhs:87:20
Relevant bindings include
f :: s -> w (bound at reification_reflection.lhs:91:19)
reifyIntegral :: Int -> (s -> w) -> w
(bound at reification_reflection.lhs:91:3)
In the first argument of ‘f’, namely ‘(undefined :: Succ s)’
In the expression: f (undefined :: Succ s)
In the second argument of ‘reifyIntegral’, namely
‘(\ s -> f (undefined :: Succ s))’
Failed, modules loaded: none.
The problem is that we have fixed s
in the continuation to inhabit only one type. We simply need to free it up by saying s
can be any type that can be reflected.
> reifyIntegral :: Int -> (forall s. ReflectNum s => s -> w) -> w
> reifyIntegral n f =
> case n `quotRem` 2 of
> (0, 0) -> f (undefined :: Zero)
> (q, 0) -> reifyIntegral q (\(_ :: s) -> f (undefined :: Twice s))
> (q, 1) -> reifyIntegral q (\(_ :: s) -> f (undefined :: Succ (Twice s)))
> (q,-1) -> reifyIntegral q (\(_ :: s) -> f (undefined :: Pred (Twice s)))
Compiling the above will fail without the {-# LANGUAGE Rank2Types #-}
extension (the use of forall
). Let’s test it out.
ghci> reifyIntegral 138291 reflectNum :: Int
138291
There you have it – reification and reflection.
]]>The first two follow due to being idependenty and identically distributed
The last two we compute by using 1) the generalized total probability formula as seen here and 2) the conditional variance formula as seen here.
where is a decomposition of the sample space. Show that
We can read this as the follows. The variance of is the sum of the expectation of its conditional variances and the variance of the conditional expectations. For example, if , then we ought to see a variance in the conditional expectations (since there is only one condition)
In general, the variance of conditional expectations expands to
and the expectation of conditional variances expands to
Adding the two
A question now asks to give an example of random variables which are not independent but for which .
Start with a distribution for sample space and a random variable . The basic expectation tells us the value is likely to take on average.
Note that induces a decomposition of as follows
Instead of we could be given a distribution with respect to an event . The expectation over this is the value is likely to take on average conditioned on the event (i.e. restricted to).
Let’s suppose that we have an event and its conditional probabilities where is a decomposition of . We can write this as a random variable that takes on the value whenever .
Now that we have this random variable, we can once again take its expectation which in this case gives the probability of on average conditioned on an event from . This has a special name and is called the total probability.
Suppose now that we have a random variable inducing the decomposition where . We also have conditional probabilities . We can certainly take the following expectation from before
which is the expectation of conditioned on . We now do this for the entire decomposition to arrive at this random variable
which takes on a conditional expectation of at each . We can now generalize the total probability formula such that it gives the the expectation of on average conditioned on an event from .
The lesson here is to always translate the things we do with probabilities to expectations.
]]>It says that on average, this estimator deduces the correct answer. Consider a different estimator which essentially starts with an assumed ‘tails’. Then
which, on average, slightly underestimates the success probability.
A problem asks the following. Let it be known a priori that has a value in the set . Construct an unbiased estimator for , taking values only in .
Consider the case where for . Then is an unbiased estimator because
which is unbiased because can only be . Now, I can’t seem to proceed further than this. For example, what is the estimator when ? I’ll have to return with an answer another day.
]]>data Match = DowMatch (UArray Int Bool)
| MonthMatch (UArray Int Bool)
| DayMatch (UArray Int Bool)
| TodMatch (Int,Int)
| DTMatch (DT,DT)
| Not Match
| Or Match Match
| And Match Match
| Never
| Always
deriving (Show)
The actual implementation and tests are here. I’ll just show what you can do with it. First off, you can check if a date matches a spec.
ghci> :l DateTime.hs
ghci> let Right dt = toDT 2016 09 22 10 12 0
ghci> Always `match` dt
True
ghci> Never `match` dt
False
ghci> dowMatch [Tuesday,Thursday] `match` dt
True
ghci> dowMatch [Monday .. Wednesday] `match` dt
False
ghci> todMatch [((3,10,0),(11,0,0))] `match` dt
True
ghci> dayMatch [1..10] `match` dt
False
ghci> monthMatch [7..11] `match` dt
True
We can also use the logical combinators.
ghci> dowMatch [Tuesday,Thursday] `And` monthMatch [7..11] `match` dt
True
ghci> Not (dowMatch [Thursday]) `And` monthMatch [7..11] `match` dt
False
The module also provides extracting matched ranges within a provided date range.
ghci> let Right dt2 = toDT 2016 10 16 12 0 0
ghci> mapM_ print $ split (dowMatch [Monday]) dt dt2
(False,308880,(2016-09-22 10:12:00,2016-09-25 23:59:59))
(True,86400,(2016-09-26 00:00:00,2016-09-26 23:59:59))
(False,518400,(2016-09-27 00:00:00,2016-10-02 23:59:59))
(True,86400,(2016-10-03 00:00:00,2016-10-03 23:59:59))
(False,518400,(2016-10-04 00:00:00,2016-10-09 23:59:59))
(True,86400,(2016-10-10 00:00:00,2016-10-10 23:59:59))
(False,475201,(2016-10-11 00:00:00,2016-10-16 12:00:00))
ghci> mapM_ print $ split (dowMatch [Monday] `And` todMatch [((3,10,0),(4,30,0)), ((18,0,0),(19,0,0))]) dt dt2
(False,320280,(2016-09-22 10:12:00,2016-09-26 03:09:59))
(True,4801,(2016-09-26 03:10:00,2016-09-26 04:30:00))
(False,48599,(2016-09-26 04:30:01,2016-09-26 17:59:59))
(True,3601,(2016-09-26 18:00:00,2016-09-26 19:00:00))
(False,547799,(2016-09-26 19:00:01,2016-10-03 03:09:59))
(True,4801,(2016-10-03 03:10:00,2016-10-03 04:30:00))
(False,48599,(2016-10-03 04:30:01,2016-10-03 17:59:59))
(True,3601,(2016-10-03 18:00:00,2016-10-03 19:00:00))
(False,547799,(2016-10-03 19:00:01,2016-10-10 03:09:59))
(True,4801,(2016-10-10 03:10:00,2016-10-10 04:30:00))
(False,48599,(2016-10-10 04:30:01,2016-10-10 17:59:59))
(True,3601,(2016-10-10 18:00:00,2016-10-10 19:00:00))
(False,493200,(2016-10-10 19:00:01,2016-10-16 12:00:00))
The test file does a QuickCheck using the Arbitrary
instance (although I can now use the generic-random package!) and compares match
and split
against a brute-force implementation using the tick
function to compute a match at every second.
unwrap . wrap
using rules and fuse it to id
). Furthermore, this method doesn’t necessarily help if we want to perform more complex tasks like the following: given a date range pick out the sub-ranges that match mondays. For this purpose, the original representation of year, month, and day is more helpful.
I have written some utilities to make these tasks simpler. Today, I will introduce the date-time representation and then provide a way to tick the date-time forward by one second or to tick it back by one second and then in the next post present a matching DSL.
> import Text.Printf
> import Data.Time.Calendar (fromGregorianValid)
> import Data.Time.Calendar.WeekDate (toWeekDate)
> import Control.Monad (guard, when)
>
> data DT = DT
> {
> year :: {-# UNPACK #-} !Int
> , month :: {-# UNPACK #-} !Int
> , day :: {-# UNPACK #-} !Int
> , dow :: {-# UNPACK #-} !Int
> , tod :: {-# UNPACK #-} !Int
> }
Some instances
> instance Eq DT where
> dt == dt' = tod dt == tod dt' &&
> day dt == day dt' &&
> month dt == month dt' &&
> year dt == year dt'
>
> instance Ord DT where
> compare dt dt' =
> compare (year dt,month dt,day dt,tod dt)
> (year dt',month dt',day dt',tod dt')
>
> instance Show DT where
> show dt = printf "%d-%02d-%02d %02d:%02d:%02d"
> (year dt) (month dt) (day dt)
> (tod dt `div` 3600) (((tod dt-s) `div` 60) `mod` 60) s
> where s = tod dt `mod` 60
For constructing it
> toDT :: Int -> Int -> Int -> Int -> Int -> Int -> Either String DT
> toDT year month day hour min sec = do
> dayObj <- maybe (Left "Invalid Year/Month/Day") return $ fromGregorianValid (fromIntegral year) month day
> when (0 > hour || hour > 23) $ Left "Invalid Hour"
> when (0 > min || min > 59) $ Left "Invalid Minute"
> when (0 > sec || sec > 59) $ Left "Invalid Second"
> let (_,_,dow) = toWeekDate dayObj
> return $ DT year month day (dow-1) (hour*3600 + min*60 + sec)
So far we have
ghci> toDT 2016 08 02 11 32 21
Right 2016-08-02 11:32:21
ghci> toDT 2016 06 31 11 32 21
Left "Invalid Year/Month/Day"
ghci> toDT 2016 08 02 11 32 21 > toDT 2016 08 02 11 32 19
True
Finally, the ability to tick
and untick
the date-time.
> tick :: DT -> DT
> tick = tickTOD
> where tickTOD (dt@DT{tod=s}) =
> let dt' = if s < secsInDay-1
> then dt{ tod = s+1 }
> else dt{ tod = 0 }
> in if s==(secsInDay-1) then tickDay dt' else dt'
>
> tickDay (dt@DT{day=d,dow=dayOfWeek,month=m,year=y}) =
> let dt' = dt{ dow = if dayOfWeek < 6 then dayOfWeek+1 else 0
> , day = day'}
> day' = if d < 27
> then d+1
> else if d == numDays y m
> then 1
> else d+1
> in if day'==1 then tickMonth dt' else dt'
>
> tickMonth (dt@DT{month=m}) =
> let dt' = dt{ month = if m < 12 then m+1 else 1}
> in if m==12 then tickYear dt' else dt'
>
> tickYear (dt@DT{year=y}) = dt{year = y+1}
>
>
> untick :: DT -> DT
> untick = untickTOD
> where untickTOD (dt@DT{tod=s}) =
> let dt' = if s == 0
> then dt{ tod = secsInDay-1 }
> else dt{ tod = s-1 }
> in if s==0 then untickDay dt' else dt'
>
> untickDay (dt@DT{day=d,dow=dayOfWeek,month=m,year=y}) =
> let dt' = dt{ dow = if dayOfWeek == 0 then 6 else dayOfWeek-1
> , day = day'}
> day' = if d == 1
> then if m==1 then numDays (y-1) 12 else numDays y (m-1)
> else d - 1
> in if d==1 then untickMonth dt' else dt'
>
> untickMonth (dt@DT{month=m}) =
> let dt' = dt{ month = if m == 1 then 12 else m-1}
> in if m==1 then untickYear dt' else dt'
> untickYear (dt@DT{year=y}) = dt{year = y-1}
>
> numDays :: Int -> Int -> Int
> numDays y m
> | m == 2 = if (mod y 4 == 0) && ((mod y 400 == 0) || not (mod y 100 == 0))
> then 29
> else 28
> | m == 1 || m == 3 || m == 5 || m == 7 || m == 8 || m == 10 || m == 12 = 31
> | otherwise = 30
>
> secsInDay :: Int
> secsInDay = 86400
Examples
ghci> let Right d = toDT 2016 08 02 11 32 21
ghci> mapM_ (print . head) $ take 10 $ iterate (drop 10000) $ iterate tick d
2016-08-02 11:32:21
2016-08-02 14:19:01
2016-08-02 17:05:41
2016-08-02 19:52:21
2016-08-02 22:39:01
2016-08-03 01:25:41
2016-08-03 04:12:21
2016-08-03 06:59:01
2016-08-03 09:45:41
2016-08-03 12:32:21
ghci>
ghci> let Right d = toDT 2016 08 03 12 32 21
ghci> mapM_ (print . head) $ take 10 $ iterate (drop 10000) $ iterate untick d
2016-08-03 12:32:21
2016-08-03 09:45:41
2016-08-03 06:59:01
2016-08-03 04:12:21
2016-08-03 01:25:41
2016-08-02 22:39:01
2016-08-02 19:52:21
2016-08-02 17:05:41
2016-08-02 14:19:01
2016-08-02 11:32:21
]]>From the looks of it, the upper bound seems simple enough because it looks like a direct application of Chebyshev’s inequality. Whereas, the lower bound doesn’t look familiar. If we go back to the proof of Chebyshev’s inequality we can try to see if we can arrive at a lower bound instead of an upper bound. It happens that we can once we know that the random variable is bounded .
Applying this quickly leads to the required solution. The problem points out the case where , which leads to
We are in essence bounding the probability that the random variable takes on a value close to its mean , which I should imagine is pretty useful to have not so much for computing the probabilities but for asymptotic analysis like we used for the law of large numbers.
]]>