Expectations (16/365)

Sometimes expectations and random variables appear in so many ways that I find it a little confusing at times. The book I am following has a neat diagram that helps quite a bit.

Expectations

Expectations

Basic Expectation

Start with a distribution P(\cdot) for sample space \Omega and a random variable \theta. The basic expectation tells us the value \theta is likely to take on average.

\displaystyle  E\theta = \sum_\omega \theta(\omega) P(\omega) \\ \text{Notice that } E\theta = E[\theta I_\Omega]

Note that \theta induces a decomposition D_1, \dots, D_n of \Omega as follows

\displaystyle  E\theta = \sum_{i=1}^n x_i P(D_i) \text{ where } D_i = \{ \omega | \theta(\omega) = x_i \}

Conditional Expectation

Instead of P(\cdot) we could be given a distribution P(\cdot | D) with respect to an event D. The expectation over this is the value \theta is likely to take on average conditioned on the event D (i.e. restricted to).

\displaystyle  E(\theta | D) = \sum_\omega \theta(\omega) P(\omega | D) \\ \text{Notice that } E(\theta | D) = \frac{E[\theta I_D]}{P(D)}

Let’s suppose that we have an event A and its conditional probabilities P(A | D_1), \dots, P(A | D_n) where \{ D_i \} is a decomposition of \Omega. We can write this as a random variable that takes on the value P(A | D_i) whenever \omega \in D_i.

\displaystyle  P(A | \mathcal{D})(\omega) = \sum_{i=1}^n P(A | D_i) I_{D_i}(\omega)

Now that we have this random variable, we can once again take its expectation which in this case gives the probability of A on average conditioned on an event from \mathcal{D}. This has a special name and is called the total probability.

\displaystyle  E P(A | \mathcal{D}) = \sum_{i=1}^n P(A | D_i) P(D_i) = P(A)

One More Generalization

Suppose now that we have a random variable \theta inducing the decomposition A_1, \dots, A_m where A_j = \{ \omega | \theta(\omega) = x_j \}. We also have conditional probabilities P(A_j | D_i). We can certainly take the following expectation from before

\displaystyle  E(\theta | D_i) = \sum_j x_j P(A_j | D_i)

which is the expectation of \theta conditioned on D_i. We now do this for the entire decomposition \mathcal{D} to arrive at this random variable

\displaystyle  E(\theta | \mathcal{D})(\omega) = \sum_i E(\theta | D_i) I_{D_i}(\omega)

which takes on a conditional expectation of \theta at each \omega. We can now generalize the total probability formula such that it gives the the expectation of \theta on average conditioned on an event from \mathcal{D}.

\displaystyle  E(A | \mathcal{D}) = P(A) \\ \text{generalizes to} \\ EE(\theta | \mathcal{D}) = E\theta

The lesson here is to always translate the things we do with probabilities to expectations.

Posted in Uncategorized | Tagged , | Leave a comment

Unbiased Estimator (15/365)

In the coin tossing setup, we saw that the fraction T_n(\omega) = \frac{S_n(\omega)}{n} of observed heads in n trails \omega approaches p as n \rightarrow \infty. The function T_n is called an estimator that takes on a value in [0,1]. It’s a special kind of estimator called an unbiased estimator because it satisfies

\displaystyle  E_\theta T_n = \theta \text{ for all } \theta \in [0,1]

It says that on average, this estimator deduces the correct answer. Consider a different estimator T_n(\omega) = \frac{S_n(\omega)}{n+1} which essentially starts with an assumed ‘tails’. Then

\displaystyle  E_\theta T_n = \frac{n}{n+1} E_\theta \frac{S_n}{n} = \frac{n\theta}{n+1}

which, on average, slightly underestimates the success probability.

A problem asks the following. Let it be known a priori that \theta has a value in the set \Omega_0 \subseteq [0,1]. Construct an unbiased estimator for \theta, taking values only in \Omega_0.

Consider the case where \Omega_0 = \{ r \} for r \in [0,1]. Then T_n = r is an unbiased estimator because

\displaystyle  E_\theta T_n = r\theta + (1-\theta)r \text{ for } \theta \in \Omega_0

which is unbiased because \theta can only be r. Now, I can’t seem to proceed further than this. For example, what is the estimator when \Omega_0 = \{ r_1, r_2 \}? I’ll have to return with an answer another day.

Posted in Uncategorized | Tagged , | Leave a comment

More Time Utilities (14/365)

In the previous post I introduced a basic date-time type and said that I’ll provide a DSL to do much more with it. Here is a date-time matching DSL I will support.

data Match = DowMatch (UArray Int Bool)
           | MonthMatch (UArray Int Bool)
           | DayMatch (UArray Int Bool)
           | TodMatch (Int,Int)
           | DTMatch (DT,DT)
           | Not Match
           | Or Match Match
           | And Match Match
           | Never
           | Always
           deriving (Show)

The actual implementation and tests are here. I’ll just show what you can do with it. First off, you can check if a date matches a spec.

ghci> :l DateTime.hs
ghci> let Right dt = toDT 2016 09 22 10 12 0
ghci> Always `match` dt
  True

ghci> Never `match` dt
  False

ghci> dowMatch [Tuesday,Thursday] `match` dt
  True

ghci> dowMatch [Monday .. Wednesday] `match` dt
  False

ghci> todMatch [((3,10,0),(11,0,0))] `match` dt
  True

ghci> dayMatch [1..10] `match` dt
  False

ghci> monthMatch [7..11] `match` dt
  True

We can also use the logical combinators.

ghci> dowMatch [Tuesday,Thursday] `And` monthMatch [7..11] `match` dt
  True

ghci> Not (dowMatch [Thursday]) `And` monthMatch [7..11] `match` dt
  False

The module also provides extracting matched ranges within a provided date range.

ghci> let Right dt2 = toDT 2016 10 16 12 0 0
ghci> mapM_ print $ split (dowMatch [Monday]) dt dt2
  (False,308880,(2016-09-22 10:12:00,2016-09-25 23:59:59))
  (True,86400,(2016-09-26 00:00:00,2016-09-26 23:59:59))
  (False,518400,(2016-09-27 00:00:00,2016-10-02 23:59:59))
  (True,86400,(2016-10-03 00:00:00,2016-10-03 23:59:59))
  (False,518400,(2016-10-04 00:00:00,2016-10-09 23:59:59))
  (True,86400,(2016-10-10 00:00:00,2016-10-10 23:59:59))
  (False,475201,(2016-10-11 00:00:00,2016-10-16 12:00:00))

ghci> mapM_ print $ split (dowMatch [Monday] `And` todMatch [((3,10,0),(4,30,0)), ((18,0,0),(19,0,0))]) dt dt2
  (False,320280,(2016-09-22 10:12:00,2016-09-26 03:09:59))
  (True,4801,(2016-09-26 03:10:00,2016-09-26 04:30:00))
  (False,48599,(2016-09-26 04:30:01,2016-09-26 17:59:59))
  (True,3601,(2016-09-26 18:00:00,2016-09-26 19:00:00))
  (False,547799,(2016-09-26 19:00:01,2016-10-03 03:09:59))
  (True,4801,(2016-10-03 03:10:00,2016-10-03 04:30:00))
  (False,48599,(2016-10-03 04:30:01,2016-10-03 17:59:59))
  (True,3601,(2016-10-03 18:00:00,2016-10-03 19:00:00))
  (False,547799,(2016-10-03 19:00:01,2016-10-10 03:09:59))
  (True,4801,(2016-10-10 03:10:00,2016-10-10 04:30:00))
  (False,48599,(2016-10-10 04:30:01,2016-10-10 17:59:59))
  (True,3601,(2016-10-10 18:00:00,2016-10-10 19:00:00))
  (False,493200,(2016-10-10 19:00:01,2016-10-16 12:00:00))

The test file does a QuickCheck using the Arbitrary instance (although I can now use the generic-random package!) and compares match and split against a brute-force implementation using the tick function to compute a match at every second.

Posted in Uncategorized | Tagged , | Leave a comment

More Time Utilities (13/365)

Suppose you want to add one second to the current date. One approach is to convert it to unix-time (seconds from epoch), add 1, and then convert it back. The cost of converting is a little too high if we frequently perfom this wrapping and unwrapping (technically we could catch unwrap . wrap using rules and fuse it to id). Furthermore, this method doesn’t necessarily help if we want to perform more complex tasks like the following: given a date range pick out the sub-ranges that match mondays. For this purpose, the original representation of year, month, and day is more helpful.

I have written some utilities to make these tasks simpler. Today, I will introduce the date-time representation and then provide a way to tick the date-time forward by one second or to tick it back by one second and then in the next post present a matching DSL.

> import Text.Printf
> import Data.Time.Calendar          (fromGregorianValid)
> import Data.Time.Calendar.WeekDate (toWeekDate)
> import Control.Monad               (guard, when)
> 
> data DT = DT
>   {
>     year  :: {-# UNPACK #-} !Int
>   , month :: {-# UNPACK #-} !Int
>   , day   :: {-# UNPACK #-} !Int
>   , dow   :: {-# UNPACK #-} !Int
>   , tod   :: {-# UNPACK #-} !Int
>   }

Some instances

> instance Eq DT where
>   dt == dt' = tod dt == tod dt' &&
>               day dt == day dt' &&
>               month dt == month dt' &&
>               year dt == year dt'
> 
> instance Ord DT where
>   compare dt dt' =
>     compare (year dt,month dt,day dt,tod dt)
>             (year dt',month dt',day dt',tod dt')
> 
> instance Show DT where
>   show dt = printf "%d-%02d-%02d %02d:%02d:%02d"
>                    (year dt) (month dt) (day dt)
>                    (tod dt `div` 3600) (((tod dt-s) `div` 60) `mod` 60) s
>     where s = tod dt `mod` 60

For constructing it

> toDT :: Int -> Int -> Int -> Int -> Int -> Int -> Either String DT
> toDT year month day hour min sec = do
>   dayObj <- maybe (Left "Invalid Year/Month/Day") return $ fromGregorianValid (fromIntegral year) month day
>   when (0 > hour || hour > 23) $ Left "Invalid Hour"
>   when (0 > min  || min  > 59) $ Left "Invalid Minute"
>   when (0 > sec  || sec  > 59) $ Left "Invalid Second"
>   let (_,_,dow) = toWeekDate dayObj
>   return $ DT year month day (dow-1) (hour*3600 + min*60 + sec)

So far we have

ghci> toDT 2016 08 02 11 32 21
  Right 2016-08-02 11:32:21

ghci> toDT 2016 06 31 11 32 21
  Left "Invalid Year/Month/Day"

ghci> toDT 2016 08 02 11 32 21 > toDT 2016 08 02 11 32 19
  True

Finally, the ability to tick and untick the date-time.

> tick :: DT -> DT
> tick = tickTOD
>   where tickTOD (dt@DT{tod=s}) =
>           let dt' = if s < secsInDay-1
>                     then dt{ tod = s+1 }
>                     else dt{ tod = 0 }
>           in if s==(secsInDay-1) then tickDay dt' else dt'
> 
>         tickDay (dt@DT{day=d,dow=dayOfWeek,month=m,year=y}) =
>           let dt' = dt{ dow = if dayOfWeek < 6 then dayOfWeek+1 else 0
>                       , day = day'}
>               day' = if d < 27
>                      then d+1
>                      else if d == numDays y m
>                           then 1
>                           else d+1
>           in if day'==1 then tickMonth dt' else dt'
> 
>         tickMonth (dt@DT{month=m}) =
>           let dt' = dt{ month = if m < 12 then m+1 else 1}
>           in if m==12 then tickYear dt' else dt'
> 
>         tickYear (dt@DT{year=y}) = dt{year = y+1}
> 
> 
> untick :: DT -> DT
> untick = untickTOD
>   where untickTOD (dt@DT{tod=s}) =
>           let dt' = if s == 0
>                     then dt{ tod = secsInDay-1 }
>                     else dt{ tod = s-1 }
>           in if s==0 then untickDay dt' else dt'
> 
>         untickDay (dt@DT{day=d,dow=dayOfWeek,month=m,year=y}) =
>           let dt' = dt{ dow = if dayOfWeek == 0 then 6 else dayOfWeek-1
>                       , day = day'}
>               day' = if d == 1
>                      then if m==1 then numDays (y-1) 12 else numDays y (m-1)
>                      else d - 1
>           in if d==1 then untickMonth dt' else dt'
> 
>         untickMonth (dt@DT{month=m}) =
>           let dt' = dt{ month = if m == 1 then 12 else m-1}
>           in if m==1 then untickYear dt' else dt'
>         untickYear (dt@DT{year=y}) = dt{year = y-1}
> 
> numDays :: Int -> Int -> Int
> numDays y m
>   | m == 2 = if (mod y 4 == 0) && ((mod y 400 == 0) || not (mod y 100 == 0))
>              then 29
>              else 28
>   | m == 1 || m == 3 || m == 5 || m == 7 || m == 8 || m == 10 || m == 12 = 31
>   | otherwise = 30
> 
> secsInDay :: Int
> secsInDay = 86400

Examples

ghci> let Right d = toDT 2016 08 02 11 32 21
ghci> mapM_ (print . head) $ take 10 $ iterate (drop 10000) $ iterate tick d
  2016-08-02 11:32:21
  2016-08-02 14:19:01
  2016-08-02 17:05:41
  2016-08-02 19:52:21
  2016-08-02 22:39:01
  2016-08-03 01:25:41
  2016-08-03 04:12:21
  2016-08-03 06:59:01
  2016-08-03 09:45:41
  2016-08-03 12:32:21

ghci> 
ghci> let Right d = toDT 2016 08 03 12 32 21
ghci> mapM_ (print . head) $ take 10 $ iterate (drop 10000) $ iterate untick d
  2016-08-03 12:32:21
  2016-08-03 09:45:41
  2016-08-03 06:59:01
  2016-08-03 04:12:21
  2016-08-03 01:25:41
  2016-08-02 22:39:01
  2016-08-02 19:52:21
  2016-08-02 17:05:41
  2016-08-02 14:19:01
  2016-08-02 11:32:21
Posted in Uncategorized | Tagged , | 1 Comment

Chebyshev’s Inequality For Bounding From Below? (12/365)

Problem 6.2 asks the following. Let f = f(x) be a nonnegative even function that is nondecreasing for positive x. Then for a random variable \theta with |\theta(\omega)| \le C,

\displaystyle  \frac{Ef(\theta) - f(\epsilon)}{f(C)} \le P(|\theta - E\theta | \ge \epsilon) \le \frac{Ef(\theta - E\theta)}{f(\epsilon)}

From the looks of it, the upper bound seems simple enough because it looks like a direct application of Chebyshev’s inequality. Whereas, the lower bound doesn’t look familiar. If we go back to the proof of Chebyshev’s inequality we can try to see if we can arrive at a lower bound instead of an upper bound. It happens that we can once we know that the random variable is bounded |\theta(\omega)| \le C.

\displaystyle  P(\theta \ge \epsilon) = EI(\theta \ge \epsilon) \\ \ge E\frac{\theta}{C} I(\theta \ge \epsilon) \\ = E\frac{\theta}{C} - E\frac{\theta}{C}I(\theta < \epsilon) \\ \ge E\frac{\theta}{C} - E\frac{\epsilon}{C}I(\theta < \epsilon) \\ \ge \frac{E(\theta - \epsilon)}{C}

Applying this quickly leads to the required solution. The problem points out the case where f(x) = x^2, which leads to

\displaystyle  \frac{E\theta^2 - \epsilon^2}{C^2} \le P( | \theta - E\theta | \ge \epsilon) \le \frac{V\theta}{\epsilon^2}

We are in essence bounding the probability that the random variable \theta takes on a value close to its mean E\theta, which I should imagine is pretty useful to have not so much for computing the probabilities but for asymptotic analysis like we used for the law of large numbers.

Posted in Uncategorized | Tagged , | Leave a comment

What’s Real About Probability? (11/365)

Suppose I tell you the axioms of elementary probability and suppose further that I tell you that I have a coin that, on tossing, will show Heads with a probability of p. Can you tell anything at all abou the coin?

You might be tempted to say something like “if I toss the coin n times I’d probably expect to see heads np times”. But this is a loaded response that seems out of scope of the simple axioms we started with. I say this because p, as it has been stated, says nothing whatsover about a fraction of tosses coming up heads.

In other words, is there any reality to the probability of heads p? As it turns out there is and is the content of Bernoulli’s Law of Large Numbers. Let’s see how this laws helps assign an intuitive reality for p.

Let’s first codify our intuition that the fraction of heads in n tosses has something to do with p. Construct the following sample space for n tosses.

\displaystyle  \Omega_n = \{ \omega = \omega_1, \dots, \omega_N : \omega_i \in \{ 0,1 \} \}

Define its probabilities as generated by the tossing of n coins with success probability p.

\displaystyle  P(\omega) = \prod_i P(\omega_i) = p^n

Create a random variable to capture the number of heads.

\displaystyle  S_n(\omega) = \sum_{i=1}^n \omega_i

Our intuition says that after observing n tosses \omega we might expect \frac{S(\omega)}{n} to be close to p. In analysis, we would expect the following to hold for sufficiently large n.

\displaystyle  \left| \frac{S(\omega)}{n} - p \right| \le \epsilon \text{ for all } \omega \text{ and } \epsilon > 0

But this can’t work for arbitrary \epsilon because we are requiring that this be true for all \omega, which is not possible because if 0 < p < 1, then the probability of getting all heads or all tails is still non-zero which prevents us from getting arbitrarily close to p for any \omega.

However, note that as n increases the probability of getting all heads or all tails become small. So, we can instead define closeness to p by investivgating the following probability.

\displaystyle  P \left( \left| \frac{S_n}{n} - p \right| \ge \epsilon \right)

Using Chebyshev’s inequality we saw here, we can provide an upper bound

\displaystyle  P \left( \left| \frac{S_n}{n} - p \right| \ge \epsilon \right) \\ \le \frac{1}{\epsilon^2} E\left( \frac{S_n}{n} - p \right)^2 \\ \le \frac{VS_n}{n^2 \epsilon^2} \\ = \frac{npq}{n^2 \epsilon^2} \\ \le \frac{1}{4 n \epsilon^2}

Hence, we see that as n \rightarrow \infty the probability that the fraction deviates from p more than \epsilon tends to zero. This says that p indeed has a realistic interpretation as given by the fraction of heads observed. I have used the following fact in the above derivation.

\displaystyle  E\frac{S(\omega)}{n} \\ = E(\frac{\omega_1 + \dots + \omega_n}{n}) \\ = E\frac{\omega_1}{n} + \dots + E\frac{\omega_n}{n} \\ = p

Posted in Uncategorized | Tagged , | 1 Comment

Writing Probability as an Expectation (10/365)

Expectation E(\cdot) is such an ubiquitous construct in probability that it’s worth looking at the various situations in which it appears. It’s one of those constructs that was so cheap to invent but pays off time and again.

Consider a random variable \theta and asking the following question.

\displaystyle  P(\theta \ge \epsilon) = ?

We can already write this as an expectation

\displaystyle  P(\theta \ge \epsilon) = EI(\theta \ge \epsilon)

where I(\theta \ge \epsilon) is the indicator function. If \epsilon > 0 we can make the following substitution

\displaystyle  EI(\theta \ge \epsilon) \le E\frac{\theta}{\epsilon} I(\theta \ge \epsilon)

By dividing \theta by \epsilon, the random variable takes on values \ge 1 whenever I(\theta \ge \epsilon) and thus we end up with the above inequality. Further, if \theta \ge 0 is a non-negative random variable we can continue

\displaystyle  P(\theta \ge \epsilon) \\ = EI(\theta \ge \epsilon) \\ \le E\frac{\theta}{\epsilon} I(\theta \ge \epsilon) \text{ if } \epsilon > 0\\ = \frac{1}{\epsilon} E\theta - \frac{1}{\epsilon} E\theta I(\theta < \epsilon) \\ \le \frac{1}{\epsilon} E\theta \text{ if } \theta \ge 0

This gives us one version of the Chebyshev’s inequality

\displaystyle  P(\theta \ge \epsilon) \ge E\frac{\theta}{\epsilon} \text{ when } \theta \ge 0 \text{ and } \epsilon > 0

### Example

As an example, I have computed the chebyshev approximations of the probabilities of a die taking on value greater than x for two different dies.

ghci> :l Stats.hs
ghci> let fair_die = replicate 6 (1/6 :: Rational)
ghci> let die2 = map (/21) [6,5..1 :: Rational]
ghci> let var = [1..6]
ghci> let error dist xs ys = e dist (zipWith (\x y -> (x-y)^2) xs ys)
ghci> 
ghci> print $ map (1-) (cumul fair_die)
  [1 % 1,5 % 6,2 % 3,1 % 2,1 % 3,1 % 6,0 % 1]

ghci> print $ map (e fair_die var /) var
  [7 % 2,7 % 4,7 % 6,7 % 8,7 % 10,7 % 12]

ghci> 
ghci> let error_fair = error fair_die (map (1-) (cumul fair_die)) $ map (e fair_die var /) var
ghci> let error_die2 = error die2 (map (1-) (cumul die2)) $ map (e die2 var /) var
ghci> error_fair > error_die2
  True

As you can see, the upper bound is pretty poor. But we see that the bound is closer when we consider probabilities closer to the tail. I’ll come back to this and a couple of other versions of the Chebyshev’s inequality at a later time.

Posted in Uncategorized | Tagged , | 2 Comments