Probability distribution


Chapter 2-3: Random Variables and probability distributions

  • 2.1 Random Variable \(X\) and \(Y\)
    • prob density function (PDF): \(f(x)\)
    • cumulative dist function(CDF): \(F(x)=\Pr(X\leq x)\)
    • survival function: \(G(x)=1-F(x)\)
    • harzard rate: \(h(x)=f(x)/G(x)\)
    • mean/expectation: \(E(X)=\int_x xf(x)dx\)
    • variance: \(Var(X)=E(X^2)-E(X)^2=E[(X-E(X))^2]\)
    • covariance: \(Cov(X,Y)=E[(X-E(X))(Y-E(Y))]=E(XY)-E(X)E(Y)\)
    • conditional dist: \(f_{X|Y}(x|y)=\frac{f(x,y)}{f_Y(y)}\)
    • variance decomp: \(Var(X)=E[Var(X|Y)] + Var[E(X|Y)]\)
    • variable transformation: for \(Y=g(X), X=h(Y)\), \(f_Y(y)=f_X(h(y))|h'(Y)|\) (determinant of Jacobian matrix)
  • 2.2 Discrete Random Variables
    • Bernoulli
    • Binomial
      • sum of iid Bernoulli rvs (with same success prob \(p\)): \(X=\sum_{i=1}^nX_i\)
      • independent \(X_i|p_i\sim Bern(p_i)\), assume independently \(E(p_i)=p\) (potentially \(Var(p_i)>0\))
        • Easy to verify that marginally \(X_i\) is a Bern(p), so \(X\sim Binom(np)\).
        • Alternatively \(Var(X|p_i)=\sum_{i=1}^n p_i(1-p_i), E(X|p_i)=\sum_ip_i\)
        • \(E(X)=np, Var(X)=np(1-p)\). Note \(E(p_i^2)-Var(p_i)=E(p_i)^2=p^2\)
        • the binomial mean/var formula still holds
      • \(X|p\sim Binom(n,p)\), assume \(E(p)=p_0\)
        • \(E(X|p)=np, Var(X|p)=np(1-p)\). So \(E(X)=np_0\)
        • \(Var(X)=n^2Var(p) + nE(p) - nE(p^2)=np_0(1-p_0)+(n^2-n)Var(p)\)
        • So when \(Var(p)>0\) and \(n>1\), \(Var(X)>np_0(1-p_0)\) (over-dispersion could happen for Binom with random p)
    • extension to Empirical DF (multinomial dist) based on \(n\) obs: \(\hat{F}_n\to F\)
      • its relation to Bootstrap: \(t(\hat{F}_n)\to t(F)\) for a statistic \(t()\)
      • Monte Carlo implementation of inference on \(t(\hat{F})_n\)
        • MC samples from dist \(\hat{F}_n\) is just sampling with replacement
        • compute statistic for each MC sample
    • Geometric: memoryless
      • \(G(n)=(1-p)^n, G(i+j)=G(i)G(j)\)
    • Poisson
      • over-dispersion in Poisson reg with heterogeneous \(\lambda\)
      • \(Var(X)=Var[E(X|\lambda)]+E[Var(X|\lambda)]=Var(\lambda)+E(\lambda)\)
      • \(E(X)=E[E(X|\lambda)]=E(\lambda)\)
      • \(Var(X)>E(X)\) when \(Var(\lambda)>0\)
      • Over-dispersion modeling: Quasi-likelihood; GEE; MM
    • Hypergeometric dist
      • \(m\) black balls and \(N-m\) white balls: select \(n\) balls, let X=# of black balls (sampling from a finite population)
      • \(\Pr(X=k)=(m,k)(N-m,n-k)/(N,n), k\leq\min(m,n)\)
      • var and mean calculation: sum of \(n\) dependent Bernoulli rvs
        • \(X=\sum_{i=1}^nX_i\): \(X_i=1\) if the \(i\)-th ball is black and zero otherwise
        • marginally \(\Pr(X_i=1)=m/N\) (the \(i\)-th ball is equally likely to be any of the \(N\) balls)
        • \(E(X_iX_j)=\Pr(X_i=1,X_j=1)=\Pr(X_i=1)\Pr(X_j=1|X_i=1)=\frac{m}{N}\frac{m-1}{N-1}\)
        • \(Cov(X_i,X_j)=\frac{m}{N}\frac{m-1}{N-1}-\frac{m^2}{N^2}\)
        • For large \(N\), \(Cov(X_i,X_j)\approx 0\), i.e., \(X_i\) can be roughly treated as iid Bern(m/N), and \(X\) as Binomial(N,m/N).
      • "Enrichment analysis": assess whether the sampling is not random but enriched, say, with more black balls (expecting, \(X/n>m/N\))
        • Hypergeometric dist can be used to compute enrichment p-value (exact test)
        • Alternatively 2x2 table independence test (say, Pearson or LRT chi-square tests) can be used for p-value calculation (approx test for large counts)
  • 2.3 Continuous Random Variables
    • Uniform
    • Exponential: memoryless
      • \(G(x)=e^{-\lambda x}\)
      • double Exp: \(f(x)=\frac{\lambda}{2}e^{-\lambda|x|}\)
    • Gamma
      • \(f(x)=\frac{\lambda e^{-\lambda x}(\lambda x)^{\alpha-1}}{\Gamma(\alpha)}\)
      • \(\chi_n^2\) (chi-square dist with n-DF): \(\alpha=n/2,\lambda=1/2\)
    • Normal
      • \(f(x)=1/\sqrt{2\pi}e^{-x^2/2}\)
      • connection to ridge regression and lasso regression
      • tuning parameters can be treated variance parameter of random effects in LMM/GLMM
      • semi-parametric reg with penalized splines: ridge type penalty
      • In general, Normal dist decays much faster than Exp dist.
    • T-dist
      • \(f(x,n)=\frac{\Gamma((n+1)/2)}{\sqrt{n\pi}\Gamma(n/2)}(1+x^2/n)^{-(n+1)/2}\) (n degree-of-freedom (DF))
      • ratio of independent normal and chi-square: \(N(0,1)/\sqrt{\chi_n^2/n}\)
    • F-dist
      • \(f(x,n,m)=\frac{\Gamma((n+m)/2)}{\Gamma(n/2)\Gamma(m/2)}(n/m)^{n/2}x^{n/2-1}(1+nx/m)^{-(n+m)/2}\)
      • ratio of independent chi-square rvs: \((\chi_n^2/n)/(\chi_m^2/m)\)
  • 2.5 Joint Distribution
    • joint CDF: \(F(x,y)=\Pr(X\leq x, Y\leq y)\)
    • joint PDF: \(p(x,y)=\Pr(X=x,Y=y)\) (discrete dist), \(\Pr(X\in A,Y\in B)=\int_A\int_Bf(x,y)dxdy\) (continuous dist)
      • marginal PDF: \(p_X(x)=\sum_yp(x,y), f_X(x)=\int_yf(x,y)dy\)
    • Independent Random Variables
      • CDF: \(F(x,y)=F_X(x)F_Y(y)\)
      • PDF: \(p(x,y)=p_X(x)p_Y(y), f(x,y)=f_X(x)f_Y(y)\)
      • given independent \(X\) and \(Y\): \(E[g(X)h(Y)]=E[g(X)]E[h(Y)],\forall g,h\)
    • Covariance and Variance of Sums of Random Variables
      • Ex 2.35 (p 51): hypergeometric dist
        • \(\Pr(X=k)=(m,k)(N-m,n-k)/(N,n)\)
        • \(m\) black balls and \(N-m\) white balls: select \(n\) balls, # of black balls (sampling from a finite population)
        • \(X=\sum_{i=1}^nX_i\): \(X_i=1\) if the \(i\)-th ball is black and zero otherwise
        • marginally \(\Pr(X_i=1)=m/N\) (the \(i\)-th ball is equally likely to be any of the \(N\) balls)
      • Ex 2.37 (p 53): sum of independent Poisson is still Poisson
        • \(X\sim Poisson(\lambda_1), Y\sim Poisson(\lambda_2)\)
        • \(\Pr(X+y=n)=\sum_{k=0}^n\Pr(X=k,Y=n-k)=\)
      • Ex 2.38 (p 54): order statistics
        • \(X_i\) are iid rvs with dist \(F(x), f(x)\) (i=1,…,n)
        • \(X_{(i)}\) denote the \(i\)-th smallest among \(X_i\)'s.
        • \(\Pr(X_{(i)}\leq x)=\sum_{k=i}^n(n,k)[F(x)]^k[1-F(x)]^{n-k}\) (at least i of them are smaller than x)
        • differential: \(f_{X_{(i)}}(x)=(n; i-1,1,n-i)f(x)[F(x)]^{i-1}[1-F(x)]^{n-i}\)
          • multinomial coefficient \((n; i-1,1,n-i)=\frac{n!}{(n-i)!(i-1)!}\)
    • Variable transformation: dist of functions of rvs
      • random vector \(X\) with PDF \(f(x)\). Let \(Y=g(X)\), and \(X=h(Y)\) (\(h=g^{-1}\)).
        • \(\nabla_YX=(\partial h(Y)/\partial Y)\)
        • \(f_{Y}(y)=f_X(x(y))|\nabla_YX|\) (P 56)
      • Ex: given \(X\sim N(0,1)\). Let \(Y=X^2\)
        • \(X=\sqrt{Y}, \nabla_YX=\frac{1}{2\sqrt{Y}}\) (consider only positive X)
        • \(2\frac{1}{\sqrt{2\pi}}e^{-y/2}/(2\sqrt{Y})=\frac{1}{\sqrt{2\pi}}y^{-1/2}e^{-y/2}\) (\(\chi_2^1\) dist)
      • Ex 2.39 (p 56): independent gamma to Beta
        • \(X\sim Gamma(\alpha,\lambda), Y\sim Gamma(\beta,\lambda)\). Study dist of \(X/(X+Y)\)
        • consider \(U=X+Y,V=X/U\). So \(X=UV,Y=U(1-V)\)

  • 3.1-3.3 Conditional distributions
    • discrete dist: \(p_{X|Y}(x|y) = \frac{p(x,y)}{p_Y{y}}\), where \(p_Y(y)=\Pr(Y=y), p(x,y)=\Pr(X=x,Y=y)\) (cond prob mass function)
    • continuous dist: \(f_{X|Y}(x|y)=\frac{f(x,y)}{f_Y(y)}\) (cond PDF)
      • \(f(x,y)\) is the joint PDF of \(X,Y\) and \(f_Y(y)=\int_xf(x,y)dx\) is the marginal PDF of \(Y\)
  • 3.2 Discrete dist
    • Ex 3.2 (p 95), independent Binoms to hypergeometric
      • \(X_1\sim\) Binom(\(n_1\),p) \(\perp X_2\sim\) Binom(\(n_2,p)\): \([X_1|X_1+X_2=m]\sim\) hypergeometric dist
      • \(X_1+X_2\sim Binom(n_1+n_2,p)\) (using iid Bernoulli decomposition)
      • \(\Pr(X_1=k|X_1+X_2=m)=\frac{\Pr(X_1=k,X_2=m-k)}{\Pr(X_1+X_2=m)}=(n_1,k)(n_2,m-k)/(n_1+n_2,m)\)
      • mean and var: \(mn_1/(n_1+n_2)\), \(mn_1n_2(n_1+n_2-m)/(n_1+n_2)^2/(n_1+n_2-1)\)
        • derivation of mean:
    • Ex 3.3 (P 96), independent Poissons to Binom
      • \(X_1\sim\) Poisson(\(\lambda_1)\perp X_2\sim\) Poisson(\(\lambda_2)\): \([X_1|X_1+X_2=m]\sim\) Binom(m,\(\lambda_1/(\lambda_1+\lambda_2)\))
      • \(X_1+X_2\sim Poisson(\lambda_1+\lambda_2)\)
      • \(\Pr(X_1=k|X_1+X_2=m)=\frac{\Pr(X_1=k,X_2=m-k)}{\Pr(X_1+X_2=m)}=\)
  • 3.3 Continuous dist
    • Ex 3.6 (P 98), t-dist from convoluting normal and chi-square dist
      • \(T=Z/\sqrt{Y/n}\), where \(Z\sim N(0,1), Y\sim \chi^2_n\)
      • \(f_T(t)=\int f_{T,Y}(t,y)dy=\int f_Y(y)f_{T|Y}(t|y)dy\)
      • \(f_Z(z)=1/\sqrt(2\pi)\exp(-z^2/2)\), \(f_Y(y)=\exp(-y/2)y^{n/2-1}/\Gamma(n/2)/2^{n/2}\)
      • \(f_T(t)=\ldots\)
  • 3.4 Computing Expectations by Conditioning
    • \(E[X]=E[E[X|Y]]\)
      • the inner expectation can be treated as a function of rv \(Y\)
      • \(E[X|Y=y]=\sum_x x\Pr(X=x|Y=y)=\sum_x x\frac{\Pr(X=x,Y=y)}{\Pr(Y=y)}\) or \(\int_x xf_{X|Y=y}(x)dx=\int_x x\frac{f(x,y)}{f_Y(y)}dx\)
    • Proposition 3.1 (P 112) \(Var(X) = E[Var(X|Y)] + Var[ E[X|Y]]\)
      • conditional expectation always have smaller variation than marginal variance.
      • \(E[Var(X|Y)]=E[E(X^2|Y)] - E[E(X|Y)^2]=E(X^2)-E[E(X|Y)^2]\)
      • \(Var[E(X|Y)]=E[E(X|Y)^2]-E[E(X|Y)]^2=E[E(X|Y)^2]-E(X)^2\)
    • Ex 3.14 (P 104) covariance of multinomial dist
      • \(n\) independent trials with \(r\) outcomes, \(\{N_i:i=1,\ldots,r\}\sim Multinom(p_i)\)
      • \(N_i\sim Binom(n,p_i)\)
      • \(\Pr(j|not\, i)=p_i/(1-p_i)\), so \([N_j|N_i=k]\sim Binom(n-k,p_j/(1-p_i))\)
        • \(Cov(N_i,N_j)=-np_ip_j\)
    • Ex 3.15 (P 107) number of trials needed \(N_k\) until \(k\) consecutive events in a Binomial dist with prob \(p\).
      • \(N_1\) is geometric dist with mean \(1/p\)
      • \(N_k=N_{k-1}+A_{k-1,k}\) with \(A_{k-1,k}\) number of additional trials needed
      • Let \(I_k\) be the next trial (0,1) after \(N_{k-1}\)
        • \(E[A_{k-1,k}|I_k=1]=1\), \(E[A_{k-1,k}|I_k=0]=E[N_k]\)
  • 3.5 Computing Probabilities by Conditioning
    • Example 3.23 (P 116)
      • when each of a Poisson number of events is independently classified either as being type \(k\) with probability \(p_k\), then the numbers of type \(k\) events are independent Poisson random variables with mean \(\lambda p_k\).
    • Example 3.24 (P 118): sum of independent Bernoulli rvs
      • dist of \(X=\sum_{i=1}^nX_i\), where \(X_i\sim Bern(p_i)\) are independent
      • consider \(P_k(j)=\Pr(\sum_{i=1}^kX_i=j)\) and condition on \(X_k\)
    • Example 3.29 (P 125): Ignatov’s theorem
      • \(X_i\) iid \(F(x),f(x)\)
      • \(N_k=\min\{n\geq 2: X_n=\mbox{k-th largest of }X_1,\ldots,X_n\}\)
      • \(X_{N_k}\sim F\)
      • Consider \(N=N_2\)
        • \(A_i=\{X_i\neq \mbox{2nd largest of }X_1,\ldots,X_i\}\), \(\Pr(A_i)=(i-1)/i\)
        • \(\Pr(N=n)=\Pr(A_2A_3\ldots A_{n-1}A_n^c)=1/(n(n-1))\)
        • \(f_{X_N}(x)=\sum_{n\geq 2}\frac{1}{n(n-1)}f_{X_N|N}(x|n)\)
        • \(f_{X_N|N}(x|n)\): density of 2nd largest of \(n\) iid rvs (order statistics)
        • so \(f_{X_N}(x)=f(x)\)
## sum of idependent bernoulli rvs
n = 100; p0 = 0.5; pr = 1:n/n*p0

## MC
B = 1e5; x = matrix(runif(B*n), n,B); x1 = colSums(x<pr)
dx1 = diff(c(0, ecdf(x1)(0:n)))
## cond 2
pkj = function(k,j, pr, pk1){
  if(k<=1){
   res = j*pr[1] + (1-j)*(1-pr[1])
  } else{
    if((j>0)&(j<k)){
      res = pr[k]*pk1[j] + (1-pr[k])*pk1[j+1]
    } else{
      if(j==0) res = prod(1-pr[1:k])
      if(j==k) res = prod(pr[1:k])
    }
  }
  res
}

pk1 = rep(0, n+1); pk1[1] = 1-pr[1]; pk1[2] = pr[1]
res = pk1
for(k in 2:n){
   for(j in 0:k){
     res[j+1] = pkj(k,j,pr,pk1)
   }
   pk1 = res
}

### pois
px1 = dpois(0:n,sum(pr))

### CLT: normal approx
tmp = pnorm(0:n+0.5, sum(pr), sqrt(sum(pr*(1-pr))))
nx1 = c(tmp[1], diff(tmp[-n-1]), 1-tmp[n])
nx2 = dnorm(0:n, sum(pr), sqrt(sum(pr*(1-pr))))

plot(0:n, res, type='h', lwd=2, ylab='density')
points(0:n, dx1, pch=1, col=2)
points(0:n, px1, pch=2, col=3)
points(0:n, nx1, pch=3, col=4)
## curve(dnorm(x, sum(pr), sqrt(sum(pr*(1-pr)))), add=TRUE, col=4)

c(cor(res, dx1), cor(res, px1), cor(res,nx1), cor(res,nx2))
n=100; p0 = 0.25
[1] 0.9999203 0.9975972 0.9987090 0.9987035
n=100; p0 = 0.05
[1] 0.9999932 0.9998898 0.9890155 0.9899978
n=100; p0 = 0.5
[1] 0.9999095 0.9885093 0.9997475 0.9997452


## cond 1
pkj = function(k,j, pr){
  if(k<=1){
   res = j*pr[1] + (1-j)*(1-pr[1])
  } else{
        if((j>0)&(j<k)){
      res = pr[k]*pkj(k-1,j-1,pr) + (1-pr[k])*pkj(k-1,j,pr)
    } else{
      if(j==0) res = prod(1-pr[1:k])
      if(j==k) res = prod(pr[1:k])
    }
  }
  res
}
cx1 = rep(0, n+1)
for(i in 0:n) cx1[i+1] = pkj(n,i,pr)
  • Conjugate dists (3.6.3, P 141)
    • \([X|p]\sim Binom(n,p), p\sim Beta(\alpha,\beta)\)
      • marginal dist \(\Pr(X=k)=(n;k)/B(\alpha,\beta)\int_p p^{k+\alpha-1}(1-p)^{n-k+\beta-1}dp=(n;k)\frac{B(k+\alpha,n-k+\beta)}{B(\alpha,\beta)}\)
        • \(E(X),Var(X)\): direct evaluation …
        • \(E(X)=E[E(X|p)]=E[np]=n\alpha/(\alpha+\beta)\)
        • \(Var(X)=Var[E(X|p)]+E[Var(X|p)]=Var(np)+E[np(1-p)]=(n^2-n)Var(p)+nE(p)[1-E(p)]=\cdots\)
      • conditional dist \([p|X]\propto \Pr(p)\Pr(X|p)\propto p^{X+\alpha-1}(1-p)^{n-X+\beta-1}\)
        • i.e., \([p|X]\sim Beta(X+\alpha,n-X+\beta)\)
      • for \(\alpha=\beta=1\), \(p\) is uniform rv. and \([p|X]\sim Beta(X+1,n-X+1)\)
      • if \([Y|p]\sim Binom(n_1,p)\), and \([X\perp Y|p]\)
        • \(\Pr(Y=k|X)=\int_p\Pr(Y=k|p)\Pr(p|X)dp=(n_1;k)\frac{B(k+X+\alpha,n_1-k+n-X+\beta)}{B(X+\alpha,n-X\beta)}\)
        • \(E(Y|X)=E[E[Y|p,X]]=E[E[Y|p]|X]=E[n_1p|X]=n_1\frac{X+\alpha}{n+\alpha+\beta}\)

  • 2.6 Moment Generating Functions
    • \(\phi(t)=E[e^{tX}]\) for r.v. \(X\)
      • \(\phi'(t)=E[\nabla_te^{tX}]=E[Xe^{tX}]\). So \(\phi'(0)=E[X]\)
      • In general, for \(n\)-th derivative, \(\phi^{(n)}(t)=E[X^ne^{tX}]\) and \(\phi^{(n)}(0)=E[X^n]\)
    • Common dists
      • Binom(n,p): \(\phi(t)=\sum_k e^{tk}(n,k)p^k(1-p)^{n-k}=(pe^t+1-p)^n\)
        • so \(\phi'(t)=n(pe^t+1-p)^{n-1}pe^t\), and hence \(E[X]=\phi'(0)=np\)
        • similarly \(E[X^2]=\phi''(0)=n(n-1)p^2+np\), and \(Var(X)=np(1-p)\)
      • Poisson(\(\lambda\)): \(\phi(t)=\sum_n e^{tn}e^{-\lambda}\lambda^n/n!=e^{-\lambda}e^{\lambda e^t}\)
        • \(\phi'(t)=\lambda e^te^{\lambda(e^t-1)}\), and \(E[X]=\phi'(0)=\lambda\)
        • \(E[X^2]=\phi''(0)=\lambda^2+\lambda\), \(Var(X)=\lambda\)
      • Exp(\(\lambda\)): \(\phi(t)=\int e^{tx}\lambda e^{-\lambda x}dx=\lambda/(\lambda-t), t<\lambda\)
        • \(\phi'(t)=\lambda/(\lambda-t)^2, \phi''(t)=2\lambda/(\lambda-t)^3\)
        • \(E[X]=1/\lambda, E[X^2]=2/\lambda^2, Var[X]=1/\lambda^2\)
      • Standard normal \(N(0,1)\): \(\phi(t)=\int e^{tx-x^2/2}/\sqrt{2\pi}dx=e^{t^2/2}\int e^{-(x-t)^2/2}/\sqrt{2\pi}dx=e^{t^2/2}\)
        • For \(X\sim N(\mu,\sigma^2)\): \(\phi(t)=E[e^{[t(X-mu)/\sigma]\sigma+t\mu}]=e^{t\mu+t^2\sigma^2/2}\)
        • so \(E[X]=\mu,Var[X]=\sigma^2\)
    • Ex 2.44-46: sum of independent rvs.
      • sum of independent \(Binom(n_1,p), Binom(n_2,p)\)
        • \(\phi(t)=(pe^t+1-p)^{n_1}(pe^t+1-p)^{n_2}\), i.e., \(Binom(n_1+n_2,p)\)
      • sum of independent \(Poisson(\lambda_1), Poisson(\lambda_2)\)
        • \(\phi(t)=e^{-\lambda_1}e^{\lambda_1 e^t}e^{-\lambda_2}e^{\lambda_2 e^t}\), i.e., \(Poisson(\lambda_1+\lambda_2)\)
      • sum of independent \(N(\mu_1,\sigma_1^2), N(\mu_2,\sigma_2^2)\)
        • \(\phi(t)=e^{t\mu_1+t^2\sigma_1^2/2}e^{t\mu_2+t^2\sigma_2^2/2}\), i.e., \(N(\mu_1+\mu_2,\sigma_1^2+\sigma_2^2)\)
    • Poisson approx to Binomial
      • For \(X\sim Binom(n,p)\) with small \(p\), let \(\lambda=np\),
        • \(\Pr(X=k)=n!/k!/(n-k)!/n^k\lambda^k(1-\lambda/n)^{n-k}\)
        • large \(n\) small \(p\): \(n!/(n-k)!/n^k\approx 1\), \((1-\lambda/n)^{n-k}\approx e^{-\lambda}\)
        • so \(\Pr(X=k)\approx \lambda^k/k!e^{-\lambda}\), i.e., \(X\approx Poisson(\lambda)\)
      • independent \(X_i\sim Bern(p_i)\), let \(X=\sum_{i=1}^nX_i\)
        • \(E[e^{tX_i}]=1+p_i(e^t-1)\approx e^{p_i(e^t-1)}\) for \(p_i\) small
        • so \(E[e^{tX}]\approx e^{\sum_ip_i(e^t-1)}\), i.e., \(X\approx Poisson(\sum_ip_i)\)
    • Joint MGF: multivariate normal dist \(X=(X_1,\cdots,X_n)\)
      • \(E[e^{\sum_it_iX_i}]=e^{\sum_it_iE(X_i)+\sum_i\sum_jt_it_jCov(X_i,X_j)/2}\)
      • \(\sum_it_iX_i\sim N(\sum_it_i\mu_i, T'Cov(X)T)\), where \(T=(t_1,\cdots,t_n)'\)
  • 2.8 Limit Theorems
    • Markov's Inequality
      • \(\Pr[X\geq a] \leq E[X]/a\) if rv \(X\geq 0\)
      • \(E[X]=\int_0^a xf(x)dx + \int_a^{\infty}xf(x)dx\geq \int_{x\geq a} af(x)dx=a\Pr[X\geq a]\)
    • Chebyshev's Inequality
      • \(\Pr[|X-E(X)|\geq k]\leq Var(X)/k^2, k>0\)
      • apply Markov's inequality to rv \((X-E(X))^2\)
    • Given iid \(X_i\) with mean \(\mu\) and finite variance, let \(\bar{X}_n=\sum_{i=1}^nX_i/n\)
      • weak law: \(\lim_{n\to\infty}\Pr(|\bar{X}_n-\mu|>\epsilon)=0, \forall\epsilon>0\) (convergence in probability)
        • proof by Chebyshev's inequality, \(Var(\bar{X}_n)\to 0\)
        • for large \(n\), \(\bar{X}_n\) is close to \(\mu\), but it's possible \(|\bar{X}_n-\mu|>\epsilon\) happens for infinite number of n (though approaching zero measure)
      • Strong Law of Large Numbers: with prob 1, \(\bar{X}_n\to \mu\) as \(n\to\infty\)
        • \(\Pr(\lim_{n\to\infty}\bar{X}_n=\mu)=1\)
        • \(\forall\epsilon>0,\exists N, st, |\bar{X}_n-\mu|<\epsilon, \forall n>N\)
    • Central Limit Theorem
      • given iid \(X_i\) with mean \(\mu\) and variance \(\sigma^2\)
      • let \(Z_n=\sum_{i=1}^n(X_i-\mu)/\sigma/\sqrt{n}\)
      • \(\Pr(Z_n\leq a)\approx \Phi(a)\), where \(\Phi()\) is the CDF of \(N(0,1)\)
      • For Binom(n,p), sum of \(n\) iid Bern(p), \((X-np)/\sqrt{np(1-p)}\approx N(0,1)\)
        • so \(\Pr(X=k)\approx \Phi[(k+0.5-np)/\sqrt{np(1-p)}]-\Phi[(k-0.5-np)/\sqrt{np(1-p)}]\)
  • 2.9 Stochastic Processes
    • A stochastic process \(\{X(t),t\in T\}\) is a collection of random variables.
      • For each \(t\in T\), \(X(t)\) is a random variable.
      • The index \(t\) is often interpreted as time.
      • we refer to \(X(t)\) as the state of the process at time \(t\).
      • \(T\) is called the index set of the process.
        • When \(T\) is a countable set, the stochastic process is said to be a discrete-time process.
        • If \(T\) is an interval of the real line, the stochastic process is said to be a continuous-time process.
      • The state space of a stochastic process is defined as the set of all possible values that the random variables \(X(t)\) can assume.