Probability distribution

Chapter 2-3: Random Variables and probability distributions

2.1 Random Variable \(X\) and \(Y\)
- prob density function (PDF): \(f(x)\)
- cumulative dist function(CDF): \(F(x)=\Pr(X\leq x)\)
- survival function: \(G(x)=1-F(x)\)
- harzard rate: \(h(x)=f(x)/G(x)\)
- mean/expectation: \(E(X)=\int_x xf(x)dx\)
- variance: \(Var(X)=E(X^2)-E(X)^2=E[(X-E(X))^2]\)
- covariance: \(Cov(X,Y)=E[(X-E(X))(Y-E(Y))]=E(XY)-E(X)E(Y)\)
- conditional dist: \(f_{X|Y}(x|y)=\frac{f(x,y)}{f_Y(y)}\)
- variance decomp: \(Var(X)=E[Var(X|Y)] + Var[E(X|Y)]\)
- variable transformation: for \(Y=g(X), X=h(Y)\), \(f_Y(y)=f_X(h(y))|h'(Y)|\) (determinant of Jacobian matrix)
2.2 Discrete Random Variables
- Bernoulli
- Binomial
  - sum of iid Bernoulli rvs (with same success prob \(p\)): \(X=\sum_{i=1}^nX_i\)
  - independent \(X_i|p_i\sim Bern(p_i)\), assume independently \(E(p_i)=p\) (potentially \(Var(p_i)>0\))
    - Easy to verify that marginally \(X_i\) is a Bern(p), so \(X\sim Binom(np)\).
    - Alternatively \(Var(X|p_i)=\sum_{i=1}^n p_i(1-p_i), E(X|p_i)=\sum_ip_i\)
    - \(E(X)=np, Var(X)=np(1-p)\). Note \(E(p_i^2)-Var(p_i)=E(p_i)^2=p^2\)
    - the binomial mean/var formula still holds
  - \(X|p\sim Binom(n,p)\), assume \(E(p)=p_0\)
    - \(E(X|p)=np, Var(X|p)=np(1-p)\). So \(E(X)=np_0\)
    - \(Var(X)=n^2Var(p) + nE(p) - nE(p^2)=np_0(1-p_0)+(n^2-n)Var(p)\)
    - So when \(Var(p)>0\) and \(n>1\), \(Var(X)>np_0(1-p_0)\) (over-dispersion could happen for Binom with random p)
- extension to Empirical DF (multinomial dist) based on \(n\) obs: \(\hat{F}_n\to F\)
  - its relation to Bootstrap: \(t(\hat{F}_n)\to t(F)\) for a statistic \(t()\)
  - Monte Carlo implementation of inference on \(t(\hat{F})_n\)
    - MC samples from dist \(\hat{F}_n\) is just sampling with replacement
    - compute statistic for each MC sample
- Geometric: memoryless
  - \(G(n)=(1-p)^n, G(i+j)=G(i)G(j)\)
- Poisson
  - over-dispersion in Poisson reg with heterogeneous \(\lambda\)
  - \(Var(X)=Var[E(X|\lambda)]+E[Var(X|\lambda)]=Var(\lambda)+E(\lambda)\)
  - \(E(X)=E[E(X|\lambda)]=E(\lambda)\)
  - \(Var(X)>E(X)\) when \(Var(\lambda)>0\)
  - Over-dispersion modeling: Quasi-likelihood; GEE; MM
- Hypergeometric dist
  - \(m\) black balls and \(N-m\) white balls: select \(n\) balls, let X=# of black balls (sampling from a finite population)
  - \(\Pr(X=k)=(m,k)(N-m,n-k)/(N,n), k\leq\min(m,n)\)
  - var and mean calculation: sum of \(n\) dependent Bernoulli rvs
    - \(X=\sum_{i=1}^nX_i\): \(X_i=1\) if the \(i\)-th ball is black and zero otherwise
    - marginally \(\Pr(X_i=1)=m/N\) (the \(i\)-th ball is equally likely to be any of the \(N\) balls)
    - \(E(X_iX_j)=\Pr(X_i=1,X_j=1)=\Pr(X_i=1)\Pr(X_j=1|X_i=1)=\frac{m}{N}\frac{m-1}{N-1}\)
    - \(Cov(X_i,X_j)=\frac{m}{N}\frac{m-1}{N-1}-\frac{m^2}{N^2}\)
    - For large \(N\), \(Cov(X_i,X_j)\approx 0\), i.e., \(X_i\) can be roughly treated as iid Bern(m/N), and \(X\) as Binomial(N,m/N).
  - "Enrichment analysis": assess whether the sampling is not random but enriched, say, with more black balls (expecting, \(X/n>m/N\))
    - Hypergeometric dist can be used to compute enrichment p-value (exact test)
    - Alternatively 2x2 table independence test (say, Pearson or LRT chi-square tests) can be used for p-value calculation (approx test for large counts)

2.3 Continuous Random Variables
- Uniform
- Exponential: memoryless
  - \(G(x)=e^{-\lambda x}\)
  - double Exp: \(f(x)=\frac{\lambda}{2}e^{-\lambda|x|}\)
- Gamma
  - \(f(x)=\frac{\lambda e^{-\lambda x}(\lambda x)^{\alpha-1}}{\Gamma(\alpha)}\)
  - \(\chi_n^2\) (chi-square dist with n-DF): \(\alpha=n/2,\lambda=1/2\)
- Normal
  - \(f(x)=1/\sqrt{2\pi}e^{-x^2/2}\)
  - connection to ridge regression and lasso regression
  - tuning parameters can be treated variance parameter of random effects in LMM/GLMM
  - semi-parametric reg with penalized splines: ridge type penalty
  - In general, Normal dist decays much faster than Exp dist.
- T-dist
  - \(f(x,n)=\frac{\Gamma((n+1)/2)}{\sqrt{n\pi}\Gamma(n/2)}(1+x^2/n)^{-(n+1)/2}\) (n degree-of-freedom (DF))
  - ratio of independent normal and chi-square: \(N(0,1)/\sqrt{\chi_n^2/n}\)
- F-dist
  - \(f(x,n,m)=\frac{\Gamma((n+m)/2)}{\Gamma(n/2)\Gamma(m/2)}(n/m)^{n/2}x^{n/2-1}(1+nx/m)^{-(n+m)/2}\)
  - ratio of independent chi-square rvs: \((\chi_n^2/n)/(\chi_m^2/m)\)
2.5 Joint Distribution
- joint CDF: \(F(x,y)=\Pr(X\leq x, Y\leq y)\)
- joint PDF: \(p(x,y)=\Pr(X=x,Y=y)\) (discrete dist), \(\Pr(X\in A,Y\in B)=\int_A\int_Bf(x,y)dxdy\) (continuous dist)
  - marginal PDF: \(p_X(x)=\sum_yp(x,y), f_X(x)=\int_yf(x,y)dy\)
- Independent Random Variables
  - CDF: \(F(x,y)=F_X(x)F_Y(y)\)
  - PDF: \(p(x,y)=p_X(x)p_Y(y), f(x,y)=f_X(x)f_Y(y)\)
  - given independent \(X\) and \(Y\): \(E[g(X)h(Y)]=E[g(X)]E[h(Y)],\forall g,h\)
- Covariance and Variance of Sums of Random Variables
  - Ex 2.35 (p 51): hypergeometric dist
    - \(\Pr(X=k)=(m,k)(N-m,n-k)/(N,n)\)
    - \(m\) black balls and \(N-m\) white balls: select \(n\) balls, # of black balls (sampling from a finite population)
    - \(X=\sum_{i=1}^nX_i\): \(X_i=1\) if the \(i\)-th ball is black and zero otherwise
    - marginally \(\Pr(X_i=1)=m/N\) (the \(i\)-th ball is equally likely to be any of the \(N\) balls)
  - Ex 2.37 (p 53): sum of independent Poisson is still Poisson
    - \(X\sim Poisson(\lambda_1), Y\sim Poisson(\lambda_2)\)
    - \(\Pr(X+y=n)=\sum_{k=0}^n\Pr(X=k,Y=n-k)=\)
  - Ex 2.38 (p 54): order statistics
    - \(X_i\) are iid rvs with dist \(F(x), f(x)\) (i=1,…,n)
    - \(X_{(i)}\) denote the \(i\)-th smallest among \(X_i\)'s.
    - \(\Pr(X_{(i)}\leq x)=\sum_{k=i}^n(n,k)[F(x)]^k[1-F(x)]^{n-k}\) (at least i of them are smaller than x)
    - differential: \(f_{X_{(i)}}(x)=(n; i-1,1,n-i)f(x)[F(x)]^{i-1}[1-F(x)]^{n-i}\)
      - multinomial coefficient \((n; i-1,1,n-i)=\frac{n!}{(n-i)!(i-1)!}\)
- Variable transformation: dist of functions of rvs
  - random vector \(X\) with PDF \(f(x)\). Let \(Y=g(X)\), and \(X=h(Y)\) (\(h=g^{-1}\)).
    - \(\nabla_YX=(\partial h(Y)/\partial Y)\)
    - \(f_{Y}(y)=f_X(x(y))|\nabla_YX|\) (P 56)
  - Ex: given \(X\sim N(0,1)\). Let \(Y=X^2\)
    - \(X=\sqrt{Y}, \nabla_YX=\frac{1}{2\sqrt{Y}}\) (consider only positive X)
    - \(2\frac{1}{\sqrt{2\pi}}e^{-y/2}/(2\sqrt{Y})=\frac{1}{\sqrt{2\pi}}y^{-1/2}e^{-y/2}\) (\(\chi_2^1\) dist)
  - Ex 2.39 (p 56): independent gamma to Beta
    - \(X\sim Gamma(\alpha,\lambda), Y\sim Gamma(\beta,\lambda)\). Study dist of \(X/(X+Y)\)
    - consider \(U=X+Y,V=X/U\). So \(X=UV,Y=U(1-V)\)

3.1-3.3 Conditional distributions
- discrete dist: \(p_{X|Y}(x|y) = \frac{p(x,y)}{p_Y{y}}\), where \(p_Y(y)=\Pr(Y=y), p(x,y)=\Pr(X=x,Y=y)\) (cond prob mass function)
- continuous dist: \(f_{X|Y}(x|y)=\frac{f(x,y)}{f_Y(y)}\) (cond PDF)
  - \(f(x,y)\) is the joint PDF of \(X,Y\) and \(f_Y(y)=\int_xf(x,y)dx\) is the marginal PDF of \(Y\)
3.2 Discrete dist
- Ex 3.2 (p 95), independent Binoms to hypergeometric
  - \(X_1\sim\) Binom(\(n_1\),p) \(\perp X_2\sim\) Binom(\(n_2,p)\): \([X_1|X_1+X_2=m]\sim\) hypergeometric dist
  - \(X_1+X_2\sim Binom(n_1+n_2,p)\) (using iid Bernoulli decomposition)
  - \(\Pr(X_1=k|X_1+X_2=m)=\frac{\Pr(X_1=k,X_2=m-k)}{\Pr(X_1+X_2=m)}=(n_1,k)(n_2,m-k)/(n_1+n_2,m)\)
  - mean and var: \(mn_1/(n_1+n_2)\), \(mn_1n_2(n_1+n_2-m)/(n_1+n_2)^2/(n_1+n_2-1)\)
    - derivation of mean:
- Ex 3.3 (P 96), independent Poissons to Binom
  - \(X_1\sim\) Poisson(\(\lambda_1)\perp X_2\sim\) Poisson(\(\lambda_2)\): \([X_1|X_1+X_2=m]\sim\) Binom(m,\(\lambda_1/(\lambda_1+\lambda_2)\))
  - \(X_1+X_2\sim Poisson(\lambda_1+\lambda_2)\)
  - \(\Pr(X_1=k|X_1+X_2=m)=\frac{\Pr(X_1=k,X_2=m-k)}{\Pr(X_1+X_2=m)}=\)
3.3 Continuous dist
- Ex 3.6 (P 98), t-dist from convoluting normal and chi-square dist
  - \(T=Z/\sqrt{Y/n}\), where \(Z\sim N(0,1), Y\sim \chi^2_n\)
  - \(f_T(t)=\int f_{T,Y}(t,y)dy=\int f_Y(y)f_{T|Y}(t|y)dy\)
  - \(f_Z(z)=1/\sqrt(2\pi)\exp(-z^2/2)\), \(f_Y(y)=\exp(-y/2)y^{n/2-1}/\Gamma(n/2)/2^{n/2}\)
  - \(f_T(t)=\ldots\)
3.4 Computing Expectations by Conditioning
- \(E[X]=E[E[X|Y]]\)
  - the inner expectation can be treated as a function of rv \(Y\)
  - \(E[X|Y=y]=\sum_x x\Pr(X=x|Y=y)=\sum_x x\frac{\Pr(X=x,Y=y)}{\Pr(Y=y)}\) or \(\int_x xf_{X|Y=y}(x)dx=\int_x x\frac{f(x,y)}{f_Y(y)}dx\)
- Proposition 3.1 (P 112) \(Var(X) = E[Var(X|Y)] + Var[ E[X|Y]]\)
  - conditional expectation always have smaller variation than marginal variance.
  - \(E[Var(X|Y)]=E[E(X^2|Y)] - E[E(X|Y)^2]=E(X^2)-E[E(X|Y)^2]\)
  - \(Var[E(X|Y)]=E[E(X|Y)^2]-E[E(X|Y)]^2=E[E(X|Y)^2]-E(X)^2\)
- Ex 3.14 (P 104) covariance of multinomial dist
  - \(n\) independent trials with \(r\) outcomes, \(\{N_i:i=1,\ldots,r\}\sim Multinom(p_i)\)
  - \(N_i\sim Binom(n,p_i)\)
  - \(\Pr(j|not\, i)=p_i/(1-p_i)\), so \([N_j|N_i=k]\sim Binom(n-k,p_j/(1-p_i))\)
    - \(Cov(N_i,N_j)=-np_ip_j\)
- Ex 3.15 (P 107) number of trials needed \(N_k\) until \(k\) consecutive events in a Binomial dist with prob \(p\).
  - \(N_1\) is geometric dist with mean \(1/p\)
  - \(N_k=N_{k-1}+A_{k-1,k}\) with \(A_{k-1,k}\) number of additional trials needed
  - Let \(I_k\) be the next trial (0,1) after \(N_{k-1}\)
    - \(E[A_{k-1,k}|I_k=1]=1\), \(E[A_{k-1,k}|I_k=0]=E[N_k]\)
3.5 Computing Probabilities by Conditioning
- Example 3.23 (P 116)
  - when each of a Poisson number of events is independently classified either as being type \(k\) with probability \(p_k\), then the numbers of type \(k\) events are independent Poisson random variables with mean \(\lambda p_k\).
- Example 3.24 (P 118): sum of independent Bernoulli rvs
  - dist of \(X=\sum_{i=1}^nX_i\), where \(X_i\sim Bern(p_i)\) are independent
  - consider \(P_k(j)=\Pr(\sum_{i=1}^kX_i=j)\) and condition on \(X_k\)
- Example 3.29 (P 125): Ignatov’s theorem
  - \(X_i\) iid \(F(x),f(x)\)
  - \(N_k=\min\{n\geq 2: X_n=\mbox{k-th largest of }X_1,\ldots,X_n\}\)
  - \(X_{N_k}\sim F\)
  - Consider \(N=N_2\)
    - \(A_i=\{X_i\neq \mbox{2nd largest of }X_1,\ldots,X_i\}\), \(\Pr(A_i)=(i-1)/i\)
    - \(\Pr(N=n)=\Pr(A_2A_3\ldots A_{n-1}A_n^c)=1/(n(n-1))\)
    - \(f_{X_N}(x)=\sum_{n\geq 2}\frac{1}{n(n-1)}f_{X_N|N}(x|n)\)
    - \(f_{X_N|N}(x|n)\): density of 2nd largest of \(n\) iid rvs (order statistics)
    - so \(f_{X_N}(x)=f(x)\)

## sum of idependent bernoulli rvs
n = 100; p0 = 0.5; pr = 1:n/n*p0

## MC
B = 1e5; x = matrix(runif(B*n), n,B); x1 = colSums(x<pr)
dx1 = diff(c(0, ecdf(x1)(0:n)))
## cond 2
pkj = function(k,j, pr, pk1){
  if(k<=1){
   res = j*pr[1] + (1-j)*(1-pr[1])
  } else{
    if((j>0)&(j<k)){
      res = pr[k]*pk1[j] + (1-pr[k])*pk1[j+1]
    } else{
      if(j==0) res = prod(1-pr[1:k])
      if(j==k) res = prod(pr[1:k])
    }
  }
  res
}

pk1 = rep(0, n+1); pk1[1] = 1-pr[1]; pk1[2] = pr[1]
res = pk1
for(k in 2:n){
   for(j in 0:k){
     res[j+1] = pkj(k,j,pr,pk1)
   }
   pk1 = res
}

### pois
px1 = dpois(0:n,sum(pr))

### CLT: normal approx
tmp = pnorm(0:n+0.5, sum(pr), sqrt(sum(pr*(1-pr))))
nx1 = c(tmp[1], diff(tmp[-n-1]), 1-tmp[n])
nx2 = dnorm(0:n, sum(pr), sqrt(sum(pr*(1-pr))))

plot(0:n, res, type='h', lwd=2, ylab='density')
points(0:n, dx1, pch=1, col=2)
points(0:n, px1, pch=2, col=3)
points(0:n, nx1, pch=3, col=4)
## curve(dnorm(x, sum(pr), sqrt(sum(pr*(1-pr)))), add=TRUE, col=4)

c(cor(res, dx1), cor(res, px1), cor(res,nx1), cor(res,nx2))
n=100; p0 = 0.25
[1] 0.9999203 0.9975972 0.9987090 0.9987035
n=100; p0 = 0.05
[1] 0.9999932 0.9998898 0.9890155 0.9899978
n=100; p0 = 0.5
[1] 0.9999095 0.9885093 0.9997475 0.9997452


## cond 1
pkj = function(k,j, pr){
  if(k<=1){
   res = j*pr[1] + (1-j)*(1-pr[1])
  } else{
        if((j>0)&(j<k)){
      res = pr[k]*pkj(k-1,j-1,pr) + (1-pr[k])*pkj(k-1,j,pr)
    } else{
      if(j==0) res = prod(1-pr[1:k])
      if(j==k) res = prod(pr[1:k])
    }
  }
  res
}
cx1 = rep(0, n+1)
for(i in 0:n) cx1[i+1] = pkj(n,i,pr)

Conjugate dists (3.6.3, P 141)
- \([X|p]\sim Binom(n,p), p\sim Beta(\alpha,\beta)\)
  - marginal dist \(\Pr(X=k)=(n;k)/B(\alpha,\beta)\int_p p^{k+\alpha-1}(1-p)^{n-k+\beta-1}dp=(n;k)\frac{B(k+\alpha,n-k+\beta)}{B(\alpha,\beta)}\)
    - \(E(X),Var(X)\): direct evaluation …
    - \(E(X)=E[E(X|p)]=E[np]=n\alpha/(\alpha+\beta)\)
    - \(Var(X)=Var[E(X|p)]+E[Var(X|p)]=Var(np)+E[np(1-p)]=(n^2-n)Var(p)+nE(p)[1-E(p)]=\cdots\)
  - conditional dist \([p|X]\propto \Pr(p)\Pr(X|p)\propto p^{X+\alpha-1}(1-p)^{n-X+\beta-1}\)
    - i.e., \([p|X]\sim Beta(X+\alpha,n-X+\beta)\)
  - for \(\alpha=\beta=1\), \(p\) is uniform rv. and \([p|X]\sim Beta(X+1,n-X+1)\)
  - if \([Y|p]\sim Binom(n_1,p)\), and \([X\perp Y|p]\)
    - \(\Pr(Y=k|X)=\int_p\Pr(Y=k|p)\Pr(p|X)dp=(n_1;k)\frac{B(k+X+\alpha,n_1-k+n-X+\beta)}{B(X+\alpha,n-X\beta)}\)
    - \(E(Y|X)=E[E[Y|p,X]]=E[E[Y|p]|X]=E[n_1p|X]=n_1\frac{X+\alpha}{n+\alpha+\beta}\)

2.6 Moment Generating Functions
- \(\phi(t)=E[e^{tX}]\) for r.v. \(X\)
  - \(\phi'(t)=E[\nabla_te^{tX}]=E[Xe^{tX}]\). So \(\phi'(0)=E[X]\)
  - In general, for \(n\)-th derivative, \(\phi^{(n)}(t)=E[X^ne^{tX}]\) and \(\phi^{(n)}(0)=E[X^n]\)
- Common dists
  - Binom(n,p): \(\phi(t)=\sum_k e^{tk}(n,k)p^k(1-p)^{n-k}=(pe^t+1-p)^n\)
    - so \(\phi'(t)=n(pe^t+1-p)^{n-1}pe^t\), and hence \(E[X]=\phi'(0)=np\)
    - similarly \(E[X^2]=\phi''(0)=n(n-1)p^2+np\), and \(Var(X)=np(1-p)\)
  - Poisson(\(\lambda\)): \(\phi(t)=\sum_n e^{tn}e^{-\lambda}\lambda^n/n!=e^{-\lambda}e^{\lambda e^t}\)
    - \(\phi'(t)=\lambda e^te^{\lambda(e^t-1)}\), and \(E[X]=\phi'(0)=\lambda\)
    - \(E[X^2]=\phi''(0)=\lambda^2+\lambda\), \(Var(X)=\lambda\)
  - Exp(\(\lambda\)): \(\phi(t)=\int e^{tx}\lambda e^{-\lambda x}dx=\lambda/(\lambda-t), t<\lambda\)
    - \(\phi'(t)=\lambda/(\lambda-t)^2, \phi''(t)=2\lambda/(\lambda-t)^3\)
    - \(E[X]=1/\lambda, E[X^2]=2/\lambda^2, Var[X]=1/\lambda^2\)
  - Standard normal \(N(0,1)\): \(\phi(t)=\int e^{tx-x^2/2}/\sqrt{2\pi}dx=e^{t^2/2}\int e^{-(x-t)^2/2}/\sqrt{2\pi}dx=e^{t^2/2}\)
    - For \(X\sim N(\mu,\sigma^2)\): \(\phi(t)=E[e^{[t(X-mu)/\sigma]\sigma+t\mu}]=e^{t\mu+t^2\sigma^2/2}\)
    - so \(E[X]=\mu,Var[X]=\sigma^2\)
- Ex 2.44-46: sum of independent rvs.
  - sum of independent \(Binom(n_1,p), Binom(n_2,p)\)
    - \(\phi(t)=(pe^t+1-p)^{n_1}(pe^t+1-p)^{n_2}\), i.e., \(Binom(n_1+n_2,p)\)
  - sum of independent \(Poisson(\lambda_1), Poisson(\lambda_2)\)
    - \(\phi(t)=e^{-\lambda_1}e^{\lambda_1 e^t}e^{-\lambda_2}e^{\lambda_2 e^t}\), i.e., \(Poisson(\lambda_1+\lambda_2)\)
  - sum of independent \(N(\mu_1,\sigma_1^2), N(\mu_2,\sigma_2^2)\)
    - \(\phi(t)=e^{t\mu_1+t^2\sigma_1^2/2}e^{t\mu_2+t^2\sigma_2^2/2}\), i.e., \(N(\mu_1+\mu_2,\sigma_1^2+\sigma_2^2)\)
- Poisson approx to Binomial
  - For \(X\sim Binom(n,p)\) with small \(p\), let \(\lambda=np\),
    - \(\Pr(X=k)=n!/k!/(n-k)!/n^k\lambda^k(1-\lambda/n)^{n-k}\)
    - large \(n\) small \(p\): \(n!/(n-k)!/n^k\approx 1\), \((1-\lambda/n)^{n-k}\approx e^{-\lambda}\)
    - so \(\Pr(X=k)\approx \lambda^k/k!e^{-\lambda}\), i.e., \(X\approx Poisson(\lambda)\)
  - independent \(X_i\sim Bern(p_i)\), let \(X=\sum_{i=1}^nX_i\)
    - \(E[e^{tX_i}]=1+p_i(e^t-1)\approx e^{p_i(e^t-1)}\) for \(p_i\) small
    - so \(E[e^{tX}]\approx e^{\sum_ip_i(e^t-1)}\), i.e., \(X\approx Poisson(\sum_ip_i)\)
- Joint MGF: multivariate normal dist \(X=(X_1,\cdots,X_n)\)
  - \(E[e^{\sum_it_iX_i}]=e^{\sum_it_iE(X_i)+\sum_i\sum_jt_it_jCov(X_i,X_j)/2}\)
  - \(\sum_it_iX_i\sim N(\sum_it_i\mu_i, T'Cov(X)T)\), where \(T=(t_1,\cdots,t_n)'\)
2.8 Limit Theorems
- Markov's Inequality
  - \(\Pr[X\geq a] \leq E[X]/a\) if rv \(X\geq 0\)
  - \(E[X]=\int_0^a xf(x)dx + \int_a^{\infty}xf(x)dx\geq \int_{x\geq a} af(x)dx=a\Pr[X\geq a]\)
- Chebyshev's Inequality
  - \(\Pr[|X-E(X)|\geq k]\leq Var(X)/k^2, k>0\)
  - apply Markov's inequality to rv \((X-E(X))^2\)
- Given iid \(X_i\) with mean \(\mu\) and finite variance, let \(\bar{X}_n=\sum_{i=1}^nX_i/n\)
  - weak law: \(\lim_{n\to\infty}\Pr(|\bar{X}_n-\mu|>\epsilon)=0, \forall\epsilon>0\) (convergence in probability)
    - proof by Chebyshev's inequality, \(Var(\bar{X}_n)\to 0\)
    - for large \(n\), \(\bar{X}_n\) is close to \(\mu\), but it's possible \(|\bar{X}_n-\mu|>\epsilon\) happens for infinite number of n (though approaching zero measure)
  - Strong Law of Large Numbers: with prob 1, \(\bar{X}_n\to \mu\) as \(n\to\infty\)
    - \(\Pr(\lim_{n\to\infty}\bar{X}_n=\mu)=1\)
    - \(\forall\epsilon>0,\exists N, st, |\bar{X}_n-\mu|<\epsilon, \forall n>N\)
- Central Limit Theorem
  - given iid \(X_i\) with mean \(\mu\) and variance \(\sigma^2\)
  - let \(Z_n=\sum_{i=1}^n(X_i-\mu)/\sigma/\sqrt{n}\)
  - \(\Pr(Z_n\leq a)\approx \Phi(a)\), where \(\Phi()\) is the CDF of \(N(0,1)\)
  - For Binom(n,p), sum of \(n\) iid Bern(p), \((X-np)/\sqrt{np(1-p)}\approx N(0,1)\)
    - so \(\Pr(X=k)\approx \Phi[(k+0.5-np)/\sqrt{np(1-p)}]-\Phi[(k-0.5-np)/\sqrt{np(1-p)}]\)
2.9 Stochastic Processes
- A stochastic process \(\{X(t),t\in T\}\) is a collection of random variables.
  - For each \(t\in T\), \(X(t)\) is a random variable.
  - The index \(t\) is often interpreted as time.
  - we refer to \(X(t)\) as the state of the process at time \(t\).
  - \(T\) is called the index set of the process.
    - When \(T\) is a countable set, the stochastic process is said to be a discrete-time process.
    - If \(T\) is an interval of the real line, the stochastic process is said to be a continuous-time process.
  - The state space of a stochastic process is defined as the set of all possible values that the random variables \(X(t)\) can assume.