- 2.1 Random Variable \(X\) and \(Y\)
- prob density function (PDF): \(f(x)\)
- cumulative dist function(CDF): \(F(x)=\Pr(X\leq x)\)
- survival function: \(G(x)=1-F(x)\)
- harzard rate: \(h(x)=f(x)/G(x)\)
- mean/expectation: \(E(X)=\int_x xf(x)dx\)
- variance: \(Var(X)=E(X^2)-E(X)^2=E[(X-E(X))^2]\)
- covariance: \(Cov(X,Y)=E[(X-E(X))(Y-E(Y))]=E(XY)-E(X)E(Y)\)
- conditional dist: \(f_{X|Y}(x|y)=\frac{f(x,y)}{f_Y(y)}\)
- variance decomp: \(Var(X)=E[Var(X|Y)] + Var[E(X|Y)]\)
- variable transformation: for \(Y=g(X), X=h(Y)\), \(f_Y(y)=f_X(h(y))|h'(Y)|\) (determinant of Jacobian matrix)
- 2.2 Discrete Random Variables
- Bernoulli
- Binomial
- sum of iid Bernoulli rvs (with same success prob \(p\)): \(X=\sum_{i=1}^nX_i\)
- independent \(X_i|p_i\sim Bern(p_i)\), assume independently \(E(p_i)=p\) (potentially \(Var(p_i)>0\))
- Easy to verify that marginally \(X_i\) is a Bern(p), so \(X\sim Binom(np)\).
- Alternatively \(Var(X|p_i)=\sum_{i=1}^n p_i(1-p_i), E(X|p_i)=\sum_ip_i\)
- \(E(X)=np, Var(X)=np(1-p)\). Note \(E(p_i^2)-Var(p_i)=E(p_i)^2=p^2\)
- the binomial mean/var formula still holds
- \(X|p\sim Binom(n,p)\), assume \(E(p)=p_0\)
- \(E(X|p)=np, Var(X|p)=np(1-p)\). So \(E(X)=np_0\)
- \(Var(X)=n^2Var(p) + nE(p) - nE(p^2)=np_0(1-p_0)+(n^2-n)Var(p)\)
- So when \(Var(p)>0\) and \(n>1\), \(Var(X)>np_0(1-p_0)\) (over-dispersion could happen for Binom with random p)
- extension to Empirical DF (multinomial dist) based on \(n\) obs: \(\hat{F}_n\to F\)
- its relation to Bootstrap: \(t(\hat{F}_n)\to t(F)\) for a statistic \(t()\)
- Monte Carlo implementation of inference on \(t(\hat{F})_n\)
- MC samples from dist \(\hat{F}_n\) is just sampling with replacement
- compute statistic for each MC sample
- Geometric: memoryless
- \(G(n)=(1-p)^n, G(i+j)=G(i)G(j)\)
- Poisson
- over-dispersion in Poisson reg with heterogeneous \(\lambda\)
- \(Var(X)=Var[E(X|\lambda)]+E[Var(X|\lambda)]=Var(\lambda)+E(\lambda)\)
- \(E(X)=E[E(X|\lambda)]=E(\lambda)\)
- \(Var(X)>E(X)\) when \(Var(\lambda)>0\)
- Over-dispersion modeling: Quasi-likelihood; GEE; MM
- Hypergeometric dist
- \(m\) black balls and \(N-m\) white balls: select \(n\) balls, let X=# of black balls (sampling from a finite population)
- \(\Pr(X=k)=(m,k)(N-m,n-k)/(N,n), k\leq\min(m,n)\)
- var and mean calculation: sum of \(n\) dependent Bernoulli rvs
- \(X=\sum_{i=1}^nX_i\): \(X_i=1\) if the \(i\)-th ball is black and zero otherwise
- marginally \(\Pr(X_i=1)=m/N\) (the \(i\)-th ball is equally likely to be any of the \(N\) balls)
- \(E(X_iX_j)=\Pr(X_i=1,X_j=1)=\Pr(X_i=1)\Pr(X_j=1|X_i=1)=\frac{m}{N}\frac{m-1}{N-1}\)
- \(Cov(X_i,X_j)=\frac{m}{N}\frac{m-1}{N-1}-\frac{m^2}{N^2}\)
- For large \(N\), \(Cov(X_i,X_j)\approx 0\), i.e., \(X_i\) can be roughly treated as iid Bern(m/N), and \(X\) as Binomial(N,m/N).
- "Enrichment analysis": assess whether the sampling is not random but enriched, say, with more black balls (expecting, \(X/n>m/N\))
- Hypergeometric dist can be used to compute enrichment p-value (exact test)
- Alternatively 2x2 table independence test (say, Pearson or LRT chi-square tests) can be used for p-value calculation (approx test for large counts)
- 2.3 Continuous Random Variables
- Uniform
- Exponential: memoryless
- \(G(x)=e^{-\lambda x}\)
- double Exp: \(f(x)=\frac{\lambda}{2}e^{-\lambda|x|}\)
- Gamma
- \(f(x)=\frac{\lambda e^{-\lambda x}(\lambda x)^{\alpha-1}}{\Gamma(\alpha)}\)
- \(\chi_n^2\) (chi-square dist with n-DF): \(\alpha=n/2,\lambda=1/2\)
- Normal
- \(f(x)=1/\sqrt{2\pi}e^{-x^2/2}\)
- connection to ridge regression and lasso regression
- tuning parameters can be treated variance parameter of random effects in LMM/GLMM
- semi-parametric reg with penalized splines: ridge type penalty
- In general, Normal dist decays much faster than Exp dist.
- T-dist
- \(f(x,n)=\frac{\Gamma((n+1)/2)}{\sqrt{n\pi}\Gamma(n/2)}(1+x^2/n)^{-(n+1)/2}\) (n degree-of-freedom (DF))
- ratio of independent normal and chi-square: \(N(0,1)/\sqrt{\chi_n^2/n}\)
- F-dist
- \(f(x,n,m)=\frac{\Gamma((n+m)/2)}{\Gamma(n/2)\Gamma(m/2)}(n/m)^{n/2}x^{n/2-1}(1+nx/m)^{-(n+m)/2}\)
- ratio of independent chi-square rvs: \((\chi_n^2/n)/(\chi_m^2/m)\)
- 2.5 Joint Distribution
- joint CDF: \(F(x,y)=\Pr(X\leq x, Y\leq y)\)
- joint PDF: \(p(x,y)=\Pr(X=x,Y=y)\) (discrete dist), \(\Pr(X\in A,Y\in B)=\int_A\int_Bf(x,y)dxdy\) (continuous dist)
- marginal PDF: \(p_X(x)=\sum_yp(x,y), f_X(x)=\int_yf(x,y)dy\)
- Independent Random Variables
- CDF: \(F(x,y)=F_X(x)F_Y(y)\)
- PDF: \(p(x,y)=p_X(x)p_Y(y), f(x,y)=f_X(x)f_Y(y)\)
- given independent \(X\) and \(Y\): \(E[g(X)h(Y)]=E[g(X)]E[h(Y)],\forall g,h\)
- Covariance and Variance of Sums of Random Variables
- Ex 2.35 (p 51): hypergeometric dist
- \(\Pr(X=k)=(m,k)(N-m,n-k)/(N,n)\)
- \(m\) black balls and \(N-m\) white balls: select \(n\) balls, # of black balls (sampling from a finite population)
- \(X=\sum_{i=1}^nX_i\): \(X_i=1\) if the \(i\)-th ball is black and zero otherwise
- marginally \(\Pr(X_i=1)=m/N\) (the \(i\)-th ball is equally likely to be any of the \(N\) balls)
- Ex 2.37 (p 53): sum of independent Poisson is still Poisson
- \(X\sim Poisson(\lambda_1), Y\sim Poisson(\lambda_2)\)
- \(\Pr(X+y=n)=\sum_{k=0}^n\Pr(X=k,Y=n-k)=\)
- Ex 2.38 (p 54): order statistics
- \(X_i\) are iid rvs with dist \(F(x), f(x)\) (i=1,…,n)
- \(X_{(i)}\) denote the \(i\)-th smallest among \(X_i\)'s.
- \(\Pr(X_{(i)}\leq x)=\sum_{k=i}^n(n,k)[F(x)]^k[1-F(x)]^{n-k}\) (at least i of them are smaller than x)
- differential: \(f_{X_{(i)}}(x)=(n; i-1,1,n-i)f(x)[F(x)]^{i-1}[1-F(x)]^{n-i}\)
- multinomial coefficient \((n; i-1,1,n-i)=\frac{n!}{(n-i)!(i-1)!}\)
- Variable transformation: dist of functions of rvs
- random vector \(X\) with PDF \(f(x)\). Let \(Y=g(X)\), and \(X=h(Y)\) (\(h=g^{-1}\)).
- \(\nabla_YX=(\partial h(Y)/\partial Y)\)
- \(f_{Y}(y)=f_X(x(y))|\nabla_YX|\) (P 56)
- Ex: given \(X\sim N(0,1)\). Let \(Y=X^2\)
- \(X=\sqrt{Y}, \nabla_YX=\frac{1}{2\sqrt{Y}}\) (consider only positive X)
- \(2\frac{1}{\sqrt{2\pi}}e^{-y/2}/(2\sqrt{Y})=\frac{1}{\sqrt{2\pi}}y^{-1/2}e^{-y/2}\) (\(\chi_2^1\) dist)
- Ex 2.39 (p 56): independent gamma to Beta
- \(X\sim Gamma(\alpha,\lambda), Y\sim Gamma(\beta,\lambda)\). Study dist of \(X/(X+Y)\)
- consider \(U=X+Y,V=X/U\). So \(X=UV,Y=U(1-V)\)
- 3.1-3.3 Conditional distributions
- discrete dist: \(p_{X|Y}(x|y) = \frac{p(x,y)}{p_Y{y}}\), where \(p_Y(y)=\Pr(Y=y), p(x,y)=\Pr(X=x,Y=y)\) (cond prob mass function)
- continuous dist: \(f_{X|Y}(x|y)=\frac{f(x,y)}{f_Y(y)}\) (cond PDF)
- \(f(x,y)\) is the joint PDF of \(X,Y\) and \(f_Y(y)=\int_xf(x,y)dx\) is the marginal PDF of \(Y\)
- 3.2 Discrete dist
- Ex 3.2 (p 95), independent Binoms to hypergeometric
- \(X_1\sim\) Binom(\(n_1\),p) \(\perp X_2\sim\) Binom(\(n_2,p)\): \([X_1|X_1+X_2=m]\sim\) hypergeometric dist
- \(X_1+X_2\sim Binom(n_1+n_2,p)\) (using iid Bernoulli decomposition)
- \(\Pr(X_1=k|X_1+X_2=m)=\frac{\Pr(X_1=k,X_2=m-k)}{\Pr(X_1+X_2=m)}=(n_1,k)(n_2,m-k)/(n_1+n_2,m)\)
- mean and var: \(mn_1/(n_1+n_2)\), \(mn_1n_2(n_1+n_2-m)/(n_1+n_2)^2/(n_1+n_2-1)\)
- Ex 3.3 (P 96), independent Poissons to Binom
- \(X_1\sim\) Poisson(\(\lambda_1)\perp X_2\sim\) Poisson(\(\lambda_2)\): \([X_1|X_1+X_2=m]\sim\) Binom(m,\(\lambda_1/(\lambda_1+\lambda_2)\))
- \(X_1+X_2\sim Poisson(\lambda_1+\lambda_2)\)
- \(\Pr(X_1=k|X_1+X_2=m)=\frac{\Pr(X_1=k,X_2=m-k)}{\Pr(X_1+X_2=m)}=\)
- 3.3 Continuous dist
- Ex 3.6 (P 98), t-dist from convoluting normal and chi-square dist
- \(T=Z/\sqrt{Y/n}\), where \(Z\sim N(0,1), Y\sim \chi^2_n\)
- \(f_T(t)=\int f_{T,Y}(t,y)dy=\int f_Y(y)f_{T|Y}(t|y)dy\)
- \(f_Z(z)=1/\sqrt(2\pi)\exp(-z^2/2)\), \(f_Y(y)=\exp(-y/2)y^{n/2-1}/\Gamma(n/2)/2^{n/2}\)
- \(f_T(t)=\ldots\)
- 3.4 Computing Expectations by Conditioning
- \(E[X]=E[E[X|Y]]\)
- the inner expectation can be treated as a function of rv \(Y\)
- \(E[X|Y=y]=\sum_x x\Pr(X=x|Y=y)=\sum_x x\frac{\Pr(X=x,Y=y)}{\Pr(Y=y)}\) or \(\int_x xf_{X|Y=y}(x)dx=\int_x x\frac{f(x,y)}{f_Y(y)}dx\)
- Proposition 3.1 (P 112) \(Var(X) = E[Var(X|Y)] + Var[ E[X|Y]]\)
- conditional expectation always have smaller variation than marginal variance.
- \(E[Var(X|Y)]=E[E(X^2|Y)] - E[E(X|Y)^2]=E(X^2)-E[E(X|Y)^2]\)
- \(Var[E(X|Y)]=E[E(X|Y)^2]-E[E(X|Y)]^2=E[E(X|Y)^2]-E(X)^2\)
- Ex 3.14 (P 104) covariance of multinomial dist
- \(n\) independent trials with \(r\) outcomes, \(\{N_i:i=1,\ldots,r\}\sim Multinom(p_i)\)
- \(N_i\sim Binom(n,p_i)\)
- \(\Pr(j|not\, i)=p_i/(1-p_i)\), so \([N_j|N_i=k]\sim Binom(n-k,p_j/(1-p_i))\)
- \(Cov(N_i,N_j)=-np_ip_j\)
- Ex 3.15 (P 107) number of trials needed \(N_k\) until \(k\) consecutive events in a Binomial dist with prob \(p\).
- \(N_1\) is geometric dist with mean \(1/p\)
- \(N_k=N_{k-1}+A_{k-1,k}\) with \(A_{k-1,k}\) number of additional trials needed
- Let \(I_k\) be the next trial (0,1) after \(N_{k-1}\)
- \(E[A_{k-1,k}|I_k=1]=1\), \(E[A_{k-1,k}|I_k=0]=E[N_k]\)
- 3.5 Computing Probabilities by Conditioning
- Example 3.23 (P 116)
- when each of a Poisson number of events is independently classified either as being type \(k\) with probability \(p_k\),
then the numbers of type \(k\) events are independent Poisson random variables with mean \(\lambda p_k\).
- Example 3.24 (P 118): sum of independent Bernoulli rvs
- dist of \(X=\sum_{i=1}^nX_i\), where \(X_i\sim Bern(p_i)\) are independent
- consider \(P_k(j)=\Pr(\sum_{i=1}^kX_i=j)\) and condition on \(X_k\)
- Example 3.29 (P 125): Ignatov’s theorem
- \(X_i\) iid \(F(x),f(x)\)
- \(N_k=\min\{n\geq 2: X_n=\mbox{k-th largest of }X_1,\ldots,X_n\}\)
- \(X_{N_k}\sim F\)
- Consider \(N=N_2\)
- \(A_i=\{X_i\neq \mbox{2nd largest of }X_1,\ldots,X_i\}\), \(\Pr(A_i)=(i-1)/i\)
- \(\Pr(N=n)=\Pr(A_2A_3\ldots A_{n-1}A_n^c)=1/(n(n-1))\)
- \(f_{X_N}(x)=\sum_{n\geq 2}\frac{1}{n(n-1)}f_{X_N|N}(x|n)\)
- \(f_{X_N|N}(x|n)\): density of 2nd largest of \(n\) iid rvs (order statistics)
- so \(f_{X_N}(x)=f(x)\)
## sum of idependent bernoulli rvs
n = 100; p0 = 0.5; pr = 1:n/n*p0
## MC
B = 1e5; x = matrix(runif(B*n), n,B); x1 = colSums(x<pr)
dx1 = diff(c(0, ecdf(x1)(0:n)))
## cond 2
pkj = function(k,j, pr, pk1){
if(k<=1){
res = j*pr[1] + (1-j)*(1-pr[1])
} else{
if((j>0)&(j<k)){
res = pr[k]*pk1[j] + (1-pr[k])*pk1[j+1]
} else{
if(j==0) res = prod(1-pr[1:k])
if(j==k) res = prod(pr[1:k])
}
}
res
}
pk1 = rep(0, n+1); pk1[1] = 1-pr[1]; pk1[2] = pr[1]
res = pk1
for(k in 2:n){
for(j in 0:k){
res[j+1] = pkj(k,j,pr,pk1)
}
pk1 = res
}
### pois
px1 = dpois(0:n,sum(pr))
### CLT: normal approx
tmp = pnorm(0:n+0.5, sum(pr), sqrt(sum(pr*(1-pr))))
nx1 = c(tmp[1], diff(tmp[-n-1]), 1-tmp[n])
nx2 = dnorm(0:n, sum(pr), sqrt(sum(pr*(1-pr))))
plot(0:n, res, type='h', lwd=2, ylab='density')
points(0:n, dx1, pch=1, col=2)
points(0:n, px1, pch=2, col=3)
points(0:n, nx1, pch=3, col=4)
## curve(dnorm(x, sum(pr), sqrt(sum(pr*(1-pr)))), add=TRUE, col=4)
c(cor(res, dx1), cor(res, px1), cor(res,nx1), cor(res,nx2))
n=100; p0 = 0.25
[1] 0.9999203 0.9975972 0.9987090 0.9987035
n=100; p0 = 0.05
[1] 0.9999932 0.9998898 0.9890155 0.9899978
n=100; p0 = 0.5
[1] 0.9999095 0.9885093 0.9997475 0.9997452
## cond 1
pkj = function(k,j, pr){
if(k<=1){
res = j*pr[1] + (1-j)*(1-pr[1])
} else{
if((j>0)&(j<k)){
res = pr[k]*pkj(k-1,j-1,pr) + (1-pr[k])*pkj(k-1,j,pr)
} else{
if(j==0) res = prod(1-pr[1:k])
if(j==k) res = prod(pr[1:k])
}
}
res
}
cx1 = rep(0, n+1)
for(i in 0:n) cx1[i+1] = pkj(n,i,pr)
- Conjugate dists (3.6.3, P 141)
- \([X|p]\sim Binom(n,p), p\sim Beta(\alpha,\beta)\)
- marginal dist \(\Pr(X=k)=(n;k)/B(\alpha,\beta)\int_p p^{k+\alpha-1}(1-p)^{n-k+\beta-1}dp=(n;k)\frac{B(k+\alpha,n-k+\beta)}{B(\alpha,\beta)}\)
- \(E(X),Var(X)\): direct evaluation …
- \(E(X)=E[E(X|p)]=E[np]=n\alpha/(\alpha+\beta)\)
- \(Var(X)=Var[E(X|p)]+E[Var(X|p)]=Var(np)+E[np(1-p)]=(n^2-n)Var(p)+nE(p)[1-E(p)]=\cdots\)
- conditional dist \([p|X]\propto \Pr(p)\Pr(X|p)\propto p^{X+\alpha-1}(1-p)^{n-X+\beta-1}\)
- i.e., \([p|X]\sim Beta(X+\alpha,n-X+\beta)\)
- for \(\alpha=\beta=1\), \(p\) is uniform rv. and \([p|X]\sim Beta(X+1,n-X+1)\)
- if \([Y|p]\sim Binom(n_1,p)\), and \([X\perp Y|p]\)
- \(\Pr(Y=k|X)=\int_p\Pr(Y=k|p)\Pr(p|X)dp=(n_1;k)\frac{B(k+X+\alpha,n_1-k+n-X+\beta)}{B(X+\alpha,n-X\beta)}\)
- \(E(Y|X)=E[E[Y|p,X]]=E[E[Y|p]|X]=E[n_1p|X]=n_1\frac{X+\alpha}{n+\alpha+\beta}\)
- 2.6 Moment Generating Functions
- \(\phi(t)=E[e^{tX}]\) for r.v. \(X\)
- \(\phi'(t)=E[\nabla_te^{tX}]=E[Xe^{tX}]\). So \(\phi'(0)=E[X]\)
- In general, for \(n\)-th derivative, \(\phi^{(n)}(t)=E[X^ne^{tX}]\) and \(\phi^{(n)}(0)=E[X^n]\)
- Common dists
- Binom(n,p): \(\phi(t)=\sum_k e^{tk}(n,k)p^k(1-p)^{n-k}=(pe^t+1-p)^n\)
- so \(\phi'(t)=n(pe^t+1-p)^{n-1}pe^t\), and hence \(E[X]=\phi'(0)=np\)
- similarly \(E[X^2]=\phi''(0)=n(n-1)p^2+np\), and \(Var(X)=np(1-p)\)
- Poisson(\(\lambda\)): \(\phi(t)=\sum_n e^{tn}e^{-\lambda}\lambda^n/n!=e^{-\lambda}e^{\lambda e^t}\)
- \(\phi'(t)=\lambda e^te^{\lambda(e^t-1)}\), and \(E[X]=\phi'(0)=\lambda\)
- \(E[X^2]=\phi''(0)=\lambda^2+\lambda\), \(Var(X)=\lambda\)
- Exp(\(\lambda\)): \(\phi(t)=\int e^{tx}\lambda e^{-\lambda x}dx=\lambda/(\lambda-t), t<\lambda\)
- \(\phi'(t)=\lambda/(\lambda-t)^2, \phi''(t)=2\lambda/(\lambda-t)^3\)
- \(E[X]=1/\lambda, E[X^2]=2/\lambda^2, Var[X]=1/\lambda^2\)
- Standard normal \(N(0,1)\): \(\phi(t)=\int e^{tx-x^2/2}/\sqrt{2\pi}dx=e^{t^2/2}\int e^{-(x-t)^2/2}/\sqrt{2\pi}dx=e^{t^2/2}\)
- For \(X\sim N(\mu,\sigma^2)\): \(\phi(t)=E[e^{[t(X-mu)/\sigma]\sigma+t\mu}]=e^{t\mu+t^2\sigma^2/2}\)
- so \(E[X]=\mu,Var[X]=\sigma^2\)
- Ex 2.44-46: sum of independent rvs.
- sum of independent \(Binom(n_1,p), Binom(n_2,p)\)
- \(\phi(t)=(pe^t+1-p)^{n_1}(pe^t+1-p)^{n_2}\), i.e., \(Binom(n_1+n_2,p)\)
- sum of independent \(Poisson(\lambda_1), Poisson(\lambda_2)\)
- \(\phi(t)=e^{-\lambda_1}e^{\lambda_1 e^t}e^{-\lambda_2}e^{\lambda_2 e^t}\), i.e., \(Poisson(\lambda_1+\lambda_2)\)
- sum of independent \(N(\mu_1,\sigma_1^2), N(\mu_2,\sigma_2^2)\)
- \(\phi(t)=e^{t\mu_1+t^2\sigma_1^2/2}e^{t\mu_2+t^2\sigma_2^2/2}\), i.e., \(N(\mu_1+\mu_2,\sigma_1^2+\sigma_2^2)\)
- Poisson approx to Binomial
- For \(X\sim Binom(n,p)\) with small \(p\), let \(\lambda=np\),
- \(\Pr(X=k)=n!/k!/(n-k)!/n^k\lambda^k(1-\lambda/n)^{n-k}\)
- large \(n\) small \(p\): \(n!/(n-k)!/n^k\approx 1\), \((1-\lambda/n)^{n-k}\approx e^{-\lambda}\)
- so \(\Pr(X=k)\approx \lambda^k/k!e^{-\lambda}\), i.e., \(X\approx Poisson(\lambda)\)
- independent \(X_i\sim Bern(p_i)\), let \(X=\sum_{i=1}^nX_i\)
- \(E[e^{tX_i}]=1+p_i(e^t-1)\approx e^{p_i(e^t-1)}\) for \(p_i\) small
- so \(E[e^{tX}]\approx e^{\sum_ip_i(e^t-1)}\), i.e., \(X\approx Poisson(\sum_ip_i)\)
- Joint MGF: multivariate normal dist \(X=(X_1,\cdots,X_n)\)
- \(E[e^{\sum_it_iX_i}]=e^{\sum_it_iE(X_i)+\sum_i\sum_jt_it_jCov(X_i,X_j)/2}\)
- \(\sum_it_iX_i\sim N(\sum_it_i\mu_i, T'Cov(X)T)\), where \(T=(t_1,\cdots,t_n)'\)
- 2.8 Limit Theorems
- Markov's Inequality
- \(\Pr[X\geq a] \leq E[X]/a\) if rv \(X\geq 0\)
- \(E[X]=\int_0^a xf(x)dx + \int_a^{\infty}xf(x)dx\geq \int_{x\geq a} af(x)dx=a\Pr[X\geq a]\)
- Chebyshev's Inequality
- \(\Pr[|X-E(X)|\geq k]\leq Var(X)/k^2, k>0\)
- apply Markov's inequality to rv \((X-E(X))^2\)
- Given iid \(X_i\) with mean \(\mu\) and finite variance, let \(\bar{X}_n=\sum_{i=1}^nX_i/n\)
- weak law: \(\lim_{n\to\infty}\Pr(|\bar{X}_n-\mu|>\epsilon)=0, \forall\epsilon>0\) (convergence in probability)
- proof by Chebyshev's inequality, \(Var(\bar{X}_n)\to 0\)
- for large \(n\), \(\bar{X}_n\) is close to \(\mu\), but it's possible \(|\bar{X}_n-\mu|>\epsilon\) happens for
infinite number of n (though approaching zero measure)
- Strong Law of Large Numbers: with prob 1, \(\bar{X}_n\to \mu\) as \(n\to\infty\)
- \(\Pr(\lim_{n\to\infty}\bar{X}_n=\mu)=1\)
- \(\forall\epsilon>0,\exists N, st, |\bar{X}_n-\mu|<\epsilon, \forall n>N\)
- Central Limit Theorem
- given iid \(X_i\) with mean \(\mu\) and variance \(\sigma^2\)
- let \(Z_n=\sum_{i=1}^n(X_i-\mu)/\sigma/\sqrt{n}\)
- \(\Pr(Z_n\leq a)\approx \Phi(a)\), where \(\Phi()\) is the CDF of \(N(0,1)\)
- For Binom(n,p), sum of \(n\) iid Bern(p), \((X-np)/\sqrt{np(1-p)}\approx N(0,1)\)
- so \(\Pr(X=k)\approx \Phi[(k+0.5-np)/\sqrt{np(1-p)}]-\Phi[(k-0.5-np)/\sqrt{np(1-p)}]\)
- 2.9 Stochastic Processes
- A stochastic process \(\{X(t),t\in T\}\) is a collection of random variables.
- For each \(t\in T\), \(X(t)\) is a random variable.
- The index \(t\) is often interpreted as time.
- we refer to \(X(t)\) as the state of the process at time \(t\).
- \(T\) is called the index set of the process.
- When \(T\) is a countable set, the stochastic process is said to be a discrete-time process.
- If \(T\) is an interval of the real line, the stochastic process is said to be a continuous-time process.
- The state space of a stochastic process is defined as the set of all possible values that the random variables \(X(t)\) can assume.