Math 5763 - Nikola Petrov

MATH 5763 - Stochastic Processes, Section 990 - Fall 2015
Tue 1:30–4:20 p.m., Classroom Building 3104

Instructor: Nikola Petrov, 802 PHSC, (405)325-4316, npetrov AT math.ou.edu

Prerequisite: Basic calculus-based probability theory (including axioms of probability, random variables, expectation, probability distributions, independence, conditional probability). The class will also require knowledge of elementary analysis (including sequences, series, continuity), linear algebra (including linear spaces, eigenvalues, eigenvectors), and ordinary differential equations.

Course description: The theory of stochastic processes studies systems that evolve randomly in time; it can be regarded as the "dynamical" part of probability theory. It has many important practical applications, as well as in other branches in mathematics such as partial differential equations. This course is a graduate-level introduction to stochastic processes, and should be of interest to students of mathematics, statistics, physics, engineering, and economics. The emphasis will be on the fundamental concepts, but we will avoid using the theory of Lebesgue measure and integration in any essential way. Many examples of stochastic phenomena in applications and some modeling issues will also be discussed in class and given as homework problems.

Texts (all available free for OU students from the OU Library):

[L] M. Lefebvre, Applied Stochastic Processes, 1st edition, Springer, 2007, through the
[BZ] Z. Brzezniak, T. Zastawniak. Basic Stochastic Processes. Springer, 1999
[P] E. Parzen. Stochastic Proceses. SIAM, 1999
[D] R. Durrett. Essentials of Stochastic Processes. 2nd ed., Springer, 2012
[R] S. Ross. Introduction to Probability Models. 8th ed., Elsevier, 2003

Main topics (a tentative list):

a brief review of probability theory;
discrete Markov chains: Chapman-Kolmogorov equations, persistence and transience, generating functions, stationary distributions, reducibility, limit theorems, ergodicity;
continuous Markov processes: Poisson process, birth-death and branching processes, embedding of a discrete-time Markov chain in a continuous-time Markov processes;
conditional expectation, martingales;
stationary processes (autocorrelation function, spectral representation);
renewal processes, queues;
diffusion processes, Wiener processes (Brownian motion);
introduction to stochastic differential equations, Itô calculus;
Fokker-Planck equation, Ornstein-Uhlenbeck process.

Homework:

Homework 1, due by 4:30 p.m. on Wednesday, September 2, in Mrs. Wagenblatt's office.
Homework 2, due by 4:30 p.m. on Wednesday, September 9, in Mrs. Wagenblatt's office.
Homework 3, due by 1:30 p.m. on Tuesday, September 15, in class.
Homework 4, due by 4:30 p.m. on Wednesday, September 23, in Ms. Arnett's office.
Homework 5, due on Tuesday, October 6, in class.
Homework 6, due by 4:30 p.m. on Wednesday, October 21, in Mrs. Wagenblatt's office.
Homework 7, due by 12:00 p.m. on Friday, October 30, in Mrs. Wagenblatt's office.
Homework 8, due by 12:00 p.m. on Thursday, November 12, in Mrs. Wagenblatt's office.
Homework 9, due by 12:00 p.m. on Tuesday, November 24, in Mrs. Wagenblatt's office.

Content of the lectures:

Lecture 1 (Tue, Aug 25): Review of probability: sample space Ω, events as subsets of the sample space, elementary events as elements of the sample space, operations with events (complement, union, intersection, difference, symmetric difference, subset, impossible event); σ-algebra (σ-field), examples; De Morgan's laws, disjoint events, distributivity properties of intersection and union; probability (probability measure) P, probability space, elementary properties of probability measures (including the inclusion-exclusion formula); conditional probability P(A|B), properties of conditional probability, partitions of the sample space, law of total probability, Bayes' formula, independent events, independent family of events, pairwise independent family of events; an example of using conditioning [pages 1-3, 5-8 of Sec. 1.1 of [L]]
Random variables: random variables (RVs), (cumulative) distribution function (c.d.f.) F_X(x) of a RV X, properties of c.d.f.s; discrete RVs, probability mass function (p.m.f.) p_X(x) of a discrete RV, important discrete RVs (Bernoulli, binomial, Poisson, geometric) [pages 8, 9, 11, 12 of Sec. 1.2 of [L]]

Lecture 2 (Tue, Sep 1): Random variables (cont.): continuous RVs; p.d.f. ƒ_X(x) of a continuous RV; important continuous RVs (uniform, exponential, normal/Gaussian, standard normal); p.d.f. ƒ_Y(y) of a function Y=g(X) of the RV X; conditional c.d.f. F_X(x|A) of a RV X conditioned on an event A; conditional p.m.f. p_X(x|A) or p.d.f. ƒ_X(x|A) of a RV X conditioned on an event A; expectation E[X] of a RV X; expectation E[g(X)] of a function of a RV X; rth moment E[X^r] of a RV X for r=0,1,2,...; variance Var X and standard deviation σ_X=(Var X)^1/2 of a RV X; properties of E[X] and Var X; conditional expectation E[X|A], conditional moments E[X^r|A] and conditional variance and Var(X|A) of a RV X given an event A; example; characteristic function C_X(ω) and moment-generating function M_X(t) of a RV X, expressing the E[X^r] as a derivative of M_X(t) [pages 10, 11, 13-20 of Sec. 1.2 of [L]]
Random vectors: definition; (joint) c.d.f. F_X,Y(x,y) of a random vector X=(X,Y); properties of F_X,Y(x,y); marginal c.d.f.s F_X(x)=F_X,Y(x,∞) and F_Y(y)=F_X,Y(∞,y); marginal p.m.f.s p_X(x) and p_Y(y), respectively p.d.f.s ƒ_X(x) and ƒ_Y(y), of a random vector (X,Y); independence of the components of a random vector; proof that if the RVs X and Y are independent, then M_X+Y(t)=M_X(t)M_Y(t); conditional c.d.f. F_X|Y(x|Y=y_m) and conditional p.m.f. p_X|Y(x_k|Y=y_m) of the discrete RV X conditioned on the discrete RV Y; conditional c.d.f. F_X|Y(x|y) and conditional p.m.f. ƒ_X|Y(x|y) of the continuous RV X conditioned on the continuous RV Y; conditional expectation E[X|Y=y] of the RV X conditioned on the RV Y; the conditional expectation E[X|Y=y] depends only on y (but not on X), so it can be considered as a function of Y, therefore we can think of the conditional expectation E[X|Y] as a new random variable which is a function of the RV Y: namely, E[X|Y]:Ω→R is defined as the value of E[X|Y](ω) is defined as E[X|Y=y], where y=Y(ω); tower rule E[E[X|Y]]=E[X] [pages 21-27 of Sec. 1.3 of [L]]

Lecture 3 (Tue, Sep 8): Stochastic processes: definition of a stochastic process (random process); classification of random processes: discrete-time or continuous-time, discrete-state space or continous-state space [pages 47-48 of Sec 2.1 of [L]]
Markov chains - introduction: Markov property; Markov chain (MC); example: simple 1-dimensional random walk (RW), symmetric simple 1-dim RW; the future of a MC depends only on the most recent available information (Prop. 3.1.1); more examples: 2-dimensional, and d-dimensional RWs, Ehrenfests' urn model, birth-death processes [pages 73-75 of Sec. 3.1 of [L]]
Discrete-time Markov chains - definitions and notations: time-homogeneous discrete-time discrete-state space MCs; stationary (time-homogeneous) MCs; one-step and n-step transition probabilities; one-step transition probability matrix P of a MC; stochastic and doubly-stochastic matrices; n-step transition probability matrices P⁽ⁿ⁾; Chapman-Kolmogorov in matrix form (P^(m+n)=P^(m)P⁽ⁿ⁾) and in components; corollary: P⁽ⁿ⁾=Pⁿ; an example of a MC with 2 states; probability ρ_ij⁽ⁿ⁾ of visiting state j for the first time in n steps starting from state i; probability ρ_ii⁽ⁿ⁾ of first return to state i in n steps; representation of p_ij⁽ⁿ⁾ as a sum over k from 1 to n of ρ_ij^(k)p_jj^(n−k); examples of direct computation ρ_ij⁽ⁿ⁾ for a simple MC; initial distribution (p.m.f.) a=(a₀,a₁,a₂,...), a_i=P(X₀=i), of a MC; distribution (p.m.f.) a⁽ⁿ⁾=(a₀⁽ⁿ⁾,a₁⁽ⁿ⁾,a₂⁽ⁿ⁾,...), a_i⁽ⁿ⁾=P(X_n=i), of a MC at time n; formula for evolution of the probability distribution: a⁽ⁿ⁾=aPⁿ; examples: simple 1-dim random walk on Z, simple 1-dim random walk on Z₊ with reflecting and absorbing boundary condition at 0, a Markov chain coming from sums of i.i.d. random variables (read Example 2 on page 85) [Sec. 3.2.1 of [L]]
Properties of Markov chains: accessibility of state j from state i, i→j; communicating states i↔j; properties of the relation ↔ (reflexivity, symmetry, transitivity), ↔ as an equivalence relation, equivalence classes with respect of ↔, closed sets of the state space (Def. 3.2.7) [pages 85-86 of Sec. 3.2.2 of [L]]

Lecture 4 (Tue, Sep 15): Properties of Markov chains (cont.): irreducible MCs; irreducibility criteria; examples; absorbing states; probability ƒ_ij of eventual visit of state j starting from state i; probability ƒ_ii of eventual return to state i; expressing ƒ_ij as a sum of the first visit probabilities ρ_ij⁽ⁿ⁾; recurrent (persistent) and transient states; Decomposition Theorem; an example of identifying closed irreducible sets of recurrent states and sets of transient states, and structure of the stochastic matrix; a necessary and sufficient criterion of recurrence of state i in terms of the expected value E[N_i] of the number N_i of returns to this state (Prop. 3.2.3); a necessary and sufficient criterion of recurrence of state i in terms of a sum of the (ii)th matrix element of P⁽ⁿ⁾ over n (Prop. 3.2.4); recurrence is a class property (Prop 3.2.5); average number μ_i of transitions for first return to state i; positive recurrent and null-recurrent states; criterion for null-recurrence; type of recurrence (positive or null) is a class property; recurrent states of a finite MC are positive recurrent; periodic and aperiodic states; remarks about periodicity; examples; simple random walk on Z: computing the number of itineraries by using combinatorial arguments, Stirling's formula, recurrence in the symmetric case (p=1/2) and transience otherwise [pages 86-93 of Sec. 3.2.2 of [L]]
Limiting probabilities: limiting probabilities π_i, limiting probability distribution π=(π₀,π₁,π₁,...), ergodic states, Ergodic Theorem (giving conditions for existence and uniqueness of a limiting probability distribution, relation between π_i and μ_i, and an algorithm for computing π) [pages 94-95 of Sec. 3.2.3 of [L]]
Mathematical digression: computing high powers of a matrix by diagonalizing it first; general methods for solving linear recurrence relations.

Lecture 5 (Tue, Sep 22): a recap of the Ergodic Theorem, examples showing the importance of the aperiodicity (otherwise the limit of p_ij⁽ⁿ⁾ as n→∞ may not exist), the irreducibility (otherwise the stationary distribution may not be unique), and the recurrence (otherwise the average number μ_j of transitions for first return to state i will be infinite, so that the relation π_j=1/μ_j will not make sense); an example of identifying the closed irreducible sets of recurrent states and of the sets of transient states of a MC, and structure of the stochastic matrix of a MC; stationary distribution of a doubly stochastic irreducible aperiodic MC with a finite state space (Prop. 3.2.6); simple RW on {0,1,2,...} with a partially reflecting boundary - obtaining the stationary distribution π when the probability of moving to the right is smaller 1/2, and showing that a stationary distribution does not exist when the probability of moving to the right is greater than 1/2 (exercise: study this MC when the probability of moving to the right is equal to 1/2) [pages 95-100 of Sec. 3.2.3 of [L]]
Absorption problems: definition of the probability r_i⁽ⁿ⁾(C) of absorption by the closed subset C of the state space S after exactly n steps (starting from state i); definition of the probability r_i(C) of eventual absorption by the closed subset C of the state space S (starting from state i); a theorem giving r_i(C) in terms of the (p_ij) (Theorem 3.2.2.); an example: the gambler's ruin problem; martingales [Sec. 3.2.4 of [L]]
Continuous-time discrete-state space MCs: definition of a continuous-time discrete-state state MCs; Markov property; transition functions p_ij(s,t)=P(X_t=j|X_s=i) for t>s; stationary (time-homogeneous) MCs - for which p_ij(s,t)=p_ij(0,t−s), notation: p_ij(t)=p_ij(0,t)=P(X_t=j|X₀=i) for t>0; a discrete-time MC {Y_n}_{n∈{0,1,2,...}} embedded in the continuous-time MC {X_t}_t≥0; irreducibility; analogue of the condition of being a stochastic matrix for p_ij(t) (the sum over j is 1); evolution of the occupation probabilities p_j(t)=P(X_t=j) expressed in terms of the initial occupation probabilities p_i(0) and the transition probabilities p_ij(t); Chapman-Kolmogorov equations; discussion of the meaning of the memorylessness properties of the geometric and the exponential random variables (Prop. 3.3.1) [pages 121-123 of Sec. 3.3.2 and pages 109-110 of Sec. 3.3.1 of [L]]

Lecture 6 (Tue, Sep 29): Continuous-time discrete-state space MCs (cont.): exponential random variable: definition, proof that it is memoryless, moment-generating function, other properties (Prop. 3.3.4 and the remarks after it, Prop. 3.3.5) [pages 109-111, 112-114 of Sec. 3.3.1 of [L]]
Poisson process: counting process; "little o(h)" notation, examples; definition of a Poisson process N as a nondecreasing process with N(0)=0, certain short-time transition probabilities p_ij(h) (for small h), and independence of events occurring at a later time interval from the events occurring at a non-overlapping earlier time interval; derivation of the distribution of N(t) for a Poisson process N by deriving an initial-value problem for an infinite system of ODEs for p_ij(t) and solving the system by induction and by the method of generating functions; an "elementary" way of deriving that N(t)∼Poisson(λt) by dividing the interval [0,t] into a large number n of short intervals of length t/n and applying the binomial distribution to the distribution of the events k events in the n short intervals [loosely following pages 231, 232, 236 of Sec. 5.1 of [L]]
Poisson process and distribution of interarrival times: independence and exponential distribution of the interarrival times τ_j of a Poisson process (Prop. 5.1.3); arrival times T_j as sums of interarrival times; basic properties of the Γ(α,λ) random variables; the sum of n i.i.d. Exp(λ) random variables is a Γ(n,λ) random variable (Prop. 3.3.6); reconstructing a Poisson process from the interarrival times τ_j [pages 237-240 of Sec. 5.1, pages 115-119 of Sec. 3.3.1 of [L]]
Miscellaneous facts about Poisson processes (optional): a sum of independent Poisson processes is a Poisson process (Prop. 5.5.1); decomposition of Poisson processes (Prop. 5.5.2); distribution of T₁ given that N(t)=1 (Prop. 5.1.5); generating a Poisson process by using a generator of uniform random variables (Example 5.1.2); first occurrence of two or more independent Poisson processes (∼Prop. 3.3.4, Prop. 3.3.5) [pages 234-236, 239, 240 of Sec. 5.1, pages 113, 114 of Sec. 3.3.1 of [L]]

Lecture 7 (Tue, Oct 6): Continuous-time discrete-state space MCs (cont.): stochastic semigroup {P_t}_t≥0; standard semigroups; generator G=(ν_ij):=(dP_t/dt)|_t=0 of a stochastic semigroup; properties of G (the sum of the elements ν_ij in each row of G is zero); holding time U_i of the ith state; proof that U_i is an Exp(−ν_ii) random variable; discussion of the meaning of ν_ij expressed in terms of the rates ν_i (such that the probability of leaving state i in a time interval of length h is ν_ih+o(h)) and the matrix elements γ_ij of the 1-step transition probability matrix of the jump chain {Y_n}_n≥0; obtaining the 1-step transition probability matrix (γ_ij) of the jump chain from the generator G; obtaining P_t from G: Kolmogorov forward and backward equations P_t'=P_tG, resp. P_t'=GP_t, initial condition P_t|_t=0=I; definition of exponential of a matrix e^A; main properties of e^A: e^Ae^A=e^AB, de^tA/dt=Ae^tA; computing e^A by simplifying A by a similarity transformation, e.g., A=C⁻¹DC for a diagonal matrix D, and using that Aⁿ=C⁻¹DⁿC to show that e^A=C⁻¹e^DC; expressing the solution of the initial value problem x'(t)=Ax(t), x(0)=x⁽⁰⁾ (for a constant matrix A) as x(t)=e^tAx⁽⁰⁾; expressing the stochastic semigroup P_t through the generator G: P_t=e^tG; computing P_t for a continuous-time, two-state MC; remarks on the Laplace transform and its usage to solve initial-value problems for ODEs; birth process; computing the expectation of the time T_n for a birth process starting at X₀=1 to reach X_t=n for the first time; solving the Kolmogorov forward equations for the birth process by using generating functions G_i(ξ,t); using the generating function G_i(ξ,t) to prove that the birth process is honest (G_i(1,t)=1), to compute the conditional average of the birth process given that X(0)=k as E[X(t)|X(0)=k]=(∂G_k(ξ,t)/∂ξ)|_ξ=1 and the conditional variance Var(X(t)|X(0)=k); another way of computing the conditional expectation E[X(t)|X(0)=k] [roughly following Sec. 3.3.3, and pages 129-133 of Sec. 3.3.4 of [L]]

Lecture 8 (Tue, Oct 13): Limiting probabilities and balance equations: stationary distribution π of a stochastic semigroup P_t; reason for the term "stationary distribution": if P(X(0)=i)=π_i where π is a stationary distribution, then P(X(t)=j)=π_j for all j∈S and all t≥0; recurrence time T_ii, mean recurrence time μ_ii=E[T_ii]; recurrent and transient states, positive recurrent and null recurrent states of a continuous-time Markov chain; irreducible Markov chains; Ergodic Theorem for continuous-time Markov process, remarks; relation between the stationary distribution π_j, the rate ν_j of leaving state j (where the holding time for state j is U_j∼Exp(ν_j)), and the mean recurrence time μ_ii=E[T_ii]; finding stationary distributions from the generator: πG=0; balance equations and their interpretation [pages 138-140 of Sec. 3.3.5 of [L]]
Birth and death processes: birth-death-immigration-disaster process; detailed derivation of the short-time transition probabilities p_ij(h) (hence, the generatorν_ij) of a death-immigration process; proving that the stationary distribution of a death-immigration process is Poisson(ρ/μ), i.e., π_j=e^−ρ/μ(ρ/μ)^j/j! [roughly following pages 135, 136 of Sec. 3.3.4 of [L]]
Nonhomogeneous Poisson processes: nonhomogeneous Poisson process with intensity function λ(t); the number of arrivals N(s+t)−N(s) of a nonhomogeneous Poisson process with intensity function λ(t) is Poisson(m(s+t)−m(s)), where m(t) is the mean value function of the process (m(0)=0, m'(t)=λ(t)) [pages 250, 252 of Sec. 5.2 of [L]]
Reading assignment (mandatory): distribution of the p.d.f. of the first arrival time T₁ of a nonhomogeneous Poisson process {N(t)}_t≥0 given that N(t₁)=1 (for some fixed t₁>0) (Prop. 5.5.2) [pages 21, 22 of the lecture notes from Lecture 8, page 253 of [L]]

Lecture 9 (Tue, Oct 20): Nonhomogeneous Poisson processes (cont.): "homogenizing" a nonhomogeneous Poisson process N(t) (with a strictly positive rate function λt>0) by rescaling the time: M(t)=N(m⁻¹(t)) is a Poisson process with rate 1 [pages 253, 254 of Sec. 5.2 of [L]]
Compound Poisson processes: compound random variable; derivation of the mean, variance, and moment generating function of a compound random variable (Prop. 5.3.1); definition of a compound Poisson process; mean, variance, and moment generating function of a compound Poisson process; approximating the distribution of a compound Poisson process for large times by using the Central Limit Theorem (Prop. 5.3.2); the sum of two independent compound Poisson processes Y₁(t) and Y₂(t) corresponding to Poisson processes N₁(t) and N₂(t) with rates λ₁ and λ₂ is a compound Poisson process corresponding to a Poisson process with rate λ₁+λ₂ [Sec. 5.3]
Doubly stochastic Poisson processes: definition of a conditional (or "mixed") Poisson process (whose rate is a random variable, independent of time); proof that a conditional Poisson process has stationary, but not independent increments (Prop. 5.4.1); best estimator of the rate of a Poisson process; definition of a doubly stochastic Poisson process ("Cox process") [pages 258-260, 262 of Sec. 5.4 of [L]]
Renewal processes: definition of a renewal process; modified ("delayed") renewal process (when the distribution of τ₀ differs from the distributions of τ₁, τ₂,...); relations between the process N(t), the times of the events T_n, and the interevent times τ_n; expression for the p.m.f. of N(t) in terms of the c.d.f. of T_n (Prop. 5.6.1); renewal function m(t)=E[N(t)]; expression for the renewal function m(t) in terms of the c.d.f.'s of T_n (Prop. 5.6.2) [pages 267-269 of Sec. 5.6 of [L]]

Mathematical digression

Lecture 10 (Tue, Oct 27): Mathematical digression (cont.): expected value of an N-valued random variable X as a sum (over n from 1 to infinity) of probabilites of X to be greater or equal to n; expected value of a non-negative continuous random variable X as an integral of [1−F_X(x)], geometric meaning.
Renewal processes (cont.): recursive formula for the c.d.f.'s of the arrival times T_n in terms of the p.d.f. F_τ of the inter-arrival times τ_j through Riemann-Stieltjes integrals; derivation of an integral equation for the renewal function m(t)=E[N_t]; solving renewal-type equations by using Laplace transform; another derivation of the formula for the renewal function m(t) by performing Laplace transformation on the formula representing m(t) as a sum of the c.d.f.s of all the T_n's; computing the renewal function m(t) for a Poisson process in three ways: (1) by using the fact that N(t) is a Poisson(λt) random variable, (2) by expressing it as a sum of the c.d.f.'s of T_n (Prop. 5.6.2), (3) by solving the integral equation for m(t) using Laplace transform; the moment-generating function M_X of a (0,∞)-valued random variable X:Ω→(0,∞) is equal to the Laplace-Stieltjes transform of the c.d.f. F_X of X and, if the random variable X is continuous, equal to the Laplace transform of the p.d.f. ƒ_X of X.
Queues: set-up of the problem, examples of queues (queues with baulking, multiple servers, airline check-in, FIFO, LIFO, group servise, "student discipline", "continental queueing"); A/S/s/c/p/D classification of the queues, where A and S are deterministic (D), Markovian (M - with exponentially distributed interrarival/service times), Γ (or Erlang), or general (G) distributions, s is the number of servers, c is the capacity of the system, p is the size of the population, D is the discipline (i.e., service policy); stability of a queue; a detailed solution of the M(λ)/M(μ)/1 queue: obtaining the stationary distribution π (and the condition for existence of a stationary distribution), computing the probability that the waiting time W will be zero, the conditional average of W given that there are j customers in the queue, and the average E[W] (both averages for large t so that the stationary distribution has been reached); M(λ)/G/1 queue - constructing of a discrete-time Markov chain embedded in the queueing process and derivation of the transition probability matrix of this Markov chain.

Lecture 11 (Tue, Nov 3): General properties of stochastic processes: cumulative distribution function F(x₁,...,x_k;t₁,...,t_k)=F_{X(t₁),...,X(t_k)}(x₁,...,x_k), probability mass function p(x₁,...,x_k;t₁,...,t_k)=p_{X(t₁),...,X(t_k)}(x₁,...,x_k), and probability density function ƒ(x₁,...,x_k;t₁,...,t_k)=ƒ_{X(t₁),...,X(t_k)}(x₁,...,x_k) of order k of a stochastic process X={X_t:t∈[0,∞)}; mean m_X(t)=E[X_t], autocorrelation function R_X(t₁,t₂)=E[X_t₁X_t₂], autocovariance function C_X(t₁,t₂)=R_X(t₁,t₂)−m_X(t₁)m_X(t₂), variance var X(t)=C_X(t,t), and autocorrelation coefficient ρ_X(t₁,t₂) of a stochastic process X; processes with indepent increments; processes with stationary increments; strict-sense stationary (SSS, strongly stationary) processes; wide-sense stationary (WSS, weakly stationary) processes; an example of a WSS stochastic process that is not SSS; average power E[X_t²] of a stochastic process; E[X_t²] of a WSS stochastic process does not depend on t; spectral density S_X(ω) of a WSS process; properties of S_X(ω) [Sec. 2.1 and 2.2 of [L]]
Gaussian and Markov processes: multinormal distribution of a random vector X=(X₁,...,X_n)∼N(m,K), vector of the means m, covariance matrix K=(cov(X_i,X_j)); characteristic function φ_X(ω)=E[exp(iωX)] of a random variable X, (joint) characteristic function φ_X(ω)=E[exp(iω⋅X)] of a multinormal random variable X (Prop. 2.4.1); if two components of X=(X₁,...,X_n)∼N(m,K) are uncorrelated, then they are independent; Gaussian process {X_t} - a continuous-time stochastic process with (X_t₁,...,X_{t_n}) being multinormal for any n and times t₁,...t_n; if {X_t} is a Gaussian process such that its mean m_X(t) does not depend on t and its autocovariance function C_X(t₁,t₂) depends only on t₂−t₁, then the process is SSS (Prop. 2.4.2); definition of a Markov (or Markovian) processes, examples (random walk, Poisson process); (first-order) density function ƒ(x;t)=ƒ_X(t)(x); conditional transition density function p(x,x₀;t,t₀)=ƒ_X(t)|X(t₀)(x|x₀); integrals of ƒ(x;t) and p(x,x₀;t,t₀) over x are equal to 1; expressing ƒ(x;t) as in integral of ƒ(x₀;t₀)p(x,x₀;t,t₀) over x₀; more on the meaning of the p.d.f. of a continuous RV: P(X∈(x,x+Δx])≈ƒ_X(x)Δx, generalization for jointly continuous random vectors P(X∈A)≈ƒ_X(x)vol(A) where A is a small domain in R^k containing x; application to kth order p.d.f.'s of a random process: P(X_t₁∈(x₁,x₁+Δx₁],...,X_{t_k}∈(x_k,x_k+Δx_k])≈ƒ_{(X_t₁,...,X_{t_k})}(x₁,...,x_k)Δx₁...Δx_k; Chapman-Kolmogorov equations for the conditional transition density function p(x,x₀;t,t₀)=ƒ_{X_t|X_t₀}(x|x₀) [pages 58-63 of Sec. 2.4 of [L]]
A digression on generalized functions (distributions): test functions (infinitely smooth compactly supported functions); Dirac δ-function δ_a defined by δ_a(ƒ):=ƒ(a); derivatives of generalized functions - defined by applying integration by parts, treating the generalized function as a regular function and using that for a test function ƒ lim_x→∞ƒ(x)=0 and lim_x→−∞ƒ(x)=0 (because ƒ has compact support): this gives us that integral of ƒ times the kth derivative ξ^(k) of a generalized function ξ is equal to (−1)^k times integral of ξ times ƒ^(k), which symbolically can be written as ξ^(k)(ƒ):=(−1)^kξ(ƒ^(k)); following this recipe, the derivatives of δ_a defined by δ_a'(ƒ):=−ƒ'(a), δ_a''(ƒ):=(−1)²ƒ''(a), and in general δ_a^(k)(ƒ):=(−1)^kƒ^(k)(a); example: generalized derivative of the Heaviside (unit step) function: H_a'=δ_a.

Lecture 12 (Tue, Nov 10): A digression on generalized functions (distributions) (cont.): interpretation of generalized functions as a "rough" signal, and of the test function as a "smoothing" function corresponding to the "smearing" due to the experimental device.
The Wiener process (Brownian motion): normal (Gaussian) random variables N(μ,σ²) - p.d.f., mean, variance, characteristic function, standard normal random variable Z∼N(0,1) which can be obtained from Z∼N(μ,σ²) as Z=(X−EX)/σ_X; computing the characteristic function of a symmetric simple random walk on the state space ηZ={...,−2η,−η,0,η,2η,...}, with jumps (of size η) occurring as a Poisson process with rate λ/2, proof that in the limit η→0, λ→∞, λη²=1 X_t∼N(0,t); definition of Brownian motion/Wiener process W_t∼N(0,σ²t) and a standard Wiener process B_t∼N(0,t); the Wiener process as a limit of simple random walk; historical remarks (Robert Brown, Albert Einstein, Marian Smoluchowski, Norbert Wiener, Andrey Kolmogorov); p.d.f. of order k of a Wiener process; moments of W_t: mean E[W_t]=0, autocovariance function C_W(t,s)=E[W_tW_s]=σ²min(t,s), autocorrelation function R_W(t,s)=E[W_tW_s]=σ²min(t,s); proof that tW_1/t is a Brownian motion (Example 4.1.2) [pages 173-179 of Sec. 4.1]

Lecture 13 (Tue, Nov 17): σ-algebras and probability measures: sample space Ω; outcome (elementary event) ω - an element of Ω; σ-algebra F of subsets of Ω; event - an element of F; examples; Borel σ-algebra B(R) of subsets of R; sub-σ-algebra G⊆F of F; σ-algebra σ(A₁,A₂,...) generated by a collection A₁, A₂,... of subsets of Ω; examples; probabiliy measure P:F→[0,1] on (Ω,F); Lebesgue measure L:B([0,1])→[0,1] on [0,1] defined by L((a,b))=b−a; probability space (Ω,F,P); F-measurable functions X:Ω→R - for which {X∈B}∈F for all B∈B(R); random variable on (Ω,F) - an F-measurable function X:Ω→R; an example: a σ(∅)-measurable function is a constant function; more examples; σ-algebra σ(X) generated by a random variable X; σ-algebra σ(F₁,...,F_n) generated by a by a collection of σ-algebras; σ-algebra σ(X₁,...,X_n) generated by a family of random variables X₁,...,X_n; filtration F₁⊆F₂⊆F₃⊆... of σ-algebras generated by a sequence X₁,X₂,X₃,... of functions X_k:Ω→R, where F_k=σ(X₁,...,X_k); example: filtration of σ-algebras generated by a sequence of coin tosses; distribution (cumulative distribution function, c.d.f.) F_X(x)=P(X≤x)=P({ω∈Ω:X(ω)≤x}) of a random variable X; expectation E[X] of a random variable X as an integral over R of x dF_X(x) or, equivalently, as an integral over Ω of X(ω) P(dω); integrable (L¹-) random variables (for which E[|X|]<∞).
Conditional expectation and martingales: conditional expectation E[X|A] of a random variable X conditioned on an event A; conditional expectation E[X|F] of a random variable X conditioned on a σ-algebra F; conditional expectation E[X|Y] of a random variable X conditioned on another random variable Y; discussion of the meaning of the filtration F₁⊆F₂⊆F₃⊆... with F_n=σ(X₁,...,X_n) in the context of "coin tossing" (where X_n is the result of the nth toss) - F_n represents our knowledge at time n; a sequence Y₁,Y₂,... of random variables adapted to the filtration F₁⊆F₂⊆F₃⊆... - each Y_n is F_n-measurable (i.e., can be determined from the values of the random variables X₁,...,X_n generating the σ-algebra F_n=σ(X₁,...,X_n)); an example - the running averages S_n; martingale M₁,M₂,... with respect to a filtration F₁⊆F₂⊆F₃⊆... - a sequence of L¹-random variables adapted to the filtration and such that E[M_n+1|F_n]=M_n; an example of a martingale - the positions Y₁,Y₂,... of a particle in a simple symmetric random walk; a continuous-time example of a martingale - for a Poisson process {N_t}_t≥0 with intensity λ, M_t=N_t−λt is a martingale.
Lecture 14 (Tue, Nov 24): Conditional expectation and martingales (cont.): example - exponential martingale exp(αB_t−α²t/2), obtaining a family of polynomial martingales from the Taylor expansion of the exponential martingale with respect to α around α=0: exp(αB_t−α²t/2)=1+B_tα+(1/2)(B_t²−t)α²+(1/6)(B_t³−3tB_t)α³+(1/24)(B_t⁴−6tB_t²+3t²)α⁴+(1/120)(B_t⁵−10tB_t³+15t²B_t)α⁵+...
The Wiener process (Brownian motion) (cont.): a brief review - definition of W_t and B_t, increments, moments (E[W_t^odd power]=0, Var W_t=E[W_t²]=σ²t, E[W_t⁴]=3σ⁴t², E[W_t⁶]=15σ⁶t³,...), autocorrelation and autocovariance functions, probability density function ƒ(x₁,...,x_k;t₁,...,t_k)=ƒ_{W(t₁),...,W(t_k)}(x₁,...,x_k) of order k expressed as product of the p.d.f. ƒ(x₁;t₁)=ƒ_W(t₁)(x₁) and the conditional p.d.f.'s ƒ_{W(t_j)|W(t_j−1)}(x_j|x_j−1) for j=2,...,k, explicit expressions for all p.d.f.'s; short-time behavior: for Δt>0 and ΔB_t:=B_t+Δt−B_t, computing E[ΔB_t^odd power]=0 and E[(ΔB_t)²]=1/Δt→∞ as Δt→0⁺, nondifferentiability of the Brownian motion; Gaussian white noise ξ_t:=dB_t/dt; making sense of the derivative dB_t/dt as a (random) generalized function acting on a test function φ, definition of a functional Ξ(φ) as an integral of ξ_tφ(t) over t from 0 to ∞ (meaning; a measurement "smeared by φ"), and a proof that E[Ξ(φ)]=0 and that E[Ξ(φ)²] equals integral of φ(t)² over t from 0 to ∞, interpreting these facts as E[ξ_t]=0 and that E[ξ_tξ_s]=δ(t−s).
Stochastic differential equations (SDEs): the standard Brownian motion can be considered as the solution of the initial value problem dB_t/dt=ξ_t, B₀=0 for the unknown function B_t whose evolution is driven by Gaussian white noise ξ_t; on meaning of an SDE - computing the transition probability density ƒ_{B_t|B_s}(x|y)=ƒ(x,y|t,s) for 0≤s<t as a solution of an initial-value problem for a partial differential equation, e.g., ∂_tƒ(x,y|t,s)=(1/2)∂_xxƒ(x,y|t,s), limit of ƒ(x,y|t,s) as t→s⁺ equals δ(x−y); a generalization: dX_t/dt=ƒ(t,X_t)+g(t,X_t)ξ_t; discretization by using the values at the left end: ΔX_t≈ƒ(t,X_t)Δt+g(t,X_t)ΔB_t, X_t+Δt=X_t+ΔX_t (similar to the Euler method for integration of ODEs), main reason for using this discretization - the increment ΔB_t is independent of the value of X_t and B_t; Itô integrals as a limit (in some sense) of left Riemann sums.

Lecture 15 (Tue, Dec 1): Stochastic differerential equations and Itô integrals: using left Riemann sums to approximate the solution of the SDE dX_t=ƒ(t,X_t)dt+g(t,X_t)dB_t; definition and examples of of L¹-limit and m.s.-limit (mean-square limit, L²-limit) of series of functions; definition of Itô integral as a m.s.-limit of the left Riemann sums ∑_i g(t_i,X_i) ΔB_i; useful facts for calculations: E[(ΔB_i^odd power)]=0, E[(ΔB_i)²]=Δt_i, E[(ΔB_i)⁴]=3(Δt_i)², E[g(t_i,B_i)ΔB_i]=0, E[g(t_i,B_i)(ΔB_i)²]=E[g(t_i,B_i)]Δt_i, E[(ΔB_i)^k(ΔB_j)^m]=E[(ΔB_i)^k]E[(ΔB_j)^m] for i≠j; computing the Itô integral from t₀ to t of B_sdB_s; writing the result about the integral in the form d(B_t²)=2B_tdB_t+dt; similar result for d(B_t^k) for k=3,4,5,...; Itô formula for dΨ(t,X_t) where Ψ(t,x) is a function of two variables and X_t satisfies the SDE dX_t=ƒ(t,X_t)dt+g(t,X_t)dB_t, mnemonic rules for deriving the formula; remarks on the meaning of the solution X_t of a SDE; non-anticipating functions and expectation of Itô integrals; properties of Itô integrals; correlation formula; Itô isometry; example - simple population growth at a noisy rate: dX_t/dt=(r+αξ_t)X_t or, equivalently, dX_t=rX_tdt+αX_tdB_t, obtaining the solution X_t=X₀e^{(r−α²/2)t+αB_t} by using Itô formula, computing the average E[X_t]=E[X₀]e^rt, discussion of the behavior of the solutions for r>α²/2 and for r<α²/2, remarks about the interpretation of the numerical simulations of the SDE.

Lecture 16 (Tue, Dec 8): Stochastic differerential equations and Itô integrals (cont.): using the exponential martingale to analyze the average of the population in the problem of simple population growth at a noisy rate ("geometric Brownian motion"), computing the variance of the population at time t; example: linear growth with noise that is proportional to the population: dX_t=dt+X_tdB_t, solving the problem by "integration by parts" in stochastic calculus, d(X_tY_t)=X_tdY_t+Y_tdX_t+(dX_t)(dy_t), computing the mean and the variance of the solution; meaning and derivation of the Fokker-Planck equation for the conditional transition density function p(x,x₀;t,t₀); solution of the Fokker-Planck equation for the standard Brownian motion, physical interpretation of the solution; solution of the Fokker-Planck equation for the geometric Brownian motion (lognormal distribution); idea of Stratonovich integral.

Grading: Your grade will be determined by your performance on the following coursework:

Homework (lowest grade dropped) 50%

Take-home midterm exam 20%

Take-home final exam 30%

Homework: Homework assignments will be given regularly throughout the semester and will be posted on this web-site. The homework will be due at the start of class on the due date. Each homework will consist of several problems, of which some pseudo-randomly chosen problems will be graded. Your lowest homework grade will be dropped. All homework should be written on a 8.5"×11" paper with your name clearly written, and should be stapled. No late homework will be accepted!

You are encouraged to discuss the homework problems with other students. However, you have to write your solutions clearly and in your own words - this is the only way to achieve real understanding! It is advisable that you first write a draft of the solutions and then copy them neatly. Please write the problems in the same order in which they are given in the assignment. There is no need to type the homework, but please use your best handwriting!

Exams: There will be one take-home midterm and a comprehensive take-home final. All tests must be taken at the scheduled times, except in extraordinary circumstances. Please do not arrange travel plans that will prevent you from taking any of the exams at the scheduled time.

Good to know:

The Greek_alphabet.
Some useful notations.
Basic principles of counting.

Homework (lowest grade dropped)	50%
Take-home midterm exam	20%
Take-home final exam	30%

MATH 5763 - Stochastic Processes, Section 990 - Fall 2015 Tue 1:30–4:20 p.m., Classroom Building 3104

MATH 5763 - Stochastic Processes, Section 990 - Fall 2015
Tue 1:30–4:20 p.m., Classroom Building 3104