2014年1月26日 星期日

Parameter Estimation – Cramer Rao Lower Bound

在 estimation theory 中,時常用到的 bound 是 Cramer-Rao Lower Bound, 簡稱 CRLB or CRB.

CRLB 的重要性在於 (i) 這個 lower bound 對不同的 estimators 能做的多好有一個比較的依據; (ii) 更重要的是 CRLB  常常是可以達到的 (也許在某一些條件下如 high SNR or asymptotically).  不像有一些數學上的 bound 只是好看而已。

Some Uses of CRLB

之後再補

What is CRLB?

CRLB is a lower bound on the variance of any unbiased parameter estimator:

image

所以到底 CRLB 為什麼存在以及如何定義?

Score, MLE, Fisher Information, CRLB

如何定義如下:

First, we define log likelihood function  LLF = ln{LF} = ln { p(x;q) } where p(x;q) is the pdf of x, or likelihood function of q.

image

V 稱為 score, 代表 LLF 的 first derivative, 也是 LF 的 sensitivity.

記得 ML estimator: q_hat 滿足 d(LLF)/dq = V = 0 代表找 LF or LLF 的最大值

事實上, ML estimator asymptotically achieves the Fisher information (and the CRLB) in the limit.

另外 under certain regularity condition, the first moment of V (即 expected value) is 0 .

image

The second moment of V is named Fisher Information (always > 0)

image

再回到 CRLB:  直覺來說,若是 LLF vs. q 很集中,estimation 的效果會比較好,Fisher Information (FI) 比較大,CRLB 就會比較小。相反如果 LLF vs. q 很平緩,FI 比較小,CRLB 就會比較大。 衡量集中或平緩的特性就是曲率,粗略來說,CRLB 的定義就是 LLF 曲率的倒數 * –1 (x-1 是轉正值).

注意 variance 大,FI 比較小。這和 Shannon Information (Entropy) 的定義相反。FI 是從估計的角度出發,variance 小則估計準,information 比較多。Shannon Information 是從 surprise 的角度出發,variance 小則沒有 surprise, information 比較少。

image

image

image

上述的第一項是 0 after E{}.   Therefore, the E{curvature (*-1)} is the same as Fisher information

CRLB = 1/(Fisher Inforamtion)

因此曲率大的  LLF 的 information 比較多,或是 CRLB 比較小。Estimator 有機會做的比較準確。

 

另一個 CRLB 有關的定理

image

 

CRLB 例子

Example 1: DC + AWGN

x[n] = A + w[n]      n = 0, 1, …, N-1

LLF function is as follows:

image

First derivative:  V = d(ln p)/dA = 1/sig^2 sum(x[n] – A)  

ML estimator:  A_hat =  1/N sum(x[n])

Second derivative:  d^2(ln p)/d^2 A = –N/sig^2

Fisher Information  I(A) = N / sig^2

CRLB = sig^2/N

image

ML estimator 和 MMSE estimator 相同,同時都可以達成 CRLB!

 

Example 2: Sine Wave + AWGN (only unknow is phase)

image

fo < 1/2 is to keep below Nyquist sampling frequency.

First, the pdf or likelihood function.  This is a general function even when A, fo, and phi are all unknown.

image 

Assuming the only unknown is phase,

image

 

image

image

image

 

The ML estimator looks very complicated (V=0) and no close-form (I think).   The efficient estimator does NOT exist based on the aforementioned theory.

Example 3: Complex Sine Wave + AWGN (amplitude, frequency, and  phase are unknowns)

If amplitude is unknown, 結果和 DC+AWGN 一樣。唯一的差別是 A –> A cos(.), 如果 N 夠大,A^2 –> A^2/2.   The CRLB doubles.

How about if both amplitude and phase are unknow?  It can be shown that amplitude and phase are orthorgonal.  Therefore,  the amplitude and phase estimation CRLB bound can be treated separately.

A more general CRLB with amplitude, frequency, and phase are unknown can be found in D.C. Rife’s paper.   

First, we consider the complex sine wave case, i.e. the sine wave contains both cos and sin.

image

image

可以看到 amplitude and phase 的 Fisher information 分別為 N/sig^2 and N*bo^2/sig^2.  這和之前的結論一致。amplitude 只和 sig^2 and N 有關;phase 和 SNR and N 有關。但是 complex sine wave 的 CRLB 比 sine wave only CRLB 小一半。

如果 frequency 也加入就更有趣。amplitude and phase 正交;amplitude and frequency 正交;但是 frequency and phase 非正交。直覺上 make sense, since frequency is phase derivative.  從 eq. (13) 也可以看出。J12=J21 <> 0

Complex sine wave (includes both sin and cos) 結論如下:

image

image

Amplitude 和 frequency and phase 正交,所以 eq.(18) hold for all cases.  eq.(19) 如上述討論 (frequency is known).  對於 frequency and phase all unknown 的情況最複雜。Basically, amplitude CRLB is proportional to sig^2/N; phase CRLB is proportional to 1/(SNR*N)); frequency is proportional to Fs^2/(SNR*N^3).  Note that N = T * Fs (?)

Example 4: Real Sine Wave + AWGN (amplitude, frequency, and phase are unknowns)

Next, we consider only real sine wave. 

image

image

image

image

The Fisher Information Matrix (FIM) is shown below:

image

Compared with the complex sine wave FIM, some second order items are ignored for simplicity (when N is large). 

image

The CRLB of real sine wave is very similiar to the complex sine wave.

Example 3 and 4 seems to imply that estimate frequency (1/N^3) is a lot more accurate than phase (1/N)!

下面的例子 Example 4 中的 method 3 and 4 似乎利用這個特性假設 perfect frequency estimation.

 

Example 5: Comlex Sine Wave with frequency offset and IQ imbalance (amplitude, frequency, and phase are unknowns)

image

Problem statement: A cos(wt + q) + j B sin(wt + q + f) + w(t) + j w’(t)   Estimate amplitude mismach (A/B) and phase mismatch f.  

The following method 1/2/3 employ a sine wave test tone (as a LIF) and estimate the amplitude and phase mismatch.   

Method 4 uses received preamble containing a known pilot sequence directly and a ZIF receiver to estimate the amplitude and phase mismatch assuming ideal frequency estimation and

Method 1: Separte estimate: A cos(wt + q)+w(t) and  B sin(wt + q + f)+w’(t)

var(A) > 2 sig^2/N   var(B) > 2 sig^2/N    var(A-B) > 4 sig^2/N

  1. var(q) > 4/(SNR*N)    var(q + f) > 4/(SNR*N)     var(f) > 8/(SNR*N)  (?)

 

Method 2: Cross correlation of I1’ and Q1’ 如下圖, then

LPF{(Acos(wt + q)+w(t))(Bsin(wt + q + f)+w’(t))} = AB/2[sin(f)]+Bw(t)sin()+Aw’(t)cos()+w(t)w’(t)

where LPF is simply a summation.

Therefore, it becomes DC + AWGN  (DC = AB/2 sin(f)   AWGN ~ Bw*sin+Aw’cos+w*w’)!  

var(sin(f)) ~ var(f) > 2*sig^2/NAB=2/(N*SNR)    (w*w’ is zero mean, but not Gaussian!)

這時的 CRLB 比較小,同時也不需要 estimate frequency!  How to estimate A/B? same as method 1? or square I and Q then average?

Method 3: Extension of method 2.

image

image

image

image

接下來假設 wd 可以準確 estimate, 以及在 digital domain 可以 create 90度 perfect 的 cos(wd) and sin(wd).  可以得到 II, IQ, QI, QQ.  這和 method 2 不同。method 2 直接 I1’ * Q1’, 因此不需要 estimate wd. 

image

image

image

image

image

 

How to derive the CRB? Wait for next time.  The result should be simiar to the next method 4.

 

Method 4: Let’s assume rRF(t) is a general received signal with preamble containing a known pilot sequence (instead of test tone).  Please refer this article.

image

image

where fd = fo - flo

Assuming ideal frequency estimation (fd = 0!?), g(t) is a match filter, and ideal timing recovery (?),  then

image

image

因此問題變成 given ak, bk, xi, and xq, estimate A, B, q, and f.  一旦 estimate 出 A, B, q, and f,

image

簡單來說,就是先用 preamble 中 known pilot sequency to estimate amplitude mismatch (A/B), phase mismatch(f), and phase offset (q).  再來用 eq (10) corrects mismatch and phase offset.  The CRLB is shown as follows:

image

Es/No = 1/sig^2   CRB(q) = 1/N*SNR   CRB(f) = 2/N*SNR  CRB(A)=CRB(B)=sig^2/N

 

Example 6: how about non-sine,  eye diagram?  harmonic estimation?  very low frequency (close to DC; or Nyquist)?

2014年1月18日 星期六

ML Estimation and MAP Estimation 差異

剛才看了 BAYESIAN ESTIMATION,   這篇文章把 MAP, ML, MMSE, MAVE 解釋和比較的很清楚 under the Bayesian estimation framework.  可以直接參考跳過本文。(Chap4 of “Advanced Digital Signal Processing and Noise Reduction”, Second Edition. by Saeed V. Vaseghi)

ML (Maximum Likelihood) and MAP (Maximum A posteriori Probability) estimation/detection 非常類似,也容易混淆。其實只有一點差異。

image

image

image

(2.1) 是 MAP detection;  (2.3) 是 ML detection.   MAP 和 ML 之間的關係是 (2.2) (Bayes theory). 

簡單來說,ML (eq. 2.3) 和 s 出現什麼的機率無關 P(s was sent).  意即完全不考慮 s 的 pdf; 或者假設所有的 s 有相同的機率。在這情況下,eq. 2.2 and eq. 2.3 are equivalent; ML estimation 等同 MAP estimation.  (P(v is observed) is a constant in eq. 2.2).

In summary,  ML (or MLE) 和 MAP estimator 的關係可以更簡化如下。唯一的差別就是 MAP 多乘了 theta 的  distribution.

 

舉例而言,s=+1 or –1 is a simple Bernoulli distribution. 

P(s=+1)=p  P(s=-1)=1-p=q           n ~ N(0, sig^2)

ML estimator:  s* = arg max P(v is observed | s was sent)

MAP estimator: s* = arg max P(v is observed | s was sent) * P(s was sent)

兩者的差異如下圖:

ML estimator:  v>0  s*=+1;  v<0  s*=-1   regardless p

MAP estimator: v>a  s*=+1;  v<a s*=-1   a depends on p (can be computed numerically)

如果 p=0.5,  a=0 and equivalent to ML estimator.  如果 p>0.5, a 就是負值,  s*=+1 的區間變大,反之亦然。直覺來說,p>0.5 代表 P(s=+1) 的機率比較大,因此在 estimate 時,比較 favor s=+1, 也就是 threshold a<0.

備註_20140118_213756_02(2)

2014年1月17日 星期五

MMSE estimator 和 ML estimator 差異


ML estimator 是 Maximum Likelihood estimator (最有可能)

MMSE estimator 是 Minimum Mean Square Error estimator (最小平方誤差)

嚴格來說,ML estimation 是  classic estimation.  MMSE 是 Bayesian estimation.  請另文參考。本文主要是以應用來比較。

這兩種 estimator 在實際應用上常遇到。 到底差異在那? 以及 when to apply what?
ML example: match filter (maximize decision SNR), digital communication detection (minimum BER- Bit Error Rate), etc.    ML 一般是在 no prior information 時的最佳選擇。

MMSE example: Wiener filter (minimize error, maximize total SNR), Kalman filter (minimize error), etc.   MMSE = min E(|x-x*|^2) 直譯為 minimum mean square.  通常 reserve 給 Bayesian estimation quardratic cost function with prior information.

簡單來說,ML estimator 主要應用是 estimate parameters based on what is most likely to create the oberved samples (但 parameter 本身 no prior information).

MMSE estimator 主要差別是 estimate waveform (Wiener filter) or sequences (Kalman filter) based on samples to minimize a cost function (mean square error function).

當然以上只是粗略的區分。也有 sequence ML estimator (Viterbi decoding algorithm 就是有名的例子)。反之 MMSE 也常用來 estimate parameters, 請 reference 前文 “Kalman filter: MMSE estimator”.

簡單例子 (類比通信和數位通信)

類比通信如 AM 或 FM, 接收時會被 noise 污染,理想的類比接收器就是 MMSE filter (如 Wiener filter), 儘可能還原原始的類比信號。實際上為了成本的考量,一般都用簡單的 LPF.  ML estimator 在此就無用武之地。


數位通信如 PSK 或 FSK, 接收時也會被 noise 污染,但理想的接收器不是儘可能還原原始信號。如下圖所示,重點在 maximize decision moment 時的 SNR, 才能準確判斷收到是 0 or 1. 即使 waveform 有 distortion 也無妨,因為數位通信並不 care waveform. 理想的數位接收器就是 ML estimator (match filer), 能準確判斷收到(或送出)是 0 or 1.   MMSE estimator and ML estimator 在這個例子顯然不同。

image

一個反例 (MMSE and ML estimator 相同結果): Linear Regression

==> 以下的例子並非是 MMSE, least square function or least square estimation (LSE) .  常常容易混淆 MMSE and least square optimzation, 以下誤導大家。

Least square estimation (LSE) and ML estimation 的差異可見本文

在 linear regression estimation, the goal is to estimate y = b_0 + b_1 x 中的 b_0 and b_1

image

注意這是 estimate parameters b_0 and b_1, 不是如 Kalman filter estimate waveform.

image

先 define square error function 如下:

image 

image

image

 

ML 的方法是先 define likelihood function 如下:

image

Then find what theta is most likely given Xi, i.e. find maximum of likelihood function.

image

更常用的是 log-likelihood function, take log of the likelihood function 再微分。

 

Assuming epi_i is Gaussian with i.i.d:

image

How to maximize the likelihood function?  It tunrs out equivalent to minimize

image

 

以下為錯誤結論 :

In this case, MMSE and ML estimation are the same!!

到底  MMSE 和 ML 在 parameter estimation 有什麼不同,還是每次都會有相同的結果?  從上面的例子,MMSE estimator 和 error (epson_i) 的 pdf 沒有關係,但 ML estimator 明顯 depends on pdf. 

舉例而言,如果 eps_i 不是 Gaussian, 而是 exponential distribution, ML 要 maximize likelihood function 就不會 equivalent to minimize “mean absolute error” (沒有 square). 

In general, MMSE estimator 不管 error 的 pdf, 基本上只 care 1st order and 2nd order statistics (mean and variance).   換言之,就是假設 error pdf 是 Gaussian, MMSE estimator 就是 optimal estimator and achieve the Cramer-Rao lower bound (CRLB).

反之,ML estimator 會針對 error pdf (prior information) 做 optimization.  理論上會比 MMSE 更優?  如果 error pdf 是 Gaussian,  MMSE and ML 就會得到相同的結果。

以下為正確的結論:

1. 上述 MMSE 應改為 least square estimation (LSE), 並沒有 mean (prior pdf) 的觀念。

2. ML estimator depends on pdf (稱為 likelihood function). 但 parameter 本身 no prior information.

3. Depending on the likelihood function, ML 需要 optimize 的 function 不同。For Gaussian likelihood function, 即為 least square function.   For other likelihood function,  就會有不同的 minimal function.

 

另一個例子 (Sine Wave Estimation)

Sine wave estimation 有兩種 flavors: (i) Parameter estimation (假設 amplitude, frequency, phase are fixed); (ii) Waveform estimation (amplitude, frequency, phase can be time varying).

For (ii) waveform estimation, 很顯然只適合 MMSE estimator.  For (i) parameter estimation, ML and MMSE 皆可用。

Sine Wave Parameter Estimation

這個例子在 digial signal processing 常常用到。Estimate 一個 sine wave 的頻率和 phase,不過這個 sine wave 被 additive Gaussian noise 所污染。這個問題有很多變形:sine wave 的 IQ imbalance; sine wave carrier synchronization; sine wave 包含了 harmonics; 或是 harmonic level 的 estimation, etc.

基本上有兩種做法:(i) Time domain or (ii) Frequency domain. 

Time Domain Linear Method

Time domain method is mainly based on linear regression estimation.  It shows accurate result at moderate to high SNRs.

Tretter suggests the following method:

image

以上 eq. (3), estimate wo and theta using x_t,  t=0, 1, . ,N-1.  可以証明 ML estimator and MMSE estimator are equivalent.   wo and theta are obtained by the equations in linear regression example.  

There are two problems: (i) eq. (2) is only approximate model and accurate only at moderate to high SNRs.; (ii) (3) needs to do phase unwrapped and it can be error prone.

One variation of the sine wave estimation is coherent sampling.  That is, the sampling frequency is integer multiple of the sine wave.  In this case, the sine wave frequency is known, the only parameter to be estimated is the phase.  Therefore, the phase is inversely proportional to the N^2.  

The following is the math of problem statement, CRLB, and Tretter’s model

image 

image

 

image

 

The following is the Tretter’s model.

image

 

 

 

Tretter’s algorithm involves unwrap(arctan) operation and achieve CRLB at a high SNR.  It can be used for IQ imbalance estimation withour prior knowledge of the frequency.  (IQ is arctan function).

In summary, Tretter’s algorithm

Pros: (i) No FFT, lower complexity;  (ii) Sequence estimation, low latency

Cons: (ii) phase unwrap add additional complexity and may fail at low SNR

 

Kay improves the algorithm by using the differenced phase of two adjacent samples. 

image

It is shown that Tretter and Kay’s algorithms are equivalent via Itoh’s phase unwrapping algorithm.

 

 

Time Domain Non-Linear Method

Zero crossing method, not surprising since arctan is nonlinear function.  We can avoid the arctan by using zero crossing

Can it generalize to non-sine waveform?  square wave or pulse like waveform?  It may be useful for asynchronous eye diagram analysis???

Frequency Domain

D.C. Rife

 

Waveform example:

MMSE is used due to close resamble the original waveform!

It is NOT required to be a fixed sine wave.  Some parameter could be time varying.

Example is like FM demodulation.  How about zero crossing MMSE for FM demodulation?

 

A sin(wt + theta)

For MMSE estimator

For ML estimator, estimate A, w, or theta

追蹤者