ENGG5301 Information Theory 2025 Midterm Exam P3:Causal Encoding
題目為回憶版,解答是 GPT-5 寫的。
考試時(shí) (1) 問就想偏了,考后看到 GPT-5 的答案很氣,不等式想不到直接 (1)(2)(3) 連跪,搞的 (4)(5) 問也沒做。
從初中就開始爛完的不等式水平又發(fā)力了,但這課確實(shí)沒啥心思去刷教材/習(xí)題,符合預(yù)期。
Problem
Let random variables \(X_1,\dots,X_n\) be i.i.d. with distribution \(p_X\).
We define an encoding sequence \(M_1,\dots,M_n\) subject to the following causality constraints:
- \(M_i\) is a function of \(X_1,\dots,X_i\), i.e., \(M_i = f_i(X_1,\dots,X_i)\);
- \(X_i\) is a function of \(M_1,\dots,M_i\), i.e., \(X_i = g_i(M_1,\dots,M_i)\).
Answer the following:
-
Find the minimum value of \(\sum_{i=1}^n H(M_i)\) in terms of \(p_X\).
-
Show that, under the optimal encoding, \(M_1,\dots,M_n\) are mutually independent.
-
Show that, under the optimal encoding, \(H(M_i \mid X_i) = 0\).
-
If we drop optimality but require causal encoding and prefix-free constraint , show that there exists an encoding such that
\[\mathbb{E}\!\left(\frac{1}{n}\sum_{i=1}^n L(M_i)\right) < H(p_X) + 1. \] -
Under the same causal and prefix-free constraint, show that for any \(\varepsilon>0\), there exists an encoding such that
\[\mathbb{E}\!\left(\frac{1}{n}\sum_{i=1}^n L(M_i)\right) < H(p_X) + \varepsilon. \]
Solution
(a) Minimum of \(\sum_i H(M_i)\) under optimal encoding
Result:
Proof:
By the data processing inequality and lossless reconstruction,
Since \(X_1,\dots,X_n\) are i.i.d.,
Thus,
Choosing \(M_i=f(X_i)\), i.e., encoding each symbol separately, achieves \(H(M_i)=H(X)\) and satisfies both causality and recoverability, hence the bound is tight.
(b) Independence of \(M_1,\dots,M_n\) under optimal encoding
Result:
Under the optimal encoding, \(M_1,\dots,M_n\) are mutually independent (and thus pairwise independent).
Proof:
Equality in part (a) requires that
Equality between the sum of marginal entropies and the joint entropy holds if and only if the variables are mutually independent.
Hence, the optimal \(M_1,\dots,M_n\) are independent.
(c) Determinism: \(H(M_i\mid X_i)=0\) under optimal encoding
Result:
Under the optimal scheme, \(M_i\) is a deterministic function of \(X_i\).
Proof:
From the reconstruction condition,
Since \(X_i\) is independent of previous messages \((M_1,\dots,M_{i-1})\),
Hence the mutual information satisfies
On the other hand,
Under the optimal encoding, \(H(M_i\mid M_1,\dots,M_{i-1}) = H(X_i)\), so
Because \(M_i\) is independent of \((M_1,\dots,M_{i-1})\), it follows that
Therefore, each \(M_i\) is a deterministic function of \(X_i\).
(d) Existence of a causal, prefix-free encoding with average length \(< H(p_X)+1\)
We want to construct \(M_1,\dots,M_n\) satisfying causality and a prefix-free constraint, such that
Construction (symbol-by-symbol / Huffman coding):
- Since \(X_1,\dots,X_n\) are i.i.d., we can encode each symbol independently.
- Build a prefix-free code \(C\) (e.g., Huffman code) for \(p_X\).
- For each \(i\), set
This clearly satisfies causality:
- \(M_i\) depends on \(X_i\) (and trivially on \(X_1,\dots,X_{i-1}\))
\(\Rightarrow M_i = g(X_1,\dots,X_i)\). - \(X_i\) can be decoded from \(M_i\) alone
\(\Rightarrow X_i = f(M_1,\dots,M_i)\).
Average length:
Hence
by the standard bound for Huffman coding.
? This proves part (d).
(e) Existence of a causal, prefix-free encoding with average length \(< H(p_X)+\varepsilon\)
To achieve \(H(p_X)+\varepsilon\), we need block coding.
- Treat the whole sequence \(X^n=(X_1,\dots,X_n)\) as a block.
- Use an \(n\)-symbol prefix-free code \(C_n\) (e.g., arithmetic coding), which satisfies
Since \(X^n\) is i.i.d.,
- Implement \(C_n\) sequentially (arithmetic coding can be output symbol by symbol):
- Let \(M_1,\dots,M_n\) be the incremental outputs, so that
- Then causality is preserved:
- \(M_i\) depends on \(X_1,\dots,X_i\).
- \(X_i\) can be recovered from \(M_1,\dots,M_i\).
- Average length per symbol:
- By choosing \(n > 1/\varepsilon\), we guarantee
? This proves part (e).

浙公網(wǎng)安備 33010602011771號(hào)