Proof of the KYP lemma

This page is incomplete and will be updated continuously.

This post is aimed at summarizing a proof of the KYP (Kalman–Yakubovich–Popov) lemma. The KYP lemma is one of the most important results in control theory. It first appeared as a tool to solve the absolute stability problem. The solvability of Lur'e resolving equations was characterized by a frequency-domain inequality. Currently, the KYP lemma has become a more powerful tool for the analysis and synthesis of control systems.

Let \(A \in \mathbb{R}^{n \times n}\), \(B \in \mathbb{R}^{n \times m}\), \(C \in \mathbb{R}^{p \times n}\), and \(D \in \mathbb{R}^{p \times m}\) be fixed. These can be considered as the coefficient matrices of the LTI system described by

\[ \begin{aligned} \dot{x}(t) &= Ax(t) + Bu(t), \\ y(t) &= Cx(t) + Du(t), \end{aligned} \]

where \(x(t)\), \(u(t)\), and \(y(t)\) are the state, input, and output, respectively. The associated transfer matrix is given by the matrix of proper rational functions

\[ \mathbf{G}(s) = C (sI - A)^{-1} B + D. \]

The KYP lemma states the equivalence between the frequency-domain property of the transfer matrix and an LMI involving its state-space realization. The statement is very clear, but the proof seems advanced. In this post, I want to summarize some ideas of the proof. An excellent proof based on linear algebra and convex analysis was given by Rantzer in this paper. For classical and modern perspectives, see this paper and this paper.

Key lemmas from linear algebra

We recall some preliminary results from Rantzer's paper.

Lemma 1. Let \(F,G \in \mathbb{C}^{m \times n}\). Then, the following statements are true:
(1) \(FF^* = GG^*\) if and only if there exists a matrix \(U\) such that \(UU^* = I\) and \(F = GU\).
(2) \(FG^* + GF^* = 0\) if and only if there exists a matrix \(U\) such that \(UU^* = I\) and \(F (I + U) = G (I - U)\).

Proof. The item (2) can be proven by replacing \(F\) and \(G\) in the item (1) by \(G - F\) and \(F + G\), respectively. Thus, we prove only the item (1).
First, we assume that \(m \le n\). In this case, instead of \(F\) and \(G\), we consider the enlarged square matrices

\[ \bar{F} := \begin{bmatrix} F \\ 0 \end{bmatrix}, \quad \bar{G} := \begin{bmatrix} G \\ 0 \end{bmatrix}. \]

The polar decompositions of \(\bar{F}\) and \(\bar{G}\) yield

\[ \begin{aligned} \bar{F} &= P_1 U_1, \quad P_1 \succeq 0, \quad U_1 \text{ unitary}, \\ \bar{G} &= P_2 U_2, \quad P_2 \succeq 0, \quad U_2 \text{ unitary}. \end{aligned} \]

The matrices \(P_1\) and \(P_2\) are uniquely determined by \(P_1 = (\bar{F} \bar{F}^*)^{1/2}\) and \(P_2 = (\bar{G} \bar{G}^*)^{1/2}\), respectively. Then, it is not difficult to see that

\[ FF^* = GG^* \iff \bar{F} \bar{F}^* = \bar{G} \bar{G}^* \iff P_1 = P_2 \iff \bar{F} = \bar{G} U \iff F = GU, \]

where \(U := U_2^* U_1\).
Then, we assume that \(m > n\). In this case, there exists a unitary matrix \(U_0\) such that

\[ F = U_0 \begin{bmatrix} F_0 \\ 0 \end{bmatrix}, \quad G = U_0 \begin{bmatrix} G_0 \\ 0 \end{bmatrix}. \]

To confirm this, we note that the SVD of \(F\) yields

\[ F = U_0 \begin{bmatrix} \Delta_F \\ 0 \end{bmatrix} V_0^*, \quad \Delta_F := \mathrm{diag}(\sigma_1(F),\ldots,\sigma_n(F)), \]

where \(U_0\) and \(V_0\) are unitary matrices. Thus, we set \(F_0 := \Delta_F V_0^*\). Correspondingly, we decompose \(G\) as

\[ G = U_0 \begin{bmatrix} G_0 \\ \bar{G}_0 \end{bmatrix}. \]

Then,

\[ FF^* = GG^* \iff \begin{bmatrix} F_0 F_0^* & 0 \\ 0 & 0 \end{bmatrix} = \begin{bmatrix} G_0 G_0^* & G_0 \bar{G}_0^* \\ \bar{G}_0 G_0^* & \bar{G}_0 \bar{G}_0^* \end{bmatrix}. \]

This means that \(\bar{G}_0 = 0\) must hold. Because of the equality \(F_0 F_0^* = G_0 G_0^*\), the above condition is equivalent to the existence of a matrix \(U\) such that \(UU^* = I\) and \(F_0 = G_0 U\). Equivalently, we have \(F = GU\). (QED)

Lemma 2. Let \(M,N \in \mathbb{C}^{m \times n}\). If \(NWM^* + MWN^* = 0\) for some \(W \succeq 0\), then there exist \(w_1,\ldots,w_n \in \mathbb{C}^n\) such that

\[ W = \sum_{k=1}^n w_k w_k^* \]

and

\[ N w_k w_k^* M^* + M w_k w_k^* N^* = 0, \quad k \in \{1,\ldots,n\}. \]

Elimination of \(\omega\)

Lemma 3. Let \(\xi,\eta \in \mathbb{C}^n\). Then, \(\xi \eta^* + \eta \xi^* = 0\) if and only if \(\xi = j \omega \eta\) for some \(\omega \in \mathbb{R}\).

Proof. The proof follows from Lemma 1.

Nonstrict inequalities

In the following, let \(\Pi\) be a given real symmetric matrix.

KYP Lemma: Assume that \((A,B)\) is controllable. Then, the following three assertions are equivalent:
(1) The inequality

\[ \begin{bmatrix} (j \omega I - A)^{-1} B \\ I \end{bmatrix}^* \Pi \begin{bmatrix} (j \omega I - A)^{-1} B \\ I \end{bmatrix} \preceq 0 \]

holds for all \(\omega \in \mathbb{R}\) such that \(\det (j \omega I - A) \ne 0\).
(2) There exists a real symmetric matrix \(P\) such that

\[ \begin{bmatrix} A^\prime P + PA & PB \\ B^\prime P & 0 \end{bmatrix} + \Pi \preceq 0. \]

(3) There exist real matrices \(P\), \(K\), and \(L\) with \(P\) symmetric such that

\[ \begin{aligned} A^\prime P + PA + K^\prime K + \Pi_{11} &= 0, \\ B^\prime P + L^\prime K + \Pi_{12} &= 0, \\ L^\prime L + \Pi_{22} &= 0. \end{aligned} \]

Proof. The item (1) can be rewritten as follows: For all \(x \in \mathbb{C}^n\) and \(u \in \mathbb{C}^m\),

\[ [\exists \omega \in \mathbb{R} : j \omega x = Ax + B u] \implies \begin{bmatrix} x \\ u \end{bmatrix}^* \Pi \begin{bmatrix} x \\ u \end{bmatrix} \le 0. \]

Moreover, Lemma 3 states that

\[ [\exists \omega \in \mathbb{R} : j \omega x = Ax + Bu] \iff x (Ax + Bu)^* + (Ax + Bu) x^* = 0. \]

We define

\[ \Theta := \left\{ \left( \begin{bmatrix} x \\ u \end{bmatrix}^* \Pi \begin{bmatrix} x \\ u \end{bmatrix}, x (Ax + Bu)^* + (Ax + Bu) x^* \right) : x \in \mathbb{C}^n,\ u \in \mathbb{C}^m \right\} \subset \mathbb{R} \times \mathbb{R}^{n \times n} \]

and

\[ \mathscr{P} := \{(a,0) : a > 0\} \subset \mathbb{R} \times \mathbb{R}^{n \times n}. \]

Thus, the item (1) is equivalent to \(\Theta \cap \mathscr{P} = \emptyset\).

Now, we consider the convex full of \(\Theta\), denoted \(\mathrm{Conv}(\Theta)\). It can be shown that the item (1) is equivalent to \(\mathrm{Conv}(\Theta) \cap \mathscr{P} = \emptyset\). This means that there is a hyperplane which separates \(\Theta\) from \(\mathscr{P}\). Let \((p,P) \in \mathbb{R} \times \mathbf{S}^n\) be a linear functional specifying such a hyperplane. It is supposed to be nonnegative on \(\mathscr{P}\) and nonpositive on \(\Theta\). Thus, \(p \ge 0\) and

\[ \mathop{\mathrm{tr}} \begin{bmatrix} p \\ P \end{bmatrix}^\prime \begin{bmatrix} \begin{bmatrix} x \\ u \end{bmatrix}^* \Pi \begin{bmatrix} x \\ u \end{bmatrix} \\ x (Ax + Bu)^* + (Ax + Bu) x^* \end{bmatrix} \le 0. \]

The last inequality is equivalent to

\[ \begin{bmatrix} x \\ u \end{bmatrix}^* \left\{ p \Pi + \begin{bmatrix} A^\prime P + PA & PB \\ B^\prime P & 0 \end{bmatrix} \right\} \begin{bmatrix} x \\ u \end{bmatrix} \le 0 \]

for all \(x \in \mathbb{C}^n\) and \(u \in \mathbb{C}^m\). If \(p = 0\) and \(P \ne 0\), then \((A,B)\) may be transformed into

\[ A = \begin{bmatrix} A_{11} & A_{12} \\ 0 & A_{22} \end{bmatrix}, \quad B = \begin{bmatrix} B_1 \\ 0 \end{bmatrix}. \]

This contradicts the controllability assumption. Thus, \(p \ne 0\). Without loss of generality, we can set \(p = 1\). This is actually the item (2). The equivalence between the items (2) and (3) is clear. (QED)

\(\mathcal{S}\)-procedure and Lagrangian duality

An interpretation based on Lagrangian duality was discussed in this paper.