Table Of ContentLecture Notes in
Operations Research and
Mathematical Systems
Economics, Computer Science, Information and Control
Edited by M. Beckmann, Providence and H. P. Kunzi, Zurich
40
Hilmar Drygas
Studiengruppe fUr Systemforschung, Heidelberg
The Coordinate-Free Approach
to Gauss-Markov Estimation
Springer-Verlag
Berlin· Heidelberg· New York 1970
Advisory Board
H. Albach . A. v. Balakrishnan· F. Ferschl
R. E. Kalman· W. Krelle . N. Wirth
ISBN-13: 978-3-540-05326-2 e-ISBN-13: 978-3-642-65148-9
DOl: 10.1007/978-3-642-65148-9
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned,
specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine
or similar means, and storage in data banks.
Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher,
the amount of the tee to be determined by agreement with the publisher.
© by Springer-Verlag Berlin· Heidelberg 1970. Library of Congress Catalog Card Number 78-147405.
Offsetdruck: Julius Beltz, Weinheim/Bergstr.
Summary and Preface.
These notes originate from a couple of lectures which were given
in the Econometric Workshop of the Center for Operations Research
and Econometrics (CORE) at the Catholic University of Louvain. The
participants of the seminars were recommended to read the first
four chapters of Seber's book [40], but the exposition of the material
went beyond Seber's exposition, if it seemed necessary.
Coordinate-free methods are not new in Gauss-Markov estimation,
besides Seber the work of Kolmogorov [11], SCheffe [36], Kruskal [21],
[22] and Malinvaud [25], [26] should be mentioned. Malinvaud's
approach however is a little different from that of the other authors,
because his optimality criterion is based on the ellipsoid of con-
centration. This criterion is however equivalent to the usual con-
cept of minimal covariance-matrix and therefore the result must be
the same in both cases. While the usual theory gives no indication
how small the covariance-matrix can be made before the optimal es
timator is computed, Malinvaud can show how small the ellipsoid of
concentration can be made: it is at most equal to the intersection
of the ellipssoid of concentration of the observed random vector
and the linear space in which the (unknown) expectation value of the
observed random vector is lying.
This exposition is based on the observation, that in regression
~nalysis and related fields two conclusions are or should preferably
be applied repeatedly. The first important fundamental lemma is the
Farkas' theorem, which is closely related to the well-known famous
IV
Farkas-Minkowski theorem (see e.g. Gale [12], pp. 41-49). It is
mainly based on the definition of the adjoint mapping, or to express
it in matrices, on the definition of the transposed matrix.
Chipman [4] has already pointed out this close relationship. The
second important lemma is the projection theorem, which says, that
a given point outside of a linear manifold has minimal distance
from a point on the linear manifold if and only if the connecting
line between the two points is perpendicular (orthogonal) to the
linear subspace belonging to the linear manifold. The proof of this
lemma rests on the application of the theorem of Pythagoras which is
older than 2300 years. In this presentation there is only made a
slight extension. In regression analysis sometimes there appear
covariancece-matrices, which are not regular. The quadratic form
corresponding to this covariance-matrix is then semi-definite and
a semi-inner product rather than an inner product. Therefore the
Cauchy-Schwarz inequality and the projection theorem are generalized
to the case that we have a semi-inner product rather than an inner
product.
The plans of these notes are as follows : In the first paragraph
we give a justification of the coordinate-free approach showing the
manner how regression models usually are built in economics. After
this short introduction in the second paragraph the more technical
concepts are developed. We start with the definition of a vector-
space and introduce the concept of a semi-inner product. Then
the notions of linear independency, of a basis and of a orthonormal
basis are given. Erhard Schmidt's orthonormalization method is
v
discussed and applied to investigate orthogonal complements of
linear subspaces. In a short section linear functions, linear mappings
and adjoint mappings are studied. After this the simple but impor-
tant Farkas' theorem already can be proved. This is followed by
a corollary investigating the orthogonal complements of sum and
intersection of linear vector-spaces. After this the projection
theorem and the investigation of the properties of projections toge
ther with the introduction of the generalized inverse of a linear
mapping finish this paragraph on vector-spaces. This paragraph is
self-contained, i.e. it contains all the proofs of the stated theorems.
The third paragraph begins by saying what we mean by a linear
statistical model M(L,Q) where L ~ H is a linear manifold of the
linear vector-space Hand Q : H + H is a (symmetric, positive-semi-
definite) linear mapping. This is the set of all random H-valued
vectors whose expectation value lies in L and whose covariance-matrix
is equal to the given Q. A coordinate-free definition of the expec-
tat ion-value and the covariance-matrix is considered. The procedure
of generalized least-squares estimation of the expectation value
is shown to be a consequence of the projection theorem. After this
best linear unbiased estimators (BLUE) of a linear function of the
expectation value are considered. Using the projection theorem
necessary and sufficient conditions are found. It is shown that ifQ is
regular, the generalized least squares-estimator has the property that
each linear function of it is BLUE of the same linear function of the
expectation value. Then in the case of an arbitrary Q necessary and
sufficient conditions for a (inhomogeneous) linear mapping to be
VI
BLUE of Ey in M(L,Q) are given. A definition of optimal (BLUE)-
estimation is given which allows to compare linear estimators as a
whole and not only their linear functions. This concept is compared
with Malinvaud's concept and a simplified proof of Malinvaud's version
of the Gauss-Markov theorem is given. Finally some remarks are made
on the relation between Gauss-Markov estimation and least-squares in
the case, that Q is singular. It is also shown that the least
squares-estimator (Gauss-Markov estimator) is an admissible estimator
in the class of linear estimators of the expectation value.
This general theory is then followed by a number of examples.
First restrictions on the expectation value are considered. In this
case a very simple formula is obtained. It coincides almost with
the well-known formula of the full-rank case, only a inverse has to
be substituted by a generalized inverse. As regression can always
be considered as restrictions and vice versa we also get a formula
in regression models. The only difference is that in the full-rank
case this formula does not coincide with the usual Aitken-formula of
estimation. The Aitken-formula is then derived in a special case.
The case of a general regression model (Schonfeld's formula) will
there be given after stepwise least-squares. Also restrictions on the
parameters are discussed and an alternative proof of Plackett's formu-
la is given. If the covariance-matrix of the observed random vector
°
2
is equal to a Q rather than Q and a > is an unknown parameter, also
02 is tobe estimated. Under the assumption that the principal components
of the observed random vector are independent and have kurtosis 0,
optimal quadratic and optimal quadratic unbiased estimators of 02
VII
are found. Stepwise least squares and stepwise Gauss-Markov esti-
mation is finally investigated.
The requirements for an understanding of these notes are very
low. The reader should be familiar with the RN and the usual repre
sentation of linear mappings by matrices, in order that he can un
derstand the motivation of the coordinate-free definitions and con-
cepts. All other results on vector-spaces are developed in section
1. A little more knowledge is required from probability theory.
Here the reader should be familiar with the concept of a probability
space, the notion of expectation value, variance and covariance-
matrix. Finally also knowledge is required of independence and
uncorrelatedness. But these concepts only appear in section e)
of §3. (estimation of a2 ). The most important requirements of these
notes are however that the reader should be able and willing to think
in the abstract categories and formulations presented here.
I am greatly indebted to Mr. Michel Mouchart from CORE who has
read with great care large parts of the several versions of these
notes. By his critical and always stimulating comments the presen-
tat ion of these notes could be improved considerably. Also Mr. Man
fred Deistler (University of Regensburg) gave some useful advice.
Last not least I am grateful to the research Director of CORE,
Prof. Jacques H. Dreze who has not only suggested the topic of these
notes as a subject for the Econometric Workshop but also suggested
to write the material down in the form presented here. I have also to
thank Miss Gina Rasschaert and Miss Jeanine De Ryck who have done
with great care the painstaking work of typing this manuscript.
Louvain, April 1970
CONTENT
Summary and Preface........................................ III
§ 1. Justification of the coordinate-free approach •••.•.••.•.•••
§ 2. Vector-spaces.............................................. 8
a) Definition of a vector-space............................ 8
b) Inner products and semi-inner products.................. 10
c) Bases of a vector-space, orthogonal complement.......... 15
d) Linear functions, linear mappings and adjoint mappings.. 24
e) Definition of set-operations in vector-spaces........... 31
f) The Farkas' theorem..................................... 33
g) Projections, generalized inverses and pseudo-inverses... 36
§ 3. Linear statistical models.................................. 46
a) Definition of linear statistical models................. 46
b) Least squares-estimators and Gauss-Markov estimators.... 50
c) Supplements to least squares and Gauss-Markov estimation 70
d) Examples: 1) Restrictions.............................. 77
2) The regression model...................... 78
3) Aitken's formula.......................... 79
4) Schonfeld's formula in the general
regression model.......................... 80
5) Restrictions on the parameters............ 81
e) The estimation ofd2............... ......... .... ........ 87
f) Stepwise least squares and stepwise Gauss-Markov
estimation.............................................. 103
Bibliography...................................................... 110
11. Justification of the coordinate-free approach.
Let us assume that we have an economic variable y, which is
explained by the exogeneous economic variables x" ... ,xk' i.e.
( 1. , ) y
for some suitable (but possibly unknown) function ~. The econome-
trician is now interested in the determination of the form of ~
and/or in the verification of the economic law. This may be desi-
reable for explanation purposes, for prediction purposes or for
devicing economic policies to attain a certain economic or political
aim.
Econometric theory, as far as it is up to now well established,
can treat only linear models. Therefore a linearization is made in
the economic law ('.'), by introducing new mathematical variables
z" ..• ,zr which are related to the economic variables x" •.. ,xk
by certain relations
( 1. 2) ',2, ... ,r.
Such variables could for example be
( 1. 3)
and so on. The second step after the linearization is the sampling
procedure. By observing y.x" ••. ,xk one can compute zi and can
2
consider the computed values as a result' of the sampling procedure,
too. Let us therefore assume that we have n observations of the
variables y,z1, ... ,zr' Then one will realize that the relation
( 1 • 4 ) y
which is assumed as our economic law after the linearization, will
in general not hold exactly. Therefore econometricians introduce
a random disturbance term £ and modify (1.4) into
( 1. 5 ) y
This relation is also assumed to hold for the observations
Yi' z1i,···,zri (i = 1,2, ... ,n) and the disturbance term is now
an unobservable random variable. So we get
( 1. 6) 1 ,2, ... , n.
The usual way of a statistical treatment of the system (1.6) is to
introduce matrices and vectors for the observed quantities. In this
first abstraction we introduce the n x 1 column-vectors
( 1. 7)