Table Of Contenti
The 3D Recognition, Generation, Fusion, Update and Refinement
(RG4) Concept.
David A. Maluf, Peter Cheeseman, Vadim N. Smelyanskyi,
Frank Kuehnel and Robin D. Morris.
NASA Ames Research Center, Mail Stop 269, Moffet Field, CA 94035
[email protected]
process, it is possible to have more than
one foveation region.
Abstract
This research initiative is directed
This paper describes an active (real towards descent imagery in connection
time) recognition strategy whereby with NASA's Entry Descent Landing
information is inferred iteratively across (EDL) applications. 3-D Model
several viewpoints in descent imagery. Recognition, Generation, Fusion,
We will show how we use inverse theory Update and Refinement (RGFUR or
within the context of parametric model RG4) for height and the spectral
reflection characteristics are in focus for
generation, namely height and spectral
reflection functions, to generate s model various reasons, one of which is the
assertions. Using this strategy in an prospect that their interpretation will
active context implies that, from every provide for real time active vision for
viewpoint, the proposed system must automated EDL.
refine its hypotheses taking into account
the image and the effect of uncertainties
as well. The proposed system employs 1 Introduction and Background
probabilistic solutions to the problem of
iteratively merging information (images) The period of the Entry, Descent and
from several viewpoints. This involves Landing is the missions most critical
feeding the posterior distribution from all period with the highest risk factor for a
previous images as a prior for the next potential Loss of Vehicle (LOV). Since
view. Novel approaches will be distant missions such as Mars are
developed to accelerate the inversion constrained in payload and design,
search using novel statistic NASA must employ technology to
implementations and reducing the intelligently use all available resources,
model complexity using foveated vision. optimally integrate sensor data and
perform real-time decision and reason
Foveated vision refers to imagery where for successful Entry, Descent and
the resolution varies across the image. Landing.
In this paper, we allow the model to be
foveated where the highest resolution Understanding the importance of Entry
region is called the foveation region. Descent and Landing is best illustrated
Typically, the images will have dynamic by describing the critical phases of an
control of the location of the foveation Entry, Descent and Landing process for
region. For descent imagery in the a spacecraft. It is estimated that the
Entry, Descent and Landing (EDL) spacecraft's descent from the time it hits
w-
the upper atmosphere until it lands About 70 to 100 seconds before landing
takes no more than 4 minutes and a few a landing radar will be activated. To this
seconds to accomplish the final landing end, we anticipate to having our
as in the case of the Mars Polar Lander. proposed 3-D Model Recognition,
Enabling technologies such as active Generation, Fusion, Update and
vision can continually operate and Refinement (RGFUR or RG4) to include
integrate the vision system to actively radar readings and other sensor
interpret images for enhanced model modalities (gyros and inertia guidance).
recognition which can play a crucial role The radar will be able to gauge the
in mitigating major risk factors. spacecraft's altitude about 40 seconds
after it is turned on, at an altitude of
We estimate that the period where on- about 1.5 miles above the surface. With
board intelligent systems can start a robust RG4 system, the spacecraft
capturing the landing site's topographic can rely on the on-board camera for
details starts about two minutes before final touch down.
landing and the spacecraft is expected
to be moving at about 1,000 miles per
hour around 5 miles above the surface.
Entry, descent and land;ng
and is an advance over, the work in [10]
in a number of ways. In this paper we
argue for a unified model of the surface
of interest, with all observations aimed
2 Similar Work and Comparison at building up knowledge of this model,
in contrast to an approach that builds up
a model piecewise and in a manner
Johnson's work described in [10]
dependent on the detection of features
addresses the problem of autonomous
in the images. We also propose doing
operation close to a small body. The
absolute location relative to the entire
work described in our paper differs from,
surface model, an approach that is
much more robust and accurate than We are investigating a Bayesian model-
location relative to a small number of based approach to integrating
landmarks. It also does not rely on the information from multiple images of the
presence of explicit landmarks on the same area into a unified model at a
object, but instead uses the entire resolution higher than that of the
surface essentially as one, extended contributing images (super-resolution).
landmark. Finally, the approach we This model is a representation of the
advocate gives explicit uncertainty physical parameters describing the
estimates of the surface and position; surface. The physical parameters we
the work in [10] provides uncertainty use are heights at each grid point and
estimates by running Monte Carlo the surface reflectance properties at
simulations. After all, a typical risk each grid point, such as albedo (for a
associated with the landing process is to Lambertian reflectance model) or more
be able to resolve the surface to the generally a parameterized bi-directional
level of details and be capable of reflectance distribution function (BRDF).
avoiding a boulder, a ditch or a crack Each image is an independent sample
which could result in a Loss of Vehicle of the area of interest, and by combining
(LOV). the information from these separate
images, surface features smaller than
the image pixel scale can be captured.
3 Research Objectives Because the model is constructed at
finer resolution than any image, it is
possible to use it to accurately project
The ambition of this paper in active
what that surface would look like from
vision is to continually operate and
any view point, under any lighting
integrate a vision system that can
conditions. This projection is computed
actively interpret images for enhanced
by summing the contribution from each
model recognition. The proposed
surface patch onto each synthesized
approach exploits super-resolution
image pixel, weighted by the camera
techniques [3][4] and focus of attention
point spread function (PSF). This
(foveated vision) to enable better model
projection process is called rendering in
recognition in descent imagery.
computer graphics, and the realism
achieved by current computer graphics
This research initiative is directed
indicates the viability of accurate image
towards descent imagery in connection
projection from a surface model.
with NASA's Entry Descent Landing
(EDL) applications. 3-D Model
The essence of super-resolution in RG4
Recognition, Generation, Fusion,
is to use Bayesian inference to invert
Update and Refinement (RGFUR or
the image rendering process. That is, in
RG4) for height and the spectral
reflection characteristics are in focus for rendering, the surface and its
reflectance properties are assumed
various reasons, one of which is the
known, as is the location and properties
prospect that their interpretation will
of the camera and the lighting source
provide for real time active vision for
automated EDL. (typically the sun), and this information
is used to generate an image under
those conditions. In the Bayesian
model-based inference process, the
4 Model Recognition, Generation,
rendering process is reversed. That is,
Fusion, Update and Refinement
given the images, we find the most likely
(RG4) and Super-Resolution
surface that would have generated
them.
bottom. With two images only (similar to
The model would consist of a
the one on the left), the right image
discretized grid covering the area of
contains more detailed features (The
interest, where each grid point stores
right image is inferred from 3D model).
the geophysical parameters of the
correspondinggroundlocation. These
NASA has developed this process of
parametersmainlyincludeelevationand model-based inversion over the last few
reflectance spectral characteristics.
years, starting from the simple 2-D
This modelis chosenso thatwhatthe
models, and working up to the full 3-D
camera is expected to see can be surface reconstruction problem [3][4].
projectedfromthemodel. Modelupdate
We are now able to super-resolve the
consists of comparing the expected heights and albedos of the true surface
pixel values with the observed, and from multiple images, where the images
changing the model to better fit the data can be taken from any viewpoint and
(including previous data). This update under any lighting conditions. On
will be accomplished by computationally artificial images generated from the
efficient Bayesian inference that inverts model, we are able to reconstruct the
the image rendering process as used in surface to essentially the noise level of
computer graphics. The search for the the data.
most likely surface will be performed by
a novel type of gradient descent, where 4.1 Research
the gradient is computed analytically.
Super-resolution is a very useful product
for the Entry, Descent, and Landing
process where the resolved model is
beyond what can be extracted using the
best available image. The main reason
for developing the super-resolution
capability is to allow the integration of
information from different images
without the problem of aliasing and
mismatched pixel grids. Super-
resolution solves this problem because
any pixel maps onto many ground
points, so that intensity of any pixel can
be accurately computed by summing up
the corresponding ground points. In fact,
the surface model becomes the
repository of the pixers information, so
that a system does not need to have
multiple images persistent in its
memory, but rather a model. EDL
processes and post processes will thus
interact with the surface model, and can
view it from any direction or under any
lighting conditions, including viewpoints
that were not originally available!
Figure 1 Top image is one of the two
In implementing this research, we
images taken from Clementine imagery
extend 3-D super-resolution algorithms
to super-resolve the image on the
to solve a number of technical problems
that arise in this application. In
right image contains crisp detailed
particularwewill findworkablesolutions
features. The bottom plot is the surface
to the following problems using the
inferred from the images (not shown is
approachoutlinedbelow.
the albedo field).
Shape from Motion
A main objective of RG4 is to achieve a
surface inference in "real time". To that
extend we obtain a fast shape from
motion alogrithm which can feed itself
as prior knowledge. Standard "shape
from motion" algorithms [1] maintain the
assumption of constant surface
reflectance properties and are not
extendable in nature to super-resolution.
We plan to use our new shape-from-
motion technique to "bootstrap" a
super-resolution inference for natural
surface formation where varying albedo
properties and shadows are correctly
accounted for.
Mufti-Spectral Integration
EDL on-board instruments have multiple
spectral bands will have different
coverages, i..e. different widths and
ranges. Our approach to solving the
problem posed by integrating this
heterogeneous information is to
consider the model's surface by a
wavelength dependent reflectance.
That is, instead of a single number to
represent the (Lambertian) surface
•. ....'.....,...-..: reflectance for a particular band, we will
represent the reflectance as a "smooth"
function of wavelength, where the
function is represented by a small
| number of coefficients that are
{ estimated from the data. This function
can then be integrated with each band
spectral response function (a property of
the instrument) to get the expected
reflectance for that band.
30 20 40
30 20_ 40
x_JOm_$ Inf_'r_l M_ _,h _2trnaqes
Super-Resolution
Figure 2: Top image is one of the twelve
synthetic images of Silicon Valley area One of the major achievements in this
used to super-resolve the second research is the method to achieve a
image. With twelve images only, the recursive linear minimization as part of
the desired inference for three-
in the resolution of pixel/model
dimensional surface reconstructionto
relationships.
the extentthatthe resolutionof inferred
surfacemeshis higherthanthe spatial
resolution of input images. This
technique also allows images to be
super-resolved in both two or three
dimensions(accordingto the natureof
thedata).
Accelerated Search
In statistical inference scheme, the
solution for the gradient step in linear
minimization for large sparse linear
systems for which direct methods such Figure 3: Left image is our planned
as Conjugate Gradient is expensive in 'spider-web' type mesh with a foveated
terms of both time and storage cost. For center (not necessary centered in the
the class of descent imagery problem of middle). Right image is a typical non-
using Bayesian inference for 3-D model uniform grids.
parameter estimation, we plan to use a
novel iterative technique which solves We extend 3-D surface models to
the problem of search minimization foveated models using traditional
efficiently in terms of storage and triangulated surface, but distribution of
memory cost. This novel technique the heights would no longer be tied to a
takes root in a recent discovery for a uniform grid but to Foveated model
model prior which reduces the (Figure 3.a). This extension is not
covariance matrix complexity from a difficult in principle, but the changed
quadratic to a linear representation. As representation affects triangle indexing,
a result, the amount data will be and so affects efficiency.
relatively linear to the size of the model
which is essential especially in a scarce
computing environment. 4.2 Active Recognition: Concepts
and Technical Aspects
Foveated Vision
The key idea behind active recognition
We also support a Foveated Vision
in a sequential recognition strategy is
capability with variable resolution--that that of improving interpretation by
is, the surface triangles may be very accumulating evidence in real time. The
small in some areas (super-resolved) important aspect in the Entry Descent
and very coarse in other areas (under- and Landing recognition problem is to
resolved). The primary value of foveated compute on-line a 3D model from
vision is in the model reconstruction
sensory data linked to the different
where high resolution information is sensor hardware which support the
transmitted in the regions of the image different phases in descent process
that are selected as important. On the (e.g. different cameras, FOV, RADAR,
other hand, low resolution information is
LADAR, altimeters, gyros etc.).
processed at a second stage under
contraints (e.g. time and computing It is clearly understood that the image
resources). Foveated vision is crutial in resolution in the early stage does not
descent imagery and will enable control guarantee enough information either for
quantitative or for qualitative model reconstruct the height field h(r) without
recognition,Butacquiringuncertainties
the knowledge of the boundary
serves to condition prior expectations
conditions, which are directly obtained
about the model and establishes a by the other sensor modalities and; in
quantitative representation. particular, the radar readings at a later
stage. With single radar readings (initial
Practically, a meaningful qualitative condition), the height field h(r) is readily
recognition for a 3D-model reconstructed.
reconstruction can be achieved after
only a few sequences of images have 5 Recursive Super-resolution
been collected. To achieve a
quantitative recognition the 3D model Our current and existing super-
recognition is optimally obtained by resolution system can address many
computing the probability of the 3D problems: the images may be of
model given the image sequences or differing resolutions (e.g. multiple
concurent cameras); the surface albedo
is not assumed constant; the density of
the model is user and data driven.
Where hi and p_ are the parameters of a
height field and an albedo field. The model that we are trying to infer is
defined to be the topology and
At different stages along the descent reflectance properties of the surface
process, image sequences with small being observed. For simplicity we
frame-to-frame camera motion can be define the surface over a grid of points,
treated actively to provide an early 3D and currently define a height value, hi
model. This real time behavior and an albedo value, p_ at each grid
leverages from small motions, which point. Bayes' theorem then states that
minimizes the correspondence problem to infer values for the heights and
between successive images and the
albedos from the image data, we use
knowledge of the camera trajectory. the expression
However, this sacrifices depth resolution
of the small baseline between
p(h, p l[_...I.) = p(l_.../. Ih,p)p(h,p),
consecutive image pairs [9]. Solution to
this problem is trivially sought through a
which states that the posterior
probabilistic incremental integration (e.g.
distribution of the heights and albedos is
Kalman Filter). In this particular active
proportional to the likelihood - the
recognition, we will employ a matching
probability of observing the image data,
and extraction technique which takes
l, given the current values of the
advantage of the lateral motion of the
heights and albedos - multiplied by the
camera and transforms the search
prior distribution over the model.
problem to a one-dimensional search
problem (search is limited to foveated
To the extent of super-resolution, we
region).
make the assumption that the likelihood
is due to zero mean Gaussian errors
Shape from shading-derived techniques
between the observed images, I, and
provides gradient vector fields of the
the images synthesized from the model,
surface 7h(r) and can be readily
obtained in "real-time" from a single
image source under very simplifying
p(l,...I, Ih,p)= II Exp[-½( !_-i(h'p)')2]:
assumptions. Our approach is to
,_(h,p), resulting in the likelihood being For an Entry and Descent real-time
process, strong prior about the surface
model is highly desirable and therefore
we plan to extend the super-resolution
where the product is taken over all technique to include the shading
pixels in all the images in the data set. information Vh,. Bayes' theorem then
The prior used is based on penalizing states that to infer values for the heights
the curvature of the surface. It is a and albedos from the image data as well
penalty encouraging a continuity in the from the slopes, we use the expression
inferred surface.
p(h.p IIt...l. ) = p(l,.../, Ih,p)p(h,p) p(Vh, - Vh )p(h, - h)
Because the likelihood is a function of
the images synthesized from the model, Here, it shall be remarked that hs and
it is clearly a non-linear function of the
Vh, are independent prior information
heights and albedos, and this makes
obtained separately (i.e. shape from
optimizing the posterior distribution
motion and image to surface gradient
difficult. However, we have found that
mappings). Furthermore, hs will be
an optimal solution to the nonlinear
function can be obtained by a novel obtained directly from a fast shape from
Conjugate Gradient (CG) search. motion method. Using the form of the
prior in the previous equation makes it
^ feasible to account for uncertainties in
We expand l(h,p) about the current
the independent measurements of hs
estimate, ho,P0, and replace it by
and Vh,. In addition, we plan to use the
prior hs to integrate the radar and other
altimeter readings whenever they
become available. We therefore
"bootstrap'" the inference of the actual
where D is the matrix of derivatives height field and albedos. Potentially, this
evaluated at h0,P0 leaves us with the advantage of
rewriting the Bayesian inference
process on the deviation (fluctuation)
onpixeli
between the prior and height field rather
D_.j = oqaeight(or albedo)j
than the height field itself, thus the
parameters will be
The minimization of the log-posterior
then becomes the minimization of a
Sh = h- hs,
quadratic form, and can be performed
using the conjugate gradient method• and are believed to be small, such that a
This minimization finds the minimum of
fast convergence of the inference
the local linear approximation. At the process can be guaranteed.
minimum, we recompute ,/(h,p) and D
and minimize the log-posterior
iteratively. 6 Final Remarks
An operational software system based
5.1 The RG4 system: embedding on this proposed demonstration system
stronger prior would use images to update the surface
model as soon as they are received.
The Bayesian approach gives a solution
to the problemof how muchthe prior principles and practice. Second
modelshouldbebelievedwhenthenew
Edition, Addison-Wesley, 1990
datadisagreeswiththe priormodel.Not
6. M. Armstrong, A. Zisserman and R.
onlydoesthis allowmodelupdatewhen Hartley. "Self-Calibration from image
thereis conflictinginformation,butitcan triplets". In Proceedings of the
also serve as a change detection European Conference on Computer
warning system. This is possible Vision, pp 3-16, 1996
because the model projects expected 7. Q. Zheng and R. Chellappa.
values. If measurements are many Estimation of Illuminant Direction,
standard deviations from expectations, Albedo and Shape from Shading.
then it is a signal for likely change. IEEE Transactions on Pattern
Analysis and Machine Intelligence,
Another planned operation of the RG4 Vo113, No 7, July 1991.
system is to use the constructed model 8. L. Matthies, T. Kanade, R. Szeliski.
as a topographical map after the landing Kalman Filter-based Algorithms for
phase. The super-resolved model can Estimating Depth from Image
be employed to focus the desired Sequences, International Journal of
exploration phase of the mission. Computer Vision, Vol. 3, pp. 209-
Models constructed from altitudes will 236, 1986.
provide a much wider scope in the 9. Y. Ohta, T. Kanade, Stereo by Intra-
landing site topography. and Inter- Scanline search Using
Dynamic Programming, IEEE Trans.
PAMI, Vol. 7 pp. 139-154.
10. A.E. Johnson, Y. Cheng and L.H.
References Matthies, Machine Vision for
Autonomous Small Body Navigation,
1. B. Horn and M. Brooks. Shape from tn Proceedings of IEEE Aerospace
Shading. MIT Press, 1989 2000 Conference.
2. Z. Zhang, R. Deriche, O. Faugeras
and Q.T. Luong. A robust technique
for matching two uncalibrated
images through the recovery of the
epipolar geometry. Technical report
No 2273, INRIA, Sophia Antipolis,
1994
3. Morris, R. D., Cheeseman, P.,
Smelyanskiy, V. N., Maluf, D. A., A
Bayesian Approach to High
Resolution 3D Surface
Reconstruction from Multiple
Images,Proceedings of IEEE Signal
Processing workshop on Higher-
Order Statistics, June, 1999, pp.
140-143.
4. Smelyanskiy, V. N., Cheeseman, P.,
Maluf, D. A., Morris, R. D., Bayesian
Super-Resolved Surface
Reconstruction, Computer Vision
and Pattern Recognition, 1999.
5. J.D. Foley, A van Dam, S.K. Feiner
and J.F. Huges, Computer Graphics,