0
Research Papers

Implicit Sampling for Path Integral Control, Monte Carlo Localization, and SLAM OPEN ACCESS

[+] Author and Article Information
Matthias Morzfeld

Department of Mathematics,
University of California, Berkeley;
Lawrence Berkeley National Laboratory,
Berkeley, CA 94720

Contributed by the Dynamic Systems Division of ASME for publication in the JOURNAL OF DYNAMIC SYSTEMS, MEASUREMENT, AND CONTROL. Manuscript received September 13, 2013; final manuscript received November 6, 2014; published online January 27, 2015. Assoc. Editor: John B. Ferris.

J. Dyn. Sys., Meas., Control 137(5), 051016 (Jan 27, 2015) (14 pages) Paper No: DS-13-1356; doi: 10.1115/1.4029064 History: Received September 13, 2013

Implicit sampling is a recently developed variationally enhanced sampling method that guides its samples to regions of high probability, so that each sample carries information. Implicit sampling may thus improve the performance of algorithms that rely on Monte Carlo (MC) methods. Here the applicability and usefulness of implicit sampling for improving the performance of MC methods in estimation and control is explored, and implicit sampling based algorithms for stochastic optimal control, stochastic localization, and simultaneous localization and mapping (SLAM) are presented. The algorithms are tested in numerical experiments where it is found that fewer samples are required if implicit sampling is used, and that the overall runtimes of the algorithms are reduced.

FIGURES IN THIS ARTICLE
<>

Many problems in physics and engineering require that one produces samples of a random variable with a given probability density function (PDF) p [1,2]. If p is difficult to sample, one can sample an easier-to-sample PDF p0 (the importance function), and attach to each sample Xj∼ p0, j = 1,…, M, the weight Wjp(Xj)/p0(Xj) (capital letters denote realizations of random variables, and these realizations are indexed in superscript; subscript indices label discrete time). The weighted samples Xj form an empirical estimate of the PDF p and, under mild assumptions, the empirical estimate converges weakly to p, i.e., expected values of a function h with respect to p can be approximated by weighted averages of the function values h(Xj). The difficulty is to find a “good” importance function p0: if p0 is not a good approximation of p, for example, if p0 is not large when p is large, then many of the samples one proposes using p0 may have a small probability with respect to p and, therefore, carry little information, and make sampling inefficient.

Implicit sampling is a general method for constructing useful importance functions [3,4]. The basic idea is to first locate the region of high probability with respect to p via numerical optimization, and then to map a reference variable to this region; this mapping is done by solving algebraic equations. While the cost of generating a sample with implicit sampling is larger than in many other MC schemes, implicit sampling can be efficient because the samples carry more information so that fewer samples are required. In Refs. [57], this tradeoff was examined in the context of geophysics, where it was found that implicit sampling indeed can be efficient, in particular when the dimension of the problem is large.

Here, implicit sampling is applied to estimation and control in robotics, and new algorithms for stochastic optimal control, MC localization (MCL), and SLAM are developed. Implementations and efficiencies of these algorithms are illustrated and explored with examples. In particular, it is investigated if implicit sampling, which requires fewer samples that are, however, more informative and more expensive, can be efficient compared to other sampling schemes that may require more samples, each of which is less informative, however cheaper to generate.

The optimal control of a stochastic control problem can be found by solving a stochastic Hamilton–Jacobi–Bellman (HJB) equation [8]. The dimension of the domain of this partial differential equation (PDE) equals the dimension of the state space of the control problem. Classical PDE solvers require a grid on the domain and, therefore, are impractical for control problems of moderate or large dimension. For a class of stochastic optimal control problems, one can use MC solvers instead of grid based PDE techniques and, since the MC approach avoids grids, it is in principle feasible to solve larger dimensional control problems within this class [912]. However, the sampling scheme must be chosen carefully or else MC based PDE solving will also fail (in the sense that unfeasibly many samples are required). It is shown in Sec. 3 how to apply implicit sampling to obtain a practical algorithm for stochastic optimal control that avoids many of the pitfalls one faces in MC based PDE solving. The method and algorithm are illustrated with the double slit problem (see Sec. 3.1 and Ref. [9]), which is a simple but vivid example of how things can go wrong, and how they can be fixed with implicit sampling. Using an inverse dynamics controller for a two degrees-of-freedom robotic arm as an example, it is also shown how to obtain a semi-analytic solution for a linear Gaussian problem via implicit sampling. Finally, it is indicated how implicit sampling and stochastic optimal control can help with being trapped in local minima of nonconvex optimization problems. An extension of the method presented here is also discussed in the conference paper [13].

In robotics, one often updates the predictions of a dynamic model of an autonomous robot with the output of the robot's sensors (e.g., radar or lidar scans). This problem is often called “localization,” and can be formulated as a sampling problem. Localization algorithms that rely on MC sampling for the computations are called MCL [14]. One can also learn the geometry and configuration of the map while simultaneously localizing the robot in it, which leads to the problem of “SLAM.” Efficient numerical solutions of the MCL and SLAM problems are a fundamental requirement for autonomous robots [15,16]. The various MCL and SLAM algorithms differ in their importance function p0, and some algorithms have been shown to be inefficient due to a poor choice of p0 [15]. Here it is shown how to use implicit sampling to generate an efficient importance function for MCL and SLAM. The implicit sampling based MCL and SLAM algorithms are convergent, i.e., as the number of samples goes to infinity one obtains an empirical estimate of the true posterior even if the underlying dynamics are nonlinear (whereas many other SLAM algorithms require linearity for convergence [15]). The memory requirements of the implicit sampling based SLAM algorithm scale linearly with the number of features in the map. The efficiencies of the new MCL and SLAM algorithms are compared to the efficiencies of competing MCL and SLAM algorithms in numerical experiments with the data set [17].

The efficiency of MC sampling depends on how well the importance function p0 approximates the target PDF p. In particular, p0 must be large where p is large, or the samples one produces using p0 are unlikely with respect to p. Implicit sampling is a general MC sampling technique that constructs an importance function that is large where p is large by mapping a reference variable to the region where p is large [37]. Here implicit sampling is described in general terms; below implementations of implicit sampling are described in the context of applications.

The region where p is large is the region where −log(p) is small (the logarithm is used here because in applications p often involves exponential functions). The region where p is large can thus be located via minimization of the function

(1)F(x)=-log(p(x))

If F is convex, the minimizer μ = argmin F (i.e., the mode of p) is the most likely value, and high-probability samples are in its neighborhood. One can obtain samples in this region by solving the stochastic algebraic equationDisplay Formula

(1)F(x)-φ=12ξTξ

where φ=minF,ξ~N(0,Im) is an m-dimensional Gaussian reference variable and where T denotes a transpose and Im is the identity matrix of order m; here, and for the remainder of this paper, N(a,B) is short-hand notation for a Gaussian PDF with mean a and covariance matrix B. Note that the right-hand side of Eq. (1) is likely to be small because ξ is close to the origin, which implies that the left-hand side is small, which in turn means that the solution of Eq. (1), i.e., the sample, is close to the mode and, thus, in the high-probability region.

The weights of the samples are proportional to the absolute value of the Jacobian determinant of the map that connects the sample Xj to the reference variable ξ

(2)W(Xj)|det(xξ(Xj))|

In practice, the weights are usually normalized so that their sum is one. Various ways of constructing a map from ξ to x have been reported in Refs. [4,7] two of which are summarized below. In summary, implicit sampling requires: (1) minimizing F = −log p(x) and (2) solving Eq. (1). This two-step approach makes efficient use of the available computational resources: the minimization identifies the high probability region and the samples are focused to lie in this region, so that (almost) all samples carry information. This is in contrast to other sampling schemes where an importance function is chosen ad hoc, which often means that many of the samples carry little or no information; the computations used for generating uninformative samples are wasted.

Finally, the assumption that F is convex can be relaxed. Implicit sampling can be used without modification if F is merely U-shaped, i.e., the target PDF, p, is unimodal. Multimodal target PDFs can be sampled, e.g., by using mixture models, for which one approximates each mode using the above recipe (see Refs. [4,7,18] for more details).

Generating Samples With Random Maps.

To generate samples, one solves Eq. (1) with a one-to-one and onto mapping. There are many choices for such a mapping, and one is to solve Eq. (1) in a random direction, i.e., one putsDisplay Formula

(2)Xj=μ+λjL-Tηj

where L is a Cholesky factor of the Hessian of F at the minimum, ηj=ξj/(ξj)Tξj is a vector which is uniformly distributed on the m-dimensional hypersphere, and where λj is a scalar. One then computes λj by substituting Eq. (2) into Eq. (1). This approach, called “random map,” requires solving a scalar equation to generate a sample [7]. Moreover, the Jacobian of this map is easy to evaluate with the formula (see Ref. [7] for a derivation)Display Formula

(3)|det(xξ)|=(ξTξ)(1-m)/2det(L)|λ(ξ)m-12FL-Tη|

where ∇F is the gradient of F with respect to x (an m-dimensional row vector).

In numerical implementations of this method, calculating ∇F may require repeated evaluations of F, e.g., if the gradient is approximated via finite differences. This can be avoided by noting that

(4)dλdρ=12FL-Tη

where ρ = ξTξ, so that the Jacobian becomesDisplay Formula

(4)|det(xξ)|=ρ(1-m)/2det(L)|λ(ξ)m-1dλdρ|

The scalar derivative / can be evaluated using finite differences with a few evaluations of F (the precise number of function evaluations depends on the finite difference scheme one chooses).

Generating Samples With Approximate Quadratic Equations.

Another path to solving Eq. (1) is to replace it with an approximate quadratic equationDisplay Formula

(5)F(x)-φ=12ξTξ

where

(6)F(x)=φ+12(x-μ)TH(x-μ)

is the Taylor expansion of order two of F about its minimizer μ, and where H is the Hessian of F at the minimum. A solution of Eq. (5) is

Xj=μ+L-Tξj

where L is the Cholesky factor of H = LLT. The weights areDisplay Formula

(6)Wjexp(-φ+F(Xj)-F(Xj))

and account for the error one makes by solving Eq. (5) instead of Eq. (1) (see Ref. [4] for more details).

The finite horizon stochastic optimal control problem considered here is as follows: find a control u (a p dimensional vector function of the state x) that minimizes the cost function

(7)C(x0,t0,u)=E[Φ(x(tf))+t0tfu(x(τ),τ)TRu(x(τ),τ)+V(x(τ),τ)dτ]

where x is an m-dimensional vector (the state), t0 ≤ t ≤ tf is time, tf is the final time (or the horizon), Φ is a scalar function that describes the “final cost,” R is a p × p symmetric positive definite matrix that specifies the control cost, and V is a scalar function which describes the “running cost” (which is also called “potential”); the expectation is taken over trajectories of the stochastic differential equation (SDE)Display Formula

(7)dx=f(x,t)dt+G(udt+QdW)

starting at x(t0) = x0, where f is a smooth m-dimensional vector function which describes the dynamics, and where G and Q are m × p and p × r matrices that describe how the uncertainty and controls are distributed within the system; W is Brownian motion (see, e.g., Ref. [8] for more details about stochastic optimal control; I closely follow Ref. [9] in the presentation of path integral control).

The “optimal cost-to-go” is defined as

(8)J(x,t)=minuC(x,t,u)

and satisfies the stochastic HJB equation [8]

tJ(x,t)=xJ(x,t)+Tr(QTGTxxJ(x,t)GQ)+V(x,t)-12xJ(x,t)GTR-1GxJ(x,t)

where Tr is the trace. If there exists a scalar γ such thatDisplay Formula

(8)γGR-1GT=GQQTGT

then the nonlinear change of variables

(9)J(x,t)=-γlogψ(x,t)

linearizes the stochastic HJB equation and one obtainsDisplay Formula

(9)tψ=V(x,t)γψ-f(x,t)Txψ-12Tr(QTGTxxψGQ)

with final condition ψ(x, tf) = exp(−Φ(x(tf))/γ) and with the optimal controlDisplay Formula

(10)u=-R-1xJG

see Refs. [9,19]. Thus, one can compute the optimal control by solving the HJB equation. Numerical PDE solvers typically require a grid on the domain of the PDE; however, the domain has the dimension of the state space of the control problem. Thus, grid based numerical PDE techniques only apply to control problems of a low dimension (or else the computational requirements become excessive).

For larger dimensional problems, one can use stochastic PDE solvers which do not require a grid. In particular, one can solve the adjoint equation

(11)tψ=-V(x,t)γψ-x(f(x,t)ψ)+12Tr(QGxxψGTQT)

forward in time (instead of solving Eq. (9) backward in time) with the Feynman–Kac formula (see, e.g., Ref. [1])Display Formula

(11)ψ(x,t)=E[exp(-1γ(Φ(y(tf),tf)+ttfV(y(τ),τ)dτ))]

where the expectation is over the trajectories ofDisplay Formula

(12)dy=f(y,τ)dτ+GQdW

starting at y(τ = t) = x(t). This is the path integral formulation of stochastic optimal control [912]. Note that the class of problems that can be tackled with path integral control is rather general, since the assumptions of: (1) a quadratic control cost and (2) the condition on the noise and the control cost in Eq. (8) are not restrictive. In particular, the dynamics f(x, t) and the potential V(x, t) in Eq. (7) can be nonlinear.

To evaluate the Feynman–Kac formula numerically, one approximates the infinite dimensional integral in Eq. (11) by a finite dimensional one. For example, one can discretize the integral on a regular grid in time with constant time step Δt, so thatDisplay Formula

(13)ψ(x,t)dy1dynp(y1,,yn)×exp(-1γΦ(yn,nΔt)-Δt2γi=1nV(yi,τi)+V(yi-1,τi-1))

where y1 = x(t), n = (tf − t)/Δt, τi = t + iΔt, i = 0, 1,…,n, and where p is the PDF of the discretized trajectory y1,…,yn. Here the trapezoidal rule is used to discretize the integral of the potential; however, other choices are also possible [20]. The SDE (12) implies that the increments Δyi = yi − yi−1 are independent Gaussians, e.g., for a forward Euler discretization [21] of Eq. (12)

(14)Δyi~N(f(yi,τi)Δt,ΣΔt)

where Σ = γGR−1GT= GQQTGT, so thatDisplay Formula

(14)p(y1,,yn)=Πi=1np(Δyi)=exp(-12i=1nΔt(ΔyiΔt-f(yi,τi))TΣ-1(ΔyiΔt-f(yi,τi)))(2πΔtdetΣ)-n/2

Thus,

(15)ψ(x,t)1(2πΔtdetΣ)n/2dy1dynexp(-F(y1,,yn))

whereDisplay Formula

(15)F(y1,,yn)=1γΦ(yn,nΔt)+Δt2γi=1nV(yi,τi)+V(yi-1,τi-1)+12i=1nΔt(ΔyiΔt-f(yi,τi))TΣ-1(ΔyiΔt-f(yi,τi))

Note that this F depends on how one discretizes the integral of the potential V, and the SDE (12); other discretization schemes will lead to different Fs.

For a given discretization, i.e., for a given F, MC sampling can be applied to compute the expectation in Eq. (13). For example, one can evaluate

(16)G(y1,,yn)=1γΦ(yn,nΔt)+Δt2γi=1nV(yi,τi)+V(yi-1,τi-1)

for discretized trajectories of Eq. (12), followed by averaging. However, this method can fail if the potential V has deep wells [911]. In this case, many trajectories of Eq. (12) can end up where V is large and, thus, contribute little to the approximation of the expected value ψ. For efficient sampling, one needs a method that guides the samples to remain where the potential is small.

This guiding of the samples can be achieved via implicit sampling. Recall that in implicit sampling one first locates the region of high probability by minimizing F = −log(p(x)), where p(x) is the PDF of the random variable one is interested in. The trajectories we wish to sample have the joint PDF (14), so that, in order to apply implicit sampling to path integral control, one needs to minimize F in Eq. (15). With this F, one solves the algebraic equation (1) with a one-to-one and onto map (2). The integral in Eq. (13) becomesDisplay Formula

(16)ψ(x,t)exp(-φ)(2πΔtdetΣ)n/2dξ1dξnexp(-ξTξ/2)|xξ|exp(-φ)(ΔtdetΣ)n/2Eξ[|xξ|]

where the expectation is over the reference variable ξ and where ∂x/∂ξ is the Jacobian of the map from x to ξ in Eq. (2). Combining Eq. (16) with the expression (3) for the Jacobian gives

(17)ψ(x,t)exp(-φ)2(ΔtdetΣ)n/2det(L)Eξ[(ξTξ)(1-m)/2|λ(ξ)m-1F(ξ)η|]

The expected value is now straightforward to compute via MC, i.e., sampling a the reference variable ξ to obtain M samples ξj, j = 1,…,M, computing, for each one, λ(ξj) and the gradient of F, followed by averaging. In numerical implementations, it may be more efficient to use the equivalent expression (3) for the Jacobian (which avoids computing the gradient of F). In this case, one obtains

ψ(x(t),t)exp(-φ)(ΔtdetΣ)n/2det(L)Eξ[(ξTξ)(1-m)/2|λ(ξ)m-1dλdρ(ξ)|]

Once ψ(x, t) is computed, we can compute the optimal control from Eq. (10), e.g., via finite differences. It is important to note that this defines the optimal control at time t, and that the computations have to be repeated at the next time step to compute ψ(x(t + dt), t + dt). The use of implicit sampling in stochastic optimal control is illustrated with three examples.

The Double Slit Problem.

The double slit problem is a simple example that illustrates the pitfalls one must avoid when using MC sampling for path integral control [9]. The problem is as follows. Suppose you observe a somehow confused person walking (randomly) toward a wall with two doors, and your job is to guide this person through either one of the two doors. The person and you are modeled by the controlled dynamics

dx=udt+σdW

the final cost is quadratic, Φ = x(tf)2/2, and the scalar R > 0, that defines the cost of the control, is given; the “wall” is modeled by the potential

V(x,t)={ift=t1andx<a,orb<x<c,ord<x0otherwise

for given a < b < c < d and t1 > 0. The stochastic HJB equation of the double slit problem is

ψt=-Qγ+12σψxx

where γ=rσ satisfies Eq. (8).

One can attempt to solve this equation using standard MC sampling as follows. Solve the SDE

dy=σdW

starting at y(τ = t) = x(t) repeatedly, e.g., using a uniform grid in time and the forward Euler scheme [21]

yn+1=yn+σΔtΔwn

where Δwn are independent Gaussians with mean 0 and variance 1. For each trajectory {y1,…, yn}, evaluate

G(y1,,yn)=12γyn2+Δt2γi=1nV(yi,τi)+V(yi-1,τi-1)

With a high probability, the trajectories will hit the potential wall at t1 and, thus, G is infinite, so that the contribution of a trajectory to the expected value ψ(x, t) in Eq. (11) is zero with a high probability. The situation is illustrated with a simulation using the parameters of Table 1. The results are shown in the left panel of Fig. 1. One observes that 5000 unguided random walks from Eq. (17), all starting at x(0), hit the potential wall, and, thus, score G = . All 5000 samples thus carry no probability and do not contribute to the approximation of ψ. Even when O(105) samples are considered, it is unlikely that sufficiently many make it past the potential wall [9]. We conclude that this method is unfeasible for this problem, since the number of samples required is extremely large due to the deep wells in the potential.

Implicit sampling can be applied here to fix these problems. In particular, one finds from the Feynman–Kac formula that

ψ(x,t)=E[exp(-y(tf)22γ-1γttfV(y(τ),τ)dτ)]

which one can compute with implicit sampling in two parts. For t ≥ t1, i.e., after one has passed the potential wall, the probability of each path is Gaussian with variance s = tf − t, so that

ψ(x,t)=12πsexp(-(z-y)22s)exp(-z22γ)dzfortt1

For implicit sampling, one defines

F(y)=(z-y)22s+z22γ

whose minimizer and minimum are μ=argminF=yγ/(s+γ),φ=minF=y2/(s+γ). Since F is quadratic, F=φ+H(y-μ2)/2, where H = (s + γ)/() is the Hessian of F at the minimum. The algebraic equation (1) can thus be solved by the linear map

y=μ+sγs+γξ

so that

ψ(x,t)=exp(-φ)γ(s+γ)fortt1

For t < t1, one splits the integration into two parts, first going from time t to t1 with probability p(y1,t1|y,t), and then from t1 onward to tf with probability p(z,tf|y1)=N(y1,s),s=tf-t1

ψ(x,t)=dzdy1exp(-z22γ)p(z,tf|y1)p(y1,t1|y,t)

Implicit sampling uses the information about the potential which is infinite at t1 except for the two slits, so that p(y1,t1|y,t)=0 outside the slits and Gaussian with mean y and variance s=t1-t otherwise. Since only the slits need to be explored with samples, the integration can be carried out over the slits

ψ(x,t)=dyy[a,b]×[c,d]dy1exp(-z22γ)exp(-(z-y1)22s)×exp(-(y1-y)22s)

The evaluation of the above integral using the same strategy as above for t ≥ t1 is tedious, but straightforward. Note that the implicit sampling strategy here is the key to solving this problem, because it locates the wells in the potential.

The above analytic solution is compared to a numerical implementation of implicit sampling, for which the paths are discretized with a constant time step Δt. Here the numerical integration is also split up into two parts. For t ≥ t1, the function F is quadratic because the probabilities are Gaussian and the potential has no effect. Thus,

F=yn22γ+12γΔti=1n(yi-yi-1)2

where n = (tf − t)/Δt and yi = y(t + iΔt), i = 1,…,n. Minimizing F is straightforward, and the algebraic equation (1) can be solved with a linear map

y=μ+L-Tξ

where y = (y0,…,yn) is a n-dimensional vector, μ is the minimizer of F and ξ is an n-dimensional vector whose elements are independent standard normal variates; L is a Cholesky factor of the Hessian at the minimum. Since the Jacobian of this linear map is constant, Eq. (16) gives

ψ(x,t)=exp(-φ)(γΔt)n/2detL

where φ is the minimum of F.

For t < t1, one obtains the same F, but needs to perform a constraint minimization over the slits. There are two (local) minima, one per slit, and both can be found easily using quadratic programming [22,23]. One can then generate a sample close to each of the minima using

y=μ+L-Tξj

where y again is a vector whose elements are the discretized path and where μ is the location of a local minimum of the constraint problem and L is a Cholesky factor of the unconstrained Hessian at a minimum. Equation (16) becomes

ψ(x,t)exp(-φ)(Δtγ)n/2Slits1detLdξ

The expected value of the Jacobian (1/det L) over the slits can be computed by MC as follows. Generate M samples ξj, j = 1,…,M, and, for each one, compute the corresponding trajectory and check if it hits the potential wall. The integral is 1/det(L) times the ratio of the number of trajectories that pass through the wall and M.

The right panel of Fig. 1 shows 50 trajectories one obtains with this approximation at t = Δt. Note that most of the trajectories pass through the slit, i.e., most of the samples carry a significant probability, score a small F, and, thus, contribute to the approximation of the expected value ψ(x, t). These “guiding” effects make it possible to solve the problem with O(10) samples, while the standard MC scheme fails even with O(105) samples [9]. The Laplace guided strategy presented in Ref. [9] (which uses similar ingredients as implicit sampling and is also related to the optimal nudging constructions in Refs. [24,25]) requires about 100 samples.

The numerical approximation obtained with implicit sampling (50 samples) is compared to the analytical solution in Fig. 2, where the optimal control and the trajectory under this control are shown. One observes that the numerical approximation of the optimal control is quite close to the control, one obtains from the analytical solution (right panel of Fig. 2) and, hence, the controlled trajectory one obtains with implicit sampling is also close to the one computed from the analytical solution (left panel of Fig. 2). This statement can be made more precise by solving the control problem repeatedly (each solution is a random event) and averaging.

The double slit problem is solved 500 times and the error of the numerical approximation is recorded for each run. The Euclidean norm of the difference of the analytical solution and the approximation via implicit sampling is used to measure an error, in particular in x (the trajectory under optimal control) and u (the optimal control). The number of samples of implicit sampling is varied to study the convergence of the algorithm. The results are shown in Table 2 where the mean and standard deviation of the Euclidean norm of the error in x and u, respectively, scaled by the mean of the norms of x and u, respectively, are listed. These numbers indicate the errors one should expect in a typical run. The errors are relatively small when 50 samples are used. The errors do not dramatically decrease for larger sample numbers, which indicates that the algorithm has converged. The small variance of the errors indicates that similar errors occur in each run, so that one concludes that implicit sampling is reliable.

The error in the numerical implementation of implicit sampling is mostly due to neglecting one of the local minima of F. In numerical tests, only a slight improvement was observed when both minima were used for sampling; however, the computations are twice as fast if one considers only the smaller of the two minima.

Stochastic Control of a Two Degrees-of-Freedom Robotic Arm.

Stochastic optimal control of the two degrees-of-freedom robotic arm shown in Fig. 3 is considered. The controller is an “inverse dynamics controller” that linearizes the system. This example demonstrates how implicit sampling based path integral control works in linear problems. Specifically, a semi-analytical solution is derived for which a numerical optimization is required; however, the expected value of the implicit sampling solution in Eq. (16) can be evaluated explicitly (i.e., no numerical sampling is needed). Moreover, only some of the dynamic equations of the first-order formulation of the dynamics are driven by noise, so that the dimension of the stochastic subspace is less than the actual dimension of the problem. Implicit sampling can exploit these features, and this example demonstrates how. The algorithm is tested in numerical experiments in which false parameters are given to the algorithm to demonstrate the robustness of the path integral controller to model error.

The goal is to compute two independent torques τ1 and τ2 that drive the arm to a desired position θ* (described by the two-angles θ1, θ2, the degrees-of-freedom). Neglecting energy dissipation, and assuming the robot is mounted on a horizontal table (no effects of gravity), the equation of motion is

M(θ)θ··+C(θ,θ·)θ·=τ

where θ = (θ1, θ2) is a two-dimensional vector and where dots denote derivatives with respect to time; the 2 × 2 matrices

M(θ)=(a1+a3cos(θ2)a2+a3cos(θ2)a2+a3cos(θ2)a2)C(θ,θ·)=(-a3sin(θ2)θ·2-a3(θ·1+θ·2)sin(θ2)a3sin(θ2)θ·10)

where

a1=m1lc12+m2lc12+m2lc22+I1+I2a2=m2lc22+I2a3=l1m2lc2

define the dynamics and depend on the lengths of the arms l1, l2 and the loads m1, m2 (see, e.g., Ref. [26] for more details about this model and Table 3 for the parameters used in simulations). One can compute the torques by inverting the dynamics and choosing τ=C(θ,θ·)+M(θ)u to derive the linear system

Display Formula

(17)(θ·θ··)=(0I00)(θθ·)+(0I)u

where 0 denotes the matrix whose elements are all zero (of appropriate dimensions), and I is the 2 × 2 identity matrix; u is a two-dimensional vector of controls which must be computed. To make the control robust to modeling errors, one can add noise and solve the stochastic optimal control problem

(18)dx=Adt+Gudt+GQdW

where x=(θ,θ·) is a four-dimensional vector and where the matrices A, G, and Q can be read from Eq. (17), and where W is a two-dimensional Brownian motion. The final cost is chosen as

Φ=θ(tf)-θ*2/2+θ·(tf)2/2

so that the robotic arm stops at tf > 0 at the desired position θ*. This linear problem can be solved with linear quadratic Gaussian control [8]; however, here a semi-analytical solution is obtained via implicit sampling.

As before, the time is discretized using a constant time step Δt = (tf − t)/n. The discretized dynamics are

zi+1=zi+yiΔtyi+1=yi+ΔB

where yi, zi are two-dimensional vectors whose elements are the discretized angular velocities (θ·) and the discretized angles (θ); note that only one of the above equations is driven by noise (ΔB), since there is no reason to inject noise into the (trivial) equation θ·=θ·. While the noise propagates via the coupling to all variables, the PDFs that define the path in Eq. (11) are in terms of yns only.

The function F in implicit sampling is thus a function of y only

F=12γyn2+12γ(zn-θ*)2+12Δti=1n(yi-yi-1)2

where γ = r is chosen to satisfy Eq. (8). Because F is quadratic the minimization is straightforward, and the algebraic equation (1) can be solved with a linear map whose Jacobian is constant and given by the determinant of a Cholesky factor L of the Hessian of F at the minimum (see above). Thus, Eq. (16) becomes

ψ(x,t)=exp(-φ)(Δt)ndetL

There is no need for generating samples, since the expected value is computed explicitly (i.e., by evaluating the integral analytically).

The robustness of the implicit sampling based path integral control is tested in numerical experiments. To simulate that the “real” robotic arm behaves differently from the numerical model that the path integral controller uses to find a control, one can give the controller false information about the parameters of the numerical model. Here the parameters m1 and m2 are perturbed values of the true parameters simulating that the robotic arm picked up a payload (of unknown weight). Thus, the controller works with values m1 = 1.4, m2 = 1, whereas the “true” robotic arm has parameters as in Table 3. The results of a simulation are shown in Fig. 4 where state trajectories and controls are shown for a truly optimal controller (i.e., one who has access to the exact model parameters), and for the path integral controller which uses false model parameters. One observes that the path integral controller can perform the desired task (moving the arm to a new position) and that its control and the resulting state trajectories closely follow those that are generated by a truly optimal controller. This example thus indicates that the path integral controller can perform reliably and quickly in applications where model error may be an issue.

Optimization Via Stochastic Control.

One can setup a conventional optimization problem, i.e., find min f(x) for a given smooth function f, as a stochastic optimal control problem as follows: find a control u to minimize the cost functionDisplay Formula

(18)C(x,tf)=E[f(x(tf))+0tfu(x,t)TRu(x,t)dt]

where the expectation is over trajectories of the SDE

(19)dx=udt+σdW

The idea is to make use of the stochastic component to explore valleys in the function f. The parameters one can tune are the noise level σ, the final time tf, and the control cost R.

Implicit sampling for this control problem requires minimization of the function

F=12σΔti=1n(yi-yi-1)2+1γf(yn)

where the discretization is done using a constant time step Δt = tf/n as before. Note that the control approach to this problem inserts a quadratic term through which the space is (randomly) explored.

These ideas are applied to minimize the Himmelblau function

f(x1,x2)=(x12+x2-11)2+(x1+x22-7)2

which is a popular test of the performance of optimization algorithms because of its many local minima [27]. The parameters are R = 0.01, σ = 0.01, Δt = 1, tf = 20, and the minimization is initialized at (−1, −4). Figure 5 shows the iterations obtained with implicit sampling and 50 samples, and, for comparison, the iterations of Newton's method.

In this example, the stochastic approach is successful and finds a much lower minimum than Newton's method. However, since each run of the stochastic approach is random, one may find another local minimum in another run. This could be useful when one needs to explore valleys or if one suspects the existence of other local minima.

To test the reliability of the stochastic control approach, 70 experiments were performed, each starting at the same initial condition. The iterations of five such experiments are shown in Table 4. All 70 runs ended up close to the local minimum around x1 = 2.5, x2 = −2.5. The average value of F is an approximation to C in Eq. (18) and, with 70 experiments was found to be C ≈ 1.75, which corresponds to a much lower value of f than the value fminNewton=178.34, that one obtains with Newton's method.

Consider a mobile robot that navigates autonomously and, as it moves, collects noisy measurements about its motion as well as scans of its environment. If the scans reveal locations of features that are known to the robot, i.e., if the robot has a map of its environment, then it can localize itself on this map. This problem is known as localization. When the robot has no map of its environment, then it must construct a map while localizing itself on it, leading to the problem of SLAM [2830]. Efficient solutions to the localization and SLAM problems are a fundamental requirement for autonomous robots, which must move through unknown environments where no global positioning data (GPS) are available, for example indoors, in abandoned mines, underwater, or on far-away planets [16,31,32]. Application of implicit sampling to localization and SLAM is the subject of Secs. 4.1 and 4.2, where the algorithms will also be tested using experimental data obtained by Nebot and colleagues at the University of Sydney [17].

MCL.

The localization problem can be formulated as follows. Information about the initial state of the robot is available in the form of a PDF p(x0), where x0 is an m-dimensional vector whose elements are the state variables (e.g., position and velocity of the robot). A probabilistic motion model defines the PDFDisplay Formula

(19)p(xn|xn-1,un)

where n = 1, 2,… is discrete time and where un is a p-dimensional vector of known “controls,” e.g., the odometry. A data equation describes the robot's sensors and defines the PDFDisplay Formula

(20)p(zn|xn,un,Θ)

where zn is a k-dimensional vector whose elements are the data (k ≤ m) and where Θ is the map. For example, one can use the “landmark model,” for which the map describes the coordinates of a collection of obstacles, called “landmarks;” the data are measurements of range and bearing of the position of the robot relative to a landmark. The landmark model and range and bearing data will be used in the applications below, but the algorithms described are more generally applicable.

The motion and sensor model jointly define the conditional PDF p(x0:n|z0:n,u0:n,Θ), where the short-hand notation x0:n={x0,x1,,xn} is used to denote a sequence of vectors. By using Bayes' rule repeatedly, one can derive the recursive formula

(21)p(x0:n|z0:n,u0:n,Θ)=p(x0:n-1|z0:n-1,u0:n-1,Θ)×p(xn|xn-1,un)p(zn|xn,un,Θ)p(zn|z0:n-1)

Approximations of this PDF are used to localize the robot. For example, methods based on the Kalman filter [33] (KF) construct a Gaussian approximation which is often problematic because of nonlinearities in the model and data. Moreover, KF-based localization algorithms can diverge, for example, in multimodal scenarios (i.e., if the data are ambiguous) or during relocalization after system failure [15]. The basic idea of MCL is to relax the Gaussian assumptions required by KF, and to apply importance sampling to the localization problem [14]. The method proceeds as follows.

Given M weighted samples {X0:n-1j,Wn-1j},j=1,,M, which form an empirical estimate of the PDF p(x0:n-1|z0:n-1,u0:n-1,Θ) at time n − 1, one updates each particle to time tn given the datum zn. In standard MCL [14], this is done by choosing the importance function

(p0)np(x0:n-1|z0:n-1,u0:n-1,Θ)p(xn|Xn-1j,un)

that is, the robot location is predicted with the motion model and then this prediction, Xnj, is weighted by

WnjWn-1jp(zn|Xnj,un,Θ)

to assess how well it fits the data. However, this choice of importance function can be inefficient, especially if the motion model is noisy and the data are accurate (as is often the case [16]). The reason is that many of the samples generated by the model are unlikely with respect to the data and the computations used to generate these samples are wasted.

Implicit sampling can be used to speed up the computations. This requires in particular that implicit sampling, as described in Sec. 2, is applied to the M functionsDisplay Formula

(21)Fj(xn)=-log(p(xn|Xn-1j,un)p(zn|xn,un,Θ))

One needs M functions Fj, j = 1,…,M, one per sample, because the recursive problem formulation requires a factorization of the importance function (compare to Secs. 2 and 3, where only one function was needed). Here, each function Fj is parametrized by the location of the sample at time n − 1, Xn-1j, the control un and the datum zn; the variables of these functions are the location of the robot at tn, xn.

For MCL with implicit sampling, each function Fj must be minimized, e.g., using Newton's method. After the minimization, one can generate samples by solving the stochastic equation (1) with the techniques described in Sec. 2. Using information from derivatives in sampling for MCL has also been considered in Refs. [34,35].

Implementation for the University Car Park Data Set.

The University Car Park data set [17] is used to demonstrate the efficiency of MCL with implicit sampling. The scenario is as follows. A robot is moving around a parking lot and steering and speed data are collected via a wheel encoder on the rear left wheel and a velocity encoder. An outdoor laser (SICK LMS 221) is mounted on the front bull-bar and is directed forward and horizontal (see Ref. [36] for more information on the hardware). The laser returns measurements of relative range and bearing to different features. Speed and steering data are the controls of a kinematic model of the robot and the laser data are used to update this model's predictions about the state of the robot.

The motion model is the forward Euler discretization of the continuous time two-dimensional-model in Ref. [36]

(22)xn+1=xn+R(xn,un)δ+ΔBn,ΔBn=N(0,(Σ1)n)

where R(xm, un) is a three-dimensional vector function, δ is the time step, and (Σ1)n is a given covariance matrix. The data equation is

zn=h(xn,Θ)+Vn,Vn~N(0,Σ2)

where h is a two-dimensional vector function that maps the position of the robot to relative range and bearing and where Σ2 is a 2 × 2 diagonal matrix (see Ref. [36] and the Appendix for the various model parameters).

With the above equations for motion model and data, the functions Fj of implicit sampling in Eq. (21) becomeDisplay Formula

(22)Fj(xn)=(xn-fnj)T((Σ1)n)-1(xn-fnj)+i=1p(h(xn)-zni)TΣ2-1(h(xn)-zni)

where fnj=Xn-1j+R(Xn-1j,un)δ and where zni denotes the measurement for the ith landmark at time n. These Fjs are minimized with Newton's method and samples are generated by solving a quadratic equation as explained in Sec. 2. The gradient and Hessian in Newton's method were computed analytically (see the Appendix). If uneven weights were observed, a “resampling” was done using Algorithm 2 in Ref. [37]. Resampling replaces samples with a small weight with samples with a larger weight without introducing significant bias. If no laser data are available, all samples evolve according to the model equation (19).

The results of MCL with implicit sampling are shown in the left panel of Fig. 6. The “ground truth” shown here is the result of an MCL run with GPS data (these GPS data however are not used in implicit sampling); the large dots are the positions of the landmarks. The reconstruction by the MCL algorithm (dashed line) is an approximation of the conditional mean, which is obtained by averaging the samples. One observes that MCL with implicit sampling gives accurate estimates whenever laser data are available. After periods during which no laser data were available, one can observe a strong “pull” toward the true state trajectory, as soon as data become available, as is indicated by the jumps in the trajectory, e.g., around x = 0, y = 0.

The efficiency of MCL with implicit sampling is studied by running the algorithm with 10, 20, 40, 80, 100, and 150 samples, i.e., with increasing computational cost, and comparing the reconstruction of the MCL with the true path. The error is measured by the Euclidean norm of the difference between the “ground truth” (see above) and the reconstruction by the MCL algorithm (scaled by the Euclidean norm of the ground truth). The standard MCL method was also applied with 10, 20, 40, 80, 100, 150, 250, and 300 samples. The results are shown in Table 5.

It is clear that the both MCL algorithms converge to the same error, since both algorithms converge to the conditional mean as the number of samples (and hence, the computational cost) goes to infinity. However the convergence rate of MCL with implicit sampling is faster, which can be seen from the right panel of Fig. 6, where the computing time is plotted as a function of the error. In this figure, the area between the lines corresponds to the improvement of implicit sampling over the standard method, and it is clear that implicit sampling is more efficient than the standard method. The improvement is particularly pronounced if a high accuracy is required, in which case MCL with implicit sampling can be orders of magnitudes faster than the standard method.

Implicit Sampling for Online SLAM.

In SLAM, one is given the probabilistic motion model (19) and a data equation (20), and one needs to estimate the position of the robot as well as the configuration and geometry of the map; the conditional PDF of interest is thusDisplay Formula

(23)p(x0:n,Θ|z0:n,u0:n,η0:n)

where the map, Θ, is a variable (not a given parameter as in MCL); the variable η0:n is the “data association,” i.e., it maps the data to the known features, or creates a new feature if the data are incompatible with current features [38,39]. Here it is assumed that the data association is done by another algorithm, so that η0:n in Eq. (23) can be assumed to be given (in the numerical implementation in Sec. 4.2.1, a maximum-likelihood algorithm is used [15]).

Note that the SLAM state-vector is different from the state of the localization problem: it is the number of variables needed to describe the robot's position, x, and the coordinates of the features of the map, Θ. If the map contains many features (which is typically the case), then the state dimension is large and this makes KF based SLAM prohibitively expensive. The reason is that KF SLAM requires dense matrix operations on matrices of size N (where N is the number of features), due to correlations between the robot's position and the features [38]. KF SLAM thus has N2 memory requirements which make it infeasible for realistic N.

MC approaches to SLAM reduce the computational cost by exploring conditional independencies [40]. In particular, the various features of the map conditioned on the robot path are independent, which implies the factorization

(24)p(Θ|x0:n,z0:n,u0:n,η0:n)=Πk=1Np(θk|x0:n,z0:n,u0:n,η0:n)

where θk, k = 1,…, N are the features of the map [15,41]. This factorization makes it possible to update one feature at a time and thus reduces memory requirements. Moreover, “online” SLAM will be considered here, i.e., the map and the robot's position are constructed recursively as data become available, usingDisplay Formula

(24)p(x0:n,Θ|z0:n,u0:n,η0:n)p(x0:n-1,Θ|z0:n-1,u0:n-1,η0:n-1)×p(xn|xn-1,un)p(zn|θn,xn,ηn)

where θn is the feature observed at time n. Alternatively, one can wait and collect all the data, and then assimilate this large data set in one sweep; such a “smoothing” approach is related to graph based SLAM (see, e.g., Ref. [42]) but will not be discussed further in this paper.

For online probabilistic SLAM, one assumes that M weighted samples {X0:n-1j,Θj,Wn-1j} approximate

p(x0:n-1,Θ|z0:n-1,u0:n-1,η0:n-1)

at time n − 1 and then updates the samples to time n by applying importance sampling to

p(xn|xn-1,un)p(zn|θn,xn,ηn)

Here, it is assumed that only one feature is observed at a time (which is realistic [15]); however, the extension to observing multiple features simultaneously is straightforward.

Various importance functions have been considered and tested in the literature. For example, the fastSLAM algorithm [43] was shown to be problematic in many cases due to the poor distribution of its samples [15,44]. The reason is that the samples are generated by the motion model and, thus, are not informed by the data. As a result, the samples are often unlikely with respect to the data. The fastSLAM 2.0 algorithm [15,44] ameliorates this problem by making use of an importance function that depends on the data. The algorithm was shown to outperform fastSLAM and was proven to converge (as the number of samples goes to infinity) to the true SLAM posterior for linear models. The convergence for nonlinear models is not well understood (because of the use of extended KFs (EKFs) to track and construct the map). Here, a new SLAM algorithm with implicit sampling is described. The algorithm converges for nonlinear models at a computational cost that is comparable to the cost of fastSLAM 2.0.

Online SLAM with implicit sampling requires that implicit sampling is applied to the M functions Fj (one per sample), whose variables are the position at time tn, xn, and the location of the feature θn, observed at time tn

Fj(xn,θ)=-logp(xn|Xn-1j,un)p(zn|θnj,xn,ηnj)

the position of the jth sample at time tn−1 is a parameter. As in MCL, one needs M functions here due to the recursive problem formulation. The minimization problems of implicit sampling are

φj=minxn,θnFj(xn,θn)

and samples {Xnj,θnj} are generated by solving the stochastic equation (1). One can use the same technique for solving these equations as described in Sec. 2. Moreover, if the feature θn is already known, the SLAM algorithm reduces to the MCL algorithm with implicit sampling described above.

Note that the memory requirements of SLAM with implicit sampling are linear in the number of features, because the algorithm makes use of the conditional independence of the features given the robot path, so that, at each step, only one feature needs to be considered. Moreover, SLAM with implicit sampling converges to the true SLAM posterior (under the usual assumptions about the supports of the importance function and target PDF [45]) as the number of samples goes to infinity; this convergence is independent of linearity assumptions about the model and data equations. Convergence for fastSLAM and fastSLAM 2.0, however can only be proven for linear Gaussian models [15].

Numerical Experiments.

The applicability and efficiency of SLAM with implicit sampling is demonstrated with numerical experiments that mimic the University Car Park data set [17]. Here speed and steering data of Ref. [17] are used, however the laser data are replaced by synthetic data, because the data in Ref. [17] are too sparse (in time) for successful online SLAM (even EKF SLAM, often viewed as a benchmark, does not give satisfactory results).

In these synthetic data experiments, a virtual laser scanner returns noisy range and bearing measurements of features in a half-circle with 15 m radius. If the laser detects more than one feature, one is selected at random and returned to the SLAM algorithm. Since each synthetic data set is a random event, 50 synthetic data sets are used to compute the average errors for various SLAM algorithms. These errors are defined as the norm of the difference of the ground truth and the conditional mean computed with a SLAM algorithm. The performances of EKF SLAM, fastSLAM, fastSLAM 2.0, and SLAM with implicit sampling are compared using these synthetic data sets. All SLAM methods make use of the same maximum likelihood algorithm for the data association. A Newton method was implemented for the optimization in implicit sampling, and the quadratic approximation of F was used to solve the algebraic equation (1). A typical outcome of a numerical experiment is shown in the left panel of Fig. 7, where the true path and true positions of the landmarks as well as their reconstructions via SLAM with implicit sampling (100 samples) are plotted.

The results of 50 numerical experiments with fastSLAM, fastSLAM 2.0, and SLAM with implicit sampling are shown in Table 6. It was observed in this example that EKF SLAM gives an average error of 3.89% at an average computation time of 0.43 ms and, thus, is computationally efficient and accurate. However, computational efficiency disappears in scenarios with larger maps due to the O(N2) memory requirements (N is the number of features in the map). The error of fastSLAM 2.0 (whose memory requirements scale linearly with N) decreases with the number of samples, and it seems as if the fastSLAM 2.0 solution converges (as the number of samples and, thus, the computation time increases) to the EKF solution. This is intuitive because fastSLAM 2.0 uses EKFs to track the map. The fastSLAM algorithm, on the other hand, converges to an error that is larger than in EKF SLAM or fastSLAM 2.0. SLAM with implicit sampling converges to an error that is smaller than in EKF SLAM. The reason is as follows: implicit sampling requires no linearity assumptions and, therefore, the empirical estimate it produces converges (as the number of samples goes to infinity) to the true SLAM posterior.

Moreover, the convergence rate of SLAM with implicit sampling is faster than for any other SLAM method, due to the data informed importance function. Thus, the approximation of the conditional mean (the minimum mean square error estimator) one obtains with implicit sampling is accurate even if the number of samples is relatively small. As a result, SLAM with implicit sampling is more efficient than the other SLAM algorithms considered here. This is illustrated in the right panel of Fig. 7, where the computing time is shown as a function of the average error. As in MCL (see above), the area between the various curves indicates the improvement in efficiency. Here implicit sampling is the most efficient method, giving a high accuracy (small error) at a small computational cost. Moreover, SLAM with implicit sampling can achieve an accuracy which is higher than the accuracy of all other methods (including EKF SLAM), because it is a convergent algorithm.

Implicit sampling is a MC scheme that localizes the high-probability region of the sample space via numerical optimization, and then produces samples in this region by solving algebraic equations with a stochastic right-hand side. The computational cost per sample in implicit sampling is larger than in many other MC sampling schemes that randomly explore the sample space. However, the minimization directs the computational power toward the relevant region of the sample space so that implicit sampling often requires fewer samples than other MC methods. There is a tradeoff between the cost-per-sample and the number of samples and this tradeoff was studied in this paper in the context of three applications: MCL and probabilistic online SLAM in robotics, as well as path integral control.

In path integral control one solves the stochastic HJB equation with a MC solver. This has the advantage that no grid on the state space is needed and the method is therefore (in principle) applicable to control problems of relatively large dimension. Implicit sampling was applied in this context to speed up the MC calculations. The applicability of this approach was demonstrated on two test problems. Path integral control was also used to find local minima in nonconvex optimization problems.

An implementation of implicit sampling for MCL was tested and it was found that implicit sampling performs better than standard MCL (in terms of computing time and accuracy), especially if the data are accurate and the motion model is noisy (which is the case typically encountered in practice [16]). Similarly, implicit sampling for SLAM outperformed standard algorithms (fastSLAM and fastSLAM 2.0). Under mild assumptions, but for nonlinear models, SLAM with implicit sampling converges to the true SLAM posterior as the number of samples goes to infinity, at a computational cost that is linear in the number of features of the map.

I thank Professor Alexandre J. Chorin of UC Berkeley for many interesting technical discussions and for bringing path integral control to my attention. I thank Dr. Robert Saye of Lawrence Berkeley National Laboratory for help with proofreading this manuscript. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Contract No. DE-AC02005CH11231, and by the National Science Foundation under Grant No. DMS-1217065.

Appendix

A kinematic model of the robot is described in Ref. [46] and shown for convenience in Fig. 8. The robot is controlled by the speed vc and the steering angle α. It can be shown that the derivatives of the position and orientation of the axle of the robot are

dxcdt=vccos(β),dycdt=vcsin(β),dβdt=vcLtan(α)

The velocity vl, however, is measured at the rear left wheel and, thus, must be translated to the axle

vc=LvlL-tanαH

The position of the laser can be obtained from the position of the axle by using

x=xc+acos(β)-bsin(β),y=yc+asin(β)+bcos(β)

so that the forward model (without noise) is

dxdt=vc(cos(β)-tanα(asin(β)-bcos(β)))dydt=vc(sin(β)+tanα(acos(β)-bsin(β)))dβdt=vcLtan(α)

which, in a more compact notation, can be written as

dxdt=R(x,u)

where x = (x, y, β)T and u = (vc, α)T. The uncertainty in the model is due to errors in the inputs and in the model [46]

dx=R(x,u(t))dt+QdWxu(t)dt=u(t)dt+PdWu

where Wu and Wx are two-, respectively, three-dimensional vectors whose components are independent Brownian motions, P and Q are real 2 × 2, respectively, 3 × 3 matrices and u(t) is the unperturbed control signal. The above equations are linearized in u to obtain

dx=R(x,u)dt+dRduPdWu+QdW

and the stochastic forward Euler method [21] with time step δ is used to discretize the equations

xn+1=xn+R(xn,un)δ+ΔWn,ΔWn=N(0,Σ1n)

where

Σ1n=δ(dRduPPTdRduT+QQT)

and xn= x(), un= u(). Table 7 lists the numerical values of all parameters of the probabilistic motion model. The steering data α and velocity data vc are taken from the data set [17].

The data are measurements of range and bearing of the features in the map θ relative to the robot and are obtained by the laser. Only “high intensity points” from the data set [17] are used. The laser is modeled by the data equation

zj,n=h(xn)+Vn,Vn~N(0,Σ2)

where Σ2 is a 2 × 2 diagonal matrix whose elements are σ1 = 0.05 and σ2 = 0.05π/180 and

h(xn)=(xn-mjatan2(m2n-x2nm1-x1n)-x3n+π2)

where xjn is the jth element of the vector xn, the “true” robot position, and m1j,m2j are the x- and y-coordinates of the jth element of a landmark; the norm · is the Euclidean norm.

The gradient and Hessian of Fj in Eq. (22) are

Fj=(Σ1n)-1(x-fn)+i=1p(x1-m1iσ1r(ri-z1i)+x2-m2iσ2ri2(z2i-θi)x2-m2iσ1r(ri-z1i)-x1-m1iσ2ri2(z2i-θi)z2i-θiσ2)Hj=(Σ1n)-1+i=1pHi

where the elements of the 3 × 3 matrices Hi are

H11i=ri-z1iσ1ri-(x1-m1i)2(ri-z1i)σ1ri3+x1-m1iσ1ri2+2(x2-m2i)3(z2i-θi)σ2(x1-m1i)ri4-2(x2-m2i)(z2i-θi)σ2(x2-m2i)ri2)+(x2-m2i)2σ2ri4H12i=H21i=(x1-m1i)(x2-m2i)(ri-z1i)σ1ri3+(x1-m1i)(x2-m2i)σ1ri2+2(x2-m2i)2(z2i-θi)σ2ri4+(z2i-θi)σ2ri2-(x1-m1i)(x2-m2i)σ2ri4H13i=H31i=x2-m2iσ2ri2H22i=(x2-m2i)2(ri-z1i)σ1ri3+(ri-z1i)σ1ri+(x2-m2i)2σ1ri2+2(x1-m1i)(x2-m2i)(z2i-θi)σ2r4+(x1-m1i)2σ2r4H23i=H32i=x1-m1iσ2ri2H33i=1σ2

and where

ri=xn-miθi=z2i-(atan2(mi-x2m1i-x1)-x3+π2)

Chorin, A. , and Hald, O. , 2013, Stochastic Tools in Mathematics and Science, 3rd ed., Springer, New York .
Doucet, A. , de Freitas, N. , and Gordon, N. , eds., 2001, Sequential Monte Carlo Methods in Practice, Springer, New York .
Chorin, A. , and Tu, X. , 2009, “Implicit Sampling for Particle Filters,” Proc. Natl. Acad. Sci., 106(41), pp. 17249–17254. [CrossRef]
Chorin, A. , Morzfeld, M. , and Tu, X. , 2010, “Implicit Particle Filters for Data Assimilation,” Commun. Appl. Math. Comput. Sci., 5(2), pp. 221–240. [CrossRef]
Atkins, E. , Morzfeld, M. , and Chorin, A. , 2013, “Implicit Particle Methods and Their Connection With Variational Data Assimilation,” Mon. Weather Rev., 141(6), pp. 1786–1803. [CrossRef]
Morzfeld, M. , and Chorin, A. , 2012, “Implicit Particle Filtering for Models With Partial Noise, and an Application to Geomagnetic Data Assimilation,” Nonlinear Process. Geophys., 19(3), pp. 365–382. [CrossRef]
Morzfeld, M. , Tu, X. , Atkins, E. , and Chorin, A. , 2012, “A Random Map Implementation of Implicit Filters,” J. Comput. Phys., 231(4), pp. 2049–2066. [CrossRef]
Stengel, R. , 1993, Optimal Control and Estimation, Dover, New York.
Kappen, H. , 2005, “Linear Theory for Control of Nonlinear Stochastic Systems,” Phys. Rev. Lett., 95(20), p. 200201. [CrossRef] [PubMed]
Kappen, H. , 2005, “Path Integrals and Symmetry Breaking for Optimal Control Theory,” J. Stat. Mech., 11, p. 011011. [CrossRef]
Kappen, H. , 2006, “An Introduction to Stochastic Control Theory, Path Integrals and Reinforcement Learning,” AIP Conference Proceedings.
Theodorou, E. , Buchli, J. , and Schaal, S. , 2010, “A Generalized Path Integral Control Approach to Reinforcement Learning,” J. Mach. Learn. Res., 11, pp. 3137–3181.
Yang, I. , Morzfeld, M. , Tomlin, C. , and Chorin, A. , 2014, “Path Integral Formulation of Stochastic Optimal Control With Generalized Costs,” Proceedings of the 19th IFAC World Congress, Cape Town, South Africa, Aug. 24–29.
Dellaert, F. , Fox, D. , Burgard, W. , and Thrun, S. , 1999, “Monte Carlo Localization for Mobile Robots,” IEEE International Conference on Robotics and Automation (ICRA’99), Detroit, MI, May 10–15, pp. 1322–1328.
Montemerlo, M. , and Thrun, S. , 2007, FastSLAM. A Scalable Method for the Simultaneous Localization and Mapping Problem in Robotics, Springer, Berlin, Germany.
Thrun, S. , Burgard, W. , and Fox, D. , 2005, Probabilistic Robotics, MIT Press, Cambridge, MA .
Nebot, E. , 2003, “University Car Park (Inertial/GPS) Data Set,” http://www-personal.acfr.usyd.edu.au/nebot/car_park.htm (Last Accessed Nov. 19, 2014)
Weir, B. , Miller, R. , and Spitz, Y. , 2013, “An Implicit Particle Smoother for High-Dimensional Systems,” Nonlinear Process. Geophys., 20(6), pp. 1047–1060. [CrossRef]
Fleming, W. , 1977, “Exit Probabilities and Optimal Stochastic Control,” Appl. Math. Optim., 4(1), pp. 329–346. [CrossRef]
Hammond, B. , Lester, W. A., Jr. , and Reynolds, P. , 1994, Monte Carlo Methods in Ab Initio Quantum Chemistry, World Scientific Publishing, Singapore.
Kloeden, P. , and Platen, E. , 1999, Numerical Solution of Stochastic Differential Equations, 3rd ed., Springer, Berlin, Germany.
Nocedal, J. , and Wright, S. , 2006, Numerical Optimization, 2nd ed., Springer New York.
Fletcher, R. , 1987, Practical Methods of Optimization, 2nd ed., Wiley, Hoboken, NJ.
Vanden-Eijnden, E. , and Weare, J. , 2012, “Rare Event Simulation and Small Noise Diffusions,” Commun. Pure Appl. Math., 65(12), pp. 1770–1803. [CrossRef]
Vanden-Eijnden, E. , and Weare, J. , 2013, “Data Assimilation in the Low Noise, Accurate Observation Regime With Application to the Kuroshio Current,” Mon. Weather Rev., 141(6), pp. 1822–1841. [CrossRef]
Slotine, J.-J. , and Li, W. , 1991, Applied Nonlinear Control, Prentice Hall, Upper Saddle River, NJ.
Himmelblau, D. , 1972, Applied Nonlinear Programming, McGraw-Hill, New York.
Kortenkamp, D. , Bonasso, R. , and Robin, R. M. , eds., 1998, AI-based Mobile Robots: Case Studies of Successful Robot Systems, MIT Press, Cambridge, MA.
Dissanayake, G. , Newman, P. , Clark, S. , and Durrant-Whyte, H. , 2001, “A Solution to the Simultaneous Localization and Map Building (SLAM) Problem,” IEEE Trans. Rob. Autom., 17(3), pp. 229–241. [CrossRef]
Thorpe, C. , and Durrant-Whyte, H. , 2001, Field Robots, ISRR, Springer, Berlin, Germany.
Golombek, M. , Cook, R. , Economou, T. , Folkner, W. , Haldemann, A. , Kallemeyn, P. , Knudsen, J. , Manning, R. , Moore, H. , Parker, T. , Rieder, R. , Schofield, J. , Smith, P. , and Vaughan, R. , 1997, “Overview of the Mars Pathfinder Mission and Assessment of Landing Site Predictions,” Science, 5344(278), pp. 1743–1748. [CrossRef]
Thrun, S. , Ferguson, D. , Haehnel, D. , Montemerlo, M. , Burgard, W. , and Triebel, R. , 2003, “A System for Volumetric Robotic Mapping of Abandoned Mines,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’03), Taipeh, Taiwan, Sept. 14–19, pp. 4270–4275.
Kalman, R. , 1960, “A New Approach to Linear Filtering and Prediction Theory,” ASME J. Basic Eng., 82(Series D), pp. 35–48. [CrossRef]
Biswas, J. , Coltin, B. , and Veloso, M. , 2011, “Corrective Gradient Refinement for Mobile Robot Localization,” Proceedings of the International Conference on Intelligent Robots and Systems, San Francisco, CA, pp. 73–78.
Biswas, J. , and Veloso, M. , 2013, “Localization and Navigation of the Cobots Over Long-Term Deployments,” Int. J. Rob. Res., 32(4), pp. 1679–1694. [CrossRef]
Nebot, E. , 2003, “Ute Documentation: Hardware Manual,” http://www-personal.acfr.usyd.edu.au/nebot/experimental_data/modeling_info/Ute_modeling_info.htm (Last Accessed Nov. 19, 2014)
Arulampalam, M. , Maskell, S. , Gordon, N. , and Clapp, T. , 2002, “A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking,” IEEE Trans. Signal Process., 50(2), pp. 174–188. [CrossRef]
Smith, R. , and Cheesman, P. , 1987, “On the Representation of Spatial Uncertainty,” Int. J. Rob. Res., 5(4), pp. 56–68. [CrossRef]
Durrant-Whyte, H. , 1988, “Uncertain Geometry in Robotics,” IEEE Trans. Rob. Autom., 4(1), pp. 23–31. [CrossRef]
Thrun, S. , Fox, D. , and Burgard, W. , 1998, “A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots,” Mach. Learn., 31(1), pp. 29–53. [CrossRef]
Murphy, K. , 1999, “Bayesian Map Learning in Dynamic Environments,” Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA.
Grisetti, G. , Kummerle, R. , Stachniss, C. , and Burgard, W. , 2010, “A Tutorial on Graph-Based SLAM,” IEEE Intell. Transp. Syst. Mag., 2(4), pp. 31–43. [CrossRef]
Montemerlo, M. , Thrun, S. , Koller, D. , and Wegbreit, B. , 2002, “Fast-SLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem,” Proceedings of the AAAI National Conference on Artificial Intelligence, Edmonton, Alberta, Canada, Jul. 28 – Aug. 1.
Montemerlo, M. , Thrun, S. , Koller, D. , and Wegbreit, B. , 2003, “FastSLAM 2.0: An Improved Particle Filtering Algorithm for Simultaneous Localization and Mapping That Provably Converges,” Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI), Acapulco, Mexico, Aug. 9 – 15.
Geweke, J. , 1989, “Bayesian Inference in Econometric Models Using Monte Carlo Integration,” Econometrica, 57(6), pp. 1317–1399. [CrossRef]
Guivant, J. , and Nebot, E. , 2001, “Optimization of Simultaneous Localization and Map-Building Algorithm for Real Time Implementation,” IEEE Trans. Rob. Autom., 17(3), pp. 241–257. [CrossRef]
Copyright © 2015 by ASME
View article in PDF format.

References

Chorin, A. , and Hald, O. , 2013, Stochastic Tools in Mathematics and Science, 3rd ed., Springer, New York .
Doucet, A. , de Freitas, N. , and Gordon, N. , eds., 2001, Sequential Monte Carlo Methods in Practice, Springer, New York .
Chorin, A. , and Tu, X. , 2009, “Implicit Sampling for Particle Filters,” Proc. Natl. Acad. Sci., 106(41), pp. 17249–17254. [CrossRef]
Chorin, A. , Morzfeld, M. , and Tu, X. , 2010, “Implicit Particle Filters for Data Assimilation,” Commun. Appl. Math. Comput. Sci., 5(2), pp. 221–240. [CrossRef]
Atkins, E. , Morzfeld, M. , and Chorin, A. , 2013, “Implicit Particle Methods and Their Connection With Variational Data Assimilation,” Mon. Weather Rev., 141(6), pp. 1786–1803. [CrossRef]
Morzfeld, M. , and Chorin, A. , 2012, “Implicit Particle Filtering for Models With Partial Noise, and an Application to Geomagnetic Data Assimilation,” Nonlinear Process. Geophys., 19(3), pp. 365–382. [CrossRef]
Morzfeld, M. , Tu, X. , Atkins, E. , and Chorin, A. , 2012, “A Random Map Implementation of Implicit Filters,” J. Comput. Phys., 231(4), pp. 2049–2066. [CrossRef]
Stengel, R. , 1993, Optimal Control and Estimation, Dover, New York.
Kappen, H. , 2005, “Linear Theory for Control of Nonlinear Stochastic Systems,” Phys. Rev. Lett., 95(20), p. 200201. [CrossRef] [PubMed]
Kappen, H. , 2005, “Path Integrals and Symmetry Breaking for Optimal Control Theory,” J. Stat. Mech., 11, p. 011011. [CrossRef]
Kappen, H. , 2006, “An Introduction to Stochastic Control Theory, Path Integrals and Reinforcement Learning,” AIP Conference Proceedings.
Theodorou, E. , Buchli, J. , and Schaal, S. , 2010, “A Generalized Path Integral Control Approach to Reinforcement Learning,” J. Mach. Learn. Res., 11, pp. 3137–3181.
Yang, I. , Morzfeld, M. , Tomlin, C. , and Chorin, A. , 2014, “Path Integral Formulation of Stochastic Optimal Control With Generalized Costs,” Proceedings of the 19th IFAC World Congress, Cape Town, South Africa, Aug. 24–29.
Dellaert, F. , Fox, D. , Burgard, W. , and Thrun, S. , 1999, “Monte Carlo Localization for Mobile Robots,” IEEE International Conference on Robotics and Automation (ICRA’99), Detroit, MI, May 10–15, pp. 1322–1328.
Montemerlo, M. , and Thrun, S. , 2007, FastSLAM. A Scalable Method for the Simultaneous Localization and Mapping Problem in Robotics, Springer, Berlin, Germany.
Thrun, S. , Burgard, W. , and Fox, D. , 2005, Probabilistic Robotics, MIT Press, Cambridge, MA .
Nebot, E. , 2003, “University Car Park (Inertial/GPS) Data Set,” http://www-personal.acfr.usyd.edu.au/nebot/car_park.htm (Last Accessed Nov. 19, 2014)
Weir, B. , Miller, R. , and Spitz, Y. , 2013, “An Implicit Particle Smoother for High-Dimensional Systems,” Nonlinear Process. Geophys., 20(6), pp. 1047–1060. [CrossRef]
Fleming, W. , 1977, “Exit Probabilities and Optimal Stochastic Control,” Appl. Math. Optim., 4(1), pp. 329–346. [CrossRef]
Hammond, B. , Lester, W. A., Jr. , and Reynolds, P. , 1994, Monte Carlo Methods in Ab Initio Quantum Chemistry, World Scientific Publishing, Singapore.
Kloeden, P. , and Platen, E. , 1999, Numerical Solution of Stochastic Differential Equations, 3rd ed., Springer, Berlin, Germany.
Nocedal, J. , and Wright, S. , 2006, Numerical Optimization, 2nd ed., Springer New York.
Fletcher, R. , 1987, Practical Methods of Optimization, 2nd ed., Wiley, Hoboken, NJ.
Vanden-Eijnden, E. , and Weare, J. , 2012, “Rare Event Simulation and Small Noise Diffusions,” Commun. Pure Appl. Math., 65(12), pp. 1770–1803. [CrossRef]
Vanden-Eijnden, E. , and Weare, J. , 2013, “Data Assimilation in the Low Noise, Accurate Observation Regime With Application to the Kuroshio Current,” Mon. Weather Rev., 141(6), pp. 1822–1841. [CrossRef]
Slotine, J.-J. , and Li, W. , 1991, Applied Nonlinear Control, Prentice Hall, Upper Saddle River, NJ.
Himmelblau, D. , 1972, Applied Nonlinear Programming, McGraw-Hill, New York.
Kortenkamp, D. , Bonasso, R. , and Robin, R. M. , eds., 1998, AI-based Mobile Robots: Case Studies of Successful Robot Systems, MIT Press, Cambridge, MA.
Dissanayake, G. , Newman, P. , Clark, S. , and Durrant-Whyte, H. , 2001, “A Solution to the Simultaneous Localization and Map Building (SLAM) Problem,” IEEE Trans. Rob. Autom., 17(3), pp. 229–241. [CrossRef]
Thorpe, C. , and Durrant-Whyte, H. , 2001, Field Robots, ISRR, Springer, Berlin, Germany.
Golombek, M. , Cook, R. , Economou, T. , Folkner, W. , Haldemann, A. , Kallemeyn, P. , Knudsen, J. , Manning, R. , Moore, H. , Parker, T. , Rieder, R. , Schofield, J. , Smith, P. , and Vaughan, R. , 1997, “Overview of the Mars Pathfinder Mission and Assessment of Landing Site Predictions,” Science, 5344(278), pp. 1743–1748. [CrossRef]
Thrun, S. , Ferguson, D. , Haehnel, D. , Montemerlo, M. , Burgard, W. , and Triebel, R. , 2003, “A System for Volumetric Robotic Mapping of Abandoned Mines,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’03), Taipeh, Taiwan, Sept. 14–19, pp. 4270–4275.
Kalman, R. , 1960, “A New Approach to Linear Filtering and Prediction Theory,” ASME J. Basic Eng., 82(Series D), pp. 35–48. [CrossRef]
Biswas, J. , Coltin, B. , and Veloso, M. , 2011, “Corrective Gradient Refinement for Mobile Robot Localization,” Proceedings of the International Conference on Intelligent Robots and Systems, San Francisco, CA, pp. 73–78.
Biswas, J. , and Veloso, M. , 2013, “Localization and Navigation of the Cobots Over Long-Term Deployments,” Int. J. Rob. Res., 32(4), pp. 1679–1694. [CrossRef]
Nebot, E. , 2003, “Ute Documentation: Hardware Manual,” http://www-personal.acfr.usyd.edu.au/nebot/experimental_data/modeling_info/Ute_modeling_info.htm (Last Accessed Nov. 19, 2014)
Arulampalam, M. , Maskell, S. , Gordon, N. , and Clapp, T. , 2002, “A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking,” IEEE Trans. Signal Process., 50(2), pp. 174–188. [CrossRef]
Smith, R. , and Cheesman, P. , 1987, “On the Representation of Spatial Uncertainty,” Int. J. Rob. Res., 5(4), pp. 56–68. [CrossRef]
Durrant-Whyte, H. , 1988, “Uncertain Geometry in Robotics,” IEEE Trans. Rob. Autom., 4(1), pp. 23–31. [CrossRef]
Thrun, S. , Fox, D. , and Burgard, W. , 1998, “A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots,” Mach. Learn., 31(1), pp. 29–53. [CrossRef]
Murphy, K. , 1999, “Bayesian Map Learning in Dynamic Environments,” Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA.
Grisetti, G. , Kummerle, R. , Stachniss, C. , and Burgard, W. , 2010, “A Tutorial on Graph-Based SLAM,” IEEE Intell. Transp. Syst. Mag., 2(4), pp. 31–43. [CrossRef]
Montemerlo, M. , Thrun, S. , Koller, D. , and Wegbreit, B. , 2002, “Fast-SLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem,” Proceedings of the AAAI National Conference on Artificial Intelligence, Edmonton, Alberta, Canada, Jul. 28 – Aug. 1.
Montemerlo, M. , Thrun, S. , Koller, D. , and Wegbreit, B. , 2003, “FastSLAM 2.0: An Improved Particle Filtering Algorithm for Simultaneous Localization and Mapping That Provably Converges,” Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI), Acapulco, Mexico, Aug. 9 – 15.
Geweke, J. , 1989, “Bayesian Inference in Econometric Models Using Monte Carlo Integration,” Econometrica, 57(6), pp. 1317–1399. [CrossRef]
Guivant, J. , and Nebot, E. , 2001, “Optimization of Simultaneous Localization and Map-Building Algorithm for Real Time Implementation,” IEEE Trans. Rob. Autom., 17(3), pp. 241–257. [CrossRef]

Figures

Grahic Jump Location
Fig. 1

Random walks for the double slit problem. Left: standard MC sampling with 5000 unguided random walks, all of which hit the potential wall. Right: 50 guided random walks obtained via implicit sampling, only four of which hit the potential wall.

Grahic Jump Location
Fig. 2

Comparison of the analytical solution and its approximation by implicit sampling with 50 samples. Left: the trajectory under optimal control (solid) and its numerical approximation (dashed). Right: the optimal control (solid) and its numerical approximation (dashed).

Grahic Jump Location
Fig. 3

Sketch of a two degrees-of-freedom robotic arm

Grahic Jump Location
Fig. 4

Simulation of a robotic arm under path integral control. Upper left: trajectories of angles θ1 (gray/gray dashed) and θ2 (light gray/light gray dashed); lower left: trajectories of angular velocities θ·2 (gray/gray dashed) and θ·2 (light gray/light gray dashed); right: applied torques τ1 (gray/gray dashed) and τ2 (light gray/light gray dashed). Solid gray and light gray lines: trajectories and torques under optimal control with exact model parameters; dashed red and blue lines: trajectories and torques under path integral control with false model parameters.

Grahic Jump Location
Fig. 5

Newton's method and stochastic optimization for the Himmelblau function

Grahic Jump Location
Fig. 6

Simulation results of MCL with implicit sampling and standard MCL. Left: the path of the robot and its reconstruction via MCL with implicit sampling (100 samples). Right: average computing time as a function of the average error for standard MCL and MCL with implicit sampling.