Approximating Markov Chain Approach to Optimal Feedback Control of a Flexible Needle

We present a computationally efﬁcient approach for the intra-operative update of the feedback control policy for the steerable needle in the presence of the motion uncertainty. The solution to dynamic programming (DP) equations, to obtain the optimal control pol- icy, is difﬁcult or intractable for nonlinear problems such as steering ﬂexible needle in soft tissue. We use the method of approximating Markov chain to approximate the contin- uous (and controlled) process with its discrete and locally consistent counterpart. This provides the ground to examine the linear programming (LP) approach to solve the imposed DP problem that signiﬁcantly reduces the computational demand. A concrete example of the two-dimensional (2D) needle steering is considered to investigate the effectiveness of the LP method for both deterministic and stochastic systems. We compare the performance of the LP-based policy with the results obtained through more computationally demanding algorithm, iterative policy space approximation. Finally, the reliabil-ity of the LP-based policy dealing with motion and parametric uncertainties as well as the effect of insertion point/angle on the probability of success is investigated. [DOI: 10.1115/1.4033834]


Introduction
Advancements in robotic minimally invasive surgeries have extended its applications to a variety of the diagnostic and therapeutic tasks. In addition to the sophisticated design and fabrication of surgical tools [1][2][3][4][5], estimation [6][7][8], motion planning, and control [9][10][11][12][13][14][15] of the surgical tool tip are crucial problems in many of such procedures. Particularly, in the needle biopsies and drug injections, bevel tip of the needle along with its flexibility allows for driving the needle through a curved trajectory. This facilitates accessing organs that are not directly reachable using ordinary needles with symmetric tip. Motion planning and control of this under-actuated system has received enormous attention among researchers in the last decade. Difficulty of this task arises from complex/large deflection/motion of the fine needle along with its interaction with the soft and nonhomogeneous tissues. Additionally, uncertainty induced by the imperfect actuation of the needle, movement of human organs (due to the respiratory and cardiac motions), patient-specific parameters, unknown parameters of the needle structure, etc. introduces further complications.
A variety of methods have been proposed in the literature to address aforementioned problems. Alterovitz et al. [9] considered planning in a 2D imaging plane where the objectives were minimization of the insertion distance, avoidance of the polygonal obstacles, and compensations of the soft tissue deformation. A finite-element approach was utilized for taking into account the friction and deflection of the needle. In other efforts, Alterovitz et al. [10,11] incorporated the effect of uncertainties on the needle motion and provided an optimal control policy by solving the DP problem. The needle tip angle was the only uncertain variable in their model. Park et al. [13] studied the uncertainty propagation employing the Gaussian models and motion planning was performed through path-of-probability algorithm. The planning problem when dealing with both sensing and motion uncertainty was addressed by Berg et al. [12]. Many studies address different aspects of the motion planning for flexible needle such as development and experimental validation of unicycle and bicycle motion models in Ref. [14], sensorless motion planning in Ref. [15] and three-dimensional (3D) motion planning in soft tissue under the motion uncertainty in Ref. [16].
Feedback control policy computation is typically of less interest due to the significant computational demand that makes the intraoperative policy updates difficult or intractable. Hence, a variety of the sampling-based motion planning algorithms (for example, see Refs. [17] and [18]) are typically applied to needle insertion problem. However, feedback control becomes more appropriate in many surgical scenarios where the feedback is available through different sensing modalities such as ultrasound imaging. This motivates investigating numerical approaches to this problem that significantly reduce the computational burden and facilitate real-time update of the feedback control policy during the surgical procedure. In this paper, we use the analogy of a Dubins vehicle traveling in the stochastic wind field [19,20] with driving a needle in a soft tissue subject to the motion uncertainties. The main objective of this work is to computationally examine the strategies that provide reliable control policies for such under-actuated systems in the presence of the motion uncertainties. We formulate the needle equation of motion as stochastic differential equation that includes the motion uncertainty in all three states of a planar system, i.e., in the motion along x and y direction as well as the needle tip orientation. One main source of these uncertainties can be the respiratory and cardiac motions. The integration of the Dubins model and DP approach has proved to be effective in finding the optimal control policy in guidance and tracking problems [19][20][21][22]. By relying on the successful application of the approximating Markov chain method to the counterpart of this problem (Dubins vehicle problem), we construct the LP formulation. Using LP formulation for solving the DP problem [23,24] offers a significant improvement in the computational cost due to the power of LP in dealing with large-scale problems. The reduced computational cost allows for intra-operative policy updates when adaptive or patient-specific motion models are used.
To this end, we are aiming to initiate an alternative approach (to many sampling based techniques) relying on Markov chain approximations coupled with computationally efficient LP technique. To further emphasize on the applicability of the LP-based approach, we compare the control policies generated by LP 1 solution to the DP problem against a computationally expensive solution obtained from iterative approximation in policy space. Furthermore, the probability of success or failure (collision with obstacles) is simply the variable of the LP model. This enables constructing different cost functions in optimal policy calculation based upon the clinically motivated objectives.
The rest of this paper is organized as follows: Section 2 begins with a brief review of the deterministic needle kinematic model. Then, we discuss the stochastic model resulting from introducing motion uncertainty. In Sec. 3, the optimal control of the needle with stochastic motion model is argued. We construct the approximating Markov chain that is locally consistent with the actual process and subsequently discuss the iterative policy approximation as well as the construction of the LP problem. Results including the control policies and simulations of the controlled process are presented in Sec. 4. Finally, Sec. 5 provides a brief discussion and future work directions.

Kinematic Model
Three main steering approaches (or a combination of them) [25] may be employed to access remote targets through driving flexible needle in soft tissues. The most common way is using the needle with bevel tip that generates lateral bending force and as a result drives the needle on a constant curvature path. Other two methods are lateral manipulation and tissue displacement where the base of the needle and the tissue are moved, respectively, in order to access the target. There is more controllability and manipulation dexterity associated with the first approach that is also the interest of this paper. This section reviews the kinematics of such needles for both the deterministic and stochastic motions.

Deterministic Motion
Model. Similar to a unicycle vehicle moving in the plane, motion of the needle with a bevel tip (here we consider planar motion) can be modeled as a nonholonomic system. The sidewise motion (alongỹ in Fig. 1) is constrained through Vỹ ¼ 0 ; however, the needle tip can assume any pose with a sequence of motions. Although it has been experimentally demonstrated [14] that the bicycle models offer higher fidelity compared to the unicycle ones, the unicycle model is frequently used for design and control studies due to its simplicity and adequate accuracy. In the local coordinate system (x Àỹ), we have and in the global frame, the motion of the needle tip is simply described by a coordinate transformation as (2) can accurately describe the motion in the absence of the uncertainty. However, this does not hold for real systems where different sources of the uncertainty affect the actual behavior. Unknown needle-tissue interaction, failure in accurate actuation, uncertain anatomical movements due to the respiratory and cardiac motions, etc. are some of the main sources of the uncertainty. In this paper, we consider a bevel-tip needle moving in a 2D plane to reach a target relatively far from the initial position while avoiding the obstacles. It is assumed that the obstacles and target area are fixed with respect to the global frame; however, the tissue in which the needle is steered is moving respect to the global frame. The motion of tissue is assumed to be

Stochastic Motion Model. Equation
where w x ðtÞ and w y ðtÞ are white Gaussian noise, and V t y ðtÞ is the time varying mean of the organ/tissue velocity. The movement of the tissue in x direction is induced by the noise. Hence, the stochastic differential equation (SDE) representing the stochastic motion of the needle in global frame can be written as where x ¼ ½x; y; h T and uðxÞ ¼ ½vðxÞ; bðxÞ T are state and feedback control vectors, respectively. bðxðtÞ; uðxðtÞÞ; tÞ ¼ ½v cos h; v sin h þ V t y ðtÞ; bjv T ; dW ¼ ½dW x ; dW y ; dW h T is the vector of (mutually independent) increments of the Wiener process and The objective is to find an optimal control policy that maximizes the probability of success, i.e., reaching the target without collision with the obstacles while minimizing the expectation of the cost functional (e.g., minimization of time, insertion distance, etc.).

Stochastic Optimal Control
3.1 DP. In stochastic system, the optimization is performed in a statistical sense where the expectation of the cost functional is minimized rather than the cost itself. For the controlled diffusion process described by SDE (5), a cost functional may be defined as [24] where E x0 denotes the expectation given that the initial condition is x 0 , T is the time that the process exits a predefined domain (e.g., reaches the target set or exits from an exiting boundary), and kð:; :Þ and gð:Þ are the running and boundary cost functions, respectively. The smooth cost functional Wðx; uÞ satisfies l uðxÞ Wðx; uÞ þ kðx; uðxÞÞ ¼ 0 where l uðxÞ is the differential operator associated with the diffusion process (5) defined as where aðxÞ ¼ r x r T x . From the set of all admissible feedback controls (denoted by u), the one that minimizes the cost functional described by Eq. (6) is the optimal control policy. Then the optimal cost can be written as The optimal cost function VðxÞ satisfies the Bellman partial differential equation (PDE) as inf uðxÞ2u ½l uðxÞ VðxÞ þ kðx; uðxÞÞ ¼ 0 where d x is the domain on which the state x is defined and @d x represents the domain boundary.
3.2 Approximating Markov Chain. Analytical solution of the Bellman PDE described by Eq. (10) is a difficult/intractable task. A well-adopted method for discretization of the continuous space diffusion process described by SDE (5) is approximating Markov chain approach where the actual process is replaced with a discrete Markov process. The main criterion of discretization is to maintain the local consistency with the actual process. In the same sense of the finite difference and finite element discretization of the PDEs, the local properties of the approximating Markov chain converge to that of the actual process when the discretization parameter (as finite difference interval or finiteelement grid size) tends to zero. The cost functional is then constructed corresponding to the discrete Markov process and the solution to the discrete model closely resembles that of the actual controlled process. Refer to Ref. [24] for detailed discussions on the local consistency and convergence analysis.
The solution to Eq. (7) (11) where p x;yjuðxÞ is the transition probability from state x to y given the control action uðxÞ and Dtðx; uðxÞÞ is the discrete process time step. Now, assume that the off diagonal terms of the covariance matrix aðxÞ are equal to zero. Then, the second part of the righthand side of Eq. (8) is zero when i 6 ¼ j. So, only the following finite difference approximations to the cost derivatives are required in order to construct the approximating Markov chain. The finite difference approximation to the first and second derivative can be written as where h i and e i are the grid size and unit vector in the ith dimension, respectively. The subscript i is used for h i as we have assumed that the grid size may be different in each direction. Substituting the approximations (12) and (13) in Eq. (7) and rearranging the terms to construct the solution given by Eq. (11) yield Dtðx; uðxÞÞ ¼ 1=Qðx; uðxÞÞ The state transition probabilities and time-step of the (discrete) approximating Markov chain provide the ground for different computational techniques to solve for the optimal control policy.

Policy Space Approximation.
Let u ¼ fũ 1 ;ũ 2 ; …;ũ q g be a finite set of the admissible controls and u 0 ðxÞ be an admissible control policy whose corresponding cost function is bounded. The iterative method described in Eq. (14) provides the optimal control policy as n ! 1 u nþ1 ðxÞ ¼ argmin a2U X y p x;yja Wðy; u n Þ þ kðx; aÞDtðx; aÞÞ (14) In the policy approximation approach, the cost function in Eq.
where W h n ðy; uÞ ¼ gðyÞ if y 2 @d h where @d h is the finite set of the boundary points.
3.4 LP. Let X ik be the joint probability of the occurrence of state i and control k. Then the probability of the occurrence of state i is X i ¼ P q k¼1 X ik . On the other hand, from the Markov chain state transitions, X i can be written as where q i is the probability that state i is the initial state (given by the initial condition). The cost corresponding to co-occurrence of the ith state and kth control action, denoted by C ik , is due to the running cost and boundary cost if the state exits the domain (reaches the boundary) in future time-step. So Finally, the total cost associated with the control sequence u can be written as The cost function described by Eq. (18) and constraints represented by Eq. (16) construct an LP problem that can be solved with a variety of the algorithms developed for large-scale linear optimization. Here, we use CVX [26,27] for solving the problem described by Eqs. (18) and (16). For further details on LP approach to DP problems, refer to Refs. [23] and [24].

Numerical Experiments
In this section, we apply the computational techniques introduced in Secs. 3.2 and 3.3 to a 2D needle steering problem in which the needle is driven to a target region while avoiding the obstacles and exiting boundaries. The region is considered to be a 2D rectangular plane and the obstacle regions are assumed to be known and fixed. The x À y domain is a square whose side length is 0:2 m where x 2 ½À0:2; 0 m; y 2 ½À0:1; 0:1 m, and h 2 ½Àp=2; p=2 rad. The 3D state space is subdivided into 13,671 discrete states by choosing h x ¼ 0:01 m; h y ¼ 0:01 m, and h h ¼ 6deg. We assume the needle insertion speed to be constant and the control policy gives only the control variable b 2 fÀ1; þ1g that specifies the direction of the drift at each time-step. The domain (dashed-line rectangle), obstacles (black circles), and the target region (solid green line) are shown in Fig. 2.
Let @x be the boundary of the 3D state space domain. We subdivide the boundary as @x ¼ @x PC [ @x NC where @x PC is the positive cost or exiting boundary and @x NC is negative cost or target boundary. We assume gðxÞ to be Through this assignment of the boundary cost function, an optimal policy (that minimizes the expected cost function) will avoid the occurrence of the high cost boundary states and will drive the state toward the boundary points with the negative cost (target). Now, in order to avoid entering the obstacle regions, we assume the running cost as (20) in which the first term represents the so-called time-fuel cost. Second term is the cost associated with the states that are close or inside a predefined obstacle region. C tf and C obs are tuning parameters (positive constants) and V c 2 ½0; 1 is the control parameter such that v ¼ V c V max and V max is the maximum insertion speed. f obst can be chosen such that it returns higher values as the state x gets closer to an obstacle. Here, we set f obst ðxÞ to be a bivariate Gaussian function that peaks at the center of the obstacle. The main objective of this section is to validate the LP-based policies in successfully performing the task and stressing on its computational efficiency through comparison with the wellknown policy space approximation technique. Both these techniques are made possible by approximation of the continuous and controlled process with a locally consistent discrete process through approximating Markov chain approach. In both methods, we study the system with (rðxÞ 6 ¼ 0) and without (rðxÞ ¼ 0) motion uncertainty. In addition, the effect of initial (insertion) condition on the probability of success and robustness of the feedback control policy against the needle parameter uncertainty are investigated.
4.1 DP: Policy Space Approximation and LP. After constructing the discrete approximating Markov chain by following the procedure described in Sec. 3.1, we solve the resulting DP equations by (i) iterative policy space approximation technique reviewed in Sec. 3.2 and (ii) LP approach described in Sec. 3.3. For both approaches, we consider j ¼ 10, V c ¼ 1, V max ¼ 0.5, M ¼ 1 Â 10 6 ; Q ¼ À5 Â 10 5 , C tf ¼ 400, C obs ¼ 75 Â 10 6 . X ik s in Sec. 3.3 are the variables of the LP problem. We have only two admissible controls in u (b ¼ þ1 or b ¼ À1) that along with 13,671 discrete states results in 13,671 Â 2 ¼ 27,342 variables (joint state and control probabilities) in the LP formulations. Figure 2 shows the optimal control policy for the deterministic system that is, in fact, treated by the probabilistic approach (approximating Markov chain) through setting rðxÞ ¼ 0. The blue color (darker) regions correspond to the control input b ¼ þ1 and red color (lighter) regions correspond to b ¼ À1 (please refer to online figure for color). The first row in Fig. 2 (Figs. 2(a)-2(c)) shows the policy obtained through iterative policy approximation, corresponding to different values of the needle tip angle. When approximating the cost function using Eq. (15), the stopping criterion is r u ¼ ðW nþ1 ÀW n Þ=W n j ðxmaxÞ < 0:3 where x max is the state that maximizes the update ratio r u . Moreover, stopping criterion for the iterative policy update (Eq. (14)) is ku nþ1 À u n k < 10 where u n is a 11,191 Â 1 vector. Second row is the policy obtained through LP Transactions of the ASME approach. Comparing the results indicates both similarity and difference between the policies generated by iterative method and LP approach. The solution provided by iterative method is suboptimal compared to that of the LP approach. As an example, comparing the policies shown for h ¼ À3h h in the first column of Fig. 2 (Figs. 2(a) and 2(d)), the policy generated by LP approach, shown in Fig. 2(d), is constantly driving the needle upward (toward the target region) in the bottom-right areas of the grid that are shown by darker areas (i.e., b ¼ þ1). However, the policy generated by iterative method, shown in Fig. 2(a), drives the needle further away from the target in the bottom-right areas of the grid that are shown by lighter (i.e., b ¼ À1). Similar situations can be seen in other columns of Fig. 2. The main reason for this is the low convergence rate of the iterative method that results in suboptimal solution for a limited computational effort. Using a desktop PC with Intel(R) Core(TM) i7-4770 at 3.40 GHz processor and running the codes in MATLAB R2013A, the elapsed time for iterative approach is 1030 s and the time required for LP-based approach is 46 s. The needle steering results corresponding to the policies shown in Fig. 2 are shown in Figs. 3(a) and 3(b) for iterative policy approximation and LP approach, respectively. Three different initial conditions are examined to evaluate the performance of the policies in dealing with the obstacles in a deterministic system. The results shown in Fig. 3 verify the effectiveness of both the feedback control policies in steering the needle to the target area without collision with the obstacles or exiting boundaries. However, comparing the needle paths obtained through using different policies for each insertion (initial) condition supports the inference on the optimality of the policies. For example, when the insertion point is at the black triangle, LP-based policy results in a shorter path-to-target compared with the one resulted from the iterative policy approximation technique. Similar argument holds for the initial condition shown by the square and its corresponding path shown by the solid line. Both the generated policies result in approximately similar paths when the insertion point is at the point shown by the circle. The results presented hereafter generated using the LP-based control policy.

Stochastic
System. If rðxÞ 6 ¼ 0, the policy will be generated to cope with the motion uncertainty. Here, it is assumed that the motion in y direction is perturbed by setting r y ¼ 0:02. We generated 50 realizations of the paths starting from four initial conditions shown in Fig. 4. The kernel smoothing (KS) densities obtained from the points that belong to the needle path realizations are shown in Fig. 4.
In the presence of the motion uncertainty, insertion condition plays an important role in successfully reaching the target as illustrated in Fig. 4. For example, in this numerical experiment, the insertion points shown by the square and triangle appear to be the safest starting points. However, a general conclusion can be drawn with more in-depth analysis and ensuring the convergence in the Monte Carlo simulations of the needle trajectories.

Insertion Condition.
To further analyze the effect of initial condition, we calculated the probability of the success corresponding to several insertion points lying on the vertical segment A-B shown in Fig. 5. For each initial condition, we simulate 100 realizations of the needle motion. A realization is successful if the needle tip reaches the target area (green line) without entering the obstacle regions shown by the black circles. The probability of success (number of successful realizations divided by the total number of trials) corresponding to each insertion point on A-B segment is shown by the horizontal bar graph located at the corresponding insertion point. When the initial needle angle is À45 deg (Fig.  5(a)), the probability of success is high only for the points that are vertically located under two larger obstacles. When the initial angle is 0 deg, all initial positions except those to the left side of the larger obstacles provide almost equal probabilities of the success. The needle can provide limited maneuverability (represented by its curvature parameter j) and it cannot successfully bypass the obstacle if the insertion point is horizontally close to the obstacle and its vertical position is in the obstacle (vertical) range. Similarly, in Fig. 5(c), the points that are (vertically) located above the larger obstacles provide higher probabilities of the success.

Parametric Uncertainty.
The exact value of the needle curvature parameter (j) is typically unknown. Hence, the policy is generated for a nominal value of j. Using the same policy for needles with different values of the curvature parameter will further reduce the probability of the success in reaching the target area. The same analysis as in Sec. 4.1.3 is performed while j is considered as a random variable with Gaussian distribution, i.e., j $ N ðj; r j Þ. In each Monte Carlo realization of the needle trajectory, we used a sample of j drawn from Gaussian distribution with mean j ¼ 10 and standard deviation r j ¼ 1. The corresponding results (for 100 Monte Carlo realizations) are shown in Fig. 6. Comparing the bar graphs shown in Fig. 6 with those depicted in Fig. 5 verifies that the probability of success reduces in presence of the parametric uncertainty especially when initial angles are 645 deg. However, the change is fairly small for the given level of the uncertainty in the curvature parameter j. Fig. 3 The needle motion for three different initial conditions generated by using (a) iterative policy approximation and (b) LP approach. Needle tip angle is assumed to be zero in all three initial conditions.

Discussion
In this paper, the optimal feedback control of a flexible needle in the presence of the motion uncertainty was considered. An analogy to the problem of Dubins vehicle traveling in the stochastic wind was used. We employed approximating Markov chain approach to construct a discrete chain that is locally consistent with the actual diffusion process. This provided the ground to construct the LP problem in order to solve the DP equations. The LP approach provided a significant reduction in the computational demand compared to its counterpart approach, iterative policy approximation. Computational efficiency is crucial when intraoperative planning is required to cope with the uncertainty. We investigated the performance of the LP-based policy in dealing with both the deterministic and stochastic systems. The role of insertion (initial) condition in the probability of the success was also studied in the presence of the motion and parametric uncertainties.  Transactions of the ASME In future, we plan to extend our analysis to cover the spatial motion of the needle in three dimensions. This requires a reliable and computationally efficient numerical method. This is the main reason of our interest in LP approach for solving the DP equations. We also plan to incorporate an accurate and efficient anatomical motion estimation methodology (taking into account the medical constraints) to the needle steering problem.