In this paper, reinforcement learning is applied to coordinate, in a decentralized fashion, the motions of a pair of hydraulic actuators whose task is to firmly hold and move an object along a specified trajectory under conventional position control. The learning goal is to reduce the interaction forces acting on the object that arise due to inevitable positioning errors resulting from the imperfect closed-loop actuator dynamics. Each actuator is therefore outfitted with a reinforcement learning neural network that modifies a centrally planned formation constrained position trajectory in response to the locally measured interaction force. It is shown that the actuators, which form a multiagent learning system, can learn decentralized control strategies that reduce the object interaction forces and thus greatly improve their coordination on the manipulation task. However, the problem of credit assignment, a common difficulty in multiagent learning systems, prevents the actuators from learning control strategies where each actuator contributes equally to reducing the interaction force. This problem is resolved in this paper via the periodic communication of limited local state information between the reinforcement learning actuators. Using both simulations and experiments, this paper examines some of the issues pertaining to learning in dynamic multiagent environments and establishes reinforcement learning as a potential technique for coordinating several nonlinear hydraulic manipulators performing a common task.

1.
Vuckbrotovic
,
M.
, and
Tuneski
,
A. I.
, 1998, “
Mathematical Model of Multiple Manipulators: Cooperative Compliant Manipulation on Dynamical Environments
,”
Mech. Mach. Theory
0094-114X,
33
, pp.
1211
1239
.
2.
Braun
,
B. M.
,
Starr
,
G. P.
,
Wood
,
J. E.
, and
Lumia
,
R.
, 2004, “
A Framework for Implementing Cooperative Motion on Industrial Controllers
,”
IEEE Trans. Rob. Autom.
1042-296X,
20
, pp.
583
589
.
3.
Arimoto
,
S.
,
Miyazaki
,
F.
, and
Kawamura
,
S.
, 1987, “
Cooperative Motion Control of Multiple Robot Arms or Fingers
,”
Proceedings of the 1987 IEEE International Conference on Robotics and Automation
, Raleigh, NC, pp.
1407
1412
.
4.
Kopf
,
C. D.
, and
Yabuta
,
T.
, 1988, “
Experimental Comparison of Master∕Slave and Hybrid Two Arm Position∕Force Control
,”
Proceedings of the 1988 IEEE International Conference on Robotics and Automation
, Philadelphia, PA, pp.
1633
1637
.
5.
Raibert
,
M. H.
, and
Craig
,
J. J.
, 1981, “
Hybrid Position∕Force Control of Manipulators
,”
ASME J. Dyn. Syst., Meas., Control
0022-0434,
102
, pp.
126
133
.
6.
Yoshikawa
,
T.
, and
Zheng
,
X.-Z.
, 1993, “
Coordinated Dynamic Hybrid Position∕Force Control for Multiple Robot Manipulators Handling One Constrained Object
,”
Int. J. Robot. Res.
0278-3649,
12
(
3
), pp.
219
230
.
7.
Perdereau
,
V.
, and
Drouin
,
M.
, 1996, “
Hybrid External Contol for Two Robot Coordinated Motion
,”
Robotica
0263-5747,
14
, pp.
141
153
.
8.
Liu
,
Y.-H.
, and
Arimoto
,
S.
, 1996, “
Distributively Controlling Two Robots Handling an Object in the Task Space Without any Communicaiton
,”
IEEE Trans. Autom. Control
0018-9286,
41
(
8
), pp.
1193
1198
.
9.
Uzmay
,
I.
,
Burkan
,
R.
, and
Sarikaya
,
H.
, 2004, “
Application of Robust and Adaptive Control Techniques to Cooperative Manipulation
,”
Control Eng. Pract.
0967-0661,
12
, pp.
139
148
.
10.
Woon
,
L. C.
,
Ge
,
S. S.
,
Chen
,
X. Q.
, and
Zhang
,
C.
, 1999, “
Adaptive Neural Network Control of Coordinated Manipulators
,”
J. Rob. Syst.
0741-2223,
16
(
4
), pp.
195
211
.
11.
Nakayama
,
T.
,
Arimoto
,
S.
, and
Naniwa
,
T.
, 1995, “
Coordinated Learning Control for Multiple Manipulators Holding an Object Rigidly
,”
Proceedings of the 1995 IEEE International Conference on Robotics and Automation
, Nagoya, Japan,
2
, pp.
1529
1534
.
12.
Schneider
,
S. A.
, and
Cannon
,
R. H.
, Jr.
, 1992, “
Object Impedance Control for Cooperative Manipulation: Theory and Experimental Results
,”
IEEE Trans. Rob. Autom.
1042-296X,
8
(
3
), pp.
383
394
.
13.
Zeng
,
H.
, and
Sepehri
,
N.
, 2005, “
Nonlinear Position Control of Cooperative Hydraulic Manipulators Handling Unknown Payloads
,”
Int. J. Control
0020-7179,
78
(
3
), pp.
196
207
.
14.
Mulder
,
M. C.
, and
Malladi
,
S. R.
, 1991, “
A Minimum Effort Control Algorithm for a Cooperating Sensor Driven Intelligent Multi-Jointed Robotic Arm
,”
Proceedings of the 30th IEEE Conference on Decision and Control
, Brighton, UK,
2
, pp.
1573
1578
.
15.
Zeng
,
H.
, and
Sepehri
,
N.
, 2007, “
On Tracking Control of Cooperative Hydraulic Manipulators
,”
Int. J. Control
0020-7179,
80
(
3
), pp.
454
469
.
16.
Sutton
,
R. S.
, and
Barto
,
A.
, 1998,
Reinforcement Learning: An Introduction
,
The MIT Press
,
Cambridge, MA
.
17.
Kaebling
,
L. P.
,
Littman
,
M. L.
, and
Moore
,
A. W.
, 1996, “
Reinforcement Learning: A Survey
,”
J. Artif. Intell. Res.
1076-9757,
4
, pp.
237
285
.
18.
Sutton
,
R. S.
, 1984, “
Temporal Credit Assignment in Reinforcement Learning
,” Ph.D. thesis, University of Massachusetts, Amherst.
19.
Russell
,
S.
, and
Norvig
,
P.
, 1995,
Artificial Intelligence: A Modern Approach
,
Prentice-Hall
,
Englewood Cliffs, NJ
.
20.
Watkins
,
C. J. C. H.
, 1989, “
Learning With Delayed Rewards
,” Ph.D. thesis, Cambridge University, Cambridge.
21.
Williams
,
R. J.
, 1992, “
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
,”
Mach. Learn.
0885-6125,
8
(
3
), pp.
229
256
.
22.
Barto
,
A.
,
Sutton
,
R. S.
, and
Anderson
,
C. W.
, 1983, “
Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems
,”
IEEE Trans. Syst. Man Cybern.
0018-9472,
13
, pp.
834
846
.
23.
Anderson
,
C. W.
, 1989, “
Learning to Control an Inverted Pendulum Using Neural Networks
,”
IEEE Control Syst. Mag.
0272-1708,
9
(
3
), pp.
31
37
.
24.
Gullapalli
,
V.
,
Franklin
,
J. A.
, and
Benbrahim
,
H.
, 1994, “
Acquiring Robot Skills via Reinforcement Learning
,”
IEEE Control Syst. Mag.
0272-1708,
14
, pp.
13
24
.
25.
Tzasfestas
,
S. G.
, and
Rigatos
,
G. G.
, 2002, “
Fuzzy Reinforcement Learning Control for Compliance Tasks of Robotic Manipulators
,”
IEEE Trans. Syst., Man, Cybern., Part B: Cybern.
1083-4419,
32
, pp.
107
113
.
26.
Stone
,
P.
, and
Veloso
,
M.
, 2000, “
Multiagent Systems: A Survey From a Machine Learning Perspective
,”
Auton. Rob.
0929-5593,
8
, pp.
345
383
.
27.
Gullapalli
,
V.
, 1990, “
Stochastic Reinforcement Learning Algorithm for Learning Real-Valued Functions
,”
Neural Networks
0893-6080,
3
(
6
), pp.
671
692
.
28.
Mataric
,
M. J.
, 1998, “
Using Communication to Reduce Locality in Distributed Multi-Agent Learning
,”
J. Exp. Theor. Artif. Intell.
0952-813X,
10
(
3
), pp.
357
369
.
29.
Merritt
,
H.
, 1967,
Hydraulic Control Systems
,
Wiley
,
New York
.
30.
Karpenko
,
M.
, and
Sepehri
,
N.
, 2003, “
Robust Position Control of an Electrohydraulic Actuator With a Faulty Actuator Piston Seal
,”
ASME J. Dyn. Syst., Meas., Control
0022-0434,
125
(
3
), pp.
413
423
.
31.
Bishop
,
C. M.
, 1995,
Neural Networks for Pattern Recognition
,
Oxford University Press
,
New York
.
You do not currently have access to this content.