资源简介
强化学习倒摆程序 是matlab程序,使用AHC算法,结构简单易懂,初学者的好资料
代码片段和文件信息
/*----------------------------------------------------------------------
This file contains a simulation of the cart and pole dynamic system and
a procedure for learning to balance the pole. Both are described in
Barto Sutton and Anderson “Neuronlike Adaptive Elements That Can Solve
Difficult Learning Control Problems“ IEEE Trans. Syst. Man Cybern.
Vol. SMC-13 pp. 834--846 Sept.--Oct. 1983 and in Sutton “Temporal
Aspects of Credit Assignment in Reinforcement Learning“ PhD
Dissertation Department of Computer and Information Science University
of Massachusetts Amherst 1984. The following routines are included:
main: controls simulation interations and implements
the learning system.
cart_and_pole: the cart and pole dynamics; given action and
current state estimates next state
get_box: The cart-pole‘s state space is divided into 162
boxes. get_box returns the index of the box into
which the current state appears.
These routines were written by Rich Sutton and Chuck Anderson. Claude Sammut
translated parts from Fortran to C. Please address correspondence to
sutton@gte.com or anderson@cs.colostate.edu
---------------------------------------
Changes:
1/93: A bug was found and fixed in the state -> box mapping which resulted
in array addressing outside the range of the array. It‘s amazing this
program worked at all before this bug was fixed. -RSS
----------------------------------------------------------------------*/
#include
#define min(x y) ((x <= y) ? x : y)
#define max(x y) ((x >= y) ? x : y)
#define prob_push_right(s) (1.0 / (1.0 + exp(-max(-50.0 min(s 50.0)))))
#define random ((float) rand() / (float)((1 << 31) - 1))
#define N_BOXES 162 /* Number of disjoint boxes of state space. */
#define ALPHA 1000 /* Learning rate for action weights w. */
#define BETA 0.5 /* Learning rate for critic weights v. */
#define GAMMA 0.95 /* Discount factor for critic. */
#define LAMBDAw 0.9 /* Decay rate for w eligibility trace. */
#define LAMBDAv 0.8 /* Decay rate for v eligibility trace. */
#define MAX_FAILURES 100 /* Termination criterion. */
#define MAX_STEPS 100000
typedef float vector[N_BOXES];
main()
{
float x /* cart position meters */
x_dot /* cart velocity */
theta /* pole angle radians */
theta_dot; /* pole angular velocity */
vector w /* vector of action weights */
v /* vector of critic weights */
e /* vector of action weight eligibilities */
xbar; /* vector of critic weight eligibilities */
float p oldp rhat r;
int box i y steps = 0 failures=0 failed;
printf(“Seed? “);
scanf(“相关资源
- sutton强化学习随书MATLAB代码
- 强化学习代码,2016版,matlab
- 强化学习matlab源代码289697
- Matlab强化学习_网格迷宫问题_SarsaLam
- MATLAB强化学习_多臂赌机问题_时变eg
- MATLAB强化学习_多臂赌机问题_softmax策
- matlab强化学习平衡杆代码
- 强化学习Qlearning算法matlab
- 强化学习matlab代码
- 强化学习_倒立摆_Matlab程序
- 强化学习matlab源代码
- 基于强化学习的路径规划
- Q学习算法来实现的机器人路径规划
- suntton-RL-book-demo sutton强化学习书籍的
- MobileRobotSimQ 使用Q学习的强化学习算法
- inverted-pendulum-control 利用强化学习的自
- Q-Learning Q 学习方面的MATLAB程序
- Q强化学习matlab源代码
川公网安备 51152502000135号
评论
共有 条评论