python实现q-learning算法
2021/10/12 22:15:04
本文主要是介绍python实现q-learning算法,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
关于q-learning算法,可参照以下博客,我只是复现作者的算法,如有错误,请私信改正。
A Painless Q-learning Tutorial (一个 Q-learning 算法的简明教程)_peghoty-CSDN博客
import numpy as np import pandas as pd class QL: def __init__(self, actions, gamma=0.8, e_greedy=0.9): self.actions = actions self.gamma = gamma self.e_greedy = e_greedy self.q_table = pd.DataFrame(columns=actions, dtype=np.float64) #行为状态,列是动作,这里简化了,同时也是下一个状态 def choose_action(self, state): self.check_state(state) if(np.random.uniform(0, 1)<self.e_greedy): action_list = self.q_table.loc[state, :] # 取出当前state的行 action = action_list[action_list == action_list.max()].index #找最大值的动作,可能有多个,比如刚开始都是0. action = np.random.choice(action) # 在最大值中随机选择一个动作。 else: action = np.random.choice(self.actions) return action def learn(self, state_now, state_next, reward_value): state_next_list = self.q_table.loc[state_next, :] self.q_table.loc[state_now, state_next] = reward_value + self.gamma * state_next_list.max() def check_state(self, state): if state not in self.q_table.index: self.q_table = self.q_table.append(pd.Series([0]*len(self.actions), index=self.actions,name=state)) terminal = 5 # 出口在5 times = 1000 # 进行1000次的尝试 actions = np.array([0,1,2,3,4,5]) #6个动作 agent = QL(actions) # 6个动作 reward_rect = [[-1,-1,-1,-1,0,-1], [-1,-1,-1,0,-1,100], [-1,-1,-1,0,-1,-1], [-1,0,0,-1,0,-1], [0,-1,-1,0,-1,100], [-1,0,-1,-1,0,100]] reward_rect = np.array(reward_rect) # 收益矩阵R for episode in range(times): state_now = np.random.choice(agent.actions) # 每次开始都随机选择一个初始状态作为起点 agent.check_state(state_now) # 检查有没有到过这个状态 route = [state_now] # 该列表记录到达过的状态,运行的时候,可以先把迭代次数调至50,把route可以打印出来看看(在下面取消注释就行了) # 开始探索 while True: state_next = agent.choose_action(state_now) # 寻找下一状态 agent.check_state(state_next) route.append(state_next) if(reward_rect[state_now, state_next]==-1): break agent.learn(state_now, state_next, reward_rect[state_now, state_next]) if(state_next==terminal): break state_now = state_next # print("行走路线:", route) print(agent.q_table)
这篇关于python实现q-learning算法的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2025-01-03用FastAPI掌握Python异步IO:轻松实现高并发网络请求处理
- 2025-01-02封装学习:Python面向对象编程基础教程
- 2024-12-28Python编程基础教程
- 2024-12-27Python编程入门指南
- 2024-12-27Python编程基础
- 2024-12-27Python编程基础教程
- 2024-12-27Python编程基础指南
- 2024-12-24Python编程入门指南
- 2024-12-24Python编程基础入门
- 2024-12-24Python编程基础:变量与数据类型