【資料總結(jié)】| Deep Reinforcement Learning 深度強(qiáng)化學(xué)習(xí)

　　在機(jī)器學(xué)習(xí)中，我們經(jīng)常會(huì)分類(lèi)為有監(jiān)督學(xué)習(xí)和無(wú)監(jiān)督學(xué)習(xí)，但是嘗嘗會(huì)忽略一個(gè)重要的分支，強(qiáng)化學(xué)習(xí)。有監(jiān)督學(xué)習(xí)和無(wú)監(jiān)督學(xué)習(xí)非常好去區(qū)分，學(xué)習(xí)的目標(biāo)，有無(wú)標(biāo)簽等都是區(qū)分標(biāo)準(zhǔn)。如果說(shuō)監(jiān)督學(xué)習(xí)的目標(biāo)是預(yù)測(cè)，那么強(qiáng)化學(xué)習(xí)就是決策，它通過(guò)對(duì)周?chē)沫h(huán)境不斷的更新?tīng)顟B(tài)，給出獎(jiǎng)勵(lì)或者懲罰的措施，來(lái)不斷調(diào)整并給出新的策略。簡(jiǎn)單來(lái)說(shuō)，就像小時(shí)候你在不該吃零食的時(shí)間偷吃了零食，你媽媽知道了會(huì)對(duì)你做出懲罰，那么下一次就不會(huì)犯同樣的錯(cuò)誤，如果遵守規(guī)則，那你媽媽興許會(huì)給你一些獎(jiǎng)勵(lì)，最終的目標(biāo)都是希望你在該吃飯的時(shí)候吃飯，該吃零食的時(shí)候吃零食，而不是在不合適的時(shí)間吃零食。同樣，曾經(jīng)風(fēng)靡過(guò)一段時(shí)間的Flappy bird，很多玩家在短時(shí)間內(nèi)達(dá)到了高分，是怎么做到的呢？除了非常厲害的玩家是真的自己手動(dòng)玩的高分，其實(shí)很多高分是通過(guò)我們用強(qiáng)化學(xué)習(xí)的方法來(lái)訓(xùn)練一個(gè)模型，讓小鳥(niǎo)自己學(xué)習(xí)如何不碰到障礙物一直往前飛，獲得最高分。此外，大家熟知的Alpha Go，其實(shí)也是強(qiáng)化學(xué)習(xí)訓(xùn)練的模型，不過(guò)是深度強(qiáng)化學(xué)習(xí)。

　　2013年DeepMind發(fā)表了一篇Playing Atari with Deep Reinforcement Learning 文章之后，深度強(qiáng)化學(xué)習(xí)便慢慢走入人們的視野。后來(lái)，在2015年，DeepMind又發(fā)表了一篇Human Level Control through Deep Reinforcement Learning，使得深度強(qiáng)化學(xué)習(xí)得到了廣泛的關(guān)注，當(dāng)年涌現(xiàn)了很多學(xué)術(shù)成果。我們熟知深度強(qiáng)化學(xué)習(xí)應(yīng)該在16~17年的時(shí)候，尤其是Alpha Go出現(xiàn)后，很多公司和研究人員開(kāi)始大量關(guān)注深度強(qiáng)化學(xué)習(xí)，并嘗試將其應(yīng)用在各個(gè)不同的應(yīng)用場(chǎng)景。

　　關(guān)于深度強(qiáng)化學(xué)習(xí)，我整理了一些資料，如果感興趣的可以學(xué)習(xí)一下（部分鏈接需要科學(xué)上網(wǎng)）：

【論文】

《深度強(qiáng)化學(xué)習(xí)》英文版(150頁(yè))，是上一版（70頁(yè)）的加強(qiáng)版：https://arxiv.org/abs/1810.06339
經(jīng)典書(shū)籍：Reinforcement Learning: An Introduction (2nd Edition)
論文集，覆蓋面比較廣，需要一定基礎(chǔ)：Reinforcement Learning: State-of-the-Art
兩個(gè)非常全的論文資料集合：

GitHub - junhyukoh/deep-reinforcement-learning-papers: A list of recent papers regarding deep reinforcement learning
GitHub - muupan/deep-reinforcement-learning-papers: A list of papers and resources dedicated to deep reinforcement learning

【博客】

yuxili: https://medium.com/@yuxili
Guest Post (Part I): Demystifying Deep Reinforcement Learning
Guest Post (Part II): Deep Reinforcement Learning with Neon
Blog Post (Part III): Deep Reinforcement Learning with OpenAI Gym
Andrej Karpathy blog: Deep Reinforcement Learning: Pong from Pixels

【文章】

南京大學(xué)俞楊博士：強(qiáng)化學(xué)習(xí)前言（強(qiáng)化學(xué)習(xí)的完整介紹）https://www.leiphone.com/news/201705/uO8nd09EnR77NBRP.html

【課程】

零基礎(chǔ)入門(mén)：莫煩python：https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/
David Silver的增強(qiáng)學(xué)習(xí)課程（有視頻和ppt），2015年的，需要一定基礎(chǔ): http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html
最好的增強(qiáng)學(xué)習(xí)教材，可以結(jié)合David Silver的課程一起看：Sutton & Barto Book: Reinforcement Learning: An Introduction
斯坦福CS234:http://web.stanford.edu/class/cs234/index.html
伯克利CS294:http://rll.berkeley.edu/deeprlcourse/
Pieter Abbeel 的AI課程（包含增強(qiáng)學(xué)習(xí)，使用Pacman實(shí)驗(yàn)）：Artificial Intelligence
Pieter Abbeel 的深度增強(qiáng)學(xué)習(xí)課程：CS 294 Deep Reinforcement Learning, Fall 2015
Nando de Freitas的深度學(xué)習(xí)課程（有視頻有ppt有作業(yè)）：Machine Learning
Michael Littman的增強(qiáng)學(xué)習(xí)課程：https://www.udacity.com/course/reinforcement-learning–ud600
最新機(jī)器人專(zhuān)題課程Penn（2016年開(kāi)課）：Specialization
Deep Learning Summer School：ppts videos

【學(xué)習(xí)網(wǎng)站】

openAI GYM Reinforcement Learning toolkits: https://gym.openai.com
強(qiáng)化學(xué)習(xí)示例演示：https://qqiang00.github.io/reinforce/javascript/demo_iteration.html
karpathy的各種強(qiáng)化學(xué)習(xí)的演示：https://cs.stanford.edu/people/karpathy/reinforcejs/index.html
MIT的強(qiáng)化學(xué)習(xí)在線(xiàn)學(xué)習(xí)網(wǎng)站：http://web.mst.edu/~gosavia/rl_website.html

【Github】

Awesome-RL: https://github.com/aikorea/awesome-rl
Flappybird：https://github.com/yenchenlin/DeepLearningFlappyBird
Deep Reinforcement Learning in Tensorflow:https://github.com/carpedm20/deep-rl-tensorflow
https://github.com/ShangtongZhang/reinforcement-learning-an-introduction
GitHub - songrotek/DeepTerrainRL: terrain-adaptive locomotion skills using deep reinforcement learning
GitHub - songrotek/async-rl: An attempt to reproduce the results of "Asynchronous Methods for Deep Reinforcement Learning" (http://arxiv.org/abs/1602.01783)
GitHub - songrotek/rllab: rllab is a framework for developing and evaluating reinforcement learning algorithms.
GitHub - songrotek/DRL-FlappyBird: Playing Flappy Bird Using Deep Reinforcement Learning (Based on Deep Q Learning DQN using Tensorflow)
GitHub - songrotek/DeepMind-Atari-Deep-Q-Learner: The original code from the DeepMind article + my tweaks

【會(huì)議】

NIPS 2015 Deep Reinforcement Learning Workshop
ICLR 2016
RSS 2016 Deep Learning for Robotics

　　歡迎大家收藏和補(bǔ)充~

posted @ 2019-01-23 20:26 Charlotte77 閱讀(8589) 評(píng)論(3) 收藏舉報(bào)

刷新頁(yè)面返回頂部

Charlotte77

數(shù)學(xué)系的數(shù)據(jù)挖掘民工(公眾號(hào):CharlotteDataMining，深度學(xué)習(xí)技術(shù)交流qq群:339120614)最新深度學(xué)習(xí)免費(fèi)學(xué)習(xí)視頻請(qǐng)移步我的B站：https://www.bilibili.com/video/av75414647

【資料總結(jié)】| Deep Reinforcement Learning 深度強(qiáng)化學(xué)習(xí)

作者：Charlotte77

出處：http://www.rzrgm.cn/charlotte77/

本文以學(xué)習(xí)、研究和分享為主，如需轉(zhuǎn)載，請(qǐng)聯(lián)系本人，標(biāo)明作者和出處，非商業(yè)用途！

關(guān)注【Charlotte數(shù)據(jù)挖掘】回復(fù) '資料' 獲取深度學(xué)習(xí)優(yōu)質(zhì)資料

公告