強化学習とは？わかりやすく解説

辞書 類語・対義語辞典 英和・和英辞典 日中中日辞典 日韓韓日辞典 古語辞典

その他の辞書▼
- 手話辞典
- インドネシア語辞典
- タイ語辞典
- ベトナム語辞典

Weblio 辞書ヘルプ

556の専門辞書や国語辞典百科事典から一度に検索!

無料の翻訳ならWeblio翻訳！

初めての方へ参加元一覧

Weblio 辞書 > 同じ種類の言葉 > 学問 > 教育 > 学習 > 強化学習の意味・解説

デジタル大辞泉

索引トップ用語の索引ランキング凡例

きょうか‐がくしゅう〔キヤウクワガクシフ〕【強化学習】

読み方：きょうかがくしゅう

人工知能における、コンピューターによる機械学習の一種。解決すべき課題に対し、より正しい結果を得るため、試行錯誤を通じて自ら得られる報酬が最大化するよう学習を進める。報酬は、確率的にある程度遅れてもたらされる。学習速度が遅く、適切なアルゴリズムの設計が難しいが、現実世界に近い不確実性のある環境や条件の下で、最適な方策を自ら獲得する特長をもつ。→教師あり学習

ウィキペディア

索引トップ用語の索引ランキングカテゴリー

強化学習

出典: フリー百科事典『ウィキペディア（Wikipedia）』 (2024/01/07 06:46 UTC 版)

Category:機械学習

^ Kaelbling, Leslie P.; Littman, Michael L.; Moore, Andrew W. (1996). “Reinforcement Learning: A Survey”. Journal of Artificial Intelligence Research 4: 237–285. arXiv:cs/9605103. doi:10.1613/jair.301. オリジナルの2001-11-20時点におけるアーカイブ。.
^ van Otterlo, M.; Wiering, M. (2012). Reinforcement learning and markov decision processes. Adaptation, Learning, and Optimization. 12. 3–42. doi:10.1007/978-3-642-27645-3_1. ISBN 978-3-642-27644-6
^ Russell, Stuart J.; Norvig, Peter (2010). Artificial intelligence : a modern approach (Third ed.). Upper Saddle River, New Jersey. pp. 830, 831. ISBN 978-0-13-604259-4
^ Lee, Daeyeol; Seo, Hyojung; Jung, Min Whan (21 July 2012). “Neural Basis of Reinforcement Learning and Decision Making”. Annual Review of Neuroscience 35 (1): 287–308. doi:10.1146/annurev-neuro-062111-150512. PMC 3490621. PMID 22462543.
^ Xie, Zhaoming, et al. "ALLSTEPS: Curriculum‐driven Learning of Stepping Stone Skills." Computer Graphics Forum. Vol. 39. No. 8. 2020.
^ Sutton & Barto 1998, Chapter 11.
^ Gosavi, Abhijit (2003). Simulation-based Optimization: Parametric Optimization Techniques and Reinforcement. Operations Research/Computer Science Interfaces Series. Springer. ISBN 978-1-4020-7454-7
^ ^a ^b Burnetas, Apostolos N.; Katehakis, Michael N. (1997), “Optimal adaptive policies for Markov Decision Processes”, Mathematics of Operations Research 22: 222–255, doi:10.1287/moor.22.1.222
^ Tokic, Michel; Palm, Günther (2011), “Value-Difference Based Exploration: Adaptive Control Between Epsilon-Greedy and Softmax”, KI 2011: Advances in Artificial Intelligence, Lecture Notes in Computer Science, 7006, Springer, pp. 335–346, ISBN 978-3-642-24455-1
^ ^a ^b “Reinforcement learning: An introduction”. 2023年5月12日閲覧。
^ Sutton, Richard S. (1984). Temporal Credit Assignment in Reinforcement Learning (PhD thesis). University of Massachusetts, Amherst, MA.
^ Sutton & Barto 1998, §6. Temporal-Difference Learning.
^ Bradtke, Steven J.; Barto, Andrew G. (1996). “Learning to predict by the method of temporal differences”. Machine Learning 22: 33–57. doi:10.1023/A:1018056104778.
^ Watkins, Christopher J.C.H. (1989). Learning from Delayed Rewards (PDF) (PhD thesis). King’s College, Cambridge, UK.
^ Matzliach, Barouch; Ben-Gal, Irad; Kagan, Evgeny (2022). “Detection of Static and Mobile Targets by an Autonomous Agent with Deep Q-Learning Abilities”. Entropy 24 (8): 1168. Bibcode: 2022Entrp..24.1168M. doi:10.3390/e24081168. PMC 9407070. PMID 36010832.
^ Williams, Ronald J. (1987). "A class of gradient-estimating algorithms for reinforcement learning in neural networks". Proceedings of the IEEE First International Conference on Neural Networks. CiteSeerX 10.1.1.129.8871。
^ Peters, Jan; Vijayakumar, Sethu; Schaal, Stefan (2003). "Reinforcement Learning for Humanoid Robotics" (PDF). IEEE-RAS International Conference on Humanoid Robots.
^ Juliani, Arthur (2016年12月17日). “Simple Reinforcement Learning with Tensorflow Part 8: Asynchronous Actor-Critic Agents (A3C)”. Medium. 2018年2月22日閲覧。
^ Deisenroth, Marc Peter; Neumann, Gerhard; Peters, Jan (2013). A Survey on Policy Search for Robotics. Foundations and Trends in Robotics. 2. NOW Publishers. pp. 1–142. doi:10.1561/2300000021. hdl:10044/1/12051
^ Sutton, Richard (1990). "Integrated Architectures for Learning, Planning and Reacting based on Dynamic Programming". Machine Learning: Proceedings of the Seventh International Workshop.
^ Lin, Long-Ji (1992). "Self-improving reactive agents based on reinforcement learning, planning and teaching" (PDF). Machine Learning volume 8. doi:10.1007/BF00992699。
^ van Hasselt, Hado; Hessel, Matteo; Aslanides, John (2019). "When to use parametric models in reinforcement learning?" (PDF). Advances in Neural Information Processing Systems 32.
^ “On the Use of Reinforcement Learning for Testing Game Mechanics : ACM - Computers in Entertainment” (英語). cie.acm.org. 2018年11月27日閲覧。
^ Riveret, Regis; Gao, Yang (2019). “A probabilistic argumentation framework for reinforcement learning agents” (英語). Autonomous Agents and Multi-Agent Systems 33 (1–2): 216–274. doi:10.1007/s10458-019-09404-2.
^ Yamagata, Taku; McConville, Ryan; Santos-Rodriguez, Raul (16 November 2021). "Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills". arXiv:2111.08596 [cs.LG]。
^ Kulkarni, Tejas D.; Narasimhan, Karthik R.; Saeedi, Ardavan; Tenenbaum, Joshua B. (2016). “Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation”. Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS'16 (USA: Curran Associates Inc.): 3682–3690. arXiv:1604.06057. Bibcode: 2016arXiv160406057K. ISBN 978-1-5108-3881-9.
^ “Reinforcement Learning / Successes of Reinforcement Learning”. umichrl.pbworks.com. 2017年8月6日閲覧。
^ Quested, Tony. “Smartphones get smarter with Essex innovation”. Business Weekly. 2021年6月17日閲覧。
^ Dey, Somdip; Singh, Amit Kumar; Wang, Xiaohang; McDonald-Maier, Klaus (March 2020). “User Interaction Aware Reinforcement Learning for Power and Thermal Efficiency of CPU-GPU Mobile MPSoCs”. 2020 Design, Automation Test in Europe Conference Exhibition (DATE): 1728–1733. doi:10.23919/DATE48585.2020.9116294. ISBN 978-3-9819263-4-7.
^ Williams, Rhiannon (2020年7月21日). “Future smartphones 'will prolong their own battery life by monitoring owners' behaviour'” (英語). i. 2021年6月17日閲覧。
^ Kaplan, F.; Oudeyer, P. (2004). “Maximizing learning progress: an internal reward system for development”. In Iida, F.; Pfeifer, R.; Steels, L. et al.. Embodied Artificial Intelligence. Lecture Notes in Computer Science. 3139. Berlin; Heidelberg: Springer. pp. 259–270. doi:10.1007/978-3-540-27833-7_19. ISBN 978-3-540-22484-6
^ Klyubin, A.; Polani, D.; Nehaniv, C. (2008). “Keep your options open: an information-based driving principle for sensorimotor systems”. PLOS ONE 3 (12): e4018. Bibcode: 2008PLoSO...3.4018K. doi:10.1371/journal.pone.0004018. PMC 2607028. PMID 19107219.
^ Barto, A. G. (2013). “Intrinsic motivation and reinforcement learning”. Intrinsically Motivated Learning in Natural and Artificial Systems. Berlin; Heidelberg: Springer. pp. 17–47
^ Dabérius, Kevin; Granat, Elvin; Karlsson, Patrik (2020). “Deep Execution - Value and Policy Based Reinforcement Learning for Trading and Beating Market Benchmarks”. The Journal of Machine Learning in Finance 1. SSRN 3374766.
^ George Karimpanal, Thommen; Bouffanais, Roland (2019). “Self-organizing maps for storage and transfer of knowledge in reinforcement learning” (英語). Adaptive Behavior 27 (2): 111–126. arXiv:1811.08318. doi:10.1177/1059712318818568. ISSN 1059-7123.
^ Soucek, Branko (6 May 1992). Dynamic, Genetic and Chaotic Programming: The Sixth-Generation Computer Technology Series. John Wiley & Sons, Inc. p. 38. ISBN 0-471-55717-X
^ Francois-Lavet, Vincent (2018). “An Introduction to Deep Reinforcement Learning”. Foundations and Trends in Machine Learning 11 (3–4): 219–354. arXiv:1811.12560. Bibcode: 2018arXiv181112560F. doi:10.1561/2200000071.
^ Mnih, Volodymyr (2015). “Human-level control through deep reinforcement learning”. Nature 518 (7540): 529–533. Bibcode: 2015Natur.518..529M. doi:10.1038/nature14236. PMID 25719670.
^ Goodfellow, Ian; Shlens, Jonathan; Szegedy, Christian (2015). “Explaining and Harnessing Adversarial Examples”. International Conference on Learning Representations. arXiv:1412.6572.
^ Behzadan, Vahid; Munir, Arslan (2017). “Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks”. International Conference on Machine Learning and Data Mining in Pattern Recognition. Lecture Notes in Computer Science 10358: 262–275. arXiv:1701.04143. doi:10.1007/978-3-319-62416-7_19. ISBN 978-3-319-62415-0.
^ Pieter, Huang, Sandy Papernot, Nicolas Goodfellow, Ian Duan, Yan Abbeel (2017-02-07). Adversarial Attacks on Neural Network Policies. OCLC 1106256905
^ Korkmaz, Ezgi (2022). “Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs.”. Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22) 36 (7): 7229–7238. doi:10.1609/aaai.v36i7.20684.
^ Berenji, H.R. (1994). “Fuzzy Q-learning: a new approach for fuzzy dynamic programming”. Proc. IEEE 3rd International Fuzzy Systems Conference (Orlando, FL, USA: IEEE): 486–491. doi:10.1109/FUZZY.1994.343737. ISBN 0-7803-1896-X.
^ Vincze, David (2017). “Fuzzy rule interpolation and reinforcement learning”. 2017 IEEE 15th International Symposium on Applied Machine Intelligence and Informatics (SAMI). IEEE. pp. 173–178. doi:10.1109/SAMI.2017.7880298. ISBN 978-1-5090-5655-2
^ Ng, A. Y.; Russell, S. J. (2000). “Algorithms for Inverse Reinforcement Learning”. Proceeding ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning. pp. 663–670. ISBN 1-55860-707-2
^ García, Javier; Fernández, Fernando (1 January 2015). “A comprehensive survey on safe reinforcement learning”. The Journal of Machine Learning Research 16 (1): 1437–1480.

[続きの解説]

「強化学習」の続きの解説一覧

ウィキペディア小見出し辞書

索引トップ用語の索引ランキング

強化学習

出典: フリー百科事典『ウィキペディア（Wikipedia）』 (2022/05/05 04:50 UTC 版)

「マルコフ決定過程」の記事における「強化学習」の解説

「強化学習」および「Q学習」も参照状態遷移確率 T ( s , a , s ′ ) {\displaystyle T(s,a,s')} や報酬関数 R ( s , a , s ′ ) {\displaystyle R(s,a,s')} が未知の場合，環境との相互作用を通じてこれらの情報を得ながら行動を決定する必要がしばしば生じる．このような問題は強化学習の枠組みで議論される．強化学習における代表的な学習アルゴリズムはQ学習と呼ばれるものである。Q学習では、行動価値関数 (action-value function) と呼ばれる関数 Q π ( s , a ) {\displaystyle Q^{\pi }(s,a)} に着目する。ここで Q π ( s , a ) {\displaystyle Q^{\pi }(s,a)} は次のように定義される: Q π ( s , a ) = E π [ ∑ t = 0 ∞ γ t r t + 1 | s 0 = s , a 0 = a ] {\displaystyle Q^{\pi }(s,a)=\mathbb {E} _{\pi }[\sum _{t=0}^{\infty }\gamma ^{t}r_{t+1}|s_{0}=s,a_{0}=a]} いま，最適政策のもとでの行動価値関数 Q ∗ ( s , a ) = max π Q π ( s , a ) {\displaystyle Q^{*}(s,a)=\max _{\pi }Q^{\pi }(s,a)} は V ∗ ( s ) = max a Q ∗ ( s , a ) {\displaystyle V^{*}(s)=\max _{a}Q^{*}(s,a)} を満たす。すなわち、 Q ∗ {\displaystyle Q^{*}} を学習することができれば（モデルのパラメータを直接求めることなく）最適政策を獲得することができる。Q学習では、各試行における遷移前後の状態と入力、および試行で得られる即時報酬の実現値をもとに Q ( s , a ) {\displaystyle Q(s,a)} の値を逐次更新する。実際の学習プロセスでは、すべての状態を十分サンプリングするため確率的なゆらぎを含むよう学習時の行動が選択される。強化学習では最適化に必要なパラメータの学習を状態遷移確率・報酬関数を介することなくおこなうことが出来る（価値反復法や政策反復法ではそれらの明示的な仕様（各状態間の遷移可能性，報酬関数の関数形など）を与える必要がある）。状態数（および行動の選択肢）が膨大な場合、強化学習はしばしばニューラルネットワークなどの関数近似と組み合わせられる。

※この「強化学習」の解説は、「マルコフ決定過程」の解説の一部です。
「強化学習」を含む「マルコフ決定過程」の記事については、「マルコフ決定過程」の概要を参照ください。

強化学習

出典: フリー百科事典『ウィキペディア（Wikipedia）』 (2022/04/15 15:33 UTC 版)

「機械学習」の記事における「強化学習」の解説

周囲の環境を観測することでどう行動すべきかを学習する。行動によって必ず環境に影響を及ぼし、環境から報酬という形でフィードバックを得ることで学習アルゴリズムのガイドとする。例えばQ学習がある。

※この「強化学習」の解説は、「機械学習」の解説の一部です。
「強化学習」を含む「機械学習」の記事については、「機械学習」の概要を参照ください。

強化学習

出典: フリー百科事典『ウィキペディア（Wikipedia）』 (2020/12/31 03:06 UTC 版)

「モンテカルロ法」の記事における「強化学習」の解説

詳細は「強化学習」を参照機械学習の強化学習の文脈では、モンテカルロ法とは行動によって得られた報酬経験だけを頼りに状態価値、行動価値を推定する方法のことを指す。

※この「強化学習」の解説は、「モンテカルロ法」の解説の一部です。
「強化学習」を含む「モンテカルロ法」の記事については、「モンテカルロ法」の概要を参照ください。

強化学習

出典: フリー百科事典『ウィキペディア（Wikipedia）』 (2022/08/04 15:53 UTC 版)

「機械学習」の記事における「強化学習」の解説

強化学習（きょうかがくしゅう、英: reinforcement learning）とは、ある環境内におけるエージェントが、現在の状態を観測し、取るべき行動を決定する問題を扱う機械学習の一種。エージェントは行動を選択することで環境から報酬を得る。強化学習は一連の行動を通じて報酬が最も多く得られるような方策（policy）を学習する。環境はマルコフ決定過程として定式化される。代表的な手法としてTD学習やQ学習が知られている。強化学習とは、試行錯誤を通じて「価値を最大化するような行動」を学習する手法あらかじめ正しい答えが分かっていなくても(=教師データが存在しない) 学習が可能対戦ゲームやロボットなどでの応用例が多い深層学習を用いた強化学習のことを深層強化学習(deep reinforcement learning)という強化学習という名前は、Skinner 博士の提唱した脳の学習メカニズムであるオペラント学習に由来する Skinner 博士は、スキナー箱と呼ばれるラット実験によって、「特定の動作に対して報酬を与えると、その動作が強化される」ことを発見し、これをオペラント学習と呼んだ (1940年頃)

※この「強化学習」の解説は、「機械学習」の解説の一部です。
「強化学習」を含む「機械学習」の記事については、「機械学習」の概要を参照ください。

ウィキペディア小見出し辞書の「強化学習」の項目はプログラムで機械的に意味や本文を生成しているため、不適切な項目が含まれていることもあります。ご了承くださいませ。お問い合わせ。

強化学習と同じ種類の言葉

学習に関連する言葉

交流学習組織学習(そしきがくしゅう) 強化学習プログラム学習(プログラムがくしゅう) 教師あり学習

>>同じ種類の言葉 >>教育に関連する言葉

>> 「強化学習」を含む用語の索引
強化学習のページへのリンク

辞書ショートカット

1 デジタル大辞泉
2 ウィキペディア
3 ウィキペディア小見出し辞書

カテゴリ一覧

＋ビジネス

＋業界用語

＋コンピュータ

＋自動車・バイク

＋船

＋建築・不動産

＋ヘルスケア

＋スポーツ

＋辞書・百科事典

すべての辞書の索引

Weblioのサービス

「強化学習」の関連用語

1

ウィキペディア小見出し辞書

92% |||||

2

動的計画法

ウィキペディア小見出し辞書

92% |||||

3

ウィキペディア小見出し辞書

76% |||||

4

ウィキペディア小見出し辞書

74% |||||

5

ウィキペディア小見出し辞書

74% |||||

6

強化学習のための自己対局の局数

ウィキペディア小見出し辞書

58% |||||

7

機能の発展

ウィキペディア小見出し辞書

56% |||||

8

マルコフ決定過程

ウィキペディア小見出し辞書

52% |||||

9

ウィキペディア小見出し辞書

52% |||||

10

教師無し学習 デジタル大辞泉

50% |||||

強化学習のお隣キーワード

強化型マシンガンブレード

強化型ロケットパンチ

強化外骨格

強化外骨格 PLM-7

強化外骨格・雫

強化学習

強化学習のための自己対局の局数

強化専用効果カード

強化形態のウバウゾー

強化必殺技

強化忍者兵

検索ランキング

強化学習のページの著作権
Weblio 辞書情報提供元は参加元一覧にて確認できます。


	(C)Shogakukan Inc. 株式会社小学館
	All text is available under the terms of the GNU Free Documentation License. この記事は、ウィキペディアの強化学習 (改訂履歴)の記事を複製、再配布したものにあたり、GNU Free Documentation Licenseというライセンスの下で提供されています。 Weblio辞書に掲載されているウィキペディアの記事も、全てGNU Free Documentation Licenseの元に提供されております。
	Text is available under GNU Free Documentation License (GFDL). Weblio辞書に掲載されている「ウィキペディア小見出し辞書」の記事は、Wikipediaのマルコフ決定過程 (改訂履歴)、機械学習 (改訂履歴)、モンテカルロ法 (改訂履歴)の記事を複製、再配布したものにあたり、GNU Free Documentation Licenseというライセンスの下で提供されています。

ビジネス｜業界用語｜コンピュータ｜電車｜自動車・バイク｜船｜工学｜建築・不動産｜学問
 文化｜生活｜ヘルスケア｜趣味｜スポーツ｜生物｜食品｜人名｜方言｜辞書・百科事典

ご利用にあたって

・Weblio辞書とは

・検索の仕方

・利用規約

・プライバシーポリシー

・サイトマップ

便利な機能

・ウェブリオのアプリ

・画像から探す

お問合せ・ご要望

・お問い合わせ

会社概要

・公式企業ページ

・会社情報

・採用情報

ウェブリオのサービス

・Weblio 辞書

・類語・対義語辞典

・英和辞典・和英辞典

・Weblio翻訳

・日中中日辞典

・日韓韓日辞典

・フランス語辞典

・インドネシア語辞典

・タイ語辞典

・ベトナム語辞典

・古語辞典

・手話辞典

・IT用語辞典バイナリ

・おすすめのプログラミングスクール情報「Livifun」

©2024 GRAS Group, Inc.RSS