Skip to main content

2 posts tagged with "强化学习"

View All Tags

强化学习 - 基本组件

· 8 min read
PuQing
AI, CVer, Pythoner, Half-stack Developer

image.png

info

The main characters of RL are the agent and the environment. The environment is the world that the agent lives in and interacts with. At every step of interaction, the agent sees a (possibly partial) observation of the state of the world, and then decides on an action to take. The environment changes when the agent acts on it, but may also change on its own.

The agent also perceives a reward signal from the environment, a number that tells it how good or bad the current world state is. The goal of the agent is to maximize its cumulative reward, called return. Reinforcement learning methods are ways that the agent can learn behaviors to achieve its goal.

Reparameterization Trick

· 5 min read
PuQing
AI, CVer, Pythoner, Half-stack Developer

Motivation

假设我们有个在参数 θ\theta 下的正态分布 qq。我们想要求解下面这样一个问题

minθEq[f(x)]\min_{\theta} E_{q}[f(x)]

其中 Eq[f(x)]E_{q}[f(x)] 的意思是求满足 qq 分布下的随机变量函数 f(x)f(x) 的均值,而最外层的 minθ\min_{\theta} 则是求使得该均值最小时θ\theta