Understanding Reward | Reinforcement, Supervised, Unsupervised Learning

Reward

In reinforcement learning (RL), a reward is a scalar feedback signal that an agent receives from the environment after taking an action. It quantifies how good or bad the outcome of the action was in a particular state. Rewards are the primary mechanism by which the environment communicates success or failure to the agent. For example, in a game, winning might give a reward of +1, losing might give -1, and all other states might give 0.

The agent’s ultimate goal is to choose actions over time in a way that maximizes the cumulative reward, often referred to as return. Depending on the task, this might be the total reward received over a fixed number of steps (episodic tasks), or an ongoing sum of rewards discounted over time (continuing tasks). The reward function is typically predefined by the designer and encodes the task objective. It’s crucial that it aligns well with the desired behavior — otherwise, the agent may learn unintended strategies. In essence, the reward is what drives learning and behavior in reinforcement learning systems.

Reward

Mentioned in blog posts: