Demote headers to avoid first-level as <h1>

2020-05-26 17:21:53 +02:00 · 2020-05-26 17:21:53 +02:00 · 02f4a537bd
commit 02f4a537bd
parent aa841f4ba2
13 changed files with 222 additions and 220 deletions
--- a/_site/posts/reinforcement-learning-1.html
+++ b/_site/posts/reinforcement-learning-1.html
@ -49,11 +49,11 @@
        
    </section>
    <section>
-        <h1 id="introduction">Introduction</h1>
+        <h2 id="introduction">Introduction</h2>
 <p>In this series of blog posts, I intend to write my notes as I go through Richard S. Sutton’s excellent <em>Reinforcement Learning: An Introduction</em> <a href="#ref-1">(1)</a>.</p>
 <p>I will try to formalise the maths behind it a little bit, mainly because I would like to use it as a useful personal reference to the main concepts in RL. I will probably add a few remarks about a possible implementation as I go on.</p>
-<h1 id="relationship-between-agent-and-environment">Relationship between agent and environment</h1>
-<h2 id="context-and-assumptions">Context and assumptions</h2>
+<h2 id="relationship-between-agent-and-environment">Relationship between agent and environment</h2>
+<h3 id="context-and-assumptions">Context and assumptions</h3>
 <p>The goal of reinforcement learning is to select the best actions availables to an agent as it goes through a series of states in an environment. In this post, we will only consider <em>discrete</em> time steps.</p>
 <p>The most important hypothesis we make is the <em>Markov property:</em></p>
 <blockquote>
@ -76,15 +76,15 @@
 <p>The function <span class="math inline">\(p\)</span> represents the probability of transitioning to the state <span class="math inline">\(s'\)</span> and getting a reward <span class="math inline">\(r\)</span> when the agent is at state <span class="math inline">\(s\)</span> and chooses action <span class="math inline">\(a\)</span>.</p>
 <p>We will also use occasionally the <em>state-transition probabilities</em>:</p>

-<h2 id="rewarding-the-agent">Rewarding the agent</h2>
+<h3 id="rewarding-the-agent">Rewarding the agent</h3>
 <div class="definition">
 <p>The <em>expected reward</em> of a state-action pair is the function</p>
 </div>
 <div class="definition">
 <p>The <em>discounted return</em> is the sum of all future rewards, with a multiplicative factor to give more weights to more immediate rewards: <span class="math display">\[ G_t := \sum_{k=t+1}^T \gamma^{k-t-1} R_k, \]</span> where <span class="math inline">\(T\)</span> can be infinite or <span class="math inline">\(\gamma\)</span> can be 1, but not both.</p>
 </div>
-<h1 id="deciding-what-to-do-policies">Deciding what to do: policies</h1>
-<h2 id="defining-our-policy-and-its-value">Defining our policy and its value</h2>
+<h2 id="deciding-what-to-do-policies">Deciding what to do: policies</h2>
+<h3 id="defining-our-policy-and-its-value">Defining our policy and its value</h3>
 <p>A <em>policy</em> is a way for the agent to choose the next action to perform.</p>
 <div class="definition">
 <p>A <em>policy</em> is a function <span class="math inline">\(\pi\)</span> defined as</p>
@ -97,8 +97,8 @@
 <div class="definition">
 <p>The <em>action-value function</em> of a policy <span class="math inline">\(\pi\)</span> is</p>
 </div>
-<h2 id="the-quest-for-the-optimal-policy">The quest for the optimal policy</h2>
-<h1 id="references">References</h1>
+<h3 id="the-quest-for-the-optimal-policy">The quest for the optimal policy</h3>
+<h2 id="references">References</h2>
 <ol>
 <li><span id="ref-1"></span>R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction, Second edition. Cambridge, MA: The MIT Press, 2018.</li>
 </ol>