diff --git a/_site/archive.html b/_site/archive.html index 3401c4c..00f5cb6 100644 --- a/_site/archive.html +++ b/_site/archive.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Archives - + + + + + + + + + +
diff --git a/_site/atom.xml b/_site/atom.xml index c5b281f..60270f1 100644 --- a/_site/atom.xml +++ b/_site/atom.xml @@ -135,26 +135,13 @@ then \(\varphi(n)\) is true for every natural n

First, we prove that every natural number commutes with \(0\).

By Axiom 5, every natural number commutes with \(0\).

We can now prove the main proposition:

We used the opposite of the second rule for \(+\), namely \(\forall a, \forall b,\quad s(a) + b = s(a+b)\). This can easily be proved by another induction.

@@ -230,31 +217,15 @@ then \(\varphi(n)\) is true for every natural n \mathcal{S}\) to a set \(\mathcal{A}(s)\) of possible actions for this state. In this post, we will often simplify by using \(\mathcal{A}\) as a set, assuming that all actions are possible for each state,
  • \(\mathcal{R} \subset \mathbb{R}\) is a set of rewards,
  • and \(p\) is a function representing the dynamics of the MDP:

    -\[\begin{align} - p &: \mathcal{S} \times \mathcal{R} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ - p(s', r \;|\; s, a) &:= \mathbb{P}(S_t=s', R_t=r \;|\; S_{t-1}=s, A_{t-1}=a), -\end{align} -\]

    such that \[ \forall s \in \mathcal{S}, \forall a \in \mathcal{A},\quad \sum_{s', r} p(s', r \;|\; s, a) = 1. \]

  • The function \(p\) represents the probability of transitioning to the state \(s'\) and getting a reward \(r\) when the agent is at state \(s\) and chooses action \(a\).

    We will also use occasionally the state-transition probabilities:

    -\[\begin{align} - p &: \mathcal{S} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ -p(s' \;|\; s, a) &:= \mathbb{P}(S_t=s' \;|\; S_{t-1}=s, A_{t-1}=a) \\ -&= \sum_r p(s', r \;|\; s, a). -\end{align} -\] +

    Rewarding the agent

    The expected reward of a state-action pair is the function

    -\[\begin{align} -r &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ -r(s,a) &:= \mathbb{E}[R_t \;|\; S_{t-1}=s, A_{t-1}=a] \\ -&= \sum_r r \sum_{s'} p(s', r \;|\; s, a). -\end{align} -\]

    The discounted return is the sum of all future rewards, with a multiplicative factor to give more weights to more immediate rewards: \[ G_t := \sum_{k=t+1}^T \gamma^{k-t-1} R_k, \] where \(T\) can be infinite or \(\gamma\) can be 1, but not both.

    @@ -264,33 +235,14 @@ r(s,a) &:= \mathbb{E}[R_t \;|\; S_{t-1}=s, A_{t-1}=a] \\

    A policy is a way for the agent to choose the next action to perform.

    A policy is a function \(\pi\) defined as

    -\[\begin{align} -\pi &: \mathcal{A} \times \mathcal{S} \mapsto [0,1] \\ -\pi(a \;|\; s) &:= \mathbb{P}(A_t=a \;|\; S_t=s). -\end{align} -\]

    In order to compare policies, we need to associate values to them.

    The state-value function of a policy \(\pi\) is

    -\[\begin{align} -v_{\pi} &: \mathcal{S} \mapsto \mathbb{R} \\ -v_{\pi}(s) &:= \text{expected return when starting in $s$ and following $\pi$} \\ -v_{\pi}(s) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s\right] \\ -v_{\pi}(s) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s\right] -\end{align} -\]

    We can also compute the value starting from a state \(s\) by also taking into account the action taken \(a\).

    The action-value function of a policy \(\pi\) is

    -\[\begin{align} -q_{\pi} &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ -q_{\pi}(s,a) &:= \text{expected return when starting from $s$, taking action $a$, and following $\pi$} \\ -q_{\pi}(s,a) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s, A_t=a \right] \\ -q_{\pi}(s,a) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s, A_t=a\right] -\end{align} -\]

    The quest for the optimal policy

    References

    diff --git a/_site/contact.html b/_site/contact.html index 0cb1b56..11b0162 100644 --- a/_site/contact.html +++ b/_site/contact.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Contact - + + + + + + + + + +
    diff --git a/_site/cv.html b/_site/cv.html index 3bb013d..58d6be2 100644 --- a/_site/cv.html +++ b/_site/cv.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Curriculum Vitæ - + + + + + + + + + +
    diff --git a/_site/index.html b/_site/index.html index 8514138..35dd274 100644 --- a/_site/index.html +++ b/_site/index.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Home - + + + + + + + + + +
    diff --git a/_site/posts/ginibre-ensemble.html b/_site/posts/ginibre-ensemble.html index 187b909..bc9f961 100644 --- a/_site/posts/ginibre-ensemble.html +++ b/_site/posts/ginibre-ensemble.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Random matrices from the Ginibre ensemble - + + + + + + + + + +
    diff --git a/_site/posts/ising-apl.html b/_site/posts/ising-apl.html index d9e7fb0..15bfee7 100644 --- a/_site/posts/ising-apl.html +++ b/_site/posts/ising-apl.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Ising model simulation in APL - + + + + + + + + + +
    diff --git a/_site/posts/ising-model.html b/_site/posts/ising-model.html index 5e7ea4b..d20559a 100644 --- a/_site/posts/ising-model.html +++ b/_site/posts/ising-model.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Ising model simulation - + + + + + + + + + +
    diff --git a/_site/posts/lsystems.html b/_site/posts/lsystems.html index 8becbed..6b60e75 100644 --- a/_site/posts/lsystems.html +++ b/_site/posts/lsystems.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Generating and representing L-systems - + + + + + + + + + +
    diff --git a/_site/posts/peano.html b/_site/posts/peano.html index 8a54a0b..bd7944e 100644 --- a/_site/posts/peano.html +++ b/_site/posts/peano.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Peano Axioms - + + + + + + + + + +
    @@ -81,26 +90,13 @@ then \(\varphi(n)\) is true for every natural n

    First, we prove that every natural number commutes with \(0\).

    • \(0+0 = 0+0\).
    • -
    • For every natural number \(a\) such that \(0+a = a+0\), we have:

      -\[\begin{align} - 0 + s(a) &= s(0+a)\\ - &= s(a+0)\\ - &= s(a)\\ - &= s(a) + 0. -\end{align} -\]
    • +
    • For every natural number \(a\) such that \(0+a = a+0\), we have:

    By Axiom 5, every natural number commutes with \(0\).

    We can now prove the main proposition:

    • \(\forall a,\quad a+0=0+a\).
    • -
    • For all \(a\) and \(b\) such that \(a+b=b+a\),

      -\[\begin{align} - a + s(b) &= s(a+b)\\ - &= s(b+a)\\ - &= s(b) + a. -\end{align} -\]
    • +
    • For all \(a\) and \(b\) such that \(a+b=b+a\),

    We used the opposite of the second rule for \(+\), namely \(\forall a, \forall b,\quad s(a) + b = s(a+b)\). This can easily be proved by another induction.

    diff --git a/_site/posts/reinforcement-learning-1.html b/_site/posts/reinforcement-learning-1.html index de4414e..1472269 100644 --- a/_site/posts/reinforcement-learning-1.html +++ b/_site/posts/reinforcement-learning-1.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Quick Notes on Reinforcement Learning - + + + + + + + + + +
    @@ -51,31 +60,15 @@ \mathcal{S}\) to a set \(\mathcal{A}(s)\) of possible actions for this state. In this post, we will often simplify by using \(\mathcal{A}\) as a set, assuming that all actions are possible for each state,
  • \(\mathcal{R} \subset \mathbb{R}\) is a set of rewards,
  • and \(p\) is a function representing the dynamics of the MDP:

    -\[\begin{align} - p &: \mathcal{S} \times \mathcal{R} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ - p(s', r \;|\; s, a) &:= \mathbb{P}(S_t=s', R_t=r \;|\; S_{t-1}=s, A_{t-1}=a), -\end{align} -\]

    such that \[ \forall s \in \mathcal{S}, \forall a \in \mathcal{A},\quad \sum_{s', r} p(s', r \;|\; s, a) = 1. \]

  • The function \(p\) represents the probability of transitioning to the state \(s'\) and getting a reward \(r\) when the agent is at state \(s\) and chooses action \(a\).

    We will also use occasionally the state-transition probabilities:

    -\[\begin{align} - p &: \mathcal{S} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ -p(s' \;|\; s, a) &:= \mathbb{P}(S_t=s' \;|\; S_{t-1}=s, A_{t-1}=a) \\ -&= \sum_r p(s', r \;|\; s, a). -\end{align} -\] +

    Rewarding the agent

    The expected reward of a state-action pair is the function

    -\[\begin{align} -r &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ -r(s,a) &:= \mathbb{E}[R_t \;|\; S_{t-1}=s, A_{t-1}=a] \\ -&= \sum_r r \sum_{s'} p(s', r \;|\; s, a). -\end{align} -\]

    The discounted return is the sum of all future rewards, with a multiplicative factor to give more weights to more immediate rewards: \[ G_t := \sum_{k=t+1}^T \gamma^{k-t-1} R_k, \] where \(T\) can be infinite or \(\gamma\) can be 1, but not both.

    @@ -85,33 +78,14 @@ r(s,a) &:= \mathbb{E}[R_t \;|\; S_{t-1}=s, A_{t-1}=a] \\

    A policy is a way for the agent to choose the next action to perform.

    A policy is a function \(\pi\) defined as

    -\[\begin{align} -\pi &: \mathcal{A} \times \mathcal{S} \mapsto [0,1] \\ -\pi(a \;|\; s) &:= \mathbb{P}(A_t=a \;|\; S_t=s). -\end{align} -\]

    In order to compare policies, we need to associate values to them.

    The state-value function of a policy \(\pi\) is

    -\[\begin{align} -v_{\pi} &: \mathcal{S} \mapsto \mathbb{R} \\ -v_{\pi}(s) &:= \text{expected return when starting in $s$ and following $\pi$} \\ -v_{\pi}(s) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s\right] \\ -v_{\pi}(s) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s\right] -\end{align} -\]

    We can also compute the value starting from a state \(s\) by also taking into account the action taken \(a\).

    The action-value function of a policy \(\pi\) is

    -\[\begin{align} -q_{\pi} &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ -q_{\pi}(s,a) &:= \text{expected return when starting from $s$, taking action $a$, and following $\pi$} \\ -q_{\pi}(s,a) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s, A_t=a \right] \\ -q_{\pi}(s,a) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s, A_t=a\right] -\end{align} -\]

    The quest for the optimal policy

    References

    diff --git a/_site/posts/self-learning-chatbots-destygo.html b/_site/posts/self-learning-chatbots-destygo.html index 37cbb56..f998d31 100644 --- a/_site/posts/self-learning-chatbots-destygo.html +++ b/_site/posts/self-learning-chatbots-destygo.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Mindsay: Towards Self-Learning Chatbots - + + + + + + + + + +
    diff --git a/_site/projects.html b/_site/projects.html index 3bb3e17..3628f50 100644 --- a/_site/projects.html +++ b/_site/projects.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Projects - + + + + + + + + + +
    diff --git a/_site/projects/adsb.html b/_site/projects/adsb.html index 05f94db..23ab4fe 100644 --- a/_site/projects/adsb.html +++ b/_site/projects/adsb.html @@ -7,7 +7,16 @@ Dimitri Lozeve - ADS-B data visualization - + + + + + + + + + +
    diff --git a/_site/projects/civilisation.html b/_site/projects/civilisation.html index 7760a32..7f84710 100644 --- a/_site/projects/civilisation.html +++ b/_site/projects/civilisation.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Civilisation - + + + + + + + + + +
    diff --git a/_site/projects/community-detection.html b/_site/projects/community-detection.html index a614173..5de7ee1 100644 --- a/_site/projects/community-detection.html +++ b/_site/projects/community-detection.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Community Detection - + + + + + + + + + +
    diff --git a/_site/projects/ising-model.html b/_site/projects/ising-model.html index 007cbfe..e844dc6 100644 --- a/_site/projects/ising-model.html +++ b/_site/projects/ising-model.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Ising model simulation - + + + + + + + + + +
    diff --git a/_site/projects/lsystems.html b/_site/projects/lsystems.html index c7e4957..2c70943 100644 --- a/_site/projects/lsystems.html +++ b/_site/projects/lsystems.html @@ -7,7 +7,16 @@ Dimitri Lozeve - L-systems - + + + + + + + + + +
    diff --git a/_site/projects/orbit.html b/_site/projects/orbit.html index 2aa7b7c..6605790 100644 --- a/_site/projects/orbit.html +++ b/_site/projects/orbit.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Orbit - + + + + + + + + + +
    diff --git a/_site/projects/satrap.html b/_site/projects/satrap.html index e5ea917..3484936 100644 --- a/_site/projects/satrap.html +++ b/_site/projects/satrap.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Satrap - + + + + + + + + + +
    diff --git a/_site/projects/tda.html b/_site/projects/tda.html index b772d70..977154c 100644 --- a/_site/projects/tda.html +++ b/_site/projects/tda.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Topological Data Analysis of time-dependent networks - + + + + + + + + + +
    diff --git a/_site/projects/ww2-bombings.html b/_site/projects/ww2-bombings.html index 2de25b6..f27c9e3 100644 --- a/_site/projects/ww2-bombings.html +++ b/_site/projects/ww2-bombings.html @@ -7,7 +7,16 @@ Dimitri Lozeve - WWII bombings visualization - + + + + + + + + + +
    diff --git a/_site/rss.xml b/_site/rss.xml index 4feec4e..49a4ac5 100644 --- a/_site/rss.xml +++ b/_site/rss.xml @@ -131,26 +131,13 @@ then \(\varphi(n)\) is true for every natural n

    First, we prove that every natural number commutes with \(0\).

    • \(0+0 = 0+0\).
    • -
    • For every natural number \(a\) such that \(0+a = a+0\), we have:

      -\[\begin{align} - 0 + s(a) &= s(0+a)\\ - &= s(a+0)\\ - &= s(a)\\ - &= s(a) + 0. -\end{align} -\]
    • +
    • For every natural number \(a\) such that \(0+a = a+0\), we have:

    By Axiom 5, every natural number commutes with \(0\).

    We can now prove the main proposition:

    • \(\forall a,\quad a+0=0+a\).
    • -
    • For all \(a\) and \(b\) such that \(a+b=b+a\),

      -\[\begin{align} - a + s(b) &= s(a+b)\\ - &= s(b+a)\\ - &= s(b) + a. -\end{align} -\]
    • +
    • For all \(a\) and \(b\) such that \(a+b=b+a\),

    We used the opposite of the second rule for \(+\), namely \(\forall a, \forall b,\quad s(a) + b = s(a+b)\). This can easily be proved by another induction.

    @@ -226,31 +213,15 @@ then \(\varphi(n)\) is true for every natural n \mathcal{S}\) to a set \(\mathcal{A}(s)\) of possible actions for this state. In this post, we will often simplify by using \(\mathcal{A}\) as a set, assuming that all actions are possible for each state,
  • \(\mathcal{R} \subset \mathbb{R}\) is a set of rewards,
  • and \(p\) is a function representing the dynamics of the MDP:

    -\[\begin{align} - p &: \mathcal{S} \times \mathcal{R} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ - p(s', r \;|\; s, a) &:= \mathbb{P}(S_t=s', R_t=r \;|\; S_{t-1}=s, A_{t-1}=a), -\end{align} -\]

    such that \[ \forall s \in \mathcal{S}, \forall a \in \mathcal{A},\quad \sum_{s', r} p(s', r \;|\; s, a) = 1. \]

  • The function \(p\) represents the probability of transitioning to the state \(s'\) and getting a reward \(r\) when the agent is at state \(s\) and chooses action \(a\).

    We will also use occasionally the state-transition probabilities:

    -\[\begin{align} - p &: \mathcal{S} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ -p(s' \;|\; s, a) &:= \mathbb{P}(S_t=s' \;|\; S_{t-1}=s, A_{t-1}=a) \\ -&= \sum_r p(s', r \;|\; s, a). -\end{align} -\] +

    Rewarding the agent

    The expected reward of a state-action pair is the function

    -\[\begin{align} -r &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ -r(s,a) &:= \mathbb{E}[R_t \;|\; S_{t-1}=s, A_{t-1}=a] \\ -&= \sum_r r \sum_{s'} p(s', r \;|\; s, a). -\end{align} -\]

    The discounted return is the sum of all future rewards, with a multiplicative factor to give more weights to more immediate rewards: \[ G_t := \sum_{k=t+1}^T \gamma^{k-t-1} R_k, \] where \(T\) can be infinite or \(\gamma\) can be 1, but not both.

    @@ -260,33 +231,14 @@ r(s,a) &:= \mathbb{E}[R_t \;|\; S_{t-1}=s, A_{t-1}=a] \\

    A policy is a way for the agent to choose the next action to perform.

    A policy is a function \(\pi\) defined as

    -\[\begin{align} -\pi &: \mathcal{A} \times \mathcal{S} \mapsto [0,1] \\ -\pi(a \;|\; s) &:= \mathbb{P}(A_t=a \;|\; S_t=s). -\end{align} -\]

    In order to compare policies, we need to associate values to them.

    The state-value function of a policy \(\pi\) is

    -\[\begin{align} -v_{\pi} &: \mathcal{S} \mapsto \mathbb{R} \\ -v_{\pi}(s) &:= \text{expected return when starting in $s$ and following $\pi$} \\ -v_{\pi}(s) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s\right] \\ -v_{\pi}(s) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s\right] -\end{align} -\]

    We can also compute the value starting from a state \(s\) by also taking into account the action taken \(a\).

    The action-value function of a policy \(\pi\) is

    -\[\begin{align} -q_{\pi} &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ -q_{\pi}(s,a) &:= \text{expected return when starting from $s$, taking action $a$, and following $\pi$} \\ -q_{\pi}(s,a) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s, A_t=a \right] \\ -q_{\pi}(s,a) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s, A_t=a\right] -\end{align} -\]

    The quest for the optimal policy

    References

    diff --git a/_site/skills.html b/_site/skills.html index 17eb61b..667befe 100644 --- a/_site/skills.html +++ b/_site/skills.html @@ -7,7 +7,16 @@ Dimitri Lozeve - Skills in Statistics, Data Science and Machine Learning - + + + + + + + + + +
    diff --git a/site.hs b/site.hs index 61fa1db..17ecf1e 100644 --- a/site.hs +++ b/site.hs @@ -123,7 +123,7 @@ customPandocCompiler = newExtensions = defaultExtensions `mappend` customExtensions writerOptions = defaultHakyllWriterOptions { writerExtensions = newExtensions - , writerHTMLMathMethod = MathJax "" + , writerHTMLMathMethod = KaTeX "" } readerOptions = defaultHakyllReaderOptions in do diff --git a/templates/default.html b/templates/default.html index d614755..e827a72 100644 --- a/templates/default.html +++ b/templates/default.html @@ -7,7 +7,17 @@ Dimitri Lozeve - $title$ - + + + + + + + + + +