diff --git a/_site/archive.html b/_site/archive.html index 2dbc709..7e7521d 100644 --- a/_site/archive.html +++ b/_site/archive.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
@@ -44,7 +48,6 @@ -
Here you can find all my previous posts: diff --git a/_site/atom.xml b/_site/atom.xml index d2b11d9..bee6725 100644 --- a/_site/atom.xml +++ b/_site/atom.xml @@ -20,7 +20,7 @@
-

Table of Contents

Introduction

After Phase I, here are my solutions to Phase II problems. The full code is included in the post, but everything is also available on GitHub.

A PDF of the problems descriptions is available on the competition website, or directly from my GitHub repo.

The submission guidelines gave a template where everything is defined in a Contest2020.Problems Namespace. I kept the default values for ⎕IO and ⎕ML because the problems were not particularly easier with ⎕IO←0.

-
:Namespace Contest2020
-
-    :Namespace Problems
-        (⎕IO ⎕ML ⎕WX)←1 1 3
+
:Namespace Contest2020
+
+ :Namespace Problems
+  (⎕IO ⎕ML ⎕WX)←1 1 3

Problem 1 – Take a Dive

-
∇ score←dd DiveScore scores
-  :If 7=≢scores
-      scores←scores[¯2↓2↓⍋scores]
-  :ElseIf 5=≢scores
-      scores←scores[¯1↓1↓⍋scores]
-  :Else
-      scores←scores
-  :EndIf
-  score←2(⍎⍕)dd×+/scores
-
+
∇ score←dd DiveScore scores
+  :If 7=≢scores
+   scores←scores[¯2↓2↓⍋scores]
+  :ElseIf 5=≢scores
+   scores←scores[¯1↓1↓⍋scores]
+  :Else
+   scores←scores
+  :EndIf
+  score←2(⍎⍕)dd×+/scores
+

This is a very straightforward implementation of the algorithm describe in the problem description. I decided to switch explicitly on the size of the input vector because I feel it is more natural. For the cases with 5 or 7 judges, we use Drop () to remove the lowest and highest scores.

At the end, we sum up the scores with +/ and multiply them by dd. The last operation, 2(⍎⍕), is a train using Format (Dyadic) to round to 2 decimal places, and Execute to get actual numbers and not strings.

Problem 2 – Another Step in the Proper Direction

-
∇ steps←{p}Steps fromTo;segments;width
-  width←|-/fromTo
-  :If 0=⎕NC'p' ⍝ No left argument: same as Problem 5 of Phase I
-      segments←0,⍳width
-  :ElseIf p<0 ⍝ -⌊p is the number of equally-sized steps to take
-      segments←(-⌊p){0,⍵×⍺÷⍨⍳⍺}width
-  :ElseIf p>0 ⍝ p is the step size
-      segments←p{⍵⌊⍺×0,⍳⌈⍵÷⍺}width
-  :ElseIf p=0 ⍝ As if we took zero step
-      segments←0
-  :EndIf
-  ⍝ Take into account the start point and the direction.
-  steps←fromTo{(⊃⍺)+(-×-/⍺)×⍵}segments
-
+
∇ steps←{p}Steps fromTo;segments;width
+  width←|-/fromTo
+  :If 0=⎕NC'p' ⍝ No left argument: same as Problem 5 of Phase I
+   segments←0,⍳width
+  :ElseIf p<0 ⍝ -⌊p is the number of equally-sized steps to take
+   segments←(-⌊p){0,⍵×⍺÷⍨⍳⍺}width
+  :ElseIf p>0 ⍝ p is the step size
+   segments←p{⍵⌊⍺×0,⍳⌈⍵÷⍺}width
+  :ElseIf p=0 ⍝ As if we took zero step
+   segments←0
+  :EndIf
+  ⍝ Take into account the start point and the direction.
+  steps←fromTo{(⊃⍺)+(-×-/⍺)×⍵}segments
+

This is an extension to Problem 5 of Phase I. In each case, we compute the “segments”, i.e., the steps starting from 0. In a last step, common to all cases, we add the correct starting point and correct the direction if need be.

To compute equally-sized steps, we first divide the segment \([0, 1]\) in p equal segments with (⍳p)÷p. This subdivision can then be multiplied by the width to obtain the required segments.

When p is the step size, we just divide the width by the step size (rounded to the next largest integer) to get the required number of segments. If the last segment is too large, we “crop” it to the width with Minimum ().

Problem 3 – Past Tasks Blast

-
∇ urls←PastTasks url;r;paths
-  r←HttpCommand.Get url
-  paths←('[a-zA-Z0-9_/]+\.pdf'⎕S'&')r.Data
-  urls←('https://www.dyalog.com/'∘,)¨paths
-
+
∇ urls←PastTasks url;r;paths
+  r←HttpCommand.Get url
+  paths←('[a-zA-Z0-9_/]+\.pdf'⎕S'&')r.Data
+  urls←('https://www.dyalog.com/'∘,)¨paths
+

I decided to use HttpCommand for this task, since it is simply one ]load HttpCommand away and should be platform-independent.

Parsing XML is not something I consider “fun” in the best of cases, and I feel like APL is not the best language to do this kind of thing. Given how simple the task is, I just decided to find the relevant bits with a regular expression using Replace and Search (⎕S).

After finding all the strings vaguely resembling a PDF file name (only alphanumeric characters and underscores, with a .pdf extension), I just concatenate them to the base URL of the Dyalog domain.

Problem 4 – Bioinformatics

The first task can be solved by decomposing it into several functions.

-
⍝ Test if a DNA string is a reverse palindrome.
-isrevp←{⍵≡⌽'TAGC'['ATCG'⍳⍵]}
+
⍝ Test if a DNA string is a reverse palindrome.
+isrevp←{⍵≡⌽'TAGC'['ATCG'⍳⍵]}

First, we compute the complement of a DNA string (using simple indexing) and test if its Reverse () is equal to the original string.

-
⍝ Generate all subarrays (position, length) pairs, for 4 ≤ length ≤ 12.
-subarrays←{⊃,/(⍳⍵),¨¨3↓¨⍳¨12⌊1+⍵-⍳⍵}
+
⍝ Generate all subarrays (position, length) pairs, for 4 ≤ length ≤ 12.
+subarrays←{⊃,/(⍳⍵),¨¨3↓¨⍳¨12⌊1+⍵-⍳⍵}

We first compute all the possible lengths for each starting point. For instance, the last element cannot have any (position, length) pair associated to it, because there is no three element following it. So we crop the possible lengths to \([3, 12]\). For instance for an array of size 10:

-
        {3↓¨⍳¨12⌊1+⍵-⍳⍵}10
-┌──────────────┬───────────┬─────────┬───────┬─────┬───┬─┬┬┬┐
-│4 5 6 7 8 9 10│4 5 6 7 8 9│4 5 6 7 8│4 5 6 7│4 5 6│4 5│4││││
-└──────────────┴───────────┴─────────┴───────┴─────┴───┴─┴┴┴┘
+
        {3↓¨⍳¨12⌊1+⍵-⍳⍵}10
+┌──────────────┬───────────┬─────────┬───────┬─────┬───┬─┬┬┬┐
+│4 5 6 7 8 9 10│4 5 6 7 8 9│4 5 6 7 8│4 5 6 7│4 5 6│4 5│4││││
+└──────────────┴───────────┴─────────┴───────┴─────┴───┴─┴┴┴┘

Then, we just add the corresponding starting position to each length (1 for the first block, 2 for the second, and so on). Finally, we flatten everything.

-
∇ r←revp dna;positions
-  positions←subarrays⍴dna
-  ⍝ Filter subarrays which are reverse palindromes.
-  r←↑({isrevp dna[¯1+⍵[1]+⍳⍵[2]]}¨positions)/positions
-
+
∇ r←revp dna;positions
+  positions←subarrays⍴dna
+  ⍝ Filter subarrays which are reverse palindromes.
+  r←↑({isrevp dna[¯1+⍵[1]+⍳⍵[2]]}¨positions)/positions
+

For each possible (position, length) pair, we get the corresponding DNA substring with dna[¯1+⍵[1]+⍳⍵[2]] (adding ¯1 is necessary because ⎕IO←1). We test if this substring is a reverse palindrome using isrevp above. Replicate (/) then selects only the (position, length) pairs for which the substring is a reverse palindrome.

The second task is just about counting the number of subsets modulo 1,000,000. So we just need to compute \(2^n \mod 1000000\) for any positive integer \(n\leq1000\).

-
sset←{((1E6|2∘×)⍣⍵)1}
+
sset←{((1E6|2∘×)⍣⍵)1}

Since we cannot just compute \(2^n\) directly and take the remainder, we use modular arithmetic to stay mod 1,000,000 during the whole computation. The dfn (1E6|2∘×) doubles its argument mod 1,000,000. So we just apply this function \(n\) times using the Power operator (), with an initial value of 1.

Problem 5 – Future and Present Value

First solution: ((1+⊢)⊥⊣) computes the total return for a vector of amounts and a vector of rates . It is applied to every prefix subarray of amounts and rates to get all intermediate values. However, this has quadratic complexity.

-
rr←(,\⊣)((1+⊢)⊥⊣)¨(,\⊢)
-

Second solution: We want to be able to use the recurrence relation (recur) and scan through the vectors of amounts and rates, accumulating the total value at every time step. However, APL evaluation is right-associative, so a simple Scan (recur\amounts,¨values) would not give the correct result, since recur is not associative and we need to evaluate it left-to-right. (In any case, in this case, Scan would have quadratic complexity, so would not bring any benefit over the previous solution.) What we need is something akin to Haskell’s scanl function, which would evaluate left to right in \(O(n)\) timeThere is an interesting StackOverflow answer explaining the behaviour of Scan, and compares it to Haskell’s scanl function.
+

rr←(,\⊣)((1+⊢)⊥⊣)¨(,\⊢)
+

Second solution: We want to be able to use the recurrence relation (recur) and scan through the vectors of amounts and rates, accumulating the total value at every time step. However, APL evaluation is right-associative, so a simple Scan (recur\amounts,¨values) would not give the correct result, since recur is not associative and we need to evaluate it left-to-right. (In any case, in this case, Scan would have quadratic complexity, so would not bring any benefit over the previous solution.) What we need is something akin to Haskell’s scanl function, which would evaluate left to right in \(O(n)\) timeThere is an interesting StackOverflow answer explaining the behaviour of Scan, and compares it to Haskell’s scanl function.

. This is what we do here, accumulating values from left to right. (This is inspired from dfns.ascan, although heavily simplified.)

-
rr←{recur←{⍵[1]+⍺×1+⍵[2]} ⋄ 1↓⌽⊃{(⊂(⊃⍵)recur⍺),⍵}/⌽⍺,¨⍵}
+
rr←{recur←{⍵[1]+⍺×1+⍵[2]} ⋄ 1↓⌽⊃{(⊂(⊃⍵)recur⍺),⍵}/⌽⍺,¨⍵}

For the second task, there is an explicit formula for cashflow calculations, so we can just apply it.

-
pv←{+/⍺÷×\1+⍵}
+
pv←{+/⍺÷×\1+⍵}

Problem 6 – Merge

-
∇ text←templateFile Merge jsonFile;template;ns
-  template←⊃⎕NGET templateFile 1
-  ns←⎕JSON⊃⎕NGET jsonFile
-  ⍝ We use a simple regex search and replace on the
-  ⍝ template.
-  text←↑('@[a-zA-Z]*@'⎕R{ns getval ¯1↓1↓⍵.Match})template
-
+
∇ text←templateFile Merge jsonFile;template;ns
+  template←⊃⎕NGET templateFile 1
+  ns←⎕JSON⊃⎕NGET jsonFile
+  ⍝ We use a simple regex search and replace on the
+  ⍝ template.
+  text←↑('@[a-zA-Z]*@'⎕R{ns getval ¯1↓1↓⍵.Match})template
+

We first read the template and the JSON values from their files. The ⎕NGET function read simple text files, and ⎕JSON extracts the key-value pairs as a namespace.

Assuming all variable names contain only letters, we match the regex @[a-zA-Z]*@ to match variable names enclosed between @ symbols. The function getval then returns the appropriate value, and we can replace the variable name in the template.

-
∇ val←ns getval var
-  :If ''≡var ⍝ literal '@'
-      val←'@'
-  :ElseIf (⊂var)∊ns.⎕NL ¯2
-      val←⍕ns⍎var
-  :Else
-      val←'???'
-  :EndIf
-
+
∇ val←ns getval var
+  :If ''≡var ⍝ literal '@'
+   val←'@'
+  :ElseIf (⊂var)∊ns.⎕NL ¯2
+   val←⍕ns⍎var
+  :Else
+   val←'???'
+  :EndIf
+

This function takes the namespace matching the variable names to their respective values, and the name of the variable.

Problem 7 – UPC

-
CheckDigit←{10|-⍵+.×11⍴3 1}
+
CheckDigit←{10|-⍵+.×11⍴3 1}

The check digit satisfies the equation \[ 3 x_{1}+x_{2}+3 x_{3}+x_{4}+3 x_{5}+x_{6}+3 x_{7}+x_{8}+3 x_{9}+x_{10}+3 x_{11}+x_{12} \equiv 0 \bmod 10, \] therefore, \[ x_{12} \equiv -(3 x_{1}+x_{2}+3 x_{3}+x_{4}+3 x_{5}+x_{6}+3 x_{7}+x_{8}+3 x_{9}+x_{10}+3 x_{11}) \bmod 10. \]

Translated to APL, we just take the dot product between the first 11 digits of the barcode with 11⍴3 1, negate it, and take the remainder by 10.

-
⍝ Left and right representations of digits. Decoding
-⍝ the binary representation from decimal is more
-⍝ compact than writing everything explicitly.
-lrepr←⍉(7⍴2)⊤13 25 19 61 35 49 47 59 55 11
-rrepr←~¨lrepr
+
⍝ Left and right representations of digits. Decoding
+⍝ the binary representation from decimal is more
+⍝ compact than writing everything explicitly.
+lrepr←⍉(7⍴2)⊤13 25 19 61 35 49 47 59 55 11
+rrepr←~¨lrepr

For the second task, the first thing we need to do is save the representation of digits. To save space, I did not encode the binary representation explicitly, instead using a decimal representation that I then decode in base 2. The right representation is just the bitwise negation.

-
∇ bits←WriteUPC digits;left;right
-  :If (11=≢digits)∧∧/digits∊0,⍳9
-      left←,lrepr[1+6↑digits;]
-      right←,rrepr[1+6↓digits,CheckDigit digits;]
-      bits←1 0 1,left,0 1 0 1 0,right,1 0 1
-  :Else
-      bits←¯1
-  :EndIf
-
+
∇ bits←WriteUPC digits;left;right
+  :If (11=≢digits)∧∧/digits∊0,⍳9
+   left←,lrepr[1+6↑digits;]
+   right←,rrepr[1+6↓digits,CheckDigit digits;]
+   bits←1 0 1,left,0 1 0 1 0,right,1 0 1
+  :Else
+   bits←¯1
+  :EndIf
+

First of all, if the vector digits does not have exactly 11 elements, all between 0 and 9, it is an error and we return ¯1.

Then, we take the first 6 digits and encode them with lrepr, and the last 5 digits plus the check digit encoded with rrepr. In each case, adding 1 is necessary because ⎕IO←1. We return the final bit array with the required beginning, middle, and end guard patterns.

-
∇ digits←ReadUPC bits
-  :If 95≠⍴bits ⍝ incorrect number of bits
-      digits←¯1
-  :Else
-      ⍝ Test if the barcode was scanned right-to-left.
-      :If 0=2|+/bits[3+⍳7]
-          bits←⌽bits
-      :EndIf
-      digits←({¯1+lrepr⍳⍵}¨(7/⍳6)⊆42↑3↓bits),{¯1+rrepr⍳⍵}¨(7/⍳6)⊆¯42↑¯3↓bits
-      :If ~∧/digits∊0,⍳9 ⍝ incorrect parity
-          digits←¯1
-      :ElseIf (⊃⌽digits)≠CheckDigit ¯1↓digits ⍝ incorrect check digit
-          digits←¯1
-      :EndIf
-  :EndIf
-
+
∇ digits←ReadUPC bits
+  :If 95≠⍴bits ⍝ incorrect number of bits
+   digits←¯1
+  :Else
+   ⍝ Test if the barcode was scanned right-to-left.
+   :If 0=2|+/bits[3+⍳7]
+    bits←⌽bits
+   :EndIf
+   digits←({¯1+lrepr⍳⍵}¨(7/⍳6)⊆42↑3↓bits),{¯1+rrepr⍳⍵}¨(7/⍳6)⊆¯42↑¯3↓bits
+   :If ~∧/digits∊0,⍳9 ⍝ incorrect parity
+    digits←¯1
+   :ElseIf (⊃⌽digits)≠CheckDigit ¯1↓digits ⍝ incorrect check digit
+    digits←¯1
+   :EndIf
+  :EndIf
+

Problem 8 – Balancing the Scales

-
∇ parts←Balance nums;subsets;partitions
-  ⍝ This is a brute force solution, running in
-  ⍝ exponential time. We generate all the possible
-  ⍝ partitions, filter out those which are not
-  ⍝ balanced, and return the first matching one. There
-  ⍝ are more advanced approach running in
-  ⍝ pseudo-polynomial time (based on dynamic
-  ⍝ programming, see the "Partition problem" Wikipedia
-  ⍝ page), but they are not warranted here, as the
-  ⍝ input size remains fairly small.
-
-  ⍝ Generate all partitions of a vector of a given
-  ⍝ size, as binary mask vectors.
-  subsets←{1↓2⊥⍣¯1⍳2*⍵}
-  ⍝ Keep only the subsets whose sum is exactly
-  ⍝ (+/nums)÷2.
-  partitions←nums{((2÷⍨+/⍺)=⍺+.×⍵)/⍵}subsets⍴nums
-  :If 0=≢,partitions
-      ⍝ If no partition satisfy the above
-      ⍝ criterion, we return ⍬.
-      parts←⍬
-  :Else
-      ⍝ Otherwise, we return the first possible
-      ⍝ partition.
-      parts←nums{((⊂,(⊂~))⊃↓⍉⍵)/¨2⍴⊂⍺}partitions
-  :EndIf
-
+
∇ parts←Balance nums;subsets;partitions
+  ⍝ This is a brute force solution, running in
+  ⍝ exponential time. We generate all the possible
+  ⍝ partitions, filter out those which are not
+  ⍝ balanced, and return the first matching one. There
+  ⍝ are more advanced approach running in
+  ⍝ pseudo-polynomial time (based on dynamic
+  ⍝ programming, see the "Partition problem" Wikipedia
+  ⍝ page), but they are not warranted here, as the
+  ⍝ input size remains fairly small.
+
+  ⍝ Generate all partitions of a vector of a given
+  ⍝ size, as binary mask vectors.
+  subsets←{1↓2⊥⍣¯1⍳2*⍵}
+  ⍝ Keep only the subsets whose sum is exactly
+  ⍝ (+/nums)÷2.
+  partitions←nums{((2÷⍨+/⍺)=⍺+.×⍵)/⍵}subsets⍴nums
+  :If 0=≢,partitions
+   ⍝ If no partition satisfy the above
+   ⍝ criterion, we return ⍬.
+   parts←⍬
+  :Else
+   ⍝ Otherwise, we return the first possible
+   ⍝ partition.
+   parts←nums{((⊂,(⊂~))⊃↓⍉⍵)/¨2⍴⊂⍺}partitions
+  :EndIf
+

Problem 9 – Upwardly Mobile

This is the only problem that I didn’t complete. It required parsing the files containing the graphical representations of the trees, which was needlessly complex and, quite frankly, hard and boring with a language like APL.

However, the next part is interesting: once we have a matrix of coefficients representing the relationships between the weights, we can solve the system of equations. Matrix Divide () will find one solution to the system. Since the system is overdetermined, we fix A=1 to find one possible solution. Since we want integer weights, the solution we find is smaller than the one we want, and may contain fractional weights. So we multiply everything by the Lowest Common Multiple () to get the smallest integer weights.

-
∇ weights←Weights filename;mobile;branches;mat
-  ⍝ Put your code and comments below here
-
-  ⍝ Parse the mobile input file.
-  mobile←↑⊃⎕NGET filename 1
-  branches←⍸mobile∊'┌┴┐'
-  ⍝ TODO: Build the matrix of coefficients mat.
-
-  ⍝ Solve the system of equations (arbitrarily setting
-  ⍝ the first variable at 1 because the system is
-  ⍝ overdetermined), then multiply the coefficients by
-  ⍝ their least common multiple to get the smallest
-  ⍝ integer weights.
-  weights←((1∘,)×(∧/÷))mat[;1]⌹1↓[2]mat
-
-
    :EndNamespace
-:EndNamespace
+
∇ weights←Weights filename;mobile;branches;mat
+  ⍝ Put your code and comments below here
+
+  ⍝ Parse the mobile input file.
+  mobile←↑⊃⎕NGET filename 1
+  branches←⍸mobile∊'┌┴┐'
+  ⍝ TODO: Build the matrix of coefficients mat.
+
+  ⍝ Solve the system of equations (arbitrarily setting
+  ⍝ the first variable at 1 because the system is
+  ⍝ overdetermined), then multiply the coefficients by
+  ⍝ their least common multiple to get the smallest
+  ⍝ integer weights.
+  weights←((1∘,)×(∧/÷))mat[;1]⌹1↓[2]mat
+
+
 :EndNamespace
+:EndNamespace
]]> @@ -242,7 +242,7 @@
-

Table of Contents

Introduction

-

I’ve always been quite fond of APL and its “array-oriented” approach of programmingSee my previous post on simulating the Ising model with APL. It also contains more background on APL.
+

I’ve always been quite fond of APL and its “array-oriented” approach of programmingSee my previous post on simulating the Ising model with APL. It also contains more background on APL.

. Every year, Dyalog (the company behind probably the most popular APL implementation) organises a competition with various challenges in APL.

The Dyalog APL Problem Solving Competition consists of two phases:

@@ -264,7 +264,7 @@
  • Phase I consists of 10 short puzzles (similar to what one can find on Project Euler or similar), that can be solved by a one-line APL function.
  • Phase II is a collection of larger problems, that may require longer solutions and a larger context (e.g. reading and writing to files), often in a more applied setting. Problems are often inspired by existing domains, such as AI, bioinformatics, and so on.
  • -

    In 2018, I participated in the competition, entering only Phase ISince I was a student at the time, I was eligible for a prize, and I won $100 for a 10-line submission, which is quite good!
    +

    In 2018, I participated in the competition, entering only Phase ISince I was a student at the time, I was eligible for a prize, and I won $100 for a 10-line submission, which is quite good!

    (my solutions are on GitHub). This year, I entered in both phases. I explain my solutions to Phase I in this post. Another post will contain annotated solutions for Phase II problems.

    The full code for my submission is on GitHub at dlozeve/apl-competition-2020, but everything is reproduced in this post.

    @@ -275,7 +275,7 @@

    If X<0, the second vector contains the last |X elements of Y and the first vector contains the remaining elements.

    Solution: (0>⊣)⌽((⊂↑),(⊂↓))

    -

    There are three nested trains hereTrains are nice to read (even if they are easy to abuse), and generally make for shorter dfns, which is better for Phase I.
    +

    There are three nested trains hereTrains are nice to read (even if they are easy to abuse), and generally make for shorter dfns, which is better for Phase I.

    . The first one, ((⊂↑),(⊂↓)), uses the two functions Take () and Drop () to build a nested array consisting of the two outputs we need. (Take and Drop already have the behaviour needed regarding negative arguments.) However, if the left argument is positive, the two arrays will not be in the correct order. So we need a way to reverse them if X<0.

    The second train (0>⊣) will return 1 if its left argument is positive. From this, we can use Rotate () to correctly order the nested array, in the last train.

    @@ -283,15 +283,15 @@

    UTF-8 encodes Unicode characters using 1-4 integers for each character. Dyalog APL includes a system function, ⎕UCS, that can convert characters into integers and integers into characters. The expression 'UTF-8'∘⎕UCS converts between characters and UTF-8.

    Consider the following:

    - +
          'UTF-8'∘⎕UCS 'D¥⍺⌊○9'
    +68 194 165 226 141 186 226 140 138 226 151 139 57
    +      'UTF-8'∘⎕UCS 68 194 165 226 141 186 226 140 138 226 151 139 57
    +D¥⍺⌊○9

    How many integers does each character use?

    - +
          'UTF-8'∘⎕UCS¨ 'D¥⍺⌊○9' ⍝ using ]Boxing on
    +┌──┬───────┬───────────┬───────────┬───────────┬──┐
    +│68│194 165│226 141 186│226 140 138│226 151 139│57│
    +└──┴───────┴───────────┴───────────┴───────────┴──┘      

    The rule is that an integer in the range 128 to 191 (inclusive) continues the character of the previous integer (which may itself be a continuation). With that in mind, write a function that, given a right argument which is a simple integer vector representing valid UTF-8 text, encloses each sequence of integers that represent a single character, like the result of 'UTF-8'∘⎕UCS¨'UTF-8'∘⎕UCS but does not use any system functions (names beginning with )

    Solution: {(~⍵∊127+⍳64)⊂⍵}

    @@ -320,7 +320,7 @@

    Write a function that, given a right argument of 2 integers, returns a vector of the integers from the first element of the right argument to the second, inclusively.

    Solution: {(⊃⍵)+(-×-/⍵)×0,⍳|-/⍵}

    -

    First, we have to compute the range of the output, which is the absolute value of the difference between the two integers |-/⍵. From this, we compute the actual sequence, including zeroIf we had ⎕IO←0, we could have written ⍳|1+-/⍵, but this is the same number of characters.
    +

    First, we have to compute the range of the output, which is the absolute value of the difference between the two integers |-/⍵. From this, we compute the actual sequence, including zeroIf we had ⎕IO←0, we could have written ⍳|1+-/⍵, but this is the same number of characters.

    : 0,⍳|-/⍵.

    This sequence will always be nondecreasing, but we have to make it decreasing if needed, so we multiply it by the opposite of the sign of -/⍵. Finally, we just have to start the sequence at the first element of .

    @@ -356,44 +356,44 @@

    Solution: {∧/(⍳∘≢≡⍋)¨(⊂((⊢⍳⌈/)↑⊢),⍵),⊂⌽((⊢⍳⌈/)↓⊢),⍵}

    How do we approach this? First we have to split the vector at the “apex”. The train (⊢⍳⌈/) will return the index of () the maximum element.

    -
          (⊢⍳⌈/)1 3 3 4 5 2 1
    -5
    +
          (⊢⍳⌈/)1 3 3 4 5 2 1
    +5

    Combined with Take () and Drop (), we build a two-element vector containing both parts, in ascending order (we Reverse () one of them). Note that we have to Ravel (,) the argument to avoid rank errors in Index Of.

    -
          {(⊂((⊢⍳⌈/)↑⊢),⍵),⊂⌽((⊢⍳⌈/)↓⊢),⍵}1 3 3 4 5 2 1
    -┌─────────┬───┐
    -│1 3 3 4 5│1 2│
    -└─────────┴───┘
    +
          {(⊂((⊢⍳⌈/)↑⊢),⍵),⊂⌽((⊢⍳⌈/)↓⊢),⍵}1 3 3 4 5 2 1
    +┌─────────┬───┐
    +│1 3 3 4 5│1 2│
    +└─────────┴───┘

    Next, (⍳∘≢≡⍋) on each of the two vectors will test if they are non-decreasing (i.e. if the ranks of all the elements correspond to a simple range from 1 to the size of the vector).

    10. Stacking It Up

    Write a function that takes as its right argument a vector of simple arrays of rank 2 or less (scalar, vector, or matrix). Each simple array will consist of either non-negative integers or printable ASCII characters. The function must return a simple character array that displays identically to what {⎕←⍵}¨ displays when applied to the right argument.

    Solution: {↑⊃,/↓¨⍕¨⍵}

    -

    The first step is to Format () everything to get strings. A lot of trial-and-error is always necessary when dealing with nested arrays, and this being about formatting exacerbates the problem.
    +

    The first step is to Format () everything to get strings. A lot of trial-and-error is always necessary when dealing with nested arrays, and this being about formatting exacerbates the problem.

    The next step would be to “stack everything vertically”, so we will need Mix () at some point. However, if we do it immediately we don’t get the correct result:

    -
          {↑⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')
    -1 2 3  
    -4 5 6  
    -7 8 9  
    -
    -Adam   
    -Michael
    -

    Mix is padding with spaces both horizontally (necessary as we want the output to be a simple array of characters) and vertically (not what we want). We will have to decompose everything line by line, and then mix all the lines together. This is exactly what SplitSplit is the dual of Mix.
    +

          {↑⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')
    +1 2 3  
    +4 5 6  
    +7 8 9  
    +
    +Adam   
    +Michael
    +

    Mix is padding with spaces both horizontally (necessary as we want the output to be a simple array of characters) and vertically (not what we want). We will have to decompose everything line by line, and then mix all the lines together. This is exactly what SplitSplit is the dual of Mix.

    () does:

    -
          {↓¨⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')(⍳10) '*'(5 5⍴⍳25)
    -┌───────────────────┬─────────────────┬──────────────────────┬─┬───────────────
    -│┌─────┬─────┬─────┐│┌───────┬───────┐│┌────────────────────┐│*│┌──────────────
    -││1 2 3│4 5 6│7 8 9│││Adam   │Michael│││1 2 3 4 5 6 7 8 9 10││ ││ 1  2  3  4  5
    -│└─────┴─────┴─────┘│└───────┴───────┘│└────────────────────┘│ │└──────────────
    -└───────────────────┴─────────────────┴──────────────────────┴─┴───────────────
    -
    -      ─────────────────────────────────────────────────────────────┐
    -      ┬──────────────┬──────────────┬──────────────┬──────────────┐│
    -      │ 6  7  8  9 10│11 12 13 14 15│16 17 18 19 20│21 22 23 24 25││
    -      ┴──────────────┴──────────────┴──────────────┴──────────────┘│
    -      ─────────────────────────────────────────────────────────────┘
    +
          {↓¨⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')(⍳10) '*'(5 5⍴⍳25)
    +┌───────────────────┬─────────────────┬──────────────────────┬─┬───────────────
    +│┌─────┬─────┬─────┐│┌───────┬───────┐│┌────────────────────┐│*│┌──────────────
    +││1 2 3│4 5 6│7 8 9│││Adam   │Michael│││1 2 3 4 5 6 7 8 9 10││ ││ 1  2  3  4  5
    +│└─────┴─────┴─────┘│└───────┴───────┘│└────────────────────┘│ │└──────────────
    +└───────────────────┴─────────────────┴──────────────────────┴─┴───────────────
    +
    +      ─────────────────────────────────────────────────────────────┐
    +      ┬──────────────┬──────────────┬──────────────┬──────────────┐│
    +      │ 6  7  8  9 10│11 12 13 14 15│16 17 18 19 20│21 22 23 24 25││
    +      ┴──────────────┴──────────────┴──────────────┴──────────────┘│
    +      ─────────────────────────────────────────────────────────────┘

    Next, we clean this up with Ravel (,) and we can Mix to obtain the final result.

    @@ -410,10 +410,11 @@
    -

    Table of Contents

    -

    Another interesting course I found online is Deep Learning in Discrete Optimization, at Johns Hopkins It is taught by William Cook, who is the author of In Pursuit of the Traveling Salesman, a nice introduction to the TSP problem in a readable form.
    +

    Another interesting course I found online is Deep Learning in Discrete Optimization, at Johns Hopkins It is taught by William Cook, who is the author of In Pursuit of the Traveling Salesman, a nice introduction to the TSP problem in a readable form.

    . It contains an interesting overview of deep learning and integer programming, with a focus on connections, and applications to recent research areas in ML (reinforcement learning, attention, etc.).

    Solvers and computational resources

    When you start reading about modelling and algorithms, I recommend you try solving a few problems yourself, either by hand for small instances, or using an existing solver. It will allow you to follow the examples in books, while also practising your modelling skills. You will also get an intuition of what is difficult to model and to solve.

    -

    There are many solvers available, both free and commercial, with various capabilities. I recommend you use the fantastic JuMP
    +

    There are many solvers available, both free and commercial, with various capabilities. I recommend you use the fantastic JuMP



    @@ -484,10 +485,10 @@

    Another awesome resource is the NEOS Server. It offers free computing resources for numerical optimization, including all major free and commercial solvers! You can submit jobs on it in a standard format, or interface your favourite programming language with it. The fact that such an amazing resource exists for free, for everyone is extraordinary. They also have an accompanying book, the NEOS Guide, containing many case studies and description of problem types. The taxonomy may be particularly useful.

    Conclusion

    Operations research is a fascinating topic, and it has an abundant literature that makes it very easy to dive into the subject. If you are interested in algorithms, modelling for practical applications, or just wish to understand more, I hope to have given you the first steps to follow, start reading and experimenting.

    -

    References

    -
    +

    References

    +
    -

    Bertsimas, Dimitris, and John N. Tsitsiklis. 1997. Introduction to Linear Optimization. Belmont, Massachusetts: Athena Scientific. http://www.athenasc.com/linoptbook.html.

    +

    Bertsimas, Dimitris, and John N. Tsitsiklis. 1997. Introduction to Linear Optimization. Belmont, Massachusetts: Athena Scientific. http://www.athenasc.com/linoptbook.html.

    Boyd, Stephen. 2004. Convex Optimization. Cambridge, UK New York: Cambridge University Press.

    @@ -496,10 +497,10 @@

    Chvátal, Vašek. 1983. Linear Programming. New York: W.H. Freeman.

    -

    Dantzig, George. 1997. Linear Programming 1: Introduction. New York: Springer. https://www.springer.com/gp/book/9780387948331.

    +

    Dantzig, George. 1997. Linear Programming 1: Introduction. New York: Springer. https://www.springer.com/gp/book/9780387948331.

    -

    ———. 2003. Linear Programming 2: Theory and Extensions. New York: Springer. https://www.springer.com/gp/book/9780387986135.

    +

    ———. 2003. Linear Programming 2: Theory and Extensions. New York: Springer. https://www.springer.com/gp/book/9780387986135.

    Kochenderfer, Mykel. 2019. Algorithms for Optimization. Cambridge, Massachusetts: The MIT Press.

    @@ -508,10 +509,10 @@

    Maros, István. 2003. Computational Techniques of the Simplex Method. Boston: Kluwer Academic Publishers.

    -

    Nocedal, Jorge. 2006. Numerical Optimization. New York: Springer. https://www.springer.com/gp/book/9780387303031.

    +

    Nocedal, Jorge. 2006. Numerical Optimization. New York: Springer. https://www.springer.com/gp/book/9780387303031.

    -

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    +

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    Vanderbei, Robert. 2014. Linear Programming : Foundations and Extensions. New York: Springer.

    @@ -520,7 +521,7 @@

    Wentzel, Elena S. 1988. Operations Research: A Methodological Approach. Moscow: Mir publishers.

    -

    Williams, H. Paul. 2013. Model Building in Mathematical Programming. Chichester, West Sussex: Wiley. https://www.wiley.com/en-fr/Model+Building+in+Mathematical+Programming,+5th+Edition-p-9781118443330.

    +

    Williams, H. Paul. 2013. Model Building in Mathematical Programming. Chichester, West Sussex: Wiley. https://www.wiley.com/en-fr/Model+Building+in+Mathematical+Programming,+5th+Edition-p-9781118443330.

    @@ -538,31 +539,33 @@
    -

    Table of Contents

    -

    Table of Contents

    +

    Two weeks ago, I did a presentation for my colleagues of the paper from Yurochkin et al. (2019), from NeurIPS 2019. It contains an interesting approach to document classification leading to strong performance, and, most importantly, excellent interpretability.

    This paper seems interesting to me because of it uses two methods with strong theoretical guarantees: optimal transport and topic modelling. Optimal transport looks very promising to me in NLP, and has seen a lot of interest in recent years due to advances in approximation algorithms, such as entropy regularisation. It is also quite refreshing to see approaches using solid results in optimisation, compared to purely experimental deep learning methods.

    Introduction and motivation

    The problem of the paper is to measure similarity (i.e. a distance) between pairs of documents, by incorporating semantic similarities (and not only syntactic artefacts), without encountering scalability issues.

    @@ -626,8 +629,8 @@
  • topic modelling methods (e.g. Latent Dirichlet Allocation), to represent semantically-meaningful groups of words.
  • Background: optimal transport

    -

    The essential backbone of the method is the Wasserstein distance, derived from optimal transport theory. Optimal transport is a fascinating and deep subject, so I won’t enter into the details here. For an introduction to the theory and its applications, check out the excellent book from Peyré and Cuturi (2019), (available on ArXiv as well). There are also very nice posts (in French) by Gabriel Peyré on the CNRS maths blog. Many more resources (including slides for presentations) are available at https://optimaltransport.github.io. For a more complete theoretical treatment of the subject, check out Santambrogio (2015), or, if you’re feeling particularly adventurous, Villani (2009).

    -

    For this paper, only a superficial understanding of how the Wasserstein distance works is necessary. Optimal transport is an optimisation technique to lift a distance between points in a given metric space, to a distance between probability distributions over this metric space. The historical example is to move piles of dirt around: you know the distance between any two points, and you have piles of dirt lying around Optimal transport originated with Monge, and then Kantorovich, both of whom had very clear military applications in mind (either in Revolutionary France, or during WWII). A lot of historical examples move cannon balls, or other military equipment, along a front line.
    +

    The essential backbone of the method is the Wasserstein distance, derived from optimal transport theory. Optimal transport is a fascinating and deep subject, so I won’t enter into the details here. For an introduction to the theory and its applications, check out the excellent book from Peyré and Cuturi (2019), (available on ArXiv as well). There are also very nice posts (in French) by Gabriel Peyré on the CNRS maths blog. Many more resources (including slides for presentations) are available at https://optimaltransport.github.io. For a more complete theoretical treatment of the subject, check out Santambrogio (2015), or, if you’re feeling particularly adventurous, Villani (2009).

    +

    For this paper, only a superficial understanding of how the Wasserstein distance works is necessary. Optimal transport is an optimisation technique to lift a distance between points in a given metric space, to a distance between probability distributions over this metric space. The historical example is to move piles of dirt around: you know the distance between any two points, and you have piles of dirt lying around Optimal transport originated with Monge, and then Kantorovich, both of whom had very clear military applications in mind (either in Revolutionary France, or during WWII). A lot of historical examples move cannon balls, or other military equipment, along a front line.

    . Now, if you want to move these piles to another configuration (fewer piles, say, or a different repartition of dirt a few metres away), you need to find the most efficient way to move them. The total cost you obtain will define a distance between the two configurations of dirt, and is usually called the earth mover’s distance, which is just an instance of the general Wasserstein metric.

    More formally, we start with two sets of points \(x = (x_1, x_2, \ldots, @@ -662,36 +665,36 @@ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}

    The first one can be precomputed once for all subsequent distances, so it is invariable in the number of documents we have to process. The second one only operates on \(\lvert T \rvert\) topics instead of the full vocabulary: the resulting optimisation problem is much smaller! This is great for performance, as it should be easy now to compute all pairwise distances in a large set of documents.

    Another interesting insight is that topics are represented as collections of words (we can keep the top 20 as a visual representations), and documents as collections of topics with weights. Both of these representations are highly interpretable for a human being who wants to understand what’s going on. I think this is one of the strongest aspects of these approaches: both the various representations and the algorithms are fully interpretable. Compared to a deep learning approach, we can make sense of every intermediate step, from the representations of topics to the weights in the optimisation algorithm to compute higher-level distances.

    -

    Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
    +

    Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).

    Experiments

    -

    The paper is very complete regarding experiments, providing a full evaluation of the method on one particular application: document clustering. They use Latent Dirichlet Allocation to compute topics and GloVe for pretrained word embeddings (Pennington, Socher, and Manning 2014), and Gurobi to solve the optimisation problems. Their code is available on GitHub.

    +

    The paper is very complete regarding experiments, providing a full evaluation of the method on one particular application: document clustering. They use Latent Dirichlet Allocation to compute topics and GloVe for pretrained word embeddings (Pennington, Socher, and Manning 2014), and Gurobi to solve the optimisation problems. Their code is available on GitHub.

    If you want the details, I encourage you to read the full paper, they tested the methods on a wide variety of datasets, with datasets containing very short documents (like Twitter), and long documents with a large vocabulary (books). With a simple \(k\)-NN classification, they establish that HOTT performs best on average, especially on large vocabularies (books, the “gutenberg” dataset). It also has a much better computational performance than alternative methods based on regularisation of the optimal transport problem directly on words. So the hierarchical nature of the approach allows to gain considerably in performance, along with improvements in interpretability.

    -

    What’s really interesting in the paper is the sensitivity analysis: they ran experiments with different word embeddings methods (word2vec, (Mikolov et al. 2013)), and with different parameters for the topic modelling (topic truncation, number of topics, etc). All of these reveal that changes in hyperparameters do not impact the performance of HOTT significantly. This is extremely important in a field like NLP where most of the times small variations in approach lead to drastically different results.

    +

    What’s really interesting in the paper is the sensitivity analysis: they ran experiments with different word embeddings methods (word2vec, (Mikolov et al. 2013)), and with different parameters for the topic modelling (topic truncation, number of topics, etc). All of these reveal that changes in hyperparameters do not impact the performance of HOTT significantly. This is extremely important in a field like NLP where most of the times small variations in approach lead to drastically different results.

    Conclusion

    All in all, this paper present a very interesting approach to compute distance between natural-language documents. It is no secret that I like methods with strong theoretical background (in this case optimisation and optimal transport), guaranteeing a stability and benefiting from decades of research in a well-established domain.

    Most importantly, this paper allows for future exploration in document representation with interpretability in mind. This is often added as an afterthought in academic research but is one of the most important topics for the industry, as a system must be understood by end users, often not trained in ML, before being deployed. The notion of topic, and distances as weights, can be understood easily by anyone without significant background in ML or in maths.

    Finally, I feel like they did not stop at a simple theoretical argument, but carefully checked on real-world datasets, measuring sensitivity to all the arbitrary choices they had to take. Again, from an industry perspective, this allows to implement the new approach quickly and easily, being confident that it won’t break unexpectedly without extensive testing.

    -

    References

    -
    +

    References

    +
    -

    Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems 26, 3111–9. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.

    +

    Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems 26, 3111–9. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.

    -

    Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “Glove: Global Vectors for Word Representation.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–43. Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162.

    +

    Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “Glove: Global Vectors for Word Representation.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–43. Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162.

    -

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    +

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    -

    Santambrogio, Filippo. 2015. Optimal Transport for Applied Mathematicians. Vol. 87. Progress in Nonlinear Differential Equations and Their Applications. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-20828-2.

    +

    Santambrogio, Filippo. 2015. Optimal Transport for Applied Mathematicians. Vol. 87. Progress in Nonlinear Differential Equations and Their Applications. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-20828-2.

    Villani, Cédric. 2009. Optimal Transport: Old and New. Grundlehren Der Mathematischen Wissenschaften 338. Berlin: Springer.

    -

    Yurochkin, Mikhail, Sebastian Claici, Edward Chien, Farzaneh Mirzazadeh, and Justin M Solomon. 2019. “Hierarchical Optimal Transport for Document Representation.” In Advances in Neural Information Processing Systems 32, 1599–1609. http://papers.nips.cc/paper/8438-hierarchical-optimal-transport-for-document-representation.pdf.

    +

    Yurochkin, Mikhail, Sebastian Claici, Edward Chien, Farzaneh Mirzazadeh, and Justin M Solomon. 2019. “Hierarchical Optimal Transport for Document Representation.” In Advances in Neural Information Processing Systems 32, 1599–1609. http://papers.nips.cc/paper/8438-hierarchical-optimal-transport-for-document-representation.pdf.

    @@ -738,22 +741,22 @@ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}

    I find this mildly fascinating that such a straightforward definition of a random matrix can exhibit such non-random properties in their spectrum.

    Simulation

    I ran a quick simulation, thanks to Julia’s great ecosystem for linear algebra and statistical distributions:

    -
    using LinearAlgebra
    -using UnicodePlots
    -
    -function ginibre(n)
    -    return randn((n, n)) * sqrt(1/2n) + im * randn((n, n)) * sqrt(1/2n)
    -end
    -
    -v = eigvals(ginibre(2000))
    -
    -scatterplot(real(v), imag(v), xlim=[-1.5,1.5], ylim=[-1.5,1.5])
    +
    using LinearAlgebra
    +using UnicodePlots
    +
    +function ginibre(n)
    +    return randn((n, n)) * sqrt(1/2n) + im * randn((n, n)) * sqrt(1/2n)
    +end
    +
    +v = eigvals(ginibre(2000))
    +
    +scatterplot(real(v), imag(v), xlim=[-1.5,1.5], ylim=[-1.5,1.5])

    I like using UnicodePlots for this kind of quick-and-dirty plots, directly in the terminal. Here is the output:

    References

      -
    1. Hall, Brian C. 2019. “Eigenvalues of Random Matrices in the General Linear Group in the Large-\(N\) Limit.” Notices of the American Mathematical Society 66, no. 4 (Spring): 568-569. https://www.ams.org/journals/notices/201904/201904FullIssue.pdf
    2. -
    3. Ginibre, Jean. “Statistical ensembles of complex, quaternion, and real matrices.” Journal of Mathematical Physics 6.3 (1965): 440-449. https://doi.org/10.1063/1.1704292
    4. +
    5. Hall, Brian C. 2019. “Eigenvalues of Random Matrices in the General Linear Group in the Large-\(N\) Limit.” Notices of the American Mathematical Society 66, no. 4 (Spring): 568-569. https://www.ams.org/journals/notices/201904/201904FullIssue.pdf
    6. +
    7. Ginibre, Jean. “Statistical ensembles of complex, quaternion, and real matrices.” Journal of Mathematical Physics 6.3 (1965): 440-449. https://doi.org/10.1063/1.1704292
    @@ -770,19 +773,20 @@ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}
    -

    Table of Contents

    @@ -908,20 +925,36 @@ then \(\varphi(n)\) is true for every natural n

    A Markov Decision Process is a tuple \((\mathcal{S}, \mathcal{A}, \mathcal{R}, p)\) where:

    The function \(p\) represents the probability of transitioning to the state \(s'\) and getting a reward \(r\) when the agent is at state \(s\) and chooses action \(a\).

    We will also use occasionally the state-transition probabilities:

    - +\[\begin{align} + p &: \mathcal{S} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ +p(s' \;|\; s, a) &:= \mathbb{P}(S_t=s' \;|\; S_{t-1}=s, A_{t-1}=a) \\ +&= \sum_r p(s', r \;|\; s, a). +\end{align} +\]

    Rewarding the agent

    The expected reward of a state-action pair is the function

    +\[\begin{align} +r &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ +r(s,a) &:= \mathbb{E}[R_t \;|\; S_{t-1}=s, A_{t-1}=a] \\ +&= \sum_r r \sum_{s'} p(s', r \;|\; s, a). +\end{align} +\]

    The discounted return is the sum of all future rewards, with a multiplicative factor to give more weights to more immediate rewards: \[ G_t := \sum_{k=t+1}^T \gamma^{k-t-1} R_k, \] where \(T\) can be infinite or \(\gamma\) can be 1, but not both.

    @@ -931,14 +964,33 @@ then \(\varphi(n)\) is true for every natural n

    A policy is a way for the agent to choose the next action to perform.

    A policy is a function \(\pi\) defined as

    +\[\begin{align} +\pi &: \mathcal{A} \times \mathcal{S} \mapsto [0,1] \\ +\pi(a \;|\; s) &:= \mathbb{P}(A_t=a \;|\; S_t=s). +\end{align} +\]

    In order to compare policies, we need to associate values to them.

    The state-value function of a policy \(\pi\) is

    +\[\begin{align} +v_{\pi} &: \mathcal{S} \mapsto \mathbb{R} \\ +v_{\pi}(s) &:= \text{expected return when starting in $s$ and following $\pi$} \\ +v_{\pi}(s) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s\right] \\ +v_{\pi}(s) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s\right] +\end{align} +\]

    We can also compute the value starting from a state \(s\) by also taking into account the action taken \(a\).

    The action-value function of a policy \(\pi\) is

    +\[\begin{align} +q_{\pi} &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ +q_{\pi}(s,a) &:= \text{expected return when starting from $s$, taking action $a$, and following $\pi$} \\ +q_{\pi}(s,a) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s, A_t=a \right] \\ +q_{\pi}(s,a) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s, A_t=a\right] +\end{align} +\]

    The quest for the optimal policy

    References

    @@ -960,14 +1012,15 @@ then \(\varphi(n)\) is true for every natural n
    -

    Table of Contents

    The APL family of languages

    Why APL?

    I recently got interested in APL, an array-based programming language. In APL (and derivatives), we try to reason about programs as series of transformations of multi-dimensional arrays. This is exactly the kind of style I like in Haskell and other functional languages, where I also try to use higher-order functions (map, fold, etc) on lists or arrays. A developer only needs to understand these abstractions once, instead of deconstructing each loop or each recursive function encountered in a program.

    @@ -1040,7 +1093,7 @@ then \(\varphi(n)\) is true for every natural n

    Final output, with a \(80\times 80\) random lattice, after 50000 update steps:

    diff --git a/_site/contact.html b/_site/contact.html index e091178..0a552a2 100644 --- a/_site/contact.html +++ b/_site/contact.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    diff --git a/_site/cv.html b/_site/cv.html index 0c5e356..225857c 100644 --- a/_site/cv.html +++ b/_site/cv.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -

    (PDF version)

    diff --git a/_site/index.html b/_site/index.html index aed4ec1..8e585fd 100644 --- a/_site/index.html +++ b/_site/index.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    Dimitri Lozeve portrait diff --git a/_site/posts/dyalog-apl-competition-2020-phase-1.html b/_site/posts/dyalog-apl-competition-2020-phase-1.html index a7634da..2d93343 100644 --- a/_site/posts/dyalog-apl-competition-2020-phase-1.html +++ b/_site/posts/dyalog-apl-competition-2020-phase-1.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -46,7 +50,6 @@ -
    @@ -54,7 +57,7 @@
    -

    Table of Contents

    Introduction

    -

    I’ve always been quite fond of APL and its “array-oriented” approach of programmingSee my previous post on simulating the Ising model with APL. It also contains more background on APL.
    +

    I’ve always been quite fond of APL and its “array-oriented” approach of programmingSee my previous post on simulating the Ising model with APL. It also contains more background on APL.

    . Every year, Dyalog (the company behind probably the most popular APL implementation) organises a competition with various challenges in APL.

    The Dyalog APL Problem Solving Competition consists of two phases:

    @@ -76,7 +79,7 @@
  • Phase I consists of 10 short puzzles (similar to what one can find on Project Euler or similar), that can be solved by a one-line APL function.
  • Phase II is a collection of larger problems, that may require longer solutions and a larger context (e.g. reading and writing to files), often in a more applied setting. Problems are often inspired by existing domains, such as AI, bioinformatics, and so on.
  • -

    In 2018, I participated in the competition, entering only Phase ISince I was a student at the time, I was eligible for a prize, and I won $100 for a 10-line submission, which is quite good!
    +

    In 2018, I participated in the competition, entering only Phase ISince I was a student at the time, I was eligible for a prize, and I won $100 for a 10-line submission, which is quite good!

    (my solutions are on GitHub). This year, I entered in both phases. I explain my solutions to Phase I in this post. Another post will contain annotated solutions for Phase II problems.

    The full code for my submission is on GitHub at dlozeve/apl-competition-2020, but everything is reproduced in this post.

    @@ -87,7 +90,7 @@

    If X<0, the second vector contains the last |X elements of Y and the first vector contains the remaining elements.

    Solution: (0>⊣)⌽((⊂↑),(⊂↓))

    -

    There are three nested trains hereTrains are nice to read (even if they are easy to abuse), and generally make for shorter dfns, which is better for Phase I.
    +

    There are three nested trains hereTrains are nice to read (even if they are easy to abuse), and generally make for shorter dfns, which is better for Phase I.

    . The first one, ((⊂↑),(⊂↓)), uses the two functions Take () and Drop () to build a nested array consisting of the two outputs we need. (Take and Drop already have the behaviour needed regarding negative arguments.) However, if the left argument is positive, the two arrays will not be in the correct order. So we need a way to reverse them if X<0.

    The second train (0>⊣) will return 1 if its left argument is positive. From this, we can use Rotate () to correctly order the nested array, in the last train.

    @@ -95,15 +98,15 @@

    UTF-8 encodes Unicode characters using 1-4 integers for each character. Dyalog APL includes a system function, ⎕UCS, that can convert characters into integers and integers into characters. The expression 'UTF-8'∘⎕UCS converts between characters and UTF-8.

    Consider the following:

    - +
          'UTF-8'∘⎕UCS 'D¥⍺⌊○9'
    +68 194 165 226 141 186 226 140 138 226 151 139 57
    +      'UTF-8'∘⎕UCS 68 194 165 226 141 186 226 140 138 226 151 139 57
    +D¥⍺⌊○9

    How many integers does each character use?

    - +
          'UTF-8'∘⎕UCS¨ 'D¥⍺⌊○9' ⍝ using ]Boxing on
    +┌──┬───────┬───────────┬───────────┬───────────┬──┐
    +│68│194 165│226 141 186│226 140 138│226 151 139│57│
    +└──┴───────┴───────────┴───────────┴───────────┴──┘      

    The rule is that an integer in the range 128 to 191 (inclusive) continues the character of the previous integer (which may itself be a continuation). With that in mind, write a function that, given a right argument which is a simple integer vector representing valid UTF-8 text, encloses each sequence of integers that represent a single character, like the result of 'UTF-8'∘⎕UCS¨'UTF-8'∘⎕UCS but does not use any system functions (names beginning with )

    Solution: {(~⍵∊127+⍳64)⊂⍵}

    @@ -132,7 +135,7 @@

    Write a function that, given a right argument of 2 integers, returns a vector of the integers from the first element of the right argument to the second, inclusively.

    Solution: {(⊃⍵)+(-×-/⍵)×0,⍳|-/⍵}

    -

    First, we have to compute the range of the output, which is the absolute value of the difference between the two integers |-/⍵. From this, we compute the actual sequence, including zeroIf we had ⎕IO←0, we could have written ⍳|1+-/⍵, but this is the same number of characters.
    +

    First, we have to compute the range of the output, which is the absolute value of the difference between the two integers |-/⍵. From this, we compute the actual sequence, including zeroIf we had ⎕IO←0, we could have written ⍳|1+-/⍵, but this is the same number of characters.

    : 0,⍳|-/⍵.

    This sequence will always be nondecreasing, but we have to make it decreasing if needed, so we multiply it by the opposite of the sign of -/⍵. Finally, we just have to start the sequence at the first element of .

    @@ -168,44 +171,44 @@

    Solution: {∧/(⍳∘≢≡⍋)¨(⊂((⊢⍳⌈/)↑⊢),⍵),⊂⌽((⊢⍳⌈/)↓⊢),⍵}

    How do we approach this? First we have to split the vector at the “apex”. The train (⊢⍳⌈/) will return the index of () the maximum element.

    - +
          (⊢⍳⌈/)1 3 3 4 5 2 1
    +5

    Combined with Take () and Drop (), we build a two-element vector containing both parts, in ascending order (we Reverse () one of them). Note that we have to Ravel (,) the argument to avoid rank errors in Index Of.

    - +
          {(⊂((⊢⍳⌈/)↑⊢),⍵),⊂⌽((⊢⍳⌈/)↓⊢),⍵}1 3 3 4 5 2 1
    +┌─────────┬───┐
    +│1 3 3 4 5│1 2│
    +└─────────┴───┘

    Next, (⍳∘≢≡⍋) on each of the two vectors will test if they are non-decreasing (i.e. if the ranks of all the elements correspond to a simple range from 1 to the size of the vector).

    10. Stacking It Up

    Write a function that takes as its right argument a vector of simple arrays of rank 2 or less (scalar, vector, or matrix). Each simple array will consist of either non-negative integers or printable ASCII characters. The function must return a simple character array that displays identically to what {⎕←⍵}¨ displays when applied to the right argument.

    Solution: {↑⊃,/↓¨⍕¨⍵}

    -

    The first step is to Format () everything to get strings. A lot of trial-and-error is always necessary when dealing with nested arrays, and this being about formatting exacerbates the problem.
    +

    The first step is to Format () everything to get strings. A lot of trial-and-error is always necessary when dealing with nested arrays, and this being about formatting exacerbates the problem.

    The next step would be to “stack everything vertically”, so we will need Mix () at some point. However, if we do it immediately we don’t get the correct result:

    - -

    Mix is padding with spaces both horizontally (necessary as we want the output to be a simple array of characters) and vertically (not what we want). We will have to decompose everything line by line, and then mix all the lines together. This is exactly what SplitSplit is the dual of Mix.
    +

          {↑⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')
    +1 2 3  
    +4 5 6  
    +7 8 9  
    +
    +Adam   
    +Michael
    +

    Mix is padding with spaces both horizontally (necessary as we want the output to be a simple array of characters) and vertically (not what we want). We will have to decompose everything line by line, and then mix all the lines together. This is exactly what SplitSplit is the dual of Mix.

    () does:

    -
          {↓¨⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')(⍳10) '*'(5 5⍴⍳25)
    -┌───────────────────┬─────────────────┬──────────────────────┬─┬───────────────
    -│┌─────┬─────┬─────┐│┌───────┬───────┐│┌────────────────────┐│*│┌──────────────
    -││1 2 3│4 5 6│7 8 9│││Adam   │Michael│││1 2 3 4 5 6 7 8 9 10││ ││ 1  2  3  4  5
    -│└─────┴─────┴─────┘│└───────┴───────┘│└────────────────────┘│ │└──────────────
    -└───────────────────┴─────────────────┴──────────────────────┴─┴───────────────
    -
    -      ─────────────────────────────────────────────────────────────┐
    -      ┬──────────────┬──────────────┬──────────────┬──────────────┐│
    -      │ 6  7  8  9 10│11 12 13 14 15│16 17 18 19 20│21 22 23 24 25││
    -      ┴──────────────┴──────────────┴──────────────┴──────────────┘│
    -      ─────────────────────────────────────────────────────────────┘
    +
          {↓¨⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')(⍳10) '*'(5 5⍴⍳25)
    +┌───────────────────┬─────────────────┬──────────────────────┬─┬───────────────
    +│┌─────┬─────┬─────┐│┌───────┬───────┐│┌────────────────────┐│*│┌──────────────
    +││1 2 3│4 5 6│7 8 9│││Adam   │Michael│││1 2 3 4 5 6 7 8 9 10││ ││ 1  2  3  4  5
    +│└─────┴─────┴─────┘│└───────┴───────┘│└────────────────────┘│ │└──────────────
    +└───────────────────┴─────────────────┴──────────────────────┴─┴───────────────
    +
    +      ─────────────────────────────────────────────────────────────┐
    +      ┬──────────────┬──────────────┬──────────────┬──────────────┐│
    +      │ 6  7  8  9 10│11 12 13 14 15│16 17 18 19 20│21 22 23 24 25││
    +      ┴──────────────┴──────────────┴──────────────┴──────────────┘│
    +      ─────────────────────────────────────────────────────────────┘

    Next, we clean this up with Ravel (,) and we can Mix to obtain the final result.

    diff --git a/_site/posts/dyalog-apl-competition-2020-phase-2.html b/_site/posts/dyalog-apl-competition-2020-phase-2.html index 4f5b75a..2f5bfc7 100644 --- a/_site/posts/dyalog-apl-competition-2020-phase-2.html +++ b/_site/posts/dyalog-apl-competition-2020-phase-2.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -46,7 +50,6 @@ -
    @@ -54,7 +57,7 @@
    -

    Table of Contents

    Introduction

    After Phase I, here are my solutions to Phase II problems. The full code is included in the post, but everything is also available on GitHub.

    A PDF of the problems descriptions is available on the competition website, or directly from my GitHub repo.

    The submission guidelines gave a template where everything is defined in a Contest2020.Problems Namespace. I kept the default values for ⎕IO and ⎕ML because the problems were not particularly easier with ⎕IO←0.

    - +
    :Namespace Contest2020
    +
    + :Namespace Problems
    +  (⎕IO ⎕ML ⎕WX)←1 1 3

    Problem 1 – Take a Dive

    - +
    ∇ score←dd DiveScore scores
    +  :If 7=≢scores
    +   scores←scores[¯2↓2↓⍋scores]
    +  :ElseIf 5=≢scores
    +   scores←scores[¯1↓1↓⍋scores]
    +  :Else
    +   scores←scores
    +  :EndIf
    +  score←2(⍎⍕)dd×+/scores
    +

    This is a very straightforward implementation of the algorithm describe in the problem description. I decided to switch explicitly on the size of the input vector because I feel it is more natural. For the cases with 5 or 7 judges, we use Drop () to remove the lowest and highest scores.

    At the end, we sum up the scores with +/ and multiply them by dd. The last operation, 2(⍎⍕), is a train using Format (Dyadic) to round to 2 decimal places, and Execute to get actual numbers and not strings.

    Problem 2 – Another Step in the Proper Direction

    - +
    ∇ steps←{p}Steps fromTo;segments;width
    +  width←|-/fromTo
    +  :If 0=⎕NC'p' ⍝ No left argument: same as Problem 5 of Phase I
    +   segments←0,⍳width
    +  :ElseIf p<0 ⍝ -⌊p is the number of equally-sized steps to take
    +   segments←(-⌊p){0,⍵×⍺÷⍨⍳⍺}width
    +  :ElseIf p>0 ⍝ p is the step size
    +   segments←p{⍵⌊⍺×0,⍳⌈⍵÷⍺}width
    +  :ElseIf p=0 ⍝ As if we took zero step
    +   segments←0
    +  :EndIf
    +  ⍝ Take into account the start point and the direction.
    +  steps←fromTo{(⊃⍺)+(-×-/⍺)×⍵}segments
    +

    This is an extension to Problem 5 of Phase I. In each case, we compute the “segments”, i.e., the steps starting from 0. In a last step, common to all cases, we add the correct starting point and correct the direction if need be.

    To compute equally-sized steps, we first divide the segment \([0, 1]\) in p equal segments with (⍳p)÷p. This subdivision can then be multiplied by the width to obtain the required segments.

    When p is the step size, we just divide the width by the step size (rounded to the next largest integer) to get the required number of segments. If the last segment is too large, we “crop” it to the width with Minimum ().

    Problem 3 – Past Tasks Blast

    - +
    ∇ urls←PastTasks url;r;paths
    +  r←HttpCommand.Get url
    +  paths←('[a-zA-Z0-9_/]+\.pdf'⎕S'&')r.Data
    +  urls←('https://www.dyalog.com/'∘,)¨paths
    +

    I decided to use HttpCommand for this task, since it is simply one ]load HttpCommand away and should be platform-independent.

    Parsing XML is not something I consider “fun” in the best of cases, and I feel like APL is not the best language to do this kind of thing. Given how simple the task is, I just decided to find the relevant bits with a regular expression using Replace and Search (⎕S).

    After finding all the strings vaguely resembling a PDF file name (only alphanumeric characters and underscores, with a .pdf extension), I just concatenate them to the base URL of the Dyalog domain.

    Problem 4 – Bioinformatics

    The first task can be solved by decomposing it into several functions.

    - +
    ⍝ Test if a DNA string is a reverse palindrome.
    +isrevp←{⍵≡⌽'TAGC'['ATCG'⍳⍵]}

    First, we compute the complement of a DNA string (using simple indexing) and test if its Reverse () is equal to the original string.

    - +
    ⍝ Generate all subarrays (position, length) pairs, for 4 ≤ length ≤ 12.
    +subarrays←{⊃,/(⍳⍵),¨¨3↓¨⍳¨12⌊1+⍵-⍳⍵}

    We first compute all the possible lengths for each starting point. For instance, the last element cannot have any (position, length) pair associated to it, because there is no three element following it. So we crop the possible lengths to \([3, 12]\). For instance for an array of size 10:

    - +
            {3↓¨⍳¨12⌊1+⍵-⍳⍵}10
    +┌──────────────┬───────────┬─────────┬───────┬─────┬───┬─┬┬┬┐
    +│4 5 6 7 8 9 10│4 5 6 7 8 9│4 5 6 7 8│4 5 6 7│4 5 6│4 5│4││││
    +└──────────────┴───────────┴─────────┴───────┴─────┴───┴─┴┴┴┘

    Then, we just add the corresponding starting position to each length (1 for the first block, 2 for the second, and so on). Finally, we flatten everything.

    - +
    ∇ r←revp dna;positions
    +  positions←subarrays⍴dna
    +  ⍝ Filter subarrays which are reverse palindromes.
    +  r←↑({isrevp dna[¯1+⍵[1]+⍳⍵[2]]}¨positions)/positions
    +

    For each possible (position, length) pair, we get the corresponding DNA substring with dna[¯1+⍵[1]+⍳⍵[2]] (adding ¯1 is necessary because ⎕IO←1). We test if this substring is a reverse palindrome using isrevp above. Replicate (/) then selects only the (position, length) pairs for which the substring is a reverse palindrome.

    The second task is just about counting the number of subsets modulo 1,000,000. So we just need to compute \(2^n \mod 1000000\) for any positive integer \(n\leq1000\).

    - +
    sset←{((1E6|2∘×)⍣⍵)1}

    Since we cannot just compute \(2^n\) directly and take the remainder, we use modular arithmetic to stay mod 1,000,000 during the whole computation. The dfn (1E6|2∘×) doubles its argument mod 1,000,000. So we just apply this function \(n\) times using the Power operator (), with an initial value of 1.

    Problem 5 – Future and Present Value

    First solution: ((1+⊢)⊥⊣) computes the total return for a vector of amounts and a vector of rates . It is applied to every prefix subarray of amounts and rates to get all intermediate values. However, this has quadratic complexity.

    - -

    Second solution: We want to be able to use the recurrence relation (recur) and scan through the vectors of amounts and rates, accumulating the total value at every time step. However, APL evaluation is right-associative, so a simple Scan (recur\amounts,¨values) would not give the correct result, since recur is not associative and we need to evaluate it left-to-right. (In any case, in this case, Scan would have quadratic complexity, so would not bring any benefit over the previous solution.) What we need is something akin to Haskell’s scanl function, which would evaluate left to right in \(O(n)\) timeThere is an interesting StackOverflow answer explaining the behaviour of Scan, and compares it to Haskell’s scanl function.
    +

    rr←(,\⊣)((1+⊢)⊥⊣)¨(,\⊢)
    +

    Second solution: We want to be able to use the recurrence relation (recur) and scan through the vectors of amounts and rates, accumulating the total value at every time step. However, APL evaluation is right-associative, so a simple Scan (recur\amounts,¨values) would not give the correct result, since recur is not associative and we need to evaluate it left-to-right. (In any case, in this case, Scan would have quadratic complexity, so would not bring any benefit over the previous solution.) What we need is something akin to Haskell’s scanl function, which would evaluate left to right in \(O(n)\) timeThere is an interesting StackOverflow answer explaining the behaviour of Scan, and compares it to Haskell’s scanl function.

    . This is what we do here, accumulating values from left to right. (This is inspired from dfns.ascan, although heavily simplified.)

    - +
    rr←{recur←{⍵[1]+⍺×1+⍵[2]} ⋄ 1↓⌽⊃{(⊂(⊃⍵)recur⍺),⍵}/⌽⍺,¨⍵}

    For the second task, there is an explicit formula for cashflow calculations, so we can just apply it.

    - +
    pv←{+/⍺÷×\1+⍵}

    Problem 6 – Merge

    - +
    ∇ text←templateFile Merge jsonFile;template;ns
    +  template←⊃⎕NGET templateFile 1
    +  ns←⎕JSON⊃⎕NGET jsonFile
    +  ⍝ We use a simple regex search and replace on the
    +  ⍝ template.
    +  text←↑('@[a-zA-Z]*@'⎕R{ns getval ¯1↓1↓⍵.Match})template
    +

    We first read the template and the JSON values from their files. The ⎕NGET function read simple text files, and ⎕JSON extracts the key-value pairs as a namespace.

    Assuming all variable names contain only letters, we match the regex @[a-zA-Z]*@ to match variable names enclosed between @ symbols. The function getval then returns the appropriate value, and we can replace the variable name in the template.

    - +
    ∇ val←ns getval var
    +  :If ''≡var ⍝ literal '@'
    +   val←'@'
    +  :ElseIf (⊂var)∊ns.⎕NL ¯2
    +   val←⍕ns⍎var
    +  :Else
    +   val←'???'
    +  :EndIf
    +

    This function takes the namespace matching the variable names to their respective values, and the name of the variable.

    • If the variable name is empty, we matched the string @@, which corresponds to a literal @.
    • @@ -171,42 +174,42 @@
    • Otherwise, we have an unknown variable, so we replace it with ???.

    Problem 7 – UPC

    - +
    CheckDigit←{10|-⍵+.×11⍴3 1}

    The check digit satisfies the equation \[ 3 x_{1}+x_{2}+3 x_{3}+x_{4}+3 x_{5}+x_{6}+3 x_{7}+x_{8}+3 x_{9}+x_{10}+3 x_{11}+x_{12} \equiv 0 \bmod 10, \] therefore, \[ x_{12} \equiv -(3 x_{1}+x_{2}+3 x_{3}+x_{4}+3 x_{5}+x_{6}+3 x_{7}+x_{8}+3 x_{9}+x_{10}+3 x_{11}) \bmod 10. \]

    Translated to APL, we just take the dot product between the first 11 digits of the barcode with 11⍴3 1, negate it, and take the remainder by 10.

    - +
    ⍝ Left and right representations of digits. Decoding
    +⍝ the binary representation from decimal is more
    +⍝ compact than writing everything explicitly.
    +lrepr←⍉(7⍴2)⊤13 25 19 61 35 49 47 59 55 11
    +rrepr←~¨lrepr

    For the second task, the first thing we need to do is save the representation of digits. To save space, I did not encode the binary representation explicitly, instead using a decimal representation that I then decode in base 2. The right representation is just the bitwise negation.

    - +
    ∇ bits←WriteUPC digits;left;right
    +  :If (11=≢digits)∧∧/digits∊0,⍳9
    +   left←,lrepr[1+6↑digits;]
    +   right←,rrepr[1+6↓digits,CheckDigit digits;]
    +   bits←1 0 1,left,0 1 0 1 0,right,1 0 1
    +  :Else
    +   bits←¯1
    +  :EndIf
    +

    First of all, if the vector digits does not have exactly 11 elements, all between 0 and 9, it is an error and we return ¯1.

    Then, we take the first 6 digits and encode them with lrepr, and the last 5 digits plus the check digit encoded with rrepr. In each case, adding 1 is necessary because ⎕IO←1. We return the final bit array with the required beginning, middle, and end guard patterns.

    - +
    ∇ digits←ReadUPC bits
    +  :If 95≠⍴bits ⍝ incorrect number of bits
    +   digits←¯1
    +  :Else
    +   ⍝ Test if the barcode was scanned right-to-left.
    +   :If 0=2|+/bits[3+⍳7]
    +    bits←⌽bits
    +   :EndIf
    +   digits←({¯1+lrepr⍳⍵}¨(7/⍳6)⊆42↑3↓bits),{¯1+rrepr⍳⍵}¨(7/⍳6)⊆¯42↑¯3↓bits
    +   :If ~∧/digits∊0,⍳9 ⍝ incorrect parity
    +    digits←¯1
    +   :ElseIf (⊃⌽digits)≠CheckDigit ¯1↓digits ⍝ incorrect check digit
    +    digits←¯1
    +   :EndIf
    +  :EndIf
    +
    • If we don’t have the correct number of bits, we return ¯1.
    • We test the first digit for its parity, to determine if its actually a left representation. If it’s not, we reverse the bit array.
    • @@ -214,53 +217,53 @@
    • Final checks for the range of the digits (i.e., if the representations could not be found in the lrepr and rrepr vectors), and for the check digit.

    Problem 8 – Balancing the Scales

    - +
    ∇ parts←Balance nums;subsets;partitions
    +  ⍝ This is a brute force solution, running in
    +  ⍝ exponential time. We generate all the possible
    +  ⍝ partitions, filter out those which are not
    +  ⍝ balanced, and return the first matching one. There
    +  ⍝ are more advanced approach running in
    +  ⍝ pseudo-polynomial time (based on dynamic
    +  ⍝ programming, see the "Partition problem" Wikipedia
    +  ⍝ page), but they are not warranted here, as the
    +  ⍝ input size remains fairly small.
    +
    +  ⍝ Generate all partitions of a vector of a given
    +  ⍝ size, as binary mask vectors.
    +  subsets←{1↓2⊥⍣¯1⍳2*⍵}
    +  ⍝ Keep only the subsets whose sum is exactly
    +  ⍝ (+/nums)÷2.
    +  partitions←nums{((2÷⍨+/⍺)=⍺+.×⍵)/⍵}subsets⍴nums
    +  :If 0=≢,partitions
    +   ⍝ If no partition satisfy the above
    +   ⍝ criterion, we return ⍬.
    +   parts←⍬
    +  :Else
    +   ⍝ Otherwise, we return the first possible
    +   ⍝ partition.
    +   parts←nums{((⊂,(⊂~))⊃↓⍉⍵)/¨2⍴⊂⍺}partitions
    +  :EndIf
    +

    Problem 9 – Upwardly Mobile

    This is the only problem that I didn’t complete. It required parsing the files containing the graphical representations of the trees, which was needlessly complex and, quite frankly, hard and boring with a language like APL.

    However, the next part is interesting: once we have a matrix of coefficients representing the relationships between the weights, we can solve the system of equations. Matrix Divide () will find one solution to the system. Since the system is overdetermined, we fix A=1 to find one possible solution. Since we want integer weights, the solution we find is smaller than the one we want, and may contain fractional weights. So we multiply everything by the Lowest Common Multiple () to get the smallest integer weights.

    - - +
    ∇ weights←Weights filename;mobile;branches;mat
    +  ⍝ Put your code and comments below here
    +
    +  ⍝ Parse the mobile input file.
    +  mobile←↑⊃⎕NGET filename 1
    +  branches←⍸mobile∊'┌┴┐'
    +  ⍝ TODO: Build the matrix of coefficients mat.
    +
    +  ⍝ Solve the system of equations (arbitrarily setting
    +  ⍝ the first variable at 1 because the system is
    +  ⍝ overdetermined), then multiply the coefficients by
    +  ⍝ their least common multiple to get the smallest
    +  ⍝ integer weights.
    +  weights←((1∘,)×(∧/÷))mat[;1]⌹1↓[2]mat
    +
    +
     :EndNamespace
    +:EndNamespace
    diff --git a/_site/posts/ginibre-ensemble.html b/_site/posts/ginibre-ensemble.html index b771923..ba9870c 100644 --- a/_site/posts/ginibre-ensemble.html +++ b/_site/posts/ginibre-ensemble.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    @@ -60,22 +63,22 @@

    I find this mildly fascinating that such a straightforward definition of a random matrix can exhibit such non-random properties in their spectrum.

    Simulation

    I ran a quick simulation, thanks to Julia’s great ecosystem for linear algebra and statistical distributions:

    - +
    using LinearAlgebra
    +using UnicodePlots
    +
    +function ginibre(n)
    +    return randn((n, n)) * sqrt(1/2n) + im * randn((n, n)) * sqrt(1/2n)
    +end
    +
    +v = eigvals(ginibre(2000))
    +
    +scatterplot(real(v), imag(v), xlim=[-1.5,1.5], ylim=[-1.5,1.5])

    I like using UnicodePlots for this kind of quick-and-dirty plots, directly in the terminal. Here is the output:

    References

      -
    1. Hall, Brian C. 2019. “Eigenvalues of Random Matrices in the General Linear Group in the Large-\(N\) Limit.” Notices of the American Mathematical Society 66, no. 4 (Spring): 568-569. https://www.ams.org/journals/notices/201904/201904FullIssue.pdf
    2. -
    3. Ginibre, Jean. “Statistical ensembles of complex, quaternion, and real matrices.” Journal of Mathematical Physics 6.3 (1965): 440-449. https://doi.org/10.1063/1.1704292
    4. +
    5. Hall, Brian C. 2019. “Eigenvalues of Random Matrices in the General Linear Group in the Large-\(N\) Limit.” Notices of the American Mathematical Society 66, no. 4 (Spring): 568-569. https://www.ams.org/journals/notices/201904/201904FullIssue.pdf
    6. +
    7. Ginibre, Jean. “Statistical ensembles of complex, quaternion, and real matrices.” Journal of Mathematical Physics 6.3 (1965): 440-449. https://doi.org/10.1063/1.1704292
    diff --git a/_site/posts/hierarchical-optimal-transport-for-document-classification.html b/_site/posts/hierarchical-optimal-transport-for-document-classification.html index 607dce8..66af7ff 100644 --- a/_site/posts/hierarchical-optimal-transport-for-document-classification.html +++ b/_site/posts/hierarchical-optimal-transport-for-document-classification.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    @@ -52,15 +55,15 @@
    -

    Table of Contents

    +

    Two weeks ago, I did a presentation for my colleagues of the paper from Yurochkin et al. (2019), from NeurIPS 2019. It contains an interesting approach to document classification leading to strong performance, and, most importantly, excellent interpretability.

    This paper seems interesting to me because of it uses two methods with strong theoretical guarantees: optimal transport and topic modelling. Optimal transport looks very promising to me in NLP, and has seen a lot of interest in recent years due to advances in approximation algorithms, such as entropy regularisation. It is also quite refreshing to see approaches using solid results in optimisation, compared to purely experimental deep learning methods.

    Introduction and motivation

    The problem of the paper is to measure similarity (i.e. a distance) between pairs of documents, by incorporating semantic similarities (and not only syntactic artefacts), without encountering scalability issues.

    @@ -70,8 +73,8 @@
  • topic modelling methods (e.g. Latent Dirichlet Allocation), to represent semantically-meaningful groups of words.
  • Background: optimal transport

    -

    The essential backbone of the method is the Wasserstein distance, derived from optimal transport theory. Optimal transport is a fascinating and deep subject, so I won’t enter into the details here. For an introduction to the theory and its applications, check out the excellent book from Peyré and Cuturi (2019), (available on ArXiv as well). There are also very nice posts (in French) by Gabriel Peyré on the CNRS maths blog. Many more resources (including slides for presentations) are available at https://optimaltransport.github.io. For a more complete theoretical treatment of the subject, check out Santambrogio (2015), or, if you’re feeling particularly adventurous, Villani (2009).

    -

    For this paper, only a superficial understanding of how the Wasserstein distance works is necessary. Optimal transport is an optimisation technique to lift a distance between points in a given metric space, to a distance between probability distributions over this metric space. The historical example is to move piles of dirt around: you know the distance between any two points, and you have piles of dirt lying around Optimal transport originated with Monge, and then Kantorovich, both of whom had very clear military applications in mind (either in Revolutionary France, or during WWII). A lot of historical examples move cannon balls, or other military equipment, along a front line.
    +

    The essential backbone of the method is the Wasserstein distance, derived from optimal transport theory. Optimal transport is a fascinating and deep subject, so I won’t enter into the details here. For an introduction to the theory and its applications, check out the excellent book from Peyré and Cuturi (2019), (available on ArXiv as well). There are also very nice posts (in French) by Gabriel Peyré on the CNRS maths blog. Many more resources (including slides for presentations) are available at https://optimaltransport.github.io. For a more complete theoretical treatment of the subject, check out Santambrogio (2015), or, if you’re feeling particularly adventurous, Villani (2009).

    +

    For this paper, only a superficial understanding of how the Wasserstein distance works is necessary. Optimal transport is an optimisation technique to lift a distance between points in a given metric space, to a distance between probability distributions over this metric space. The historical example is to move piles of dirt around: you know the distance between any two points, and you have piles of dirt lying around Optimal transport originated with Monge, and then Kantorovich, both of whom had very clear military applications in mind (either in Revolutionary France, or during WWII). A lot of historical examples move cannon balls, or other military equipment, along a front line.

    . Now, if you want to move these piles to another configuration (fewer piles, say, or a different repartition of dirt a few metres away), you need to find the most efficient way to move them. The total cost you obtain will define a distance between the two configurations of dirt, and is usually called the earth mover’s distance, which is just an instance of the general Wasserstein metric.

    More formally, we start with two sets of points \(x = (x_1, x_2, \ldots, @@ -106,36 +109,36 @@ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}

    The first one can be precomputed once for all subsequent distances, so it is invariable in the number of documents we have to process. The second one only operates on \(\lvert T \rvert\) topics instead of the full vocabulary: the resulting optimisation problem is much smaller! This is great for performance, as it should be easy now to compute all pairwise distances in a large set of documents.

    Another interesting insight is that topics are represented as collections of words (we can keep the top 20 as a visual representations), and documents as collections of topics with weights. Both of these representations are highly interpretable for a human being who wants to understand what’s going on. I think this is one of the strongest aspects of these approaches: both the various representations and the algorithms are fully interpretable. Compared to a deep learning approach, we can make sense of every intermediate step, from the representations of topics to the weights in the optimisation algorithm to compute higher-level distances.

    -

    Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
    +

    Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).

    Experiments

    -

    The paper is very complete regarding experiments, providing a full evaluation of the method on one particular application: document clustering. They use Latent Dirichlet Allocation to compute topics and GloVe for pretrained word embeddings (Pennington, Socher, and Manning 2014), and Gurobi to solve the optimisation problems. Their code is available on GitHub.

    +

    The paper is very complete regarding experiments, providing a full evaluation of the method on one particular application: document clustering. They use Latent Dirichlet Allocation to compute topics and GloVe for pretrained word embeddings (Pennington, Socher, and Manning 2014), and Gurobi to solve the optimisation problems. Their code is available on GitHub.

    If you want the details, I encourage you to read the full paper, they tested the methods on a wide variety of datasets, with datasets containing very short documents (like Twitter), and long documents with a large vocabulary (books). With a simple \(k\)-NN classification, they establish that HOTT performs best on average, especially on large vocabularies (books, the “gutenberg” dataset). It also has a much better computational performance than alternative methods based on regularisation of the optimal transport problem directly on words. So the hierarchical nature of the approach allows to gain considerably in performance, along with improvements in interpretability.

    -

    What’s really interesting in the paper is the sensitivity analysis: they ran experiments with different word embeddings methods (word2vec, (Mikolov et al. 2013)), and with different parameters for the topic modelling (topic truncation, number of topics, etc). All of these reveal that changes in hyperparameters do not impact the performance of HOTT significantly. This is extremely important in a field like NLP where most of the times small variations in approach lead to drastically different results.

    +

    What’s really interesting in the paper is the sensitivity analysis: they ran experiments with different word embeddings methods (word2vec, (Mikolov et al. 2013)), and with different parameters for the topic modelling (topic truncation, number of topics, etc). All of these reveal that changes in hyperparameters do not impact the performance of HOTT significantly. This is extremely important in a field like NLP where most of the times small variations in approach lead to drastically different results.

    Conclusion

    All in all, this paper present a very interesting approach to compute distance between natural-language documents. It is no secret that I like methods with strong theoretical background (in this case optimisation and optimal transport), guaranteeing a stability and benefiting from decades of research in a well-established domain.

    Most importantly, this paper allows for future exploration in document representation with interpretability in mind. This is often added as an afterthought in academic research but is one of the most important topics for the industry, as a system must be understood by end users, often not trained in ML, before being deployed. The notion of topic, and distances as weights, can be understood easily by anyone without significant background in ML or in maths.

    Finally, I feel like they did not stop at a simple theoretical argument, but carefully checked on real-world datasets, measuring sensitivity to all the arbitrary choices they had to take. Again, from an industry perspective, this allows to implement the new approach quickly and easily, being confident that it won’t break unexpectedly without extensive testing.

    -

    References

    -
    +

    References

    +
    -

    Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems 26, 3111–9. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.

    +

    Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems 26, 3111–9. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.

    -

    Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “Glove: Global Vectors for Word Representation.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–43. Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162.

    +

    Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “Glove: Global Vectors for Word Representation.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–43. Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162.

    -

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    +

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    -

    Santambrogio, Filippo. 2015. Optimal Transport for Applied Mathematicians. Vol. 87. Progress in Nonlinear Differential Equations and Their Applications. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-20828-2.

    +

    Santambrogio, Filippo. 2015. Optimal Transport for Applied Mathematicians. Vol. 87. Progress in Nonlinear Differential Equations and Their Applications. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-20828-2.

    Villani, Cédric. 2009. Optimal Transport: Old and New. Grundlehren Der Mathematischen Wissenschaften 338. Berlin: Springer.

    -

    Yurochkin, Mikhail, Sebastian Claici, Edward Chien, Farzaneh Mirzazadeh, and Justin M Solomon. 2019. “Hierarchical Optimal Transport for Document Representation.” In Advances in Neural Information Processing Systems 32, 1599–1609. http://papers.nips.cc/paper/8438-hierarchical-optimal-transport-for-document-representation.pdf.

    +

    Yurochkin, Mikhail, Sebastian Claici, Edward Chien, Farzaneh Mirzazadeh, and Justin M Solomon. 2019. “Hierarchical Optimal Transport for Document Representation.” In Advances in Neural Information Processing Systems 32, 1599–1609. http://papers.nips.cc/paper/8438-hierarchical-optimal-transport-for-document-representation.pdf.

    diff --git a/_site/posts/iclr-2020-notes.html b/_site/posts/iclr-2020-notes.html index 00fc8df..8d89179 100644 --- a/_site/posts/iclr-2020-notes.html +++ b/_site/posts/iclr-2020-notes.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    @@ -52,31 +55,33 @@
    -

    Table of Contents

      +

      ICLR is one of the most important conferences in machine learning, and as such, I was very excited to have the opportunity to volunteer and attend the first fully-virtual edition of the event. The whole content of the conference has been made publicly available, only a few days after the end of the event!

      -

      I would like to thank the organizing committee for this incredible event, and the possibility to volunteer to help other participantsTo better organize the event, and help people navigate the various online tools, they brought in 500(!) volunteers, waved our registration fees, and asked us to do simple load-testing and tech support. This was a very generous offer, and felt very rewarding for us, as we could attend the conference, and give back to the organization a little bit.
      +

      I would like to thank the organizing committee for this incredible event, and the possibility to volunteer to help other participantsTo better organize the event, and help people navigate the various online tools, they brought in 500(!) volunteers, waved our registration fees, and asked us to do simple load-testing and tech support. This was a very generous offer, and felt very rewarding for us, as we could attend the conference, and give back to the organization a little bit.

      .

      The many volunteers, the online-only nature of the event, and the low registration fees also allowed for what felt like a very diverse, inclusive event. Many graduate students and researchers from industry (like me), who do not generally have the time or the resources to travel to conferences like this, were able to attend, and make the exchanges richer.

      In this post, I will try to give my impressions on the event, the speakers, and the workshops that I could attend. I will do a quick recap of the most interesting papers I saw in a future post.

      The Format of the Virtual Conference

      As a result of global travel restrictions, the conference was made fully-virtual. It was supposed to take place in Addis Ababa, Ethiopia, which is great for people who are often the target of restrictive visa policies in Northern American countries.

      -

      The thing I appreciated most about the conference format was its emphasis on asynchronous communication. Given how little time they had to plan the conference, they could have made all poster presentations via video-conference and call it a day. Instead, each poster had to record a 5-minute videoThe videos are streamed using SlidesLive, which is a great solution for synchronising videos and slides. It is very comfortable to navigate through the slides and synchronising the video to the slides and vice-versa. As a result, SlidesLive also has a very nice library of talks, including major conferences. This is much better than browsing YouTube randomly.
      +

      The thing I appreciated most about the conference format was its emphasis on asynchronous communication. Given how little time they had to plan the conference, they could have made all poster presentations via video-conference and call it a day. Instead, each poster had to record a 5-minute videoThe videos are streamed using SlidesLive, which is a great solution for synchronising videos and slides. It is very comfortable to navigate through the slides and synchronising the video to the slides and vice-versa. As a result, SlidesLive also has a very nice library of talks, including major conferences. This is much better than browsing YouTube randomly.

      -
      summarising their research. Alongside each presentation, there was a dedicated Rocket.Chat channelRocket.Chat seems to be an open-source alternative to Slack. Overall, the experience was great, and I appreciate the efforts of the organizers to use open source software instead of proprietary applications. I hope other conferences will do the same, and perhaps even avoid Zoom, because of recent privacy concerns (maybe try Jitsi?).
      +
      summarising their research. Alongside each presentation, there was a dedicated Rocket.Chat channelRocket.Chat seems to be an open-source alternative to Slack. Overall, the experience was great, and I appreciate the efforts of the organizers to use open source software instead of proprietary applications. I hope other conferences will do the same, and perhaps even avoid Zoom, because of recent privacy concerns (maybe try Jitsi?).

      where anyone could ask a question to the authors, or just show their appreciation for the work. This was a fantastic idea as it allowed any participant to interact with papers and authors at any time they please, which is especially important in a setting where people were spread all over the globe.

      There were also Zoom session where authors were available for direct, face-to-face discussions, allowing for more traditional conversations. But asking questions on the channel had also the advantage of keeping a track of all questions that were asked by other people. As such, I quickly acquired the habit of watching the video, looking at the chat to see the previous discussions (even if they happened in the middle of the night in my timezone!), and then skimming the paper or asking questions myself.

      @@ -100,7 +105,7 @@

      Beyond ‘tabula rasa’ in reinforcement learning: agents that remember, adapt, and generalize

      A lot of pretty advanced talks about RL. The general theme was meta-learning, aka “learning to learn”. This is a very active area of research, which goes way beyond classical RL theory, and offer many interesting avenues to adjacent fields (both inside ML and outside, especially cognitive science). The first talk, by Martha White, about inductive biases, was a very interesting and approachable introduction to the problems and challenges of the field. There was also a panel with Jürgen Schmidhuber. We hear a lot about him from the various controversies, but it’s nice to see him talking about research and future developments in RL.

      Causal Learning For Decision Making

      -

      Ever since I read Judea Pearl’s The Book of Why on causality, I have been interested in how we can incorporate causality reasoning in machine learning. This is a complex topic, and I’m not sure yet that it is a complete revolution as Judea Pearl likes to portray it, but it nevertheless introduces a lot of new fascinating ideas. Yoshua Bengio gave an interesting talkYou can find it at 4:45:20 in the livestream of the workshop.
      +

      Ever since I read Judea Pearl’s The Book of Why on causality, I have been interested in how we can incorporate causality reasoning in machine learning. This is a complex topic, and I’m not sure yet that it is a complete revolution as Judea Pearl likes to portray it, but it nevertheless introduces a lot of new fascinating ideas. Yoshua Bengio gave an interesting talkYou can find it at 4:45:20 in the livestream of the workshop.

      (even though very similar to his keynote talk) on causal priors for deep learning.

      Bridging AI and Cognitive Science

      diff --git a/_site/posts/ising-apl.html b/_site/posts/ising-apl.html index 7d2cdf0..847187c 100644 --- a/_site/posts/ising-apl.html +++ b/_site/posts/ising-apl.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
      @@ -44,7 +48,6 @@ -
      @@ -52,14 +55,15 @@
    -

    Table of Contents

      -
    • The APL family of languages

      The APL family of languages

      Why APL?

      I recently got interested in APL, an array-based programming language. In APL (and derivatives), we try to reason about programs as series of transformations of multi-dimensional arrays. This is exactly the kind of style I like in Haskell and other functional languages, where I also try to use higher-order functions (map, fold, etc) on lists or arrays. A developer only needs to understand these abstractions once, instead of deconstructing each loop or each recursive function encountered in a program.

      @@ -132,7 +136,7 @@
      • We draw a random lattice of size ⍺ with L ⍺.
      • -
      • We apply to it our update function, with $β$=10, ⍵ times (using the function, which applies a function \(n\) times.
      • +
      • We apply to it our update function, with $β$=10, ⍵ times (using the function, which applies a function \(n\) times.
      • Finally, we display -1 as a space and 1 as a domino ⌹.

      Final output, with a \(80\times 80\) random lattice, after 50000 update steps:

      diff --git a/_site/posts/ising-model.html b/_site/posts/ising-model.html index 7a1e21d..6accc36 100644 --- a/_site/posts/ising-model.html +++ b/_site/posts/ising-model.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
      @@ -44,7 +48,6 @@ -
      @@ -54,12 +57,12 @@
    -

    Table of Contents

    The Ising model is a model used to represent magnetic dipole moments in statistical physics. Physical details are on the Wikipedia page, but what is interesting is that it follows a complex probability distribution on a lattice, where each site can take the value +1 or -1.

    Mathematical definition

    @@ -84,83 +87,83 @@ H(\sigma) = -\sum_{i\sim j} J_{ij}\, \sigma_i\, \sigma_j,

    Implementation

    The simulation is in Clojure, using the Quil library (a Processing library for Clojure) to display the state of the system.

    This post is “literate Clojure”, and contains core.clj. The complete project can be found on GitHub.

    - +
    (ns ising-model.core
    +  (:require [quil.core :as q]
    +     [quil.middleware :as m]))

    The application works with Quil’s functional mode, with each function taking a state and returning an updated state at each time step.

    The setup function generates the initial state, with random initial spins. It also sets the frame rate. The matrix is a single vector in row-major mode. The state also holds relevant parameters for the simulation: \(\beta\), \(J\), and the iteration step.

    - +
    (defn setup [size]
    +  "Setup the display parameters and the initial state"
    +  (q/frame-rate 300)
    +  (q/color-mode :hsb)
    +  (let [matrix (vec (repeatedly (* size size) #(- (* 2 (rand-int 2)) 1)))]
    +    {:grid-size size
    +     :matrix matrix
    +     :beta 10
    +     :intensity 10
    +     :iteration 0}))

    Given a site \(i\), we reverse its spin to generate a new configuration state.

    - +
    (defn toggle-state [state i]
    +  "Compute the new state when we toggle a cell's value"
    +  (let [matrix (:matrix state)]
    +    (assoc state :matrix (assoc matrix i (* -1 (matrix i))))))

    In order to decide whether to accept this new state, we compute the difference in energy introduced by reversing site \(i\): \[ \Delta E = J\sigma_i \sum_{j\sim i} \sigma_j. \]

    The filter some? is required to eliminate sites outside of the boundaries of the lattice.

    - +
    (defn get-neighbours [state idx]
    +  "Return the values of a cell's neighbours"
    +  [(get (:matrix state) (- idx (:grid-size state)))
    +   (get (:matrix state) (dec idx))
    +   (get (:matrix state) (inc idx))
    +   (get (:matrix state) (+ (:grid-size state) idx))])
    +
    +(defn delta-e [state i]
    +  "Compute the energy difference introduced by a particular cell"
    +  (* (:intensity state) ((:matrix state) i)
    +     (reduce + (filter some? (get-neighbours state i)))))

    We also add a function to compute directly the hamiltonian for the entire configuration state. We can use it later to log its values across iterations.

    - +
    (defn hamiltonian [state]
    +  "Compute the Hamiltonian of a configuration state"
    +  (- (reduce + (for [i (range (count (:matrix state)))
    +       j (filter some? (get-neighbours state i))]
    +   (* (:intensity state) ((:matrix state) i) j)))))

    Finally, we put everything together in the update-state function, which will decide whether to accept or reject the new configuration.

    - +
    (defn update-state [state]
    +  "Accept or reject a new state based on energy
    +  difference (Metropolis-Hastings)"
    +  (let [i (rand-int (count (:matrix state)))
    + new-state (toggle-state state i)
    + alpha (q/exp (- (* (:beta state) (delta-e state i))))]
    +    ;;(println (hamiltonian new-state))
    +    (update (if (< (rand) alpha) new-state state)
    +     :iteration inc)))

    The last thing to do is to draw the new configuration:

    - +
     (defn draw-state [state]
    +   "Draw a configuration state as a grid"
    +   (q/background 255)
    +   (let [cell-size (quot (q/width) (:grid-size state))]
    +     (doseq [[i v] (map-indexed vector (:matrix state))]
    +(let [x (* cell-size (rem i (:grid-size state)))
    +      y (* cell-size  (quot i (:grid-size state)))]
    +  (q/no-stroke)
    +  (q/fill
    +   (if (= 1 v) 0 255))
    +  (q/rect x y cell-size cell-size))))
    +   ;;(when (zero? (mod (:iteration state) 50)) (q/save-frame "img/ising-######.jpg"))
    +   )

    And to reset the simulation when the user clicks anywhere on the screen:

    - - +
    (defn mouse-clicked [state event]
    +  "When the mouse is clicked, reset the configuration to a random one"
    +  (setup 100))
    +
    (q/defsketch ising-model
    +  :title "Ising model"
    +  :size [300 300]
    +  :setup #(setup 100)
    +  :update update-state
    +  :draw draw-state
    +  :mouse-clicked mouse-clicked
    +  :features [:keep-on-top :no-bind-output]
    +  :middleware [m/fun-mode])

    Conclusion

    The Ising model is a really easy (and common) example use of MCMC and Metropolis-Hastings. It allows to easily and intuitively understand how the algorithm works, and to make nice visualizations!

    diff --git a/_site/posts/lsystems.html b/_site/posts/lsystems.html index 05ba0e0..c12f5d0 100644 --- a/_site/posts/lsystems.html +++ b/_site/posts/lsystems.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    @@ -54,13 +57,15 @@
    -

    Table of Contents

    diff --git a/_site/posts/operations-research-references.html b/_site/posts/operations-research-references.html index c8cf007..4843b25 100644 --- a/_site/posts/operations-research-references.html +++ b/_site/posts/operations-research-references.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    @@ -52,10 +55,11 @@
    -

    Table of Contents

      +

      Introduction

      Operations research (OR) is a vast area comprising a lot of theory, different branches of mathematics, and too many applications to count. In this post, I will try to explain why it can be a little disconcerting to explore at first, and how to start investigating the topic with a few references to get started.

      Keep in mind that although I studied it during my graduate studies, this is not my primary area of expertise (I’m a data scientist by trade), and I definitely don’t pretend to know everything in OR. This is a field too vast for any single person to understand in its entirety, and I talk mostly from an “amateur mathematician and computer scientist” standpoint.

      Why is it hard to approach?

      -

      Operations research can be difficult to approach, since there are many references and subfields. Compared to machine learning for instance, OR has a slightly longer history (going back to the 17th century, for example with Monge and the optimal transport problem) For a very nice introduction (in French) to optimal transport, see these blog posts by Gabriel Peyré, on the CNRS maths blog: Part 1 and Part 2. See also the resources on optimaltransport.github.io (in English).
      +

      Operations research can be difficult to approach, since there are many references and subfields. Compared to machine learning for instance, OR has a slightly longer history (going back to the 17th century, for example with Monge and the optimal transport problem) For a very nice introduction (in French) to optimal transport, see these blog posts by Gabriel Peyré, on the CNRS maths blog: Part 1 and Part 2. See also the resources on optimaltransport.github.io (in English).

      . This means that good textbooks and such have existed for a long time, but also that there will be plenty of material to choose from.

      Moreover, OR is very close to applications. Sometimes methods may vary a lot in their presentation depending on whether they’re applied to train tracks, sudoku, or travelling salesmen. In practice, the terminology and notations are not the same everywhere. This is disconcerting if you are used to “pure” mathematics, where notations evolved over a long time and is pretty much standardised for many areas. In contrast, if you’re used to the statistics literature with its strange notations, you will find that OR is actually very well formalized.

      There are many subfields of operations research, including all kinds of optimization (constrained and unconstrained), game theory, dynamic programming, stochastic processes, etc.

      Where to start

      Introduction and modelling

      -

      For an overall introduction, I recommend Wentzel (1988). It is an old book, published by Mir Publications, a Soviet publisher which published many excellent scientific textbooks Mir also published Physics for Everyone by Lev Landau and Alexander Kitaigorodsky, a three-volume introduction to physics that is really accessible. Together with Feynman’s famous lectures, I read them (in French) when I was a kid, and it was the best introduction I could possibly have to the subject.
      +

      For an overall introduction, I recommend Wentzel (1988). It is an old book, published by Mir Publications, a Soviet publisher which published many excellent scientific textbooks Mir also published Physics for Everyone by Lev Landau and Alexander Kitaigorodsky, a three-volume introduction to physics that is really accessible. Together with Feynman’s famous lectures, I read them (in French) when I was a kid, and it was the best introduction I could possibly have to the subject.

      -
      . It is out of print, but it is available on Archive.org
      +
      . It is out of print, but it is available on Archive.org



      . The book is quite old, but everything presented is still extremely relevant today. It requires absolutely no background, and covers everything: a general introduction to the field, linear programming, dynamic programming, Markov processes and queues, Monte Carlo methods, and game theory. Even if you already know some of these topics, the presentations is so clear that it is a pleasure to read! (In particular, it is one of the best presentations of dynamic programming that I have ever read. The explanation of the simplex algorithm is also excellent.)

      If you are interested in optimization, the first thing you have to learn is modelling, i.e. transforming your problem (described in natural language, often from a particular industrial application) into a mathematical programme. The mathematical programme is the structure on which you will be able to apply an algorithm to find an optimal solution. Even if (like me) you are initially more interested in the algorithmic side of things, learning to create models will shed a lot of light on the overall process, and will give you more insight in general on the reasoning behind algorithms.

      -

      The best book I have read on the subject is Williams (2013)
      +

      The best book I have read on the subject is Williams (2013)



      . It contains a lot of concrete, step-by-step examples on concrete applications, in a multitude of domains, and remains very easy to read and to follow. It covers nearly every type of problem, so it is very useful as a reference. When you encounter a concrete problem in real life afterwards, you will know how to construct an appropriate model, and in the process you will often identify a common type of problem. The book then gives plenty of advice on how to approach each type of problem. Finally, it is also a great resource to build a “mental map” of the field, avoiding getting lost in the jungle of linear, stochastic, mixed integer, quadratic, and other network problems.

      -

      Another interesting resource is the freely available MOSEK Modeling Cookbook, covering many types of problems, with more mathematical details than in Williams (2013). It is built for people wanting to use the commercial MOSEK solver, so it could be useful if you plan to use a solver package like this one (more details on solvers below).

      +

      Another interesting resource is the freely available MOSEK Modeling Cookbook, covering many types of problems, with more mathematical details than in Williams (2013). It is built for people wanting to use the commercial MOSEK solver, so it could be useful if you plan to use a solver package like this one (more details on solvers below).

      Theory and algorithms

      -

      The basic algorithm for optimization is the simplex algorithm, developed by Dantzig in the 1940s to solve linear programming problems. It is the one of the main building blocks for mathematical optimization, and is used and referenced extensively in all kinds of approaches. As such, it is really important to understand it in detail. There are many books on the subject, but I especially liked Chvátal (1983) (out of print, but you can find cheap used versions on Amazon). It covers everything there is to know on the simplex algorithms (step-by-step explanations with simple examples, correctness and complexity analysis, computational and implementation considerations) and to many applications. I think it is overall the best introduction. Vanderbei (2014) follows a very similar outline, but contains more recent computational considerationsFor all the details about practical implementations of the simplex algorithm, Maros (2003) is dedicated to the computational aspects and contains everything you will need.
      +

      The basic algorithm for optimization is the simplex algorithm, developed by Dantzig in the 1940s to solve linear programming problems. It is the one of the main building blocks for mathematical optimization, and is used and referenced extensively in all kinds of approaches. As such, it is really important to understand it in detail. There are many books on the subject, but I especially liked Chvátal (1983) (out of print, but you can find cheap used versions on Amazon). It covers everything there is to know on the simplex algorithms (step-by-step explanations with simple examples, correctness and complexity analysis, computational and implementation considerations) and to many applications. I think it is overall the best introduction. Vanderbei (2014) follows a very similar outline, but contains more recent computational considerationsFor all the details about practical implementations of the simplex algorithm, Maros (2003) is dedicated to the computational aspects and contains everything you will need.

      . (The author also has lecture slides.)

      -

      For more books on linear programming, the two books Dantzig (1997), Dantzig (2003) are very complete, if somewhat more mathematically advanced. Bertsimas and Tsitsiklis (1997) is also a great reference, if you can find it.

      -

      For all the other subfields, this great StackExchange answer contains a lot of useful references, including most of the above. Of particular note are Peyré and Cuturi (2019) for optimal transport, Boyd (2004) for convex optimization (freely available online), and Nocedal (2006) for numerical optimization. Kochenderfer (2019)
      +

      For more books on linear programming, the two books Dantzig (1997), Dantzig (2003) are very complete, if somewhat more mathematically advanced. Bertsimas and Tsitsiklis (1997) is also a great reference, if you can find it.

      +

      For all the other subfields, this great StackExchange answer contains a lot of useful references, including most of the above. Of particular note are Peyré and Cuturi (2019) for optimal transport, Boyd (2004) for convex optimization (freely available online), and Nocedal (2006) for numerical optimization. Kochenderfer (2019)



      is not in the list (because it is very recent) but is also excellent, with examples in Julia covering nearly every kind of optimization algorithms.

      Online courses

      -

      If you would like to watch video lectures, there are a few good opportunities freely available online, in particular on MIT OpenCourseWare. The list of courses at MIT is available on their webpage. I haven’t actually looked in details at the courses contentI am more comfortable reading books than watching lecture videos online. Although I liked attending classes during my studies, I do not have the same feeling in front of a video. When I read, I can re-read three times the same sentence, pause to look up something, or skim a few paragraphs. I find that the inability to do that with a video diminishes greatly my ability to concentrate.
      +

      If you would like to watch video lectures, there are a few good opportunities freely available online, in particular on MIT OpenCourseWare. The list of courses at MIT is available on their webpage. I haven’t actually looked in details at the courses contentI am more comfortable reading books than watching lecture videos online. Although I liked attending classes during my studies, I do not have the same feeling in front of a video. When I read, I can re-read three times the same sentence, pause to look up something, or skim a few paragraphs. I find that the inability to do that with a video diminishes greatly my ability to concentrate.

      , so I cannot vouch for them directly, but MIT courses are generally of excellent quality. Most courses are also taught by Bertsimas and Bertsekas, who are very famous and wrote many excellent books.

      Of particular notes are:

      @@ -111,12 +115,12 @@
    • Algebraic Techniques and Semidefinite Optimization,
    • Integer Programming and Combinatorial Optimization.
    -

    Another interesting course I found online is Deep Learning in Discrete Optimization, at Johns Hopkins It is taught by William Cook, who is the author of In Pursuit of the Traveling Salesman, a nice introduction to the TSP problem in a readable form.
    +

    Another interesting course I found online is Deep Learning in Discrete Optimization, at Johns Hopkins It is taught by William Cook, who is the author of In Pursuit of the Traveling Salesman, a nice introduction to the TSP problem in a readable form.

    . It contains an interesting overview of deep learning and integer programming, with a focus on connections, and applications to recent research areas in ML (reinforcement learning, attention, etc.).

    Solvers and computational resources

    When you start reading about modelling and algorithms, I recommend you try solving a few problems yourself, either by hand for small instances, or using an existing solver. It will allow you to follow the examples in books, while also practising your modelling skills. You will also get an intuition of what is difficult to model and to solve.

    -

    There are many solvers available, both free and commercial, with various capabilities. I recommend you use the fantastic JuMP
    +

    There are many solvers available, both free and commercial, with various capabilities. I recommend you use the fantastic JuMP



    @@ -126,10 +130,10 @@

    Another awesome resource is the NEOS Server. It offers free computing resources for numerical optimization, including all major free and commercial solvers! You can submit jobs on it in a standard format, or interface your favourite programming language with it. The fact that such an amazing resource exists for free, for everyone is extraordinary. They also have an accompanying book, the NEOS Guide, containing many case studies and description of problem types. The taxonomy may be particularly useful.

    Conclusion

    Operations research is a fascinating topic, and it has an abundant literature that makes it very easy to dive into the subject. If you are interested in algorithms, modelling for practical applications, or just wish to understand more, I hope to have given you the first steps to follow, start reading and experimenting.

    -

    References

    -
    +

    References

    +
    -

    Bertsimas, Dimitris, and John N. Tsitsiklis. 1997. Introduction to Linear Optimization. Belmont, Massachusetts: Athena Scientific. http://www.athenasc.com/linoptbook.html.

    +

    Bertsimas, Dimitris, and John N. Tsitsiklis. 1997. Introduction to Linear Optimization. Belmont, Massachusetts: Athena Scientific. http://www.athenasc.com/linoptbook.html.

    Boyd, Stephen. 2004. Convex Optimization. Cambridge, UK New York: Cambridge University Press.

    @@ -138,10 +142,10 @@

    Chvátal, Vašek. 1983. Linear Programming. New York: W.H. Freeman.

    -

    Dantzig, George. 1997. Linear Programming 1: Introduction. New York: Springer. https://www.springer.com/gp/book/9780387948331.

    +

    Dantzig, George. 1997. Linear Programming 1: Introduction. New York: Springer. https://www.springer.com/gp/book/9780387948331.

    -

    ———. 2003. Linear Programming 2: Theory and Extensions. New York: Springer. https://www.springer.com/gp/book/9780387986135.

    +

    ———. 2003. Linear Programming 2: Theory and Extensions. New York: Springer. https://www.springer.com/gp/book/9780387986135.

    Kochenderfer, Mykel. 2019. Algorithms for Optimization. Cambridge, Massachusetts: The MIT Press.

    @@ -150,10 +154,10 @@

    Maros, István. 2003. Computational Techniques of the Simplex Method. Boston: Kluwer Academic Publishers.

    -

    Nocedal, Jorge. 2006. Numerical Optimization. New York: Springer. https://www.springer.com/gp/book/9780387303031.

    +

    Nocedal, Jorge. 2006. Numerical Optimization. New York: Springer. https://www.springer.com/gp/book/9780387303031.

    -

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    +

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    Vanderbei, Robert. 2014. Linear Programming : Foundations and Extensions. New York: Springer.

    @@ -162,7 +166,7 @@

    Wentzel, Elena S. 1988. Operations Research: A Methodological Approach. Moscow: Mir publishers.

    -

    Williams, H. Paul. 2013. Model Building in Mathematical Programming. Chichester, West Sussex: Wiley. https://www.wiley.com/en-fr/Model+Building+in+Mathematical+Programming,+5th+Edition-p-9781118443330.

    +

    Williams, H. Paul. 2013. Model Building in Mathematical Programming. Chichester, West Sussex: Wiley. https://www.wiley.com/en-fr/Model+Building+in+Mathematical+Programming,+5th+Edition-p-9781118443330.

    diff --git a/_site/posts/peano.html b/_site/posts/peano.html index 7aa3903..dc6d53a 100644 --- a/_site/posts/peano.html +++ b/_site/posts/peano.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    @@ -52,19 +55,20 @@
    -

    Table of Contents

      +

      Introduction

      -

      I have recently bought the book Category Theory from Steve Awodey (Awodey 2010) is awesome, but probably the topic for another post), and a particular passage excited my curiosity:

      +

      I have recently bought the book Category Theory from Steve Awodey (Awodey 2010) is awesome, but probably the topic for another post), and a particular passage excited my curiosity:

      Let us begin by distinguishing between the following things: i. categorical foundations for mathematics, ii. mathematical foundations for category theory.

      As for the first point, one sometimes hears it said that category theory can be used to provide “foundations for mathematics,” as an alternative to set theory. That is in fact the case, but it is not what we are doing here. In set theory, one often begins with existential axioms such as “there is an infinite set” and derives further sets by axioms like “every set has a powerset,” thus building up a universe of mathematical objects (namely sets), which in principle suffice for “all of mathematics.”

      @@ -72,7 +76,7 @@

      This statement is interesting because one often considers category theory as pretty “fundamental”, in the sense that it has no issue with considering what I call “dangerous” notions, such as the category \(\mathbf{Set}\) of all sets, and even the category \(\mathbf{Cat}\) of all categories. Surely a theory this general, that can afford to study such objects, should provide suitable foundations for mathematics? Awodey addresses these issues very explicitly in the section following the quote above, and finds a good way of avoiding circular definitions.

      Now, I remember some basics from my undergrad studies about foundations of mathematics. I was told that if you could define arithmetic, you basically had everything else “for free” (as Kronecker famously said, “natural numbers were created by God, everything else is the work of men”). I was also told that two sets of axioms existed, the Peano axioms and the Zermelo-Fraenkel axioms. Also, I should steer clear of the axiom of choice if I could, because one can do strange things with it, and it is equivalent to many different statements. Finally (and this I knew mainly from Logicomix, I must admit), it is impossible for a set of axioms to be both complete and consistent.

      Given all this, I realised that my knowledge of foundational mathematics was pretty deficient. I do not believe that it is a very important topic that everyone should know about, even though Gödel’s incompleteness theorem is very interesting from a logical and philosophical standpoint. However, I wanted to go deeper on this subject.

      -

      In this post, I will try to share my path through Peano’s axioms (Gowers, Barrow-Green, and Leader 2010), because they are very simple, and it is easy to uncover basic algebraic structure from them.

      +

      In this post, I will try to share my path through Peano’s axioms (Gowers, Barrow-Green, and Leader 2010), because they are very simple, and it is easy to uncover basic algebraic structure from them.

      The Axioms

      The purpose of the axioms is to define a collection of objects that we will call the natural numbers. Here, we place ourselves in the context of first-order logic. Logic is not the main topic here, so I will just assume that I have access to some quantifiers, to some predicates, to some variables, and, most importantly, to a relation \(=\) which is reflexive, symmetric, transitive, and closed over the natural numbers.

      Without further digressions, let us define two symbols \(0\) and \(s\) (called successor) such that:

      @@ -113,14 +117,27 @@ then \(\varphi(n)\) is true for every natural n

      First, we prove that every natural number commutes with \(0\).

        -
      • \(0+0 = 0+0\).
      • -
      • For every natural number \(a\) such that \(0+a = a+0\), we have:

      • +
      • \(0+0 = 0+0\).

      • +
      • For every natural number \(a\) such that \(0+a = a+0\), we have:

        +\[\begin{align} + 0 + s(a) &= s(0+a)\\ + &= s(a+0)\\ + &= s(a)\\ + &= s(a) + 0. +\end{align} +\]

      By Axiom 5, every natural number commutes with \(0\).

      We can now prove the main proposition:

        -
      • \(\forall a,\quad a+0=0+a\).
      • -
      • For all \(a\) and \(b\) such that \(a+b=b+a\),

      • +
      • \(\forall a,\quad a+0=0+a\).

      • +
      • For all \(a\) and \(b\) such that \(a+b=b+a\),

        +\[\begin{align} + a + s(b) &= s(a+b)\\ + &= s(b+a)\\ + &= s(b) + a. +\end{align} +\]

      We used the opposite of the second rule for \(+\), namely \(\forall a, \forall b,\quad s(a) + b = s(a+b)\). This can easily be proved by another induction.

      @@ -143,12 +160,12 @@ then \(\varphi(n)\) is true for every natural n

      Going further

      We have imbued our newly created set of natural numbers with a significant algebraic structure. From there, similar arguments will create more structure, notably by introducing another operation \(\times\), and an order \(\leq\).

      It is now a matter of conventional mathematics to construct the integers \(\mathbb{Z}\) and the rationals \(\mathbb{Q}\) (using equivalence classes), and eventually the real numbers \(\mathbb{R}\).

      -

      It is remarkable how very few (and very simple, as far as you would consider the induction axiom “simple”) axioms are enough to build an entire theory of mathematics. This sort of things makes me agree with Eugene Wigner (Wigner 1990) when he says that “mathematics is the science of skillful operations with concepts and rules invented just for this purpose”. We drew some arbitrary rules out of thin air, and derived countless properties and theorems from them, basically for our own enjoyment. (As Wigner would say, it is incredible that any of these fanciful inventions coming out of nowhere turned out to be even remotely useful.) Mathematics is done mainly for the mathematician’s own pleasure!

      +

      It is remarkable how very few (and very simple, as far as you would consider the induction axiom “simple”) axioms are enough to build an entire theory of mathematics. This sort of things makes me agree with Eugene Wigner (Wigner 1990) when he says that “mathematics is the science of skillful operations with concepts and rules invented just for this purpose”. We drew some arbitrary rules out of thin air, and derived countless properties and theorems from them, basically for our own enjoyment. (As Wigner would say, it is incredible that any of these fanciful inventions coming out of nowhere turned out to be even remotely useful.) Mathematics is done mainly for the mathematician’s own pleasure!

      -

      Mathematics cannot be defined without acknowledging its most obvious feature: namely, that it is interesting — M. Polanyi (Wigner 1990)

      +

      Mathematics cannot be defined without acknowledging its most obvious feature: namely, that it is interesting — M. Polanyi (Wigner 1990)

      -

      References

      -
      +

      References

      +

      Awodey, Steve. 2010. Category Theory. 2nd ed. Oxford Logic Guides 52. Oxford ; New York: Oxford University Press.

      @@ -156,7 +173,7 @@ then \(\varphi(n)\) is true for every natural n

      Gowers, Timothy, June Barrow-Green, and Imre Leader. 2010. The Princeton Companion to Mathematics. Princeton University Press.

      -

      Wigner, Eugene P. 1990. “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” In Mathematics and Science, by Ronald E Mickens, 291–306. WORLD SCIENTIFIC. https://doi.org/10.1142/9789814503488_0018.

      +

      Wigner, Eugene P. 1990. “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” In Mathematics and Science, by Ronald E Mickens, 291–306. WORLD SCIENTIFIC. https://doi.org/10.1142/9789814503488_0018.

    diff --git a/_site/posts/reinforcement-learning-1.html b/_site/posts/reinforcement-learning-1.html index 2c4f7aa..95f63ca 100644 --- a/_site/posts/reinforcement-learning-1.html +++ b/_site/posts/reinforcement-learning-1.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    @@ -68,20 +71,36 @@

    A Markov Decision Process is a tuple \((\mathcal{S}, \mathcal{A}, \mathcal{R}, p)\) where:

      -
    • \(\mathcal{S}\) is a set of states,
    • -
    • \(\mathcal{A}\) is an application mapping each state \(s \in - \mathcal{S}\) to a set \(\mathcal{A}(s)\) of possible actions for this state. In this post, we will often simplify by using \(\mathcal{A}\) as a set, assuming that all actions are possible for each state,
    • -
    • \(\mathcal{R} \subset \mathbb{R}\) is a set of rewards,
    • +
    • \(\mathcal{S}\) is a set of states,

    • +
    • \(\mathcal{A}\) is an application mapping each state \(s \in + \mathcal{S}\) to a set \(\mathcal{A}(s)\) of possible actions for this state. In this post, we will often simplify by using \(\mathcal{A}\) as a set, assuming that all actions are possible for each state,

    • +
    • \(\mathcal{R} \subset \mathbb{R}\) is a set of rewards,

    • and \(p\) is a function representing the dynamics of the MDP:

      +\[\begin{align} + p &: \mathcal{S} \times \mathcal{R} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ + p(s', r \;|\; s, a) &:= \mathbb{P}(S_t=s', R_t=r \;|\; S_{t-1}=s, A_{t-1}=a), +\end{align} +\]

      such that \[ \forall s \in \mathcal{S}, \forall a \in \mathcal{A},\quad \sum_{s', r} p(s', r \;|\; s, a) = 1. \]

    The function \(p\) represents the probability of transitioning to the state \(s'\) and getting a reward \(r\) when the agent is at state \(s\) and chooses action \(a\).

    We will also use occasionally the state-transition probabilities:

    - +\[\begin{align} + p &: \mathcal{S} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ +p(s' \;|\; s, a) &:= \mathbb{P}(S_t=s' \;|\; S_{t-1}=s, A_{t-1}=a) \\ +&= \sum_r p(s', r \;|\; s, a). +\end{align} +\]

    Rewarding the agent

    The expected reward of a state-action pair is the function

    +\[\begin{align} +r &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ +r(s,a) &:= \mathbb{E}[R_t \;|\; S_{t-1}=s, A_{t-1}=a] \\ +&= \sum_r r \sum_{s'} p(s', r \;|\; s, a). +\end{align} +\]

    The discounted return is the sum of all future rewards, with a multiplicative factor to give more weights to more immediate rewards: \[ G_t := \sum_{k=t+1}^T \gamma^{k-t-1} R_k, \] where \(T\) can be infinite or \(\gamma\) can be 1, but not both.

    @@ -91,14 +110,33 @@

    A policy is a way for the agent to choose the next action to perform.

    A policy is a function \(\pi\) defined as

    +\[\begin{align} +\pi &: \mathcal{A} \times \mathcal{S} \mapsto [0,1] \\ +\pi(a \;|\; s) &:= \mathbb{P}(A_t=a \;|\; S_t=s). +\end{align} +\]

    In order to compare policies, we need to associate values to them.

    The state-value function of a policy \(\pi\) is

    +\[\begin{align} +v_{\pi} &: \mathcal{S} \mapsto \mathbb{R} \\ +v_{\pi}(s) &:= \text{expected return when starting in $s$ and following $\pi$} \\ +v_{\pi}(s) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s\right] \\ +v_{\pi}(s) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s\right] +\end{align} +\]

    We can also compute the value starting from a state \(s\) by also taking into account the action taken \(a\).

    The action-value function of a policy \(\pi\) is

    +\[\begin{align} +q_{\pi} &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ +q_{\pi}(s,a) &:= \text{expected return when starting from $s$, taking action $a$, and following $\pi$} \\ +q_{\pi}(s,a) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s, A_t=a \right] \\ +q_{\pi}(s,a) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s, A_t=a\right] +\end{align} +\]

    The quest for the optimal policy

    References

    diff --git a/_site/posts/self-learning-chatbots-destygo.html b/_site/posts/self-learning-chatbots-destygo.html index 95777c8..a20048f 100644 --- a/_site/posts/self-learning-chatbots-destygo.html +++ b/_site/posts/self-learning-chatbots-destygo.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    diff --git a/_site/projects.html b/_site/projects.html index c285688..51c486f 100644 --- a/_site/projects.html +++ b/_site/projects.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    diff --git a/_site/projects/adsb.html b/_site/projects/adsb.html index ed18d4f..0543a36 100644 --- a/_site/projects/adsb.html +++ b/_site/projects/adsb.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    diff --git a/_site/projects/civilisation.html b/_site/projects/civilisation.html index 4b9d561..50ff241 100644 --- a/_site/projects/civilisation.html +++ b/_site/projects/civilisation.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    diff --git a/_site/projects/community-detection.html b/_site/projects/community-detection.html index 37ef3bd..6e301a5 100644 --- a/_site/projects/community-detection.html +++ b/_site/projects/community-detection.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    diff --git a/_site/projects/ising-model.html b/_site/projects/ising-model.html index c987833..2f87d7c 100644 --- a/_site/projects/ising-model.html +++ b/_site/projects/ising-model.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    diff --git a/_site/projects/lsystems.html b/_site/projects/lsystems.html index ce39fc1..145a474 100644 --- a/_site/projects/lsystems.html +++ b/_site/projects/lsystems.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    diff --git a/_site/projects/orbit.html b/_site/projects/orbit.html index 2b55aad..6cf5bbf 100644 --- a/_site/projects/orbit.html +++ b/_site/projects/orbit.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    diff --git a/_site/projects/satrap.html b/_site/projects/satrap.html index 93ecc2d..45b9ba1 100644 --- a/_site/projects/satrap.html +++ b/_site/projects/satrap.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    diff --git a/_site/projects/tda.html b/_site/projects/tda.html index 75fe5fc..46aed4b 100644 --- a/_site/projects/tda.html +++ b/_site/projects/tda.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -
    diff --git a/_site/projects/ww2-bombings.html b/_site/projects/ww2-bombings.html index df38d20..745b947 100644 --- a/_site/projects/ww2-bombings.html +++ b/_site/projects/ww2-bombings.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -

    Introduction

    After Phase I, here are my solutions to Phase II problems. The full code is included in the post, but everything is also available on GitHub.

    A PDF of the problems descriptions is available on the competition website, or directly from my GitHub repo.

    The submission guidelines gave a template where everything is defined in a Contest2020.Problems Namespace. I kept the default values for ⎕IO and ⎕ML because the problems were not particularly easier with ⎕IO←0.

    - +
    :Namespace Contest2020
    +
    + :Namespace Problems
    +  (⎕IO ⎕ML ⎕WX)←1 1 3

    Problem 1 – Take a Dive

    - +
    ∇ score←dd DiveScore scores
    +  :If 7=≢scores
    +   scores←scores[¯2↓2↓⍋scores]
    +  :ElseIf 5=≢scores
    +   scores←scores[¯1↓1↓⍋scores]
    +  :Else
    +   scores←scores
    +  :EndIf
    +  score←2(⍎⍕)dd×+/scores
    +

    This is a very straightforward implementation of the algorithm describe in the problem description. I decided to switch explicitly on the size of the input vector because I feel it is more natural. For the cases with 5 or 7 judges, we use Drop () to remove the lowest and highest scores.

    At the end, we sum up the scores with +/ and multiply them by dd. The last operation, 2(⍎⍕), is a train using Format (Dyadic) to round to 2 decimal places, and Execute to get actual numbers and not strings.

    Problem 2 – Another Step in the Proper Direction

    - +
    ∇ steps←{p}Steps fromTo;segments;width
    +  width←|-/fromTo
    +  :If 0=⎕NC'p' ⍝ No left argument: same as Problem 5 of Phase I
    +   segments←0,⍳width
    +  :ElseIf p<0 ⍝ -⌊p is the number of equally-sized steps to take
    +   segments←(-⌊p){0,⍵×⍺÷⍨⍳⍺}width
    +  :ElseIf p>0 ⍝ p is the step size
    +   segments←p{⍵⌊⍺×0,⍳⌈⍵÷⍺}width
    +  :ElseIf p=0 ⍝ As if we took zero step
    +   segments←0
    +  :EndIf
    +  ⍝ Take into account the start point and the direction.
    +  steps←fromTo{(⊃⍺)+(-×-/⍺)×⍵}segments
    +

    This is an extension to Problem 5 of Phase I. In each case, we compute the “segments”, i.e., the steps starting from 0. In a last step, common to all cases, we add the correct starting point and correct the direction if need be.

    To compute equally-sized steps, we first divide the segment \([0, 1]\) in p equal segments with (⍳p)÷p. This subdivision can then be multiplied by the width to obtain the required segments.

    When p is the step size, we just divide the width by the step size (rounded to the next largest integer) to get the required number of segments. If the last segment is too large, we “crop” it to the width with Minimum ().

    Problem 3 – Past Tasks Blast

    - +
    ∇ urls←PastTasks url;r;paths
    +  r←HttpCommand.Get url
    +  paths←('[a-zA-Z0-9_/]+\.pdf'⎕S'&')r.Data
    +  urls←('https://www.dyalog.com/'∘,)¨paths
    +

    I decided to use HttpCommand for this task, since it is simply one ]load HttpCommand away and should be platform-independent.

    Parsing XML is not something I consider “fun” in the best of cases, and I feel like APL is not the best language to do this kind of thing. Given how simple the task is, I just decided to find the relevant bits with a regular expression using Replace and Search (⎕S).

    After finding all the strings vaguely resembling a PDF file name (only alphanumeric characters and underscores, with a .pdf extension), I just concatenate them to the base URL of the Dyalog domain.

    Problem 4 – Bioinformatics

    The first task can be solved by decomposing it into several functions.

    - +
    ⍝ Test if a DNA string is a reverse palindrome.
    +isrevp←{⍵≡⌽'TAGC'['ATCG'⍳⍵]}

    First, we compute the complement of a DNA string (using simple indexing) and test if its Reverse () is equal to the original string.

    - +
    ⍝ Generate all subarrays (position, length) pairs, for 4 ≤ length ≤ 12.
    +subarrays←{⊃,/(⍳⍵),¨¨3↓¨⍳¨12⌊1+⍵-⍳⍵}

    We first compute all the possible lengths for each starting point. For instance, the last element cannot have any (position, length) pair associated to it, because there is no three element following it. So we crop the possible lengths to \([3, 12]\). For instance for an array of size 10:

    - +
            {3↓¨⍳¨12⌊1+⍵-⍳⍵}10
    +┌──────────────┬───────────┬─────────┬───────┬─────┬───┬─┬┬┬┐
    +│4 5 6 7 8 9 10│4 5 6 7 8 9│4 5 6 7 8│4 5 6 7│4 5 6│4 5│4││││
    +└──────────────┴───────────┴─────────┴───────┴─────┴───┴─┴┴┴┘

    Then, we just add the corresponding starting position to each length (1 for the first block, 2 for the second, and so on). Finally, we flatten everything.

    - +
    ∇ r←revp dna;positions
    +  positions←subarrays⍴dna
    +  ⍝ Filter subarrays which are reverse palindromes.
    +  r←↑({isrevp dna[¯1+⍵[1]+⍳⍵[2]]}¨positions)/positions
    +

    For each possible (position, length) pair, we get the corresponding DNA substring with dna[¯1+⍵[1]+⍳⍵[2]] (adding ¯1 is necessary because ⎕IO←1). We test if this substring is a reverse palindrome using isrevp above. Replicate (/) then selects only the (position, length) pairs for which the substring is a reverse palindrome.

    The second task is just about counting the number of subsets modulo 1,000,000. So we just need to compute \(2^n \mod 1000000\) for any positive integer \(n\leq1000\).

    - +
    sset←{((1E6|2∘×)⍣⍵)1}

    Since we cannot just compute \(2^n\) directly and take the remainder, we use modular arithmetic to stay mod 1,000,000 during the whole computation. The dfn (1E6|2∘×) doubles its argument mod 1,000,000. So we just apply this function \(n\) times using the Power operator (), with an initial value of 1.

    Problem 5 – Future and Present Value

    First solution: ((1+⊢)⊥⊣) computes the total return for a vector of amounts and a vector of rates . It is applied to every prefix subarray of amounts and rates to get all intermediate values. However, this has quadratic complexity.

    - -

    Second solution: We want to be able to use the recurrence relation (recur) and scan through the vectors of amounts and rates, accumulating the total value at every time step. However, APL evaluation is right-associative, so a simple Scan (recur\amounts,¨values) would not give the correct result, since recur is not associative and we need to evaluate it left-to-right. (In any case, in this case, Scan would have quadratic complexity, so would not bring any benefit over the previous solution.) What we need is something akin to Haskell’s scanl function, which would evaluate left to right in \(O(n)\) timeThere is an interesting StackOverflow answer explaining the behaviour of Scan, and compares it to Haskell’s scanl function.
    +

    rr←(,\⊣)((1+⊢)⊥⊣)¨(,\⊢)
    +

    Second solution: We want to be able to use the recurrence relation (recur) and scan through the vectors of amounts and rates, accumulating the total value at every time step. However, APL evaluation is right-associative, so a simple Scan (recur\amounts,¨values) would not give the correct result, since recur is not associative and we need to evaluate it left-to-right. (In any case, in this case, Scan would have quadratic complexity, so would not bring any benefit over the previous solution.) What we need is something akin to Haskell’s scanl function, which would evaluate left to right in \(O(n)\) timeThere is an interesting StackOverflow answer explaining the behaviour of Scan, and compares it to Haskell’s scanl function.

    . This is what we do here, accumulating values from left to right. (This is inspired from dfns.ascan, although heavily simplified.)

    - +
    rr←{recur←{⍵[1]+⍺×1+⍵[2]} ⋄ 1↓⌽⊃{(⊂(⊃⍵)recur⍺),⍵}/⌽⍺,¨⍵}

    For the second task, there is an explicit formula for cashflow calculations, so we can just apply it.

    - +
    pv←{+/⍺÷×\1+⍵}

    Problem 6 – Merge

    - +
    ∇ text←templateFile Merge jsonFile;template;ns
    +  template←⊃⎕NGET templateFile 1
    +  ns←⎕JSON⊃⎕NGET jsonFile
    +  ⍝ We use a simple regex search and replace on the
    +  ⍝ template.
    +  text←↑('@[a-zA-Z]*@'⎕R{ns getval ¯1↓1↓⍵.Match})template
    +

    We first read the template and the JSON values from their files. The ⎕NGET function read simple text files, and ⎕JSON extracts the key-value pairs as a namespace.

    Assuming all variable names contain only letters, we match the regex @[a-zA-Z]*@ to match variable names enclosed between @ symbols. The function getval then returns the appropriate value, and we can replace the variable name in the template.

    - +
    ∇ val←ns getval var
    +  :If ''≡var ⍝ literal '@'
    +   val←'@'
    +  :ElseIf (⊂var)∊ns.⎕NL ¯2
    +   val←⍕ns⍎var
    +  :Else
    +   val←'???'
    +  :EndIf
    +

    This function takes the namespace matching the variable names to their respective values, and the name of the variable.

    • If the variable name is empty, we matched the string @@, which corresponds to a literal @.
    • @@ -133,42 +133,42 @@
    • Otherwise, we have an unknown variable, so we replace it with ???.

    Problem 7 – UPC

    - +
    CheckDigit←{10|-⍵+.×11⍴3 1}

    The check digit satisfies the equation \[ 3 x_{1}+x_{2}+3 x_{3}+x_{4}+3 x_{5}+x_{6}+3 x_{7}+x_{8}+3 x_{9}+x_{10}+3 x_{11}+x_{12} \equiv 0 \bmod 10, \] therefore, \[ x_{12} \equiv -(3 x_{1}+x_{2}+3 x_{3}+x_{4}+3 x_{5}+x_{6}+3 x_{7}+x_{8}+3 x_{9}+x_{10}+3 x_{11}) \bmod 10. \]

    Translated to APL, we just take the dot product between the first 11 digits of the barcode with 11⍴3 1, negate it, and take the remainder by 10.

    - +
    ⍝ Left and right representations of digits. Decoding
    +⍝ the binary representation from decimal is more
    +⍝ compact than writing everything explicitly.
    +lrepr←⍉(7⍴2)⊤13 25 19 61 35 49 47 59 55 11
    +rrepr←~¨lrepr

    For the second task, the first thing we need to do is save the representation of digits. To save space, I did not encode the binary representation explicitly, instead using a decimal representation that I then decode in base 2. The right representation is just the bitwise negation.

    - +
    ∇ bits←WriteUPC digits;left;right
    +  :If (11=≢digits)∧∧/digits∊0,⍳9
    +   left←,lrepr[1+6↑digits;]
    +   right←,rrepr[1+6↓digits,CheckDigit digits;]
    +   bits←1 0 1,left,0 1 0 1 0,right,1 0 1
    +  :Else
    +   bits←¯1
    +  :EndIf
    +

    First of all, if the vector digits does not have exactly 11 elements, all between 0 and 9, it is an error and we return ¯1.

    Then, we take the first 6 digits and encode them with lrepr, and the last 5 digits plus the check digit encoded with rrepr. In each case, adding 1 is necessary because ⎕IO←1. We return the final bit array with the required beginning, middle, and end guard patterns.

    - +
    ∇ digits←ReadUPC bits
    +  :If 95≠⍴bits ⍝ incorrect number of bits
    +   digits←¯1
    +  :Else
    +   ⍝ Test if the barcode was scanned right-to-left.
    +   :If 0=2|+/bits[3+⍳7]
    +    bits←⌽bits
    +   :EndIf
    +   digits←({¯1+lrepr⍳⍵}¨(7/⍳6)⊆42↑3↓bits),{¯1+rrepr⍳⍵}¨(7/⍳6)⊆¯42↑¯3↓bits
    +   :If ~∧/digits∊0,⍳9 ⍝ incorrect parity
    +    digits←¯1
    +   :ElseIf (⊃⌽digits)≠CheckDigit ¯1↓digits ⍝ incorrect check digit
    +    digits←¯1
    +   :EndIf
    +  :EndIf
    +
    • If we don’t have the correct number of bits, we return ¯1.
    • We test the first digit for its parity, to determine if its actually a left representation. If it’s not, we reverse the bit array.
    • @@ -176,53 +176,53 @@
    • Final checks for the range of the digits (i.e., if the representations could not be found in the lrepr and rrepr vectors), and for the check digit.

    Problem 8 – Balancing the Scales

    - +
    ∇ parts←Balance nums;subsets;partitions
    +  ⍝ This is a brute force solution, running in
    +  ⍝ exponential time. We generate all the possible
    +  ⍝ partitions, filter out those which are not
    +  ⍝ balanced, and return the first matching one. There
    +  ⍝ are more advanced approach running in
    +  ⍝ pseudo-polynomial time (based on dynamic
    +  ⍝ programming, see the "Partition problem" Wikipedia
    +  ⍝ page), but they are not warranted here, as the
    +  ⍝ input size remains fairly small.
    +
    +  ⍝ Generate all partitions of a vector of a given
    +  ⍝ size, as binary mask vectors.
    +  subsets←{1↓2⊥⍣¯1⍳2*⍵}
    +  ⍝ Keep only the subsets whose sum is exactly
    +  ⍝ (+/nums)÷2.
    +  partitions←nums{((2÷⍨+/⍺)=⍺+.×⍵)/⍵}subsets⍴nums
    +  :If 0=≢,partitions
    +   ⍝ If no partition satisfy the above
    +   ⍝ criterion, we return ⍬.
    +   parts←⍬
    +  :Else
    +   ⍝ Otherwise, we return the first possible
    +   ⍝ partition.
    +   parts←nums{((⊂,(⊂~))⊃↓⍉⍵)/¨2⍴⊂⍺}partitions
    +  :EndIf
    +

    Problem 9 – Upwardly Mobile

    This is the only problem that I didn’t complete. It required parsing the files containing the graphical representations of the trees, which was needlessly complex and, quite frankly, hard and boring with a language like APL.

    However, the next part is interesting: once we have a matrix of coefficients representing the relationships between the weights, we can solve the system of equations. Matrix Divide () will find one solution to the system. Since the system is overdetermined, we fix A=1 to find one possible solution. Since we want integer weights, the solution we find is smaller than the one we want, and may contain fractional weights. So we multiply everything by the Lowest Common Multiple () to get the smallest integer weights.

    - - +
    ∇ weights←Weights filename;mobile;branches;mat
    +  ⍝ Put your code and comments below here
    +
    +  ⍝ Parse the mobile input file.
    +  mobile←↑⊃⎕NGET filename 1
    +  branches←⍸mobile∊'┌┴┐'
    +  ⍝ TODO: Build the matrix of coefficients mat.
    +
    +  ⍝ Solve the system of equations (arbitrarily setting
    +  ⍝ the first variable at 1 because the system is
    +  ⍝ overdetermined), then multiply the coefficients by
    +  ⍝ their least common multiple to get the smallest
    +  ⍝ integer weights.
    +  weights←((1∘,)×(∧/÷))mat[;1]⌹1↓[2]mat
    +
    +
     :EndNamespace
    +:EndNamespace
    ]]> @@ -238,7 +238,7 @@
    -

    Table of Contents

    Introduction

    -

    I’ve always been quite fond of APL and its “array-oriented” approach of programmingSee my previous post on simulating the Ising model with APL. It also contains more background on APL.
    +

    I’ve always been quite fond of APL and its “array-oriented” approach of programmingSee my previous post on simulating the Ising model with APL. It also contains more background on APL.

    . Every year, Dyalog (the company behind probably the most popular APL implementation) organises a competition with various challenges in APL.

    The Dyalog APL Problem Solving Competition consists of two phases:

    @@ -260,7 +260,7 @@
  • Phase I consists of 10 short puzzles (similar to what one can find on Project Euler or similar), that can be solved by a one-line APL function.
  • Phase II is a collection of larger problems, that may require longer solutions and a larger context (e.g. reading and writing to files), often in a more applied setting. Problems are often inspired by existing domains, such as AI, bioinformatics, and so on.
  • -

    In 2018, I participated in the competition, entering only Phase ISince I was a student at the time, I was eligible for a prize, and I won $100 for a 10-line submission, which is quite good!
    +

    In 2018, I participated in the competition, entering only Phase ISince I was a student at the time, I was eligible for a prize, and I won $100 for a 10-line submission, which is quite good!

    (my solutions are on GitHub). This year, I entered in both phases. I explain my solutions to Phase I in this post. Another post will contain annotated solutions for Phase II problems.

    The full code for my submission is on GitHub at dlozeve/apl-competition-2020, but everything is reproduced in this post.

    @@ -271,7 +271,7 @@

    If X<0, the second vector contains the last |X elements of Y and the first vector contains the remaining elements.

    Solution: (0>⊣)⌽((⊂↑),(⊂↓))

    -

    There are three nested trains hereTrains are nice to read (even if they are easy to abuse), and generally make for shorter dfns, which is better for Phase I.
    +

    There are three nested trains hereTrains are nice to read (even if they are easy to abuse), and generally make for shorter dfns, which is better for Phase I.

    . The first one, ((⊂↑),(⊂↓)), uses the two functions Take () and Drop () to build a nested array consisting of the two outputs we need. (Take and Drop already have the behaviour needed regarding negative arguments.) However, if the left argument is positive, the two arrays will not be in the correct order. So we need a way to reverse them if X<0.

    The second train (0>⊣) will return 1 if its left argument is positive. From this, we can use Rotate () to correctly order the nested array, in the last train.

    @@ -279,15 +279,15 @@

    UTF-8 encodes Unicode characters using 1-4 integers for each character. Dyalog APL includes a system function, ⎕UCS, that can convert characters into integers and integers into characters. The expression 'UTF-8'∘⎕UCS converts between characters and UTF-8.

    Consider the following:

    - +
          'UTF-8'∘⎕UCS 'D¥⍺⌊○9'
    +68 194 165 226 141 186 226 140 138 226 151 139 57
    +      'UTF-8'∘⎕UCS 68 194 165 226 141 186 226 140 138 226 151 139 57
    +D¥⍺⌊○9

    How many integers does each character use?

    - +
          'UTF-8'∘⎕UCS¨ 'D¥⍺⌊○9' ⍝ using ]Boxing on
    +┌──┬───────┬───────────┬───────────┬───────────┬──┐
    +│68│194 165│226 141 186│226 140 138│226 151 139│57│
    +└──┴───────┴───────────┴───────────┴───────────┴──┘      

    The rule is that an integer in the range 128 to 191 (inclusive) continues the character of the previous integer (which may itself be a continuation). With that in mind, write a function that, given a right argument which is a simple integer vector representing valid UTF-8 text, encloses each sequence of integers that represent a single character, like the result of 'UTF-8'∘⎕UCS¨'UTF-8'∘⎕UCS but does not use any system functions (names beginning with )

    Solution: {(~⍵∊127+⍳64)⊂⍵}

    @@ -316,7 +316,7 @@

    Write a function that, given a right argument of 2 integers, returns a vector of the integers from the first element of the right argument to the second, inclusively.

    Solution: {(⊃⍵)+(-×-/⍵)×0,⍳|-/⍵}

    -

    First, we have to compute the range of the output, which is the absolute value of the difference between the two integers |-/⍵. From this, we compute the actual sequence, including zeroIf we had ⎕IO←0, we could have written ⍳|1+-/⍵, but this is the same number of characters.
    +

    First, we have to compute the range of the output, which is the absolute value of the difference between the two integers |-/⍵. From this, we compute the actual sequence, including zeroIf we had ⎕IO←0, we could have written ⍳|1+-/⍵, but this is the same number of characters.

    : 0,⍳|-/⍵.

    This sequence will always be nondecreasing, but we have to make it decreasing if needed, so we multiply it by the opposite of the sign of -/⍵. Finally, we just have to start the sequence at the first element of .

    @@ -352,44 +352,44 @@

    Solution: {∧/(⍳∘≢≡⍋)¨(⊂((⊢⍳⌈/)↑⊢),⍵),⊂⌽((⊢⍳⌈/)↓⊢),⍵}

    How do we approach this? First we have to split the vector at the “apex”. The train (⊢⍳⌈/) will return the index of () the maximum element.

    - +
          (⊢⍳⌈/)1 3 3 4 5 2 1
    +5

    Combined with Take () and Drop (), we build a two-element vector containing both parts, in ascending order (we Reverse () one of them). Note that we have to Ravel (,) the argument to avoid rank errors in Index Of.

    - +
          {(⊂((⊢⍳⌈/)↑⊢),⍵),⊂⌽((⊢⍳⌈/)↓⊢),⍵}1 3 3 4 5 2 1
    +┌─────────┬───┐
    +│1 3 3 4 5│1 2│
    +└─────────┴───┘

    Next, (⍳∘≢≡⍋) on each of the two vectors will test if they are non-decreasing (i.e. if the ranks of all the elements correspond to a simple range from 1 to the size of the vector).

    10. Stacking It Up

    Write a function that takes as its right argument a vector of simple arrays of rank 2 or less (scalar, vector, or matrix). Each simple array will consist of either non-negative integers or printable ASCII characters. The function must return a simple character array that displays identically to what {⎕←⍵}¨ displays when applied to the right argument.

    Solution: {↑⊃,/↓¨⍕¨⍵}

    -

    The first step is to Format () everything to get strings. A lot of trial-and-error is always necessary when dealing with nested arrays, and this being about formatting exacerbates the problem.
    +

    The first step is to Format () everything to get strings. A lot of trial-and-error is always necessary when dealing with nested arrays, and this being about formatting exacerbates the problem.

    The next step would be to “stack everything vertically”, so we will need Mix () at some point. However, if we do it immediately we don’t get the correct result:

    - -

    Mix is padding with spaces both horizontally (necessary as we want the output to be a simple array of characters) and vertically (not what we want). We will have to decompose everything line by line, and then mix all the lines together. This is exactly what SplitSplit is the dual of Mix.
    +

          {↑⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')
    +1 2 3  
    +4 5 6  
    +7 8 9  
    +
    +Adam   
    +Michael
    +

    Mix is padding with spaces both horizontally (necessary as we want the output to be a simple array of characters) and vertically (not what we want). We will have to decompose everything line by line, and then mix all the lines together. This is exactly what SplitSplit is the dual of Mix.

    () does:

    -
          {↓¨⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')(⍳10) '*'(5 5⍴⍳25)
    -┌───────────────────┬─────────────────┬──────────────────────┬─┬───────────────
    -│┌─────┬─────┬─────┐│┌───────┬───────┐│┌────────────────────┐│*│┌──────────────
    -││1 2 3│4 5 6│7 8 9│││Adam   │Michael│││1 2 3 4 5 6 7 8 9 10││ ││ 1  2  3  4  5
    -│└─────┴─────┴─────┘│└───────┴───────┘│└────────────────────┘│ │└──────────────
    -└───────────────────┴─────────────────┴──────────────────────┴─┴───────────────
    -
    -      ─────────────────────────────────────────────────────────────┐
    -      ┬──────────────┬──────────────┬──────────────┬──────────────┐│
    -      │ 6  7  8  9 10│11 12 13 14 15│16 17 18 19 20│21 22 23 24 25││
    -      ┴──────────────┴──────────────┴──────────────┴──────────────┘│
    -      ─────────────────────────────────────────────────────────────┘
    +
          {↓¨⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')(⍳10) '*'(5 5⍴⍳25)
    +┌───────────────────┬─────────────────┬──────────────────────┬─┬───────────────
    +│┌─────┬─────┬─────┐│┌───────┬───────┐│┌────────────────────┐│*│┌──────────────
    +││1 2 3│4 5 6│7 8 9│││Adam   │Michael│││1 2 3 4 5 6 7 8 9 10││ ││ 1  2  3  4  5
    +│└─────┴─────┴─────┘│└───────┴───────┘│└────────────────────┘│ │└──────────────
    +└───────────────────┴─────────────────┴──────────────────────┴─┴───────────────
    +
    +      ─────────────────────────────────────────────────────────────┐
    +      ┬──────────────┬──────────────┬──────────────┬──────────────┐│
    +      │ 6  7  8  9 10│11 12 13 14 15│16 17 18 19 20│21 22 23 24 25││
    +      ┴──────────────┴──────────────┴──────────────┴──────────────┘│
    +      ─────────────────────────────────────────────────────────────┘

    Next, we clean this up with Ravel (,) and we can Mix to obtain the final result.

    @@ -406,10 +406,11 @@
    -

    Table of Contents

      +

      Introduction

      Operations research (OR) is a vast area comprising a lot of theory, different branches of mathematics, and too many applications to count. In this post, I will try to explain why it can be a little disconcerting to explore at first, and how to start investigating the topic with a few references to get started.

      Keep in mind that although I studied it during my graduate studies, this is not my primary area of expertise (I’m a data scientist by trade), and I definitely don’t pretend to know everything in OR. This is a field too vast for any single person to understand in its entirety, and I talk mostly from an “amateur mathematician and computer scientist” standpoint.

      Why is it hard to approach?

      -

      Operations research can be difficult to approach, since there are many references and subfields. Compared to machine learning for instance, OR has a slightly longer history (going back to the 17th century, for example with Monge and the optimal transport problem) For a very nice introduction (in French) to optimal transport, see these blog posts by Gabriel Peyré, on the CNRS maths blog: Part 1 and Part 2. See also the resources on optimaltransport.github.io (in English).
      +

      Operations research can be difficult to approach, since there are many references and subfields. Compared to machine learning for instance, OR has a slightly longer history (going back to the 17th century, for example with Monge and the optimal transport problem) For a very nice introduction (in French) to optimal transport, see these blog posts by Gabriel Peyré, on the CNRS maths blog: Part 1 and Part 2. See also the resources on optimaltransport.github.io (in English).

      . This means that good textbooks and such have existed for a long time, but also that there will be plenty of material to choose from.

      Moreover, OR is very close to applications. Sometimes methods may vary a lot in their presentation depending on whether they’re applied to train tracks, sudoku, or travelling salesmen. In practice, the terminology and notations are not the same everywhere. This is disconcerting if you are used to “pure” mathematics, where notations evolved over a long time and is pretty much standardised for many areas. In contrast, if you’re used to the statistics literature with its strange notations, you will find that OR is actually very well formalized.

      There are many subfields of operations research, including all kinds of optimization (constrained and unconstrained), game theory, dynamic programming, stochastic processes, etc.

      Where to start

      Introduction and modelling

      -

      For an overall introduction, I recommend Wentzel (1988). It is an old book, published by Mir Publications, a Soviet publisher which published many excellent scientific textbooks Mir also published Physics for Everyone by Lev Landau and Alexander Kitaigorodsky, a three-volume introduction to physics that is really accessible. Together with Feynman’s famous lectures, I read them (in French) when I was a kid, and it was the best introduction I could possibly have to the subject.
      +

      For an overall introduction, I recommend Wentzel (1988). It is an old book, published by Mir Publications, a Soviet publisher which published many excellent scientific textbooks Mir also published Physics for Everyone by Lev Landau and Alexander Kitaigorodsky, a three-volume introduction to physics that is really accessible. Together with Feynman’s famous lectures, I read them (in French) when I was a kid, and it was the best introduction I could possibly have to the subject.

      -
      . It is out of print, but it is available on Archive.org
      +
      . It is out of print, but it is available on Archive.org



      . The book is quite old, but everything presented is still extremely relevant today. It requires absolutely no background, and covers everything: a general introduction to the field, linear programming, dynamic programming, Markov processes and queues, Monte Carlo methods, and game theory. Even if you already know some of these topics, the presentations is so clear that it is a pleasure to read! (In particular, it is one of the best presentations of dynamic programming that I have ever read. The explanation of the simplex algorithm is also excellent.)

      If you are interested in optimization, the first thing you have to learn is modelling, i.e. transforming your problem (described in natural language, often from a particular industrial application) into a mathematical programme. The mathematical programme is the structure on which you will be able to apply an algorithm to find an optimal solution. Even if (like me) you are initially more interested in the algorithmic side of things, learning to create models will shed a lot of light on the overall process, and will give you more insight in general on the reasoning behind algorithms.

      -

      The best book I have read on the subject is Williams (2013)
      +

      The best book I have read on the subject is Williams (2013)



      . It contains a lot of concrete, step-by-step examples on concrete applications, in a multitude of domains, and remains very easy to read and to follow. It covers nearly every type of problem, so it is very useful as a reference. When you encounter a concrete problem in real life afterwards, you will know how to construct an appropriate model, and in the process you will often identify a common type of problem. The book then gives plenty of advice on how to approach each type of problem. Finally, it is also a great resource to build a “mental map” of the field, avoiding getting lost in the jungle of linear, stochastic, mixed integer, quadratic, and other network problems.

      -

      Another interesting resource is the freely available MOSEK Modeling Cookbook, covering many types of problems, with more mathematical details than in Williams (2013). It is built for people wanting to use the commercial MOSEK solver, so it could be useful if you plan to use a solver package like this one (more details on solvers below).

      +

      Another interesting resource is the freely available MOSEK Modeling Cookbook, covering many types of problems, with more mathematical details than in Williams (2013). It is built for people wanting to use the commercial MOSEK solver, so it could be useful if you plan to use a solver package like this one (more details on solvers below).

      Theory and algorithms

      -

      The basic algorithm for optimization is the simplex algorithm, developed by Dantzig in the 1940s to solve linear programming problems. It is the one of the main building blocks for mathematical optimization, and is used and referenced extensively in all kinds of approaches. As such, it is really important to understand it in detail. There are many books on the subject, but I especially liked Chvátal (1983) (out of print, but you can find cheap used versions on Amazon). It covers everything there is to know on the simplex algorithms (step-by-step explanations with simple examples, correctness and complexity analysis, computational and implementation considerations) and to many applications. I think it is overall the best introduction. Vanderbei (2014) follows a very similar outline, but contains more recent computational considerationsFor all the details about practical implementations of the simplex algorithm, Maros (2003) is dedicated to the computational aspects and contains everything you will need.
      +

      The basic algorithm for optimization is the simplex algorithm, developed by Dantzig in the 1940s to solve linear programming problems. It is the one of the main building blocks for mathematical optimization, and is used and referenced extensively in all kinds of approaches. As such, it is really important to understand it in detail. There are many books on the subject, but I especially liked Chvátal (1983) (out of print, but you can find cheap used versions on Amazon). It covers everything there is to know on the simplex algorithms (step-by-step explanations with simple examples, correctness and complexity analysis, computational and implementation considerations) and to many applications. I think it is overall the best introduction. Vanderbei (2014) follows a very similar outline, but contains more recent computational considerationsFor all the details about practical implementations of the simplex algorithm, Maros (2003) is dedicated to the computational aspects and contains everything you will need.

      . (The author also has lecture slides.)

      -

      For more books on linear programming, the two books Dantzig (1997), Dantzig (2003) are very complete, if somewhat more mathematically advanced. Bertsimas and Tsitsiklis (1997) is also a great reference, if you can find it.

      -

      For all the other subfields, this great StackExchange answer contains a lot of useful references, including most of the above. Of particular note are Peyré and Cuturi (2019) for optimal transport, Boyd (2004) for convex optimization (freely available online), and Nocedal (2006) for numerical optimization. Kochenderfer (2019)
      +

      For more books on linear programming, the two books Dantzig (1997), Dantzig (2003) are very complete, if somewhat more mathematically advanced. Bertsimas and Tsitsiklis (1997) is also a great reference, if you can find it.

      +

      For all the other subfields, this great StackExchange answer contains a lot of useful references, including most of the above. Of particular note are Peyré and Cuturi (2019) for optimal transport, Boyd (2004) for convex optimization (freely available online), and Nocedal (2006) for numerical optimization. Kochenderfer (2019)



      is not in the list (because it is very recent) but is also excellent, with examples in Julia covering nearly every kind of optimization algorithms.

      Online courses

      -

      If you would like to watch video lectures, there are a few good opportunities freely available online, in particular on MIT OpenCourseWare. The list of courses at MIT is available on their webpage. I haven’t actually looked in details at the courses contentI am more comfortable reading books than watching lecture videos online. Although I liked attending classes during my studies, I do not have the same feeling in front of a video. When I read, I can re-read three times the same sentence, pause to look up something, or skim a few paragraphs. I find that the inability to do that with a video diminishes greatly my ability to concentrate.
      +

      If you would like to watch video lectures, there are a few good opportunities freely available online, in particular on MIT OpenCourseWare. The list of courses at MIT is available on their webpage. I haven’t actually looked in details at the courses contentI am more comfortable reading books than watching lecture videos online. Although I liked attending classes during my studies, I do not have the same feeling in front of a video. When I read, I can re-read three times the same sentence, pause to look up something, or skim a few paragraphs. I find that the inability to do that with a video diminishes greatly my ability to concentrate.

      , so I cannot vouch for them directly, but MIT courses are generally of excellent quality. Most courses are also taught by Bertsimas and Bertsekas, who are very famous and wrote many excellent books.

      Of particular notes are:

      @@ -465,12 +466,12 @@
    • Algebraic Techniques and Semidefinite Optimization,
    • Integer Programming and Combinatorial Optimization.
    -

    Another interesting course I found online is Deep Learning in Discrete Optimization, at Johns Hopkins It is taught by William Cook, who is the author of In Pursuit of the Traveling Salesman, a nice introduction to the TSP problem in a readable form.
    +

    Another interesting course I found online is Deep Learning in Discrete Optimization, at Johns Hopkins It is taught by William Cook, who is the author of In Pursuit of the Traveling Salesman, a nice introduction to the TSP problem in a readable form.

    . It contains an interesting overview of deep learning and integer programming, with a focus on connections, and applications to recent research areas in ML (reinforcement learning, attention, etc.).

    Solvers and computational resources

    When you start reading about modelling and algorithms, I recommend you try solving a few problems yourself, either by hand for small instances, or using an existing solver. It will allow you to follow the examples in books, while also practising your modelling skills. You will also get an intuition of what is difficult to model and to solve.

    -

    There are many solvers available, both free and commercial, with various capabilities. I recommend you use the fantastic JuMP
    +

    There are many solvers available, both free and commercial, with various capabilities. I recommend you use the fantastic JuMP



    @@ -480,10 +481,10 @@

    Another awesome resource is the NEOS Server. It offers free computing resources for numerical optimization, including all major free and commercial solvers! You can submit jobs on it in a standard format, or interface your favourite programming language with it. The fact that such an amazing resource exists for free, for everyone is extraordinary. They also have an accompanying book, the NEOS Guide, containing many case studies and description of problem types. The taxonomy may be particularly useful.

    Conclusion

    Operations research is a fascinating topic, and it has an abundant literature that makes it very easy to dive into the subject. If you are interested in algorithms, modelling for practical applications, or just wish to understand more, I hope to have given you the first steps to follow, start reading and experimenting.

    -

    References

    -
    +

    References

    +
    -

    Bertsimas, Dimitris, and John N. Tsitsiklis. 1997. Introduction to Linear Optimization. Belmont, Massachusetts: Athena Scientific. http://www.athenasc.com/linoptbook.html.

    +

    Bertsimas, Dimitris, and John N. Tsitsiklis. 1997. Introduction to Linear Optimization. Belmont, Massachusetts: Athena Scientific. http://www.athenasc.com/linoptbook.html.

    Boyd, Stephen. 2004. Convex Optimization. Cambridge, UK New York: Cambridge University Press.

    @@ -492,10 +493,10 @@

    Chvátal, Vašek. 1983. Linear Programming. New York: W.H. Freeman.

    -

    Dantzig, George. 1997. Linear Programming 1: Introduction. New York: Springer. https://www.springer.com/gp/book/9780387948331.

    +

    Dantzig, George. 1997. Linear Programming 1: Introduction. New York: Springer. https://www.springer.com/gp/book/9780387948331.

    -

    ———. 2003. Linear Programming 2: Theory and Extensions. New York: Springer. https://www.springer.com/gp/book/9780387986135.

    +

    ———. 2003. Linear Programming 2: Theory and Extensions. New York: Springer. https://www.springer.com/gp/book/9780387986135.

    Kochenderfer, Mykel. 2019. Algorithms for Optimization. Cambridge, Massachusetts: The MIT Press.

    @@ -504,10 +505,10 @@

    Maros, István. 2003. Computational Techniques of the Simplex Method. Boston: Kluwer Academic Publishers.

    -

    Nocedal, Jorge. 2006. Numerical Optimization. New York: Springer. https://www.springer.com/gp/book/9780387303031.

    +

    Nocedal, Jorge. 2006. Numerical Optimization. New York: Springer. https://www.springer.com/gp/book/9780387303031.

    -

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    +

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    Vanderbei, Robert. 2014. Linear Programming : Foundations and Extensions. New York: Springer.

    @@ -516,7 +517,7 @@

    Wentzel, Elena S. 1988. Operations Research: A Methodological Approach. Moscow: Mir publishers.

    -

    Williams, H. Paul. 2013. Model Building in Mathematical Programming. Chichester, West Sussex: Wiley. https://www.wiley.com/en-fr/Model+Building+in+Mathematical+Programming,+5th+Edition-p-9781118443330.

    +

    Williams, H. Paul. 2013. Model Building in Mathematical Programming. Chichester, West Sussex: Wiley. https://www.wiley.com/en-fr/Model+Building+in+Mathematical+Programming,+5th+Edition-p-9781118443330.

    @@ -534,31 +535,33 @@
    -

    Table of Contents

      +

      ICLR is one of the most important conferences in machine learning, and as such, I was very excited to have the opportunity to volunteer and attend the first fully-virtual edition of the event. The whole content of the conference has been made publicly available, only a few days after the end of the event!

      -

      I would like to thank the organizing committee for this incredible event, and the possibility to volunteer to help other participantsTo better organize the event, and help people navigate the various online tools, they brought in 500(!) volunteers, waved our registration fees, and asked us to do simple load-testing and tech support. This was a very generous offer, and felt very rewarding for us, as we could attend the conference, and give back to the organization a little bit.
      +

      I would like to thank the organizing committee for this incredible event, and the possibility to volunteer to help other participantsTo better organize the event, and help people navigate the various online tools, they brought in 500(!) volunteers, waved our registration fees, and asked us to do simple load-testing and tech support. This was a very generous offer, and felt very rewarding for us, as we could attend the conference, and give back to the organization a little bit.

      .

      The many volunteers, the online-only nature of the event, and the low registration fees also allowed for what felt like a very diverse, inclusive event. Many graduate students and researchers from industry (like me), who do not generally have the time or the resources to travel to conferences like this, were able to attend, and make the exchanges richer.

      In this post, I will try to give my impressions on the event, the speakers, and the workshops that I could attend. I will do a quick recap of the most interesting papers I saw in a future post.

      The Format of the Virtual Conference

      As a result of global travel restrictions, the conference was made fully-virtual. It was supposed to take place in Addis Ababa, Ethiopia, which is great for people who are often the target of restrictive visa policies in Northern American countries.

      -

      The thing I appreciated most about the conference format was its emphasis on asynchronous communication. Given how little time they had to plan the conference, they could have made all poster presentations via video-conference and call it a day. Instead, each poster had to record a 5-minute videoThe videos are streamed using SlidesLive, which is a great solution for synchronising videos and slides. It is very comfortable to navigate through the slides and synchronising the video to the slides and vice-versa. As a result, SlidesLive also has a very nice library of talks, including major conferences. This is much better than browsing YouTube randomly.
      +

      The thing I appreciated most about the conference format was its emphasis on asynchronous communication. Given how little time they had to plan the conference, they could have made all poster presentations via video-conference and call it a day. Instead, each poster had to record a 5-minute videoThe videos are streamed using SlidesLive, which is a great solution for synchronising videos and slides. It is very comfortable to navigate through the slides and synchronising the video to the slides and vice-versa. As a result, SlidesLive also has a very nice library of talks, including major conferences. This is much better than browsing YouTube randomly.

      -
      summarising their research. Alongside each presentation, there was a dedicated Rocket.Chat channelRocket.Chat seems to be an open-source alternative to Slack. Overall, the experience was great, and I appreciate the efforts of the organizers to use open source software instead of proprietary applications. I hope other conferences will do the same, and perhaps even avoid Zoom, because of recent privacy concerns (maybe try Jitsi?).
      +
      summarising their research. Alongside each presentation, there was a dedicated Rocket.Chat channelRocket.Chat seems to be an open-source alternative to Slack. Overall, the experience was great, and I appreciate the efforts of the organizers to use open source software instead of proprietary applications. I hope other conferences will do the same, and perhaps even avoid Zoom, because of recent privacy concerns (maybe try Jitsi?).

      where anyone could ask a question to the authors, or just show their appreciation for the work. This was a fantastic idea as it allowed any participant to interact with papers and authors at any time they please, which is especially important in a setting where people were spread all over the globe.

      There were also Zoom session where authors were available for direct, face-to-face discussions, allowing for more traditional conversations. But asking questions on the channel had also the advantage of keeping a track of all questions that were asked by other people. As such, I quickly acquired the habit of watching the video, looking at the chat to see the previous discussions (even if they happened in the middle of the night in my timezone!), and then skimming the paper or asking questions myself.

      @@ -582,7 +585,7 @@

      Beyond ‘tabula rasa’ in reinforcement learning: agents that remember, adapt, and generalize

      A lot of pretty advanced talks about RL. The general theme was meta-learning, aka “learning to learn”. This is a very active area of research, which goes way beyond classical RL theory, and offer many interesting avenues to adjacent fields (both inside ML and outside, especially cognitive science). The first talk, by Martha White, about inductive biases, was a very interesting and approachable introduction to the problems and challenges of the field. There was also a panel with Jürgen Schmidhuber. We hear a lot about him from the various controversies, but it’s nice to see him talking about research and future developments in RL.

      Causal Learning For Decision Making

      -

      Ever since I read Judea Pearl’s The Book of Why on causality, I have been interested in how we can incorporate causality reasoning in machine learning. This is a complex topic, and I’m not sure yet that it is a complete revolution as Judea Pearl likes to portray it, but it nevertheless introduces a lot of new fascinating ideas. Yoshua Bengio gave an interesting talkYou can find it at 4:45:20 in the livestream of the workshop.
      +

      Ever since I read Judea Pearl’s The Book of Why on causality, I have been interested in how we can incorporate causality reasoning in machine learning. This is a complex topic, and I’m not sure yet that it is a complete revolution as Judea Pearl likes to portray it, but it nevertheless introduces a lot of new fascinating ideas. Yoshua Bengio gave an interesting talkYou can find it at 4:45:20 in the livestream of the workshop.

      (even though very similar to his keynote talk) on causal priors for deep learning.

      Bridging AI and Cognitive Science

      @@ -604,15 +607,15 @@
    -

    Table of Contents

    +

    Two weeks ago, I did a presentation for my colleagues of the paper from Yurochkin et al. (2019), from NeurIPS 2019. It contains an interesting approach to document classification leading to strong performance, and, most importantly, excellent interpretability.

    This paper seems interesting to me because of it uses two methods with strong theoretical guarantees: optimal transport and topic modelling. Optimal transport looks very promising to me in NLP, and has seen a lot of interest in recent years due to advances in approximation algorithms, such as entropy regularisation. It is also quite refreshing to see approaches using solid results in optimisation, compared to purely experimental deep learning methods.

    Introduction and motivation

    The problem of the paper is to measure similarity (i.e. a distance) between pairs of documents, by incorporating semantic similarities (and not only syntactic artefacts), without encountering scalability issues.

    @@ -622,8 +625,8 @@
  • topic modelling methods (e.g. Latent Dirichlet Allocation), to represent semantically-meaningful groups of words.
  • Background: optimal transport

    -

    The essential backbone of the method is the Wasserstein distance, derived from optimal transport theory. Optimal transport is a fascinating and deep subject, so I won’t enter into the details here. For an introduction to the theory and its applications, check out the excellent book from Peyré and Cuturi (2019), (available on ArXiv as well). There are also very nice posts (in French) by Gabriel Peyré on the CNRS maths blog. Many more resources (including slides for presentations) are available at https://optimaltransport.github.io. For a more complete theoretical treatment of the subject, check out Santambrogio (2015), or, if you’re feeling particularly adventurous, Villani (2009).

    -

    For this paper, only a superficial understanding of how the Wasserstein distance works is necessary. Optimal transport is an optimisation technique to lift a distance between points in a given metric space, to a distance between probability distributions over this metric space. The historical example is to move piles of dirt around: you know the distance between any two points, and you have piles of dirt lying around Optimal transport originated with Monge, and then Kantorovich, both of whom had very clear military applications in mind (either in Revolutionary France, or during WWII). A lot of historical examples move cannon balls, or other military equipment, along a front line.
    +

    The essential backbone of the method is the Wasserstein distance, derived from optimal transport theory. Optimal transport is a fascinating and deep subject, so I won’t enter into the details here. For an introduction to the theory and its applications, check out the excellent book from Peyré and Cuturi (2019), (available on ArXiv as well). There are also very nice posts (in French) by Gabriel Peyré on the CNRS maths blog. Many more resources (including slides for presentations) are available at https://optimaltransport.github.io. For a more complete theoretical treatment of the subject, check out Santambrogio (2015), or, if you’re feeling particularly adventurous, Villani (2009).

    +

    For this paper, only a superficial understanding of how the Wasserstein distance works is necessary. Optimal transport is an optimisation technique to lift a distance between points in a given metric space, to a distance between probability distributions over this metric space. The historical example is to move piles of dirt around: you know the distance between any two points, and you have piles of dirt lying around Optimal transport originated with Monge, and then Kantorovich, both of whom had very clear military applications in mind (either in Revolutionary France, or during WWII). A lot of historical examples move cannon balls, or other military equipment, along a front line.

    . Now, if you want to move these piles to another configuration (fewer piles, say, or a different repartition of dirt a few metres away), you need to find the most efficient way to move them. The total cost you obtain will define a distance between the two configurations of dirt, and is usually called the earth mover’s distance, which is just an instance of the general Wasserstein metric.

    More formally, we start with two sets of points \(x = (x_1, x_2, \ldots, @@ -658,36 +661,36 @@ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}

    The first one can be precomputed once for all subsequent distances, so it is invariable in the number of documents we have to process. The second one only operates on \(\lvert T \rvert\) topics instead of the full vocabulary: the resulting optimisation problem is much smaller! This is great for performance, as it should be easy now to compute all pairwise distances in a large set of documents.

    Another interesting insight is that topics are represented as collections of words (we can keep the top 20 as a visual representations), and documents as collections of topics with weights. Both of these representations are highly interpretable for a human being who wants to understand what’s going on. I think this is one of the strongest aspects of these approaches: both the various representations and the algorithms are fully interpretable. Compared to a deep learning approach, we can make sense of every intermediate step, from the representations of topics to the weights in the optimisation algorithm to compute higher-level distances.

    -

    Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
    +

    Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).

    Experiments

    -

    The paper is very complete regarding experiments, providing a full evaluation of the method on one particular application: document clustering. They use Latent Dirichlet Allocation to compute topics and GloVe for pretrained word embeddings (Pennington, Socher, and Manning 2014), and Gurobi to solve the optimisation problems. Their code is available on GitHub.

    +

    The paper is very complete regarding experiments, providing a full evaluation of the method on one particular application: document clustering. They use Latent Dirichlet Allocation to compute topics and GloVe for pretrained word embeddings (Pennington, Socher, and Manning 2014), and Gurobi to solve the optimisation problems. Their code is available on GitHub.

    If you want the details, I encourage you to read the full paper, they tested the methods on a wide variety of datasets, with datasets containing very short documents (like Twitter), and long documents with a large vocabulary (books). With a simple \(k\)-NN classification, they establish that HOTT performs best on average, especially on large vocabularies (books, the “gutenberg” dataset). It also has a much better computational performance than alternative methods based on regularisation of the optimal transport problem directly on words. So the hierarchical nature of the approach allows to gain considerably in performance, along with improvements in interpretability.

    -

    What’s really interesting in the paper is the sensitivity analysis: they ran experiments with different word embeddings methods (word2vec, (Mikolov et al. 2013)), and with different parameters for the topic modelling (topic truncation, number of topics, etc). All of these reveal that changes in hyperparameters do not impact the performance of HOTT significantly. This is extremely important in a field like NLP where most of the times small variations in approach lead to drastically different results.

    +

    What’s really interesting in the paper is the sensitivity analysis: they ran experiments with different word embeddings methods (word2vec, (Mikolov et al. 2013)), and with different parameters for the topic modelling (topic truncation, number of topics, etc). All of these reveal that changes in hyperparameters do not impact the performance of HOTT significantly. This is extremely important in a field like NLP where most of the times small variations in approach lead to drastically different results.

    Conclusion

    All in all, this paper present a very interesting approach to compute distance between natural-language documents. It is no secret that I like methods with strong theoretical background (in this case optimisation and optimal transport), guaranteeing a stability and benefiting from decades of research in a well-established domain.

    Most importantly, this paper allows for future exploration in document representation with interpretability in mind. This is often added as an afterthought in academic research but is one of the most important topics for the industry, as a system must be understood by end users, often not trained in ML, before being deployed. The notion of topic, and distances as weights, can be understood easily by anyone without significant background in ML or in maths.

    Finally, I feel like they did not stop at a simple theoretical argument, but carefully checked on real-world datasets, measuring sensitivity to all the arbitrary choices they had to take. Again, from an industry perspective, this allows to implement the new approach quickly and easily, being confident that it won’t break unexpectedly without extensive testing.

    -

    References

    -
    +

    References

    +
    -

    Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems 26, 3111–9. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.

    +

    Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems 26, 3111–9. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.

    -

    Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “Glove: Global Vectors for Word Representation.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–43. Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162.

    +

    Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “Glove: Global Vectors for Word Representation.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–43. Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162.

    -

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    +

    Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.

    -

    Santambrogio, Filippo. 2015. Optimal Transport for Applied Mathematicians. Vol. 87. Progress in Nonlinear Differential Equations and Their Applications. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-20828-2.

    +

    Santambrogio, Filippo. 2015. Optimal Transport for Applied Mathematicians. Vol. 87. Progress in Nonlinear Differential Equations and Their Applications. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-20828-2.

    Villani, Cédric. 2009. Optimal Transport: Old and New. Grundlehren Der Mathematischen Wissenschaften 338. Berlin: Springer.

    -

    Yurochkin, Mikhail, Sebastian Claici, Edward Chien, Farzaneh Mirzazadeh, and Justin M Solomon. 2019. “Hierarchical Optimal Transport for Document Representation.” In Advances in Neural Information Processing Systems 32, 1599–1609. http://papers.nips.cc/paper/8438-hierarchical-optimal-transport-for-document-representation.pdf.

    +

    Yurochkin, Mikhail, Sebastian Claici, Edward Chien, Farzaneh Mirzazadeh, and Justin M Solomon. 2019. “Hierarchical Optimal Transport for Document Representation.” In Advances in Neural Information Processing Systems 32, 1599–1609. http://papers.nips.cc/paper/8438-hierarchical-optimal-transport-for-document-representation.pdf.

    @@ -734,22 +737,22 @@ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}

    I find this mildly fascinating that such a straightforward definition of a random matrix can exhibit such non-random properties in their spectrum.

    Simulation

    I ran a quick simulation, thanks to Julia’s great ecosystem for linear algebra and statistical distributions:

    - +
    using LinearAlgebra
    +using UnicodePlots
    +
    +function ginibre(n)
    +    return randn((n, n)) * sqrt(1/2n) + im * randn((n, n)) * sqrt(1/2n)
    +end
    +
    +v = eigvals(ginibre(2000))
    +
    +scatterplot(real(v), imag(v), xlim=[-1.5,1.5], ylim=[-1.5,1.5])

    I like using UnicodePlots for this kind of quick-and-dirty plots, directly in the terminal. Here is the output:

    References

      -
    1. Hall, Brian C. 2019. “Eigenvalues of Random Matrices in the General Linear Group in the Large-\(N\) Limit.” Notices of the American Mathematical Society 66, no. 4 (Spring): 568-569. https://www.ams.org/journals/notices/201904/201904FullIssue.pdf
    2. -
    3. Ginibre, Jean. “Statistical ensembles of complex, quaternion, and real matrices.” Journal of Mathematical Physics 6.3 (1965): 440-449. https://doi.org/10.1063/1.1704292
    4. +
    5. Hall, Brian C. 2019. “Eigenvalues of Random Matrices in the General Linear Group in the Large-\(N\) Limit.” Notices of the American Mathematical Society 66, no. 4 (Spring): 568-569. https://www.ams.org/journals/notices/201904/201904FullIssue.pdf
    6. +
    7. Ginibre, Jean. “Statistical ensembles of complex, quaternion, and real matrices.” Journal of Mathematical Physics 6.3 (1965): 440-449. https://doi.org/10.1063/1.1704292
    @@ -766,19 +769,20 @@ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}
    -

    Table of Contents

      +

      Introduction

      -

      I have recently bought the book Category Theory from Steve Awodey (Awodey 2010) is awesome, but probably the topic for another post), and a particular passage excited my curiosity:

      +

      I have recently bought the book Category Theory from Steve Awodey (Awodey 2010) is awesome, but probably the topic for another post), and a particular passage excited my curiosity:

      Let us begin by distinguishing between the following things: i. categorical foundations for mathematics, ii. mathematical foundations for category theory.

      As for the first point, one sometimes hears it said that category theory can be used to provide “foundations for mathematics,” as an alternative to set theory. That is in fact the case, but it is not what we are doing here. In set theory, one often begins with existential axioms such as “there is an infinite set” and derives further sets by axioms like “every set has a powerset,” thus building up a universe of mathematical objects (namely sets), which in principle suffice for “all of mathematics.”

      @@ -786,7 +790,7 @@ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}

      This statement is interesting because one often considers category theory as pretty “fundamental”, in the sense that it has no issue with considering what I call “dangerous” notions, such as the category \(\mathbf{Set}\) of all sets, and even the category \(\mathbf{Cat}\) of all categories. Surely a theory this general, that can afford to study such objects, should provide suitable foundations for mathematics? Awodey addresses these issues very explicitly in the section following the quote above, and finds a good way of avoiding circular definitions.

      Now, I remember some basics from my undergrad studies about foundations of mathematics. I was told that if you could define arithmetic, you basically had everything else “for free” (as Kronecker famously said, “natural numbers were created by God, everything else is the work of men”). I was also told that two sets of axioms existed, the Peano axioms and the Zermelo-Fraenkel axioms. Also, I should steer clear of the axiom of choice if I could, because one can do strange things with it, and it is equivalent to many different statements. Finally (and this I knew mainly from Logicomix, I must admit), it is impossible for a set of axioms to be both complete and consistent.

      Given all this, I realised that my knowledge of foundational mathematics was pretty deficient. I do not believe that it is a very important topic that everyone should know about, even though Gödel’s incompleteness theorem is very interesting from a logical and philosophical standpoint. However, I wanted to go deeper on this subject.

      -

      In this post, I will try to share my path through Peano’s axioms (Gowers, Barrow-Green, and Leader 2010), because they are very simple, and it is easy to uncover basic algebraic structure from them.

      +

      In this post, I will try to share my path through Peano’s axioms (Gowers, Barrow-Green, and Leader 2010), because they are very simple, and it is easy to uncover basic algebraic structure from them.

      The Axioms

      The purpose of the axioms is to define a collection of objects that we will call the natural numbers. Here, we place ourselves in the context of first-order logic. Logic is not the main topic here, so I will just assume that I have access to some quantifiers, to some predicates, to some variables, and, most importantly, to a relation \(=\) which is reflexive, symmetric, transitive, and closed over the natural numbers.

      Without further digressions, let us define two symbols \(0\) and \(s\) (called successor) such that:

      @@ -827,14 +831,27 @@ then \(\varphi(n)\) is true for every natural n

      First, we prove that every natural number commutes with \(0\).

        -
      • \(0+0 = 0+0\).
      • -
      • For every natural number \(a\) such that \(0+a = a+0\), we have:

      • +
      • \(0+0 = 0+0\).

      • +
      • For every natural number \(a\) such that \(0+a = a+0\), we have:

        +\[\begin{align} + 0 + s(a) &= s(0+a)\\ + &= s(a+0)\\ + &= s(a)\\ + &= s(a) + 0. +\end{align} +\]

      By Axiom 5, every natural number commutes with \(0\).

      We can now prove the main proposition:

        -
      • \(\forall a,\quad a+0=0+a\).
      • -
      • For all \(a\) and \(b\) such that \(a+b=b+a\),

      • +
      • \(\forall a,\quad a+0=0+a\).

      • +
      • For all \(a\) and \(b\) such that \(a+b=b+a\),

        +\[\begin{align} + a + s(b) &= s(a+b)\\ + &= s(b+a)\\ + &= s(b) + a. +\end{align} +\]

      We used the opposite of the second rule for \(+\), namely \(\forall a, \forall b,\quad s(a) + b = s(a+b)\). This can easily be proved by another induction.

      @@ -857,12 +874,12 @@ then \(\varphi(n)\) is true for every natural n

      Going further

      We have imbued our newly created set of natural numbers with a significant algebraic structure. From there, similar arguments will create more structure, notably by introducing another operation \(\times\), and an order \(\leq\).

      It is now a matter of conventional mathematics to construct the integers \(\mathbb{Z}\) and the rationals \(\mathbb{Q}\) (using equivalence classes), and eventually the real numbers \(\mathbb{R}\).

      -

      It is remarkable how very few (and very simple, as far as you would consider the induction axiom “simple”) axioms are enough to build an entire theory of mathematics. This sort of things makes me agree with Eugene Wigner (Wigner 1990) when he says that “mathematics is the science of skillful operations with concepts and rules invented just for this purpose”. We drew some arbitrary rules out of thin air, and derived countless properties and theorems from them, basically for our own enjoyment. (As Wigner would say, it is incredible that any of these fanciful inventions coming out of nowhere turned out to be even remotely useful.) Mathematics is done mainly for the mathematician’s own pleasure!

      +

      It is remarkable how very few (and very simple, as far as you would consider the induction axiom “simple”) axioms are enough to build an entire theory of mathematics. This sort of things makes me agree with Eugene Wigner (Wigner 1990) when he says that “mathematics is the science of skillful operations with concepts and rules invented just for this purpose”. We drew some arbitrary rules out of thin air, and derived countless properties and theorems from them, basically for our own enjoyment. (As Wigner would say, it is incredible that any of these fanciful inventions coming out of nowhere turned out to be even remotely useful.) Mathematics is done mainly for the mathematician’s own pleasure!

      -

      Mathematics cannot be defined without acknowledging its most obvious feature: namely, that it is interesting — M. Polanyi (Wigner 1990)

      +

      Mathematics cannot be defined without acknowledging its most obvious feature: namely, that it is interesting — M. Polanyi (Wigner 1990)

      -

      References

      -
      +

      References

      +

      Awodey, Steve. 2010. Category Theory. 2nd ed. Oxford Logic Guides 52. Oxford ; New York: Oxford University Press.

      @@ -870,7 +887,7 @@ then \(\varphi(n)\) is true for every natural n

      Gowers, Timothy, June Barrow-Green, and Imre Leader. 2010. The Princeton Companion to Mathematics. Princeton University Press.

      -

      Wigner, Eugene P. 1990. “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” In Mathematics and Science, by Ronald E Mickens, 291–306. WORLD SCIENTIFIC. https://doi.org/10.1142/9789814503488_0018.

      +

      Wigner, Eugene P. 1990. “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” In Mathematics and Science, by Ronald E Mickens, 291–306. WORLD SCIENTIFIC. https://doi.org/10.1142/9789814503488_0018.

    @@ -904,20 +921,36 @@ then \(\varphi(n)\) is true for every natural n

    A Markov Decision Process is a tuple \((\mathcal{S}, \mathcal{A}, \mathcal{R}, p)\) where:

      -
    • \(\mathcal{S}\) is a set of states,
    • -
    • \(\mathcal{A}\) is an application mapping each state \(s \in - \mathcal{S}\) to a set \(\mathcal{A}(s)\) of possible actions for this state. In this post, we will often simplify by using \(\mathcal{A}\) as a set, assuming that all actions are possible for each state,
    • -
    • \(\mathcal{R} \subset \mathbb{R}\) is a set of rewards,
    • +
    • \(\mathcal{S}\) is a set of states,

    • +
    • \(\mathcal{A}\) is an application mapping each state \(s \in + \mathcal{S}\) to a set \(\mathcal{A}(s)\) of possible actions for this state. In this post, we will often simplify by using \(\mathcal{A}\) as a set, assuming that all actions are possible for each state,

    • +
    • \(\mathcal{R} \subset \mathbb{R}\) is a set of rewards,

    • and \(p\) is a function representing the dynamics of the MDP:

      +\[\begin{align} + p &: \mathcal{S} \times \mathcal{R} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ + p(s', r \;|\; s, a) &:= \mathbb{P}(S_t=s', R_t=r \;|\; S_{t-1}=s, A_{t-1}=a), +\end{align} +\]

      such that \[ \forall s \in \mathcal{S}, \forall a \in \mathcal{A},\quad \sum_{s', r} p(s', r \;|\; s, a) = 1. \]

    The function \(p\) represents the probability of transitioning to the state \(s'\) and getting a reward \(r\) when the agent is at state \(s\) and chooses action \(a\).

    We will also use occasionally the state-transition probabilities:

    - +\[\begin{align} + p &: \mathcal{S} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ +p(s' \;|\; s, a) &:= \mathbb{P}(S_t=s' \;|\; S_{t-1}=s, A_{t-1}=a) \\ +&= \sum_r p(s', r \;|\; s, a). +\end{align} +\]

    Rewarding the agent

    The expected reward of a state-action pair is the function

    +\[\begin{align} +r &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ +r(s,a) &:= \mathbb{E}[R_t \;|\; S_{t-1}=s, A_{t-1}=a] \\ +&= \sum_r r \sum_{s'} p(s', r \;|\; s, a). +\end{align} +\]

    The discounted return is the sum of all future rewards, with a multiplicative factor to give more weights to more immediate rewards: \[ G_t := \sum_{k=t+1}^T \gamma^{k-t-1} R_k, \] where \(T\) can be infinite or \(\gamma\) can be 1, but not both.

    @@ -927,14 +960,33 @@ then \(\varphi(n)\) is true for every natural n

    A policy is a way for the agent to choose the next action to perform.

    A policy is a function \(\pi\) defined as

    +\[\begin{align} +\pi &: \mathcal{A} \times \mathcal{S} \mapsto [0,1] \\ +\pi(a \;|\; s) &:= \mathbb{P}(A_t=a \;|\; S_t=s). +\end{align} +\]

    In order to compare policies, we need to associate values to them.

    The state-value function of a policy \(\pi\) is

    +\[\begin{align} +v_{\pi} &: \mathcal{S} \mapsto \mathbb{R} \\ +v_{\pi}(s) &:= \text{expected return when starting in $s$ and following $\pi$} \\ +v_{\pi}(s) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s\right] \\ +v_{\pi}(s) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s\right] +\end{align} +\]

    We can also compute the value starting from a state \(s\) by also taking into account the action taken \(a\).

    The action-value function of a policy \(\pi\) is

    +\[\begin{align} +q_{\pi} &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ +q_{\pi}(s,a) &:= \text{expected return when starting from $s$, taking action $a$, and following $\pi$} \\ +q_{\pi}(s,a) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s, A_t=a \right] \\ +q_{\pi}(s,a) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s, A_t=a\right] +\end{align} +\]

    The quest for the optimal policy

    References

    @@ -956,14 +1008,15 @@ then \(\varphi(n)\) is true for every natural n
    -

    Table of Contents

    The APL family of languages

    Why APL?

    I recently got interested in APL, an array-based programming language. In APL (and derivatives), we try to reason about programs as series of transformations of multi-dimensional arrays. This is exactly the kind of style I like in Haskell and other functional languages, where I also try to use higher-order functions (map, fold, etc) on lists or arrays. A developer only needs to understand these abstractions once, instead of deconstructing each loop or each recursive function encountered in a program.

    @@ -1036,7 +1089,7 @@ then \(\varphi(n)\) is true for every natural n
    • We draw a random lattice of size ⍺ with L ⍺.
    • -
    • We apply to it our update function, with $β$=10, ⍵ times (using the function, which applies a function \(n\) times.
    • +
    • We apply to it our update function, with $β$=10, ⍵ times (using the function, which applies a function \(n\) times.
    • Finally, we display -1 as a space and 1 as a domino ⌹.

    Final output, with a \(80\times 80\) random lattice, after 50000 update steps:

    diff --git a/_site/skills.html b/_site/skills.html index 7499958..45d5d32 100644 --- a/_site/skills.html +++ b/_site/skills.html @@ -16,14 +16,18 @@ - + - + - - + + + + + +
    @@ -44,7 +48,6 @@ -

    Statistics

    diff --git a/site.hs b/site.hs index 401f8e8..71b8d27 100644 --- a/site.hs +++ b/site.hs @@ -172,12 +172,19 @@ customPandocCompiler withTOC = newExtensions = defaultExtensions `mappend` customExtensions writerOptions = defaultHakyllWriterOptions { writerExtensions = newExtensions - , writerHTMLMathMethod = KaTeX "" + , writerHTMLMathMethod = MathJax "" } + -- below copied from https://www.gwern.net/hakyll.hs + -- below copied from https://github.com/jaspervdj/hakyll/blob/e8ed369edaae1808dffcc22d1c8fb1df7880e065/web/site.hs#L73 because god knows I don't know what this type bullshit is either: + -- "When did it get so hard to compile a string to a Pandoc template?" + tocTemplate = + either error id $ either (error . show) id $ + runPure $ runWithDefaultPartials $ + compileTemplate "" "

    Table of Contents

    $toc$
    \n$body$" writerOptionsWithTOC = writerOptions { writerTableOfContents = True , writerTOCDepth = 2 - , writerTemplate = Just "

    Table of Contents

    $toc$\n$body$" + , writerTemplate = Just tocTemplate--"

    Table of Contents

    $toc$\n$body$" } readerOptions = defaultHakyllReaderOptions in do diff --git a/stack.yaml b/stack.yaml index ae10b64..0af0c5e 100644 --- a/stack.yaml +++ b/stack.yaml @@ -17,7 +17,7 @@ # # resolver: ./custom-snapshot.yaml # resolver: https://example.com/snapshots/2018-01-01.yaml -resolver: lts-12.18 +resolver: lts-16.11 # User packages to be built. # Various formats can be used as shown in the example below. @@ -38,8 +38,11 @@ packages: # using the same syntax as the packages field. # (e.g., acme-missiles-0.3) extra-deps: - - pandoc-sidenote-0.19.0.0@sha256:f5a5fdab1900da7b26ca673d14302b0a27b56024ca811babcacab79ba9614e90,1196 + - github: jez/pandoc-sidenote + commit: 7aacfa4b20a44562de48725dea812c5de2c5aeac + #- pandoc-sidenote-0.19.0.0@sha256:f5a5fdab1900da7b26ca673d14302b0a27b56024ca811babcacab79ba9614e90,1196 - monad-gen-0.3.0.1@sha256:a2569465cfbd468d3350ef25de56b3362580e77537224313aab1210f40804a3b,821 + - hakyll-4.13.4.0@sha256:d97cb1b1cf7b901ce049e6e0aa5b1ac702cf52d2e475f5d3fe2de3694bb7dc7b,8867 # Override default flag values for local packages and extra-deps # flags: {} diff --git a/stack.yaml.lock b/stack.yaml.lock index 261e0cb..50d4931 100644 --- a/stack.yaml.lock +++ b/stack.yaml.lock @@ -5,12 +5,19 @@ packages: - completed: - hackage: pandoc-sidenote-0.19.0.0@sha256:f5a5fdab1900da7b26ca673d14302b0a27b56024ca811babcacab79ba9614e90,1196 + size: 4871 + url: https://github.com/jez/pandoc-sidenote/archive/7aacfa4b20a44562de48725dea812c5de2c5aeac.tar.gz + cabal-file: + size: 1578 + sha256: 27753af099fd14469ec326713c5437ff8dcf9307c9711ab8d319a7ae9678a862 + name: pandoc-sidenote + version: 0.20.0.0 + sha256: 280c0ef67602128f0e8f9f184868f14a71c3938bebf8644c55e6988413db7e9f pantry-tree: - size: 273 - sha256: 0d23026f86da13ca57aaa0041bfe719cf7acbeddbadbbe1bb8e265316dcfce49 + size: 575 + sha256: ace183ff53f1fe2987fc3ae77f38fa8c5ee1e656fe3bed0941691150bc1771ab original: - hackage: pandoc-sidenote-0.19.0.0@sha256:f5a5fdab1900da7b26ca673d14302b0a27b56024ca811babcacab79ba9614e90,1196 + url: https://github.com/jez/pandoc-sidenote/archive/7aacfa4b20a44562de48725dea812c5de2c5aeac.tar.gz - completed: hackage: monad-gen-0.3.0.1@sha256:a2569465cfbd468d3350ef25de56b3362580e77537224313aab1210f40804a3b,821 pantry-tree: @@ -18,9 +25,16 @@ packages: sha256: a825dc8dcc3ee27dbbc1b71f6853703790ed320249b8f8510bb36d726664e795 original: hackage: monad-gen-0.3.0.1@sha256:a2569465cfbd468d3350ef25de56b3362580e77537224313aab1210f40804a3b,821 +- completed: + hackage: hakyll-4.13.4.0@sha256:d97cb1b1cf7b901ce049e6e0aa5b1ac702cf52d2e475f5d3fe2de3694bb7dc7b,8867 + pantry-tree: + size: 7841 + sha256: e31b612480429149bd2c042dd49cdb1a5eb9c807d4acce4a971a6da3d6f5db81 + original: + hackage: hakyll-4.13.4.0@sha256:d97cb1b1cf7b901ce049e6e0aa5b1ac702cf52d2e475f5d3fe2de3694bb7dc7b,8867 snapshots: - completed: - size: 506303 - url: https://raw.githubusercontent.com/commercialhaskell/stackage-snapshots/master/lts/12/18.yaml - sha256: 44c58c3e2bfc57455cb54ce3860847aaf69d9ee11baad238d7d0a82b7034a474 - original: lts-12.18 + size: 532381 + url: https://raw.githubusercontent.com/commercialhaskell/stackage-snapshots/master/lts/16/11.yaml + sha256: 1f43c4ad661a114a4f9dd4580988f30da1208d844c097714f5867c52a02e0aa1 + original: lts-16.11 diff --git a/templates/default.html b/templates/default.html index 3ed8ef5..b53a018 100644 --- a/templates/default.html +++ b/templates/default.html @@ -18,15 +18,19 @@ - + - + - - + + + + + +
    @@ -53,11 +57,6 @@ $endif$ - $if(toc)$ - - $endif$
    $body$