After Phase I, here are my solutions to Phase II problems. The full code is included in the post, but everything is also available on GitHub.
A PDF of the problems descriptions is available on the competition website, or directly from my GitHub repo.
The submission guidelines gave a template where everything is defined in a Contest2020.Problems
Namespace. I kept the default values for ⎕IO
and ⎕ML
because the problems were not particularly easier with ⎕IO←0
.
:Namespace Contest2020
:Namespace Problems (⎕IO ⎕ML ⎕WX)←1 1 3
∇ score←dd DiveScore scores
:If 7=≢scores
scores←scores[¯2↓2↓⍋scores]
:ElseIf 5=≢scores
scores←scores[¯1↓1↓⍋scores]
:Else
scores←scores
:EndIf
score←2(⍎⍕)dd×+/scores ∇
This is a very straightforward implementation of the algorithm describe in the problem description. I decided to switch explicitly on the size of the input vector because I feel it is more natural. For the cases with 5 or 7 judges, we use Drop (↓
) to remove the lowest and highest scores.
At the end, we sum up the scores with +/
and multiply them by dd
. The last operation, 2(⍎⍕)
, is a train using Format (Dyadic) to round to 2 decimal places, and Execute to get actual numbers and not strings.
∇ steps←{p}Steps fromTo;segments;width
width←|-/fromTo
:If 0=⎕NC'p' ⍝ No left argument: same as Problem 5 of Phase I
segments←0,⍳width
:ElseIf p<0 ⍝ -⌊p is the number of equally-sized steps to take
segments←(-⌊p){0,⍵×⍺÷⍨⍳⍺}width
:ElseIf p>0 ⍝ p is the step size
segments←p{⍵⌊⍺×0,⍳⌈⍵÷⍺}width
:ElseIf p=0 ⍝ As if we took zero step
segments←0
:EndIf
⍝ Take into account the start point and the direction.
steps←fromTo{(⊃⍺)+(-×-/⍺)×⍵}segments ∇
This is an extension to Problem 5 of Phase I. In each case, we compute the “segments”, i.e., the steps starting from 0. In a last step, common to all cases, we add the correct starting point and correct the direction if need be.
To compute equally-sized steps, we first divide the segment \([0, 1]\) in p
equal segments with (⍳p)÷p
. This subdivision can then be multiplied by the width to obtain the required segments.
When p
is the step size, we just divide the width by the step size (rounded to the next largest integer) to get the required number of segments. If the last segment is too large, we “crop” it to the width with Minimum (⌊
).
∇ urls←PastTasks url;r;paths
r←HttpCommand.Get url
paths←('[a-zA-Z0-9_/]+\.pdf'⎕S'&')r.Data
urls←('https://www.dyalog.com/'∘,)¨paths ∇
I decided to use HttpCommand
for this task, since it is simply one ]load HttpCommand
away and should be platform-independent.
Parsing XML is not something I consider “fun” in the best of cases, and I feel like APL is not the best language to do this kind of thing. Given how simple the task is, I just decided to find the relevant bits with a regular expression using Replace and Search (⎕S
).
After finding all the strings vaguely resembling a PDF file name (only alphanumeric characters and underscores, with a .pdf
extension), I just concatenate them to the base URL of the Dyalog domain.
The first task can be solved by decomposing it into several functions.
⍝ Test if a DNA string is a reverse palindrome. isrevp←{⍵≡⌽'TAGC'['ATCG'⍳⍵]}
First, we compute the complement of a DNA string (using simple indexing) and test if its Reverse (⌽
) is equal to the original string.
⍝ Generate all subarrays (position, length) pairs, for 4 ≤ length ≤ 12. subarrays←{⊃,/(⍳⍵),¨¨3↓¨⍳¨12⌊1+⍵-⍳⍵}
We first compute all the possible lengths for each starting point. For instance, the last element cannot have any (position, length) pair associated to it, because there is no three element following it. So we crop the possible lengths to \([3, 12]\). For instance for an array of size 10:
{3↓¨⍳¨12⌊1+⍵-⍳⍵}10
┌──────────────┬───────────┬─────────┬───────┬─────┬───┬─┬┬┬┐
│4 5 6 7 8 9 10│4 5 6 7 8 9│4 5 6 7 8│4 5 6 7│4 5 6│4 5│4││││ └──────────────┴───────────┴─────────┴───────┴─────┴───┴─┴┴┴┘
Then, we just add the corresponding starting position to each length (1 for the first block, 2 for the second, and so on). Finally, we flatten everything.
∇ r←revp dna;positions
positions←subarrays⍴dna
⍝ Filter subarrays which are reverse palindromes.
r←↑({isrevp dna[¯1+⍵[1]+⍳⍵[2]]}¨positions)/positions ∇
For each possible (position, length) pair, we get the corresponding DNA substring with dna[¯1+⍵[1]+⍳⍵[2]]
(adding ¯1
is necessary because ⎕IO←1
). We test if this substring is a reverse palindrome using isrevp
above. Replicate (/
) then selects only the (position, length) pairs for which the substring is a reverse palindrome.
The second task is just about counting the number of subsets modulo 1,000,000. So we just need to compute \(2^n \mod 1000000\) for any positive integer \(n\leq1000\).
sset←{((1E6|2∘×)⍣⍵)1}
Since we cannot just compute \(2^n\) directly and take the remainder, we use modular arithmetic to stay mod 1,000,000 during the whole computation. The dfn (1E6|2∘×)
doubles its argument mod 1,000,000. So we just apply this function \(n\) times using the Power operator (⍣
), with an initial value of 1.
First solution: ((1+⊢)⊥⊣)
computes the total return for a vector of amounts ⍺
and a vector of rates ⍵
. It is applied to every prefix subarray of amounts and rates to get all intermediate values. However, this has quadratic complexity.
rr←(,\⊣)((1+⊢)⊥⊣)¨(,\⊢)
Second solution: We want to be able to use the recurrence relation (recur
) and scan through the vectors of amounts and rates, accumulating the total value at every time step. However, APL evaluation is right-associative, so a simple Scan (recur\amounts,¨values
) would not give the correct result, since recur
is not associative and we need to evaluate it left-to-right. (In any case, in this case, Scan would have quadratic complexity, so would not bring any benefit over the previous solution.) What we need is something akin to Haskell’s scanl
function, which would evaluate left to right in \(O(n)\) timeThere is an interesting StackOverflow answer explaining the behaviour of Scan, and compares it to Haskell’s scanl
function.
. This is what we do here, accumulating values from left to right. (This is inspired from dfns.ascan
, although heavily simplified.)
rr←{recur←{⍵[1]+⍺×1+⍵[2]} ⋄ 1↓⌽⊃{(⊂(⊃⍵)recur⍺),⍵}/⌽⍺,¨⍵}
For the second task, there is an explicit formula for cashflow calculations, so we can just apply it.
pv←{+/⍺÷×\1+⍵}
∇ text←templateFile Merge jsonFile;template;ns
template←⊃⎕NGET templateFile 1
ns←⎕JSON⊃⎕NGET jsonFile
⍝ We use a simple regex search and replace on the
⍝ template.
text←↑('@[a-zA-Z]*@'⎕R{ns getval ¯1↓1↓⍵.Match})template ∇
We first read the template and the JSON values from their files. The ⎕NGET
function read simple text files, and ⎕JSON
extracts the key-value pairs as a namespace.
Assuming all variable names contain only letters, we match the regex @[a-zA-Z]*@
to match variable names enclosed between @
symbols. The function getval
then returns the appropriate value, and we can replace the variable name in the template.
∇ val←ns getval var
:If ''≡var ⍝ literal '@'
val←'@'
:ElseIf (⊂var)∊ns.⎕NL ¯2
val←⍕ns⍎var
:Else
val←'???'
:EndIf ∇
This function takes the namespace matching the variable names to their respective values, and the name of the variable.
@@
, which corresponds to a literal @
.???
. CheckDigit←{10|-⍵+.×11⍴3 1}
The check digit satisfies the equation \[ 3 x_{1}+x_{2}+3 x_{3}+x_{4}+3 x_{5}+x_{6}+3 x_{7}+x_{8}+3 x_{9}+x_{10}+3 x_{11}+x_{12} \equiv 0 \bmod 10, \] therefore, \[ x_{12} \equiv -(3 x_{1}+x_{2}+3 x_{3}+x_{4}+3 x_{5}+x_{6}+3 x_{7}+x_{8}+3 x_{9}+x_{10}+3 x_{11}) \bmod 10. \]
Translated to APL, we just take the dot product between the first 11 digits of the barcode with 11⍴3 1
, negate it, and take the remainder by 10.
⍝ Left and right representations of digits. Decoding
⍝ the binary representation from decimal is more
⍝ compact than writing everything explicitly.
lrepr←⍉(7⍴2)⊤13 25 19 61 35 49 47 59 55 11 rrepr←~¨lrepr
For the second task, the first thing we need to do is save the representation of digits. To save space, I did not encode the binary representation explicitly, instead using a decimal representation that I then decode in base 2. The right representation is just the bitwise negation.
∇ bits←WriteUPC digits;left;right
:If (11=≢digits)∧∧/digits∊0,⍳9
left←,lrepr[1+6↑digits;]
right←,rrepr[1+6↓digits,CheckDigit digits;]
bits←1 0 1,left,0 1 0 1 0,right,1 0 1
:Else
bits←¯1
:EndIf ∇
First of all, if the vector digits
does not have exactly 11 elements, all between 0 and 9, it is an error and we return ¯1
.
Then, we take the first 6 digits and encode them with lrepr
, and the last 5 digits plus the check digit encoded with rrepr
. In each case, adding 1 is necessary because ⎕IO←1
. We return the final bit array with the required beginning, middle, and end guard patterns.
∇ digits←ReadUPC bits
:If 95≠⍴bits ⍝ incorrect number of bits
digits←¯1
:Else
⍝ Test if the barcode was scanned right-to-left.
:If 0=2|+/bits[3+⍳7]
bits←⌽bits
:EndIf
digits←({¯1+lrepr⍳⍵}¨(7/⍳6)⊆42↑3↓bits),{¯1+rrepr⍳⍵}¨(7/⍳6)⊆¯42↑¯3↓bits
:If ~∧/digits∊0,⍳9 ⍝ incorrect parity
digits←¯1
:ElseIf (⊃⌽digits)≠CheckDigit ¯1↓digits ⍝ incorrect check digit
digits←¯1
:EndIf
:EndIf ∇
¯1
.¯42↑¯3↓bits
), separate the different digits using Partition (⊆
), and look up each of them in the rrepr
vector using Index Of (⍳
). We do the same for the left digits.lrepr
and rrepr
vectors), and for the check digit.
∇ parts←Balance nums;subsets;partitions
⍝ This is a brute force solution, running in
⍝ exponential time. We generate all the possible
⍝ partitions, filter out those which are not
⍝ balanced, and return the first matching one. There
⍝ are more advanced approach running in
⍝ pseudo-polynomial time (based on dynamic
⍝ programming, see the "Partition problem" Wikipedia
⍝ page), but they are not warranted here, as the
⍝ input size remains fairly small.
⍝ Generate all partitions of a vector of a given
⍝ size, as binary mask vectors.
subsets←{1↓2⊥⍣¯1⍳2*⍵}
⍝ Keep only the subsets whose sum is exactly
⍝ (+/nums)÷2.
partitions←nums{((2÷⍨+/⍺)=⍺+.×⍵)/⍵}subsets⍴nums
:If 0=≢,partitions
⍝ If no partition satisfy the above
⍝ criterion, we return ⍬.
parts←⍬
:Else
⍝ Otherwise, we return the first possible
⍝ partition.
parts←nums{((⊂,(⊂~))⊃↓⍉⍵)/¨2⍴⊂⍺}partitions
:EndIf ∇
This is the only problem that I didn’t complete. It required parsing the files containing the graphical representations of the trees, which was needlessly complex and, quite frankly, hard and boring with a language like APL.
However, the next part is interesting: once we have a matrix of coefficients representing the relationships between the weights, we can solve the system of equations. Matrix Divide (⌹
) will find one solution to the system. Since the system is overdetermined, we fix A=1
to find one possible solution. Since we want integer weights, the solution we find is smaller than the one we want, and may contain fractional weights. So we multiply everything by the Lowest Common Multiple (∧
) to get the smallest integer weights.
∇ weights←Weights filename;mobile;branches;mat
⍝ Put your code and comments below here
⍝ Parse the mobile input file.
mobile←↑⊃⎕NGET filename 1
branches←⍸mobile∊'┌┴┐'
⍝ TODO: Build the matrix of coefficients mat.
⍝ Solve the system of equations (arbitrarily setting
⍝ the first variable at 1 because the system is
⍝ overdetermined), then multiply the coefficients by
⍝ their least common multiple to get the smallest
⍝ integer weights.
weights←((1∘,)×(∧/÷))mat[;1]⌹1↓[2]mat ∇
:EndNamespace :EndNamespace
I’ve always been quite fond of APL and its “array-oriented” approach of programmingSee my previous post on simulating the Ising model with APL. It also contains more background on APL.
. Every year, Dyalog (the company behind probably the most popular APL implementation) organises a competition with various challenges in APL.
The Dyalog APL Problem Solving Competition consists of two phases:
In 2018, I participated in the competition, entering only Phase ISince I was a student at the time, I was eligible for a prize, and I won $100 for a 10-line submission, which is quite good!
(my solutions are on GitHub). This year, I entered in both phases. I explain my solutions to Phase I in this post. Another post will contain annotated solutions for Phase II problems.
The full code for my submission is on GitHub at dlozeve/apl-competition-2020, but everything is reproduced in this post.
Write a function that, given a right argument
Y
which is a scalar or a non-empty vector and a left argumentX
which is a single non-zero integer so that its absolute value is less or equal to≢Y
, splitsY
into a vector of two vectors according toX
, as follows:If
X>0
, the first vector contains the firstX
elements ofY
and the second vector contains the remaining elements.If
X<0
, the second vector contains the last|X
elements ofY
and the first vector contains the remaining elements.
Solution: (0>⊣)⌽((⊂↑),(⊂↓))
There are three nested trains hereTrains are nice to read (even if they are easy to abuse), and generally make for shorter dfns, which is better for Phase I.
. The first one, ((⊂↑),(⊂↓))
, uses the two functions Take (↑
) and Drop (↓
) to build a nested array consisting of the two outputs we need. (Take and Drop already have the behaviour needed regarding negative arguments.) However, if the left argument is positive, the two arrays will not be in the correct order. So we need a way to reverse them if X<0
.
The second train (0>⊣)
will return 1 if its left argument is positive. From this, we can use Rotate (⌽
) to correctly order the nested array, in the last train.
UTF-8 encodes Unicode characters using 1-4 integers for each character. Dyalog APL includes a system function,
⎕UCS
, that can convert characters into integers and integers into characters. The expression'UTF-8'∘⎕UCS
converts between characters and UTF-8.Consider the following:
'UTF-8'∘⎕UCS 'D¥⍺⌊○9' 68 194 165 226 141 186 226 140 138 226 151 139 57 'UTF-8'∘⎕UCS 68 194 165 226 141 186 226 140 138 226 151 139 57 D¥⍺⌊○9
How many integers does each character use?
'UTF-8'∘⎕UCS¨ 'D¥⍺⌊○9' ⍝ using ]Boxing on ┌──┬───────┬───────────┬───────────┬───────────┬──┐ │68│194 165│226 141 186│226 140 138│226 151 139│57│ └──┴───────┴───────────┴───────────┴───────────┴──┘
The rule is that an integer in the range 128 to 191 (inclusive) continues the character of the previous integer (which may itself be a continuation). With that in mind, write a function that, given a right argument which is a simple integer vector representing valid UTF-8 text, encloses each sequence of integers that represent a single character, like the result of
'UTF-8'∘⎕UCS¨'UTF-8'∘⎕UCS
but does not use any system functions (names beginning with⎕
)
Solution: {(~⍵∊127+⍳64)⊂⍵}
First, we build a binary array from the string, encoding each continuation character as 0, and all the others as 1. Next, we can use this binary array with Partitioned Enclose (⊂
) to return the correct output.
A Microsoft Excel spreadsheet numbers its rows counting up from 1. However, Excel’s columns are labelled alphabetically — beginning with A–Z, then AA–AZ, BA–BZ, up to ZA–ZZ, then AAA–AAZ and so on.
Write a function that, given a right argument which is a character scalar or non-empty vector representing a valid character Excel column identifier between A and XFD, returns the corresponding column number
Solution: 26⊥⎕A∘⍳
We use the alphabet ⎕A
and Index Of (⍳
) to compute the index in the alphabet of every character. As a train, this can be done by (⎕A∘⍳)
. We then obtain an array of numbers, each representing a letter from 1 to 26. The Decode (⊥
) function can then turn this base-26 number into the expected result.
Write a function that, given a right argument which is an integer array of year numbers greater than or equal to 1752 and less than 4000, returns a result of the same shape as the right argument where 1 indicates that the corresponding year is a leap year (0 otherwise).
A leap year algorithm can be found here.
Solution: 1 3∊⍨(0+.=400 100 4∘.|⊢)
According to the algorithm, a year is a leap year in two situations:
The train (400 100 4∘.|⊢)
will test if each year in the right argument is divisible by 400, 100, and 4, using an Outer Product. We then use an Inner Product to count how many times each year is divisible by one of these numbers. If the count is 1 or 3, it is a leap year. Note that we use Commute (⍨
) to keep the dfn as a train, and to preserve the natural right-to-left reading of the algorithm.
Write a function that, given a right argument of 2 integers, returns a vector of the integers from the first element of the right argument to the second, inclusively.
Solution: {(⊃⍵)+(-×-/⍵)×0,⍳|-/⍵}
First, we have to compute the range of the output, which is the absolute value of the difference between the two integers |-/⍵
. From this, we compute the actual sequence, including zeroIf we had ⎕IO←0
, we could have written ⍳|1+-/⍵
, but this is the same number of characters.
: 0,⍳|-/⍵
.
This sequence will always be nondecreasing, but we have to make it decreasing if needed, so we multiply it by the opposite of the sign of -/⍵
. Finally, we just have to start the sequence at the first element of ⍵
.
Write a function that, given a right argument which is an integer vector and a left argument which is an integer scalar, reorders the right argument so any elements equal to the left argument come first while all other elements keep their order.
Solution: {⍵[⍋⍺≠⍵]}
⍺≠⍵
will return a binary vector marking as 0 all elements equal to the left argument. Using this index to sort in the usual way with Grade Up will return the expected result.
A common technique for encoding a set of on/off states is to use a value of \(2^n\) for the state in position \(n\) (origin 0), 1 if the state is “on” or 0 for “off” and then add the values. Dyalog APL’s component file permission codes are an example of this. For example, if you wanted to grant permissions for read (access code 1), append (access code 8) and rename (access code 128) then the resulting code would be 137 because that’s 1 + 8 + 128.
Write a function that, given a non-negative right argument which is an integer scalar representing the encoded state and a left argument which is an integer scalar representing the encoded state settings that you want to query, returns 1 if all of the codes in the left argument are found in the right argument (0 otherwise).
Solution: {f←⍸∘⌽(2∘⊥⍣¯1)⋄∧/(f⍺)∊f⍵}
The difficult part is to find the set of states for an integer. We need a function that will return 1 8 128
(or an equivalent representation) for an input of 137
. To do this, we need the base-2 representations of \(137 = 1 + 8 + 128 = 2^0 + 2^3 + 2^7 =
10010001_2\). The function (2∘⊥⍣¯1)
will return the base-2 representation of its argument, and by reversing and finding where the non-zero elements are, we find the correct exponents (1 3 7
in this case). That is what the function f
does.
Next, we just need to check that all elements of f⍺
are also in f⍵
.
A zigzag number is an integer in which the difference in magnitude of each pair of consecutive digits alternates from positive to negative or negative to positive.
Write a function that takes a single integer greater than or equal to 100 and less than 1015 as its right argument and returns a 1 if the integer is a zigzag number, 0 otherwise.
Solution: ∧/2=∘|2-/∘×2-/(10∘⊥⍣¯1)
First, we decompose a number into an array of digits, using (10∘⊥⍣¯1)
(Decode (⊥
) in base 10). Then, we Reduce N Wise to compute the difference between each pair of digits, take the sign, and ensure that the signs are indeed alternating.
Write a function that, given a right argument which is an integer scalar or vector, returns a 1 if the values of the right argument conform to the following pattern (0 otherwise):
- The elements increase or stay the same until the “apex” (the highest value) is reached
- After the apex, any remaining values decrease or remain the same
Solution: {∧/(⍳∘≢≡⍋)¨(⊂((⊢⍳⌈/)↑⊢),⍵),⊂⌽((⊢⍳⌈/)↓⊢),⍵}
How do we approach this? First we have to split the vector at the “apex”. The train (⊢⍳⌈/)
will return the index of (⍳
) the maximum element.
(⊢⍳⌈/)1 3 3 4 5 2 1 5
Combined with Take (↑
) and Drop (↓
), we build a two-element vector containing both parts, in ascending order (we Reverse (⌽
) one of them). Note that we have to Ravel (,
) the argument to avoid rank errors in Index Of.
{(⊂((⊢⍳⌈/)↑⊢),⍵),⊂⌽((⊢⍳⌈/)↓⊢),⍵}1 3 3 4 5 2 1
┌─────────┬───┐
│1 3 3 4 5│1 2│ └─────────┴───┘
Next, (⍳∘≢≡⍋)
on each of the two vectors will test if they are non-decreasing (i.e. if the ranks of all the elements correspond to a simple range from 1 to the size of the vector).
Write a function that takes as its right argument a vector of simple arrays of rank 2 or less (scalar, vector, or matrix). Each simple array will consist of either non-negative integers or printable ASCII characters. The function must return a simple character array that displays identically to what
{⎕←⍵}¨
displays when applied to the right argument.
Solution: {↑⊃,/↓¨⍕¨⍵}
The first step is to Format (⍕
) everything to get strings. A lot of trial-and-error is always necessary when dealing with nested arrays, and this being about formatting exacerbates the problem.
The next step would be to “stack everything vertically”, so we will need Mix (↑
) at some point. However, if we do it immediately we don’t get the correct result:
{↑⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')
1 2 3
4 5 6
7 8 9
Adam Michael
Mix is padding with spaces both horizontally (necessary as we want the output to be a simple array of characters) and vertically (not what we want). We will have to decompose everything line by line, and then mix all the lines together. This is exactly what SplitSplit is the dual of Mix.
(↓
) does:
{↓¨⍕¨⍵}(3 3⍴⍳9)(↑'Adam' 'Michael')(⍳10) '*'(5 5⍴⍳25)
┌───────────────────┬─────────────────┬──────────────────────┬─┬───────────────
│┌─────┬─────┬─────┐│┌───────┬───────┐│┌────────────────────┐│*│┌──────────────
││1 2 3│4 5 6│7 8 9│││Adam │Michael│││1 2 3 4 5 6 7 8 9 10││ ││ 1 2 3 4 5
│└─────┴─────┴─────┘│└───────┴───────┘│└────────────────────┘│ │└──────────────
└───────────────────┴─────────────────┴──────────────────────┴─┴───────────────
─────────────────────────────────────────────────────────────┐
┬──────────────┬──────────────┬──────────────┬──────────────┐│
│ 6 7 8 9 10│11 12 13 14 15│16 17 18 19 20│21 22 23 24 25││
┴──────────────┴──────────────┴──────────────┴──────────────┘│ ─────────────────────────────────────────────────────────────┘
Next, we clean this up with Ravel (,
) and we can Mix to obtain the final result.
Operations research (OR) is a vast area comprising a lot of theory, different branches of mathematics, and too many applications to count. In this post, I will try to explain why it can be a little disconcerting to explore at first, and how to start investigating the topic with a few references to get started.
Keep in mind that although I studied it during my graduate studies, this is not my primary area of expertise (I’m a data scientist by trade), and I definitely don’t pretend to know everything in OR. This is a field too vast for any single person to understand in its entirety, and I talk mostly from an “amateur mathematician and computer scientist” standpoint.
Operations research can be difficult to approach, since there are many references and subfields. Compared to machine learning for instance, OR has a slightly longer history (going back to the 17th century, for example with Monge and the optimal transport problem) For a very nice introduction (in French) to optimal transport, see these blog posts by Gabriel Peyré, on the CNRS maths blog: Part 1 and Part 2. See also the resources on optimaltransport.github.io (in English).
. This means that good textbooks and such have existed for a long time, but also that there will be plenty of material to choose from.
Moreover, OR is very close to applications. Sometimes methods may vary a lot in their presentation depending on whether they’re applied to train tracks, sudoku, or travelling salesmen. In practice, the terminology and notations are not the same everywhere. This is disconcerting if you are used to “pure” mathematics, where notations evolved over a long time and is pretty much standardised for many areas. In contrast, if you’re used to the statistics literature with its strange notations, you will find that OR is actually very well formalized.
There are many subfields of operations research, including all kinds of optimization (constrained and unconstrained), game theory, dynamic programming, stochastic processes, etc.
For an overall introduction, I recommend Wentzel (1988). It is an old book, published by Mir Publications, a Soviet publisher which published many excellent scientific textbooks Mir also published Physics for Everyone by Lev Landau and Alexander Kitaigorodsky, a three-volume introduction to physics that is really accessible. Together with Feynman’s famous lectures, I read them (in French) when I was a kid, and it was the best introduction I could possibly have to the subject.
. It is out of print, but it is available on Archive.org
. The book is quite old, but everything presented is still extremely relevant today. It requires absolutely no background, and covers everything: a general introduction to the field, linear programming, dynamic programming, Markov processes and queues, Monte Carlo methods, and game theory. Even if you already know some of these topics, the presentations is so clear that it is a pleasure to read! (In particular, it is one of the best presentations of dynamic programming that I have ever read. The explanation of the simplex algorithm is also excellent.)
If you are interested in optimization, the first thing you have to learn is modelling, i.e. transforming your problem (described in natural language, often from a particular industrial application) into a mathematical programme. The mathematical programme is the structure on which you will be able to apply an algorithm to find an optimal solution. Even if (like me) you are initially more interested in the algorithmic side of things, learning to create models will shed a lot of light on the overall process, and will give you more insight in general on the reasoning behind algorithms.
The best book I have read on the subject is Williams (2013)
. It contains a lot of concrete, step-by-step examples on concrete applications, in a multitude of domains, and remains very easy to read and to follow. It covers nearly every type of problem, so it is very useful as a reference. When you encounter a concrete problem in real life afterwards, you will know how to construct an appropriate model, and in the process you will often identify a common type of problem. The book then gives plenty of advice on how to approach each type of problem. Finally, it is also a great resource to build a “mental map” of the field, avoiding getting lost in the jungle of linear, stochastic, mixed integer, quadratic, and other network problems.
Another interesting resource is the freely available MOSEK Modeling Cookbook, covering many types of problems, with more mathematical details than in Williams (2013). It is built for people wanting to use the commercial MOSEK solver, so it could be useful if you plan to use a solver package like this one (more details on solvers below).
The basic algorithm for optimization is the simplex algorithm, developed by Dantzig in the 1940s to solve linear programming problems. It is the one of the main building blocks for mathematical optimization, and is used and referenced extensively in all kinds of approaches. As such, it is really important to understand it in detail. There are many books on the subject, but I especially liked Chvátal (1983) (out of print, but you can find cheap used versions on Amazon). It covers everything there is to know on the simplex algorithms (step-by-step explanations with simple examples, correctness and complexity analysis, computational and implementation considerations) and to many applications. I think it is overall the best introduction. Vanderbei (2014) follows a very similar outline, but contains more recent computational considerationsFor all the details about practical implementations of the simplex algorithm, Maros (2003) is dedicated to the computational aspects and contains everything you will need.
. (The author also has lecture slides.)
For more books on linear programming, the two books Dantzig (1997), Dantzig (2003) are very complete, if somewhat more mathematically advanced. Bertsimas and Tsitsiklis (1997) is also a great reference, if you can find it.
For all the other subfields, this great StackExchange answer contains a lot of useful references, including most of the above. Of particular note are Peyré and Cuturi (2019) for optimal transport, Boyd (2004) for convex optimization (freely available online), and Nocedal (2006) for numerical optimization. Kochenderfer (2019)
is not in the list (because it is very recent) but is also excellent, with examples in Julia covering nearly every kind of optimization algorithms.
If you would like to watch video lectures, there are a few good opportunities freely available online, in particular on MIT OpenCourseWare. The list of courses at MIT is available on their webpage. I haven’t actually looked in details at the courses contentI am more comfortable reading books than watching lecture videos online. Although I liked attending classes during my studies, I do not have the same feeling in front of a video. When I read, I can re-read three times the same sentence, pause to look up something, or skim a few paragraphs. I find that the inability to do that with a video diminishes greatly my ability to concentrate.
, so I cannot vouch for them directly, but MIT courses are generally of excellent quality. Most courses are also taught by Bertsimas and Bertsekas, who are very famous and wrote many excellent books.
Of particular notes are:
Another interesting course I found online is Deep Learning in Discrete Optimization, at Johns Hopkins It is taught by William Cook, who is the author of In Pursuit of the Traveling Salesman, a nice introduction to the TSP problem in a readable form.
. It contains an interesting overview of deep learning and integer programming, with a focus on connections, and applications to recent research areas in ML (reinforcement learning, attention, etc.).
When you start reading about modelling and algorithms, I recommend you try solving a few problems yourself, either by hand for small instances, or using an existing solver. It will allow you to follow the examples in books, while also practising your modelling skills. You will also get an intuition of what is difficult to model and to solve.
There are many solvers available, both free and commercial, with various capabilities. I recommend you use the fantastic JuMP
library for Julia, which exposes a domain-specific language for modelling, along with interfaces to nearly all major solver packages. (Even if you don’t know Julia, this is a great and easy way to start!) If you’d rather use Python, you can use Google’s OR-Tools or PuLP for linear programming.
Regarding solvers, there is a list of solvers on JuMP’s documentation, with their capabilities and their license. Free solvers include GLPK (linear programming), Ipopt (non-linear programming), and SCIP (mixed-integer linear programming).
Commercial solvers often have better performance, and some of them propose a free academic license: MOSEK, Gurobi, and IBM CPLEX in particular all offer free academic licenses and work very well with JuMP.
Another awesome resource is the NEOS Server. It offers free computing resources for numerical optimization, including all major free and commercial solvers! You can submit jobs on it in a standard format, or interface your favourite programming language with it. The fact that such an amazing resource exists for free, for everyone is extraordinary. They also have an accompanying book, the NEOS Guide, containing many case studies and description of problem types. The taxonomy may be particularly useful.
Operations research is a fascinating topic, and it has an abundant literature that makes it very easy to dive into the subject. If you are interested in algorithms, modelling for practical applications, or just wish to understand more, I hope to have given you the first steps to follow, start reading and experimenting.
Bertsimas, Dimitris, and John N. Tsitsiklis. 1997. Introduction to Linear Optimization. Belmont, Massachusetts: Athena Scientific. http://www.athenasc.com/linoptbook.html.
Boyd, Stephen. 2004. Convex Optimization. Cambridge, UK New York: Cambridge University Press.
Chvátal, Vašek. 1983. Linear Programming. New York: W.H. Freeman.
Dantzig, George. 1997. Linear Programming 1: Introduction. New York: Springer. https://www.springer.com/gp/book/9780387948331.
———. 2003. Linear Programming 2: Theory and Extensions. New York: Springer. https://www.springer.com/gp/book/9780387986135.
Kochenderfer, Mykel. 2019. Algorithms for Optimization. Cambridge, Massachusetts: The MIT Press.
Maros, István. 2003. Computational Techniques of the Simplex Method. Boston: Kluwer Academic Publishers.
Nocedal, Jorge. 2006. Numerical Optimization. New York: Springer. https://www.springer.com/gp/book/9780387303031.
Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.
Vanderbei, Robert. 2014. Linear Programming : Foundations and Extensions. New York: Springer.
Wentzel, Elena S. 1988. Operations Research: A Methodological Approach. Moscow: Mir publishers.
Williams, H. Paul. 2013. Model Building in Mathematical Programming. Chichester, West Sussex: Wiley. https://www.wiley.com/en-fr/Model+Building+in+Mathematical+Programming,+5th+Edition-p-9781118443330.
ICLR is one of the most important conferences in machine learning, and as such, I was very excited to have the opportunity to volunteer and attend the first fully-virtual edition of the event. The whole content of the conference has been made publicly available, only a few days after the end of the event!
I would like to thank the organizing committee for this incredible event, and the possibility to volunteer to help other participantsTo better organize the event, and help people navigate the various online tools, they brought in 500(!) volunteers, waved our registration fees, and asked us to do simple load-testing and tech support. This was a very generous offer, and felt very rewarding for us, as we could attend the conference, and give back to the organization a little bit.
.
The many volunteers, the online-only nature of the event, and the low registration fees also allowed for what felt like a very diverse, inclusive event. Many graduate students and researchers from industry (like me), who do not generally have the time or the resources to travel to conferences like this, were able to attend, and make the exchanges richer.
In this post, I will try to give my impressions on the event, the speakers, and the workshops that I could attend. I will do a quick recap of the most interesting papers I saw in a future post.
As a result of global travel restrictions, the conference was made fully-virtual. It was supposed to take place in Addis Ababa, Ethiopia, which is great for people who are often the target of restrictive visa policies in Northern American countries.
The thing I appreciated most about the conference format was its emphasis on asynchronous communication. Given how little time they had to plan the conference, they could have made all poster presentations via video-conference and call it a day. Instead, each poster had to record a 5-minute videoThe videos are streamed using SlidesLive, which is a great solution for synchronising videos and slides. It is very comfortable to navigate through the slides and synchronising the video to the slides and vice-versa. As a result, SlidesLive also has a very nice library of talks, including major conferences. This is much better than browsing YouTube randomly.
summarising their research. Alongside each presentation, there was a dedicated Rocket.Chat channelRocket.Chat seems to be an open-source alternative to Slack. Overall, the experience was great, and I appreciate the efforts of the organizers to use open source software instead of proprietary applications. I hope other conferences will do the same, and perhaps even avoid Zoom, because of recent privacy concerns (maybe try Jitsi?).
where anyone could ask a question to the authors, or just show their appreciation for the work. This was a fantastic idea as it allowed any participant to interact with papers and authors at any time they please, which is especially important in a setting where people were spread all over the globe.
There were also Zoom session where authors were available for direct, face-to-face discussions, allowing for more traditional conversations. But asking questions on the channel had also the advantage of keeping a track of all questions that were asked by other people. As such, I quickly acquired the habit of watching the video, looking at the chat to see the previous discussions (even if they happened in the middle of the night in my timezone!), and then skimming the paper or asking questions myself.
All of these excellent ideas were implemented by an amazing website, collecting all papers in a searchable, easy-to-use interface, and even including a nice visualisation of papers as a point cloud!
Overall, there were 8 speakers (two for each day of the main conference). They made a 40-minute presentation, and then there was a Q&A both via the chat and via Zoom. I only saw a few of them, but I expect I will be watching the others in the near future.
This talk was fascinating. It is about robotics, and especially how to design the “software” of our robots. We want to program a robot in a way that it could work the best it can over all possible domains it can encounter. I loved the discussion on how to describe the space of distributions over domains, from the point of view of the robot factory:
There are many ways to describe a policy (i.e. the software running in the robot’s head), and many ways to obtain them. If you are familiar with recent advances in reinforcement learning, this talk is a great occasion to take a step back, and review the relevant background ideas from engineering and control theory.
Finally, the most important take-away from this talk is the importance of abstractions. Whatever the methods we use to program our robots, we still need a lot of human insights to give them good structural biases. There are many more insights, on the cost of experience, (hierarchical) planning, learning constraints, etc, so I strongly encourage you to watch the talk!
This is a very clear presentation of an area of ML research I do not know very well. I really like the approach of teaching a set of methods from a “historical”, personal point of view. Laurent Dinh shows us how he arrived at this topic, what he finds interesting, in a very personal and relatable manner. This has the double advantage of introducing us to a topic that he is passionate about, while also giving us a glimpse of a researcher’s process, without hiding the momentary disillusions and disappointments, but emphasising the great achievements. Normalizing flows are also very interesting because it is grounded in strong theoretical results, that brings together a lot of different methods.
This talk was very interesting, and yet felt very familiar, as if I already saw a very similar one elsewhere. Especially for Yann LeCun, who clearly reuses the same slides for many presentations at various events. They both came back to their favourite subjects: self-supervised learning for Yann LeCun, and system 1/system 2 for Yoshua Bengio. All in all, they are very good speakers, and their presentations are always insightful. Yann LeCun gives a lot of references on recent technical advances, which is great if you want to go deeper in the approaches he recommends. Yoshua Bengio is also very good at broadening the debate around deep learning, and introducing very important concepts from cognitive science.
On Sunday, there were 15 different workshops. All of them were recorded, and are available on the website. As always, unfortunately, there are too many interesting things to watch everything, but I saw bits and pieces of different workshops.
A lot of pretty advanced talks about RL. The general theme was meta-learning, aka “learning to learn”. This is a very active area of research, which goes way beyond classical RL theory, and offer many interesting avenues to adjacent fields (both inside ML and outside, especially cognitive science). The first talk, by Martha White, about inductive biases, was a very interesting and approachable introduction to the problems and challenges of the field. There was also a panel with Jürgen Schmidhuber. We hear a lot about him from the various controversies, but it’s nice to see him talking about research and future developments in RL.
Ever since I read Judea Pearl’s The Book of Why on causality, I have been interested in how we can incorporate causality reasoning in machine learning. This is a complex topic, and I’m not sure yet that it is a complete revolution as Judea Pearl likes to portray it, but it nevertheless introduces a lot of new fascinating ideas. Yoshua Bengio gave an interesting talkYou can find it at 4:45:20 in the livestream of the workshop.
(even though very similar to his keynote talk) on causal priors for deep learning.
Cognitive science is fascinating, and I believe that collaboration between ML practitioners and cognitive scientists will greatly help advance both fields. I only watched Leslie Kaelbling’s presentation, which echoes a lot of things from her talk at the main conference. It complements it nicely, with more focus on intelligence, especially embodied intelligence. I think she has the right approach to relationships between AI and natural science, explicitly listing the things from her work that would be helpful to natural scientists, and things she wish she knew about natural intelligences. It raises many fascinating questions on ourselves, what we build, and what we understand. I felt it was very motivational!
I didn’t attend this workshop, but I think I will watch the presentations if I can find the time. I have found the intersection of differential equations and ML very interesting, ever since the famous NeurIPS best paper on Neural ODEs. I think that such improvements to ML theory from other fields in mathematics would be extremely beneficial to a better understanding of the systems we build.
]]>Two weeks ago, I did a presentation for my colleagues of the paper from Yurochkin et al. (2019), from NeurIPS 2019. It contains an interesting approach to document classification leading to strong performance, and, most importantly, excellent interpretability.
This paper seems interesting to me because of it uses two methods with strong theoretical guarantees: optimal transport and topic modelling. Optimal transport looks very promising to me in NLP, and has seen a lot of interest in recent years due to advances in approximation algorithms, such as entropy regularisation. It is also quite refreshing to see approaches using solid results in optimisation, compared to purely experimental deep learning methods.
The problem of the paper is to measure similarity (i.e. a distance) between pairs of documents, by incorporating semantic similarities (and not only syntactic artefacts), without encountering scalability issues.
They propose a “meta-distance” between documents, called the hierarchical optimal topic transport (HOTT), providing a scalable metric incorporating topic information between documents. As such, they try to combine two different levels of analysis:
The essential backbone of the method is the Wasserstein distance, derived from optimal transport theory. Optimal transport is a fascinating and deep subject, so I won’t enter into the details here. For an introduction to the theory and its applications, check out the excellent book from Peyré and Cuturi (2019), (available on ArXiv as well). There are also very nice posts (in French) by Gabriel Peyré on the CNRS maths blog. Many more resources (including slides for presentations) are available at https://optimaltransport.github.io. For a more complete theoretical treatment of the subject, check out Santambrogio (2015), or, if you’re feeling particularly adventurous, Villani (2009).
For this paper, only a superficial understanding of how the Wasserstein distance works is necessary. Optimal transport is an optimisation technique to lift a distance between points in a given metric space, to a distance between probability distributions over this metric space. The historical example is to move piles of dirt around: you know the distance between any two points, and you have piles of dirt lying around Optimal transport originated with Monge, and then Kantorovich, both of whom had very clear military applications in mind (either in Revolutionary France, or during WWII). A lot of historical examples move cannon balls, or other military equipment, along a front line.
. Now, if you want to move these piles to another configuration (fewer piles, say, or a different repartition of dirt a few metres away), you need to find the most efficient way to move them. The total cost you obtain will define a distance between the two configurations of dirt, and is usually called the earth mover’s distance, which is just an instance of the general Wasserstein metric.
More formally, we start with two sets of points \(x = (x_1, x_2, \ldots, x_n)\), and \(y = (y_1, y_2, \ldots, y_n)\), along with probability distributions \(p \in \Delta^n\), \(q \in \Delta^m\) over \(x\) and \(y\) (\(\Delta^n\) is the probability simplex of dimension \(n\), i.e. the set of vectors of size \(n\) summing to 1). We can then define the Wasserstein distance as \[ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j} \] \[ \text{subject to } \sum_j P_{i,j} = p_i \text{ and } \sum_i P_{i,j} = q_j, \] where \(C_{i,j} = d(x_i, x_j)\) are the costs computed from the original distance between points, and \(P_{i,j}\) represent the amount we are moving from pile \(i\) to pile \(j\).
Now, how can this be applied to a natural language setting? Once we have word embeddings, we can consider that the vocabulary forms a metric space (we can compute a distance, for instance the euclidean or the cosine distance, between two word embeddings). The key is to define documents as distributions over words.
Given a vocabulary \(V \subset \mathbb{R}^n\) and a corpus \(D = (d^1, d^2, \ldots, d^{\lvert D \rvert})\), we represent a document as \(d^i \in \Delta^{l_i}\) where \(l_i\) is the number of unique words in \(d^i\), and \(d^i_j\) is the proportion of word \(v_j\) in the document \(d^i\). The word mover’s distance (WMD) is then defined simply as \[ \operatorname{WMD}(d^1, d^2) = W_1(d^1, d^2). \]
If you didn’t follow all of this, don’t worry! The gist is: if you have a distance between points, you can solve an optimisation problem to obtain a distance between distributions over these points! This is especially useful when you consider that each word embedding is a point, and a document is just a set of words, along with the number of times they appear.
Using optimal transport, we can use the word mover’s distance to define a metric between documents. However, this suffers from two drawbacks:
To escape these issues, we will add an intermediary step using topic modelling. Once we have topics \(T = (t_1, t_2, \ldots, t_{\lvert T \rvert}) \subset \Delta^{\lvert V \rvert}\), we get two kinds of representations:
Since they are distributions over words, the word mover’s distance defines a metric over topics. As such, the topics with the WMD become a metric space.
We can now define the hierarchical optimal topic transport (HOTT), as the optimal transport distance between documents, represented as distributions over topics. For two documents \(d^1\), \(d^2\), \[ \operatorname{HOTT}(d^1, d^2) = W_1\left( \sum_{k=1}^{\lvert T \rvert} \bar{d^1_k} \delta_{t_k}, \sum_{k=1}^{\lvert T \rvert} \bar{d^2_k} \delta_{t_k} \right). \] where \(\delta_{t_k}\) is a distribution supported on topic \(t_k\).
Note that in this case, we used optimal transport twice:
The first one can be precomputed once for all subsequent distances, so it is invariable in the number of documents we have to process. The second one only operates on \(\lvert T \rvert\) topics instead of the full vocabulary: the resulting optimisation problem is much smaller! This is great for performance, as it should be easy now to compute all pairwise distances in a large set of documents.
Another interesting insight is that topics are represented as collections of words (we can keep the top 20 as a visual representations), and documents as collections of topics with weights. Both of these representations are highly interpretable for a human being who wants to understand what’s going on. I think this is one of the strongest aspects of these approaches: both the various representations and the algorithms are fully interpretable. Compared to a deep learning approach, we can make sense of every intermediate step, from the representations of topics to the weights in the optimisation algorithm to compute higher-level distances.
Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
The paper is very complete regarding experiments, providing a full evaluation of the method on one particular application: document clustering. They use Latent Dirichlet Allocation to compute topics and GloVe for pretrained word embeddings (Pennington, Socher, and Manning 2014), and Gurobi to solve the optimisation problems. Their code is available on GitHub.
If you want the details, I encourage you to read the full paper, they tested the methods on a wide variety of datasets, with datasets containing very short documents (like Twitter), and long documents with a large vocabulary (books). With a simple \(k\)-NN classification, they establish that HOTT performs best on average, especially on large vocabularies (books, the “gutenberg” dataset). It also has a much better computational performance than alternative methods based on regularisation of the optimal transport problem directly on words. So the hierarchical nature of the approach allows to gain considerably in performance, along with improvements in interpretability.
What’s really interesting in the paper is the sensitivity analysis: they ran experiments with different word embeddings methods (word2vec, (Mikolov et al. 2013)), and with different parameters for the topic modelling (topic truncation, number of topics, etc). All of these reveal that changes in hyperparameters do not impact the performance of HOTT significantly. This is extremely important in a field like NLP where most of the times small variations in approach lead to drastically different results.
All in all, this paper present a very interesting approach to compute distance between natural-language documents. It is no secret that I like methods with strong theoretical background (in this case optimisation and optimal transport), guaranteeing a stability and benefiting from decades of research in a well-established domain.
Most importantly, this paper allows for future exploration in document representation with interpretability in mind. This is often added as an afterthought in academic research but is one of the most important topics for the industry, as a system must be understood by end users, often not trained in ML, before being deployed. The notion of topic, and distances as weights, can be understood easily by anyone without significant background in ML or in maths.
Finally, I feel like they did not stop at a simple theoretical argument, but carefully checked on real-world datasets, measuring sensitivity to all the arbitrary choices they had to take. Again, from an industry perspective, this allows to implement the new approach quickly and easily, being confident that it won’t break unexpectedly without extensive testing.
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems 26, 3111–9. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.
Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “Glove: Global Vectors for Word Representation.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–43. Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162.
Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport.” Foundations and Trends in Machine Learning 11 (5-6): 355–206. https://doi.org/10.1561/2200000073.
Santambrogio, Filippo. 2015. Optimal Transport for Applied Mathematicians. Vol. 87. Progress in Nonlinear Differential Equations and Their Applications. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-20828-2.
Villani, Cédric. 2009. Optimal Transport: Old and New. Grundlehren Der Mathematischen Wissenschaften 338. Berlin: Springer.
Yurochkin, Mikhail, Sebastian Claici, Edward Chien, Farzaneh Mirzazadeh, and Justin M Solomon. 2019. “Hierarchical Optimal Transport for Document Representation.” In Advances in Neural Information Processing Systems 32, 1599–1609. http://papers.nips.cc/paper/8438-hierarchical-optimal-transport-for-document-representation.pdf.
Last week I made a presentation at the Paris NLP Meetup, on how we implemented self-learning chatbots in production at Mindsay.
It was fascinating to other people interested in NLP about the technologies and models we deploy at work! It’s always nice to have some feedback about our work, and preparing this talk forced me to take a step back about what we do and rethink it in new interesting ways.
Also check out the other presentations, one about diachronic (i.e. time-dependent) word embeddings and the other about the different models and the use of Knowledge Bases for Information Retrieval. (This might even give us new ideas to explore…)
If you’re interested about exciting applications at the confluence of Reinforcement Learning and NLP, the slides are available here. It includes basic RL theory, how we transposed it to the specific case of conversational agents, the technical and mathematical challenges in implementing self-learning chatbots, and of course plenty of references for further reading if we piqued your interest!
Update: The videos are now available on the NLP Meetup website.
Update 2: Destygo changed its name to Mindsay!
]]>The Ginibre ensemble is a set of random matrices with the entries chosen independently. Each entry of a \(n \times n\) matrix is a complex number, with both the real and imaginary part sampled from a normal distribution of mean zero and variance \(1/2n\).
Random matrices distributions are very complex and are a very active subject of research. I stumbled on this example while reading an article in Notices of the AMS by Brian C. Hall (1).
Now what is interesting about these random matrices is the distribution of their \(n\) eigenvalues in the complex plane.
The circular law (first established by Jean Ginibre in 1965 (2)) states that when \(n\) is large, with high probability, almost all the eigenvalues lie in the unit disk. Moreover, they tend to be nearly uniformly distributed there.
I find this mildly fascinating that such a straightforward definition of a random matrix can exhibit such non-random properties in their spectrum.
I ran a quick simulation, thanks to Julia’s great ecosystem for linear algebra and statistical distributions:
using LinearAlgebra
using UnicodePlots
function ginibre(n)
return randn((n, n)) * sqrt(1/2n) + im * randn((n, n)) * sqrt(1/2n)
end
= eigvals(ginibre(2000))
v
, imag(v), xlim=[-1.5,1.5], ylim=[-1.5,1.5]) scatterplot(real(v)
I like using UnicodePlots
for this kind of quick-and-dirty plots, directly in the terminal. Here is the output:
I have recently bought the book Category Theory from Steve Awodey (Awodey 2010) is awesome, but probably the topic for another post), and a particular passage excited my curiosity:
Let us begin by distinguishing between the following things: i. categorical foundations for mathematics, ii. mathematical foundations for category theory.
As for the first point, one sometimes hears it said that category theory can be used to provide “foundations for mathematics,” as an alternative to set theory. That is in fact the case, but it is not what we are doing here. In set theory, one often begins with existential axioms such as “there is an infinite set” and derives further sets by axioms like “every set has a powerset,” thus building up a universe of mathematical objects (namely sets), which in principle suffice for “all of mathematics.”
This statement is interesting because one often considers category theory as pretty “fundamental”, in the sense that it has no issue with considering what I call “dangerous” notions, such as the category \(\mathbf{Set}\) of all sets, and even the category \(\mathbf{Cat}\) of all categories. Surely a theory this general, that can afford to study such objects, should provide suitable foundations for mathematics? Awodey addresses these issues very explicitly in the section following the quote above, and finds a good way of avoiding circular definitions.
Now, I remember some basics from my undergrad studies about foundations of mathematics. I was told that if you could define arithmetic, you basically had everything else “for free” (as Kronecker famously said, “natural numbers were created by God, everything else is the work of men”). I was also told that two sets of axioms existed, the Peano axioms and the Zermelo-Fraenkel axioms. Also, I should steer clear of the axiom of choice if I could, because one can do strange things with it, and it is equivalent to many different statements. Finally (and this I knew mainly from Logicomix, I must admit), it is impossible for a set of axioms to be both complete and consistent.
Given all this, I realised that my knowledge of foundational mathematics was pretty deficient. I do not believe that it is a very important topic that everyone should know about, even though Gödel’s incompleteness theorem is very interesting from a logical and philosophical standpoint. However, I wanted to go deeper on this subject.
In this post, I will try to share my path through Peano’s axioms (Gowers, Barrow-Green, and Leader 2010), because they are very simple, and it is easy to uncover basic algebraic structure from them.
The purpose of the axioms is to define a collection of objects that we will call the natural numbers. Here, we place ourselves in the context of first-order logic. Logic is not the main topic here, so I will just assume that I have access to some quantifiers, to some predicates, to some variables, and, most importantly, to a relation \(=\) which is reflexive, symmetric, transitive, and closed over the natural numbers.
Without further digressions, let us define two symbols \(0\) and \(s\) (called successor) such that:
Let’s break this down. Axioms 1–4 define a collection of objects, written \(0\), \(s(0)\), \(s(s(0))\), and so on, and ensure their basic properties. All of these are natural numbers by the first four axioms, but how can we be sure that all natural numbers are of the form \(s( \cdots s(0))\)? This is where the induction axiom (Axiom 5) intervenes. It ensures that every natural number is “well-formed” according to the previous axioms.
But Axiom 5 is slightly disturbing, because it mentions a “set” and a relation “is in”. This seems pretty straightforward at first sight, but these notions were never defined anywhere before that! Isn’t our goal to define all these notions in order to derive a foundation of mathematics? (I still don’t know the answer to that question.) I prefer the following alternative version of the induction axiom:
The alternative formulation is much better in my opinion, as it obviously implies the first one (juste choose \(\varphi(n)\) as “\(n\) is a natural number”), and it only references predicates. It will also be much more useful afterwards, as we will see.
What is needed afterwards? The most basic notion after the natural numbers themselves is the addition operator. We define an operator \(+\) by the following (recursive) rules:
Let us use these rules to prove the basic properties of \(+\).
\(\forall a, \forall b,\quad a+b = b+a\).
First, we prove that every natural number commutes with \(0\).
\(0+0 = 0+0\).
For every natural number \(a\) such that \(0+a = a+0\), we have:
\[\begin{align} 0 + s(a) &= s(0+a)\\ &= s(a+0)\\ &= s(a)\\ &= s(a) + 0. \end{align} \]By Axiom 5, every natural number commutes with \(0\).
We can now prove the main proposition:
\(\forall a,\quad a+0=0+a\).
For all \(a\) and \(b\) such that \(a+b=b+a\),
\[\begin{align} a + s(b) &= s(a+b)\\ &= s(b+a)\\ &= s(b) + a. \end{align} \]We used the opposite of the second rule for \(+\), namely \(\forall a, \forall b,\quad s(a) + b = s(a+b)\). This can easily be proved by another induction.
\(\forall a, \forall b, \forall c,\quad a+(b+c) = (a+b)+c\).
Todo, left as an exercise to the reader 😉
\(\forall a,\quad a+0 = 0+a = a\).
This follows directly from the definition of \(+\) and commutativity.
From all these properties, it follows that the set of natural numbers with \(+\) is a commutative monoid.
We have imbued our newly created set of natural numbers with a significant algebraic structure. From there, similar arguments will create more structure, notably by introducing another operation \(\times\), and an order \(\leq\).
It is now a matter of conventional mathematics to construct the integers \(\mathbb{Z}\) and the rationals \(\mathbb{Q}\) (using equivalence classes), and eventually the real numbers \(\mathbb{R}\).
It is remarkable how very few (and very simple, as far as you would consider the induction axiom “simple”) axioms are enough to build an entire theory of mathematics. This sort of things makes me agree with Eugene Wigner (Wigner 1990) when he says that “mathematics is the science of skillful operations with concepts and rules invented just for this purpose”. We drew some arbitrary rules out of thin air, and derived countless properties and theorems from them, basically for our own enjoyment. (As Wigner would say, it is incredible that any of these fanciful inventions coming out of nowhere turned out to be even remotely useful.) Mathematics is done mainly for the mathematician’s own pleasure!
Mathematics cannot be defined without acknowledging its most obvious feature: namely, that it is interesting — M. Polanyi (Wigner 1990)
Awodey, Steve. 2010. Category Theory. 2nd ed. Oxford Logic Guides 52. Oxford ; New York: Oxford University Press.
Gowers, Timothy, June Barrow-Green, and Imre Leader. 2010. The Princeton Companion to Mathematics. Princeton University Press.
Wigner, Eugene P. 1990. “The Unreasonable Effectiveness of Mathematics in the Natural Sciences.” In Mathematics and Science, by Ronald E Mickens, 291–306. WORLD SCIENTIFIC. https://doi.org/10.1142/9789814503488_0018.
In this series of blog posts, I intend to write my notes as I go through Richard S. Sutton’s excellent Reinforcement Learning: An Introduction (1).
I will try to formalise the maths behind it a little bit, mainly because I would like to use it as a useful personal reference to the main concepts in RL. I will probably add a few remarks about a possible implementation as I go on.
The goal of reinforcement learning is to select the best actions availables to an agent as it goes through a series of states in an environment. In this post, we will only consider discrete time steps.
The most important hypothesis we make is the Markov property:
At each time step, the next state of the agent depends only on the current state and the current action taken. It cannot depend on the history of the states visited by the agent.
This property is essential to make our problems tractable, and often holds true in practice (to a reasonable approximation).
With this assumption, we can define the relationship between agent and environment as a Markov Decision Process (MDP).
A Markov Decision Process is a tuple \((\mathcal{S}, \mathcal{A}, \mathcal{R}, p)\) where:
\(\mathcal{S}\) is a set of states,
\(\mathcal{A}\) is an application mapping each state \(s \in \mathcal{S}\) to a set \(\mathcal{A}(s)\) of possible actions for this state. In this post, we will often simplify by using \(\mathcal{A}\) as a set, assuming that all actions are possible for each state,
\(\mathcal{R} \subset \mathbb{R}\) is a set of rewards,
and \(p\) is a function representing the dynamics of the MDP:
\[\begin{align} p &: \mathcal{S} \times \mathcal{R} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ p(s', r \;|\; s, a) &:= \mathbb{P}(S_t=s', R_t=r \;|\; S_{t-1}=s, A_{t-1}=a), \end{align} \]such that \[ \forall s \in \mathcal{S}, \forall a \in \mathcal{A},\quad \sum_{s', r} p(s', r \;|\; s, a) = 1. \]
The function \(p\) represents the probability of transitioning to the state \(s'\) and getting a reward \(r\) when the agent is at state \(s\) and chooses action \(a\).
We will also use occasionally the state-transition probabilities:
\[\begin{align} p &: \mathcal{S} \times \mathcal{S} \times \mathcal{A} \mapsto [0,1] \\ p(s' \;|\; s, a) &:= \mathbb{P}(S_t=s' \;|\; S_{t-1}=s, A_{t-1}=a) \\ &= \sum_r p(s', r \;|\; s, a). \end{align} \]The expected reward of a state-action pair is the function
\[\begin{align} r &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ r(s,a) &:= \mathbb{E}[R_t \;|\; S_{t-1}=s, A_{t-1}=a] \\ &= \sum_r r \sum_{s'} p(s', r \;|\; s, a). \end{align} \]The discounted return is the sum of all future rewards, with a multiplicative factor to give more weights to more immediate rewards: \[ G_t := \sum_{k=t+1}^T \gamma^{k-t-1} R_k, \] where \(T\) can be infinite or \(\gamma\) can be 1, but not both.
A policy is a way for the agent to choose the next action to perform.
A policy is a function \(\pi\) defined as
\[\begin{align} \pi &: \mathcal{A} \times \mathcal{S} \mapsto [0,1] \\ \pi(a \;|\; s) &:= \mathbb{P}(A_t=a \;|\; S_t=s). \end{align} \]In order to compare policies, we need to associate values to them.
The state-value function of a policy \(\pi\) is
\[\begin{align} v_{\pi} &: \mathcal{S} \mapsto \mathbb{R} \\ v_{\pi}(s) &:= \text{expected return when starting in $s$ and following $\pi$} \\ v_{\pi}(s) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s\right] \\ v_{\pi}(s) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s\right] \end{align} \]We can also compute the value starting from a state \(s\) by also taking into account the action taken \(a\).
The action-value function of a policy \(\pi\) is
\[\begin{align} q_{\pi} &: \mathcal{S} \times \mathcal{A} \mapsto \mathbb{R} \\ q_{\pi}(s,a) &:= \text{expected return when starting from $s$, taking action $a$, and following $\pi$} \\ q_{\pi}(s,a) &:= \mathbb{E}_{\pi}\left[ G_t \;|\; S_t=s, A_t=a \right] \\ q_{\pi}(s,a) &= \mathbb{E}_{\pi}\left[ \sum_{k=0}^{\infty} \gamma^k R_{t+k+1} \;|\; S_t=s, A_t=a\right] \end{align} \]I recently got interested in APL, an array-based programming language. In APL (and derivatives), we try to reason about programs as series of transformations of multi-dimensional arrays. This is exactly the kind of style I like in Haskell and other functional languages, where I also try to use higher-order functions (map, fold, etc) on lists or arrays. A developer only needs to understand these abstractions once, instead of deconstructing each loop or each recursive function encountered in a program.
APL also tries to be a really simple and terse language. This combined with strange Unicode characters for primitive functions and operators, gives it a reputation of unreadability. However, there is only a small number of functions to learn, and you get used really quickly to read them and understand what they do. Some combinations also occur so frequently that you can recognize them instantly (APL programmers call them idioms).
APL is actually a family of languages. The classic APL, as created by Ken Iverson, with strange symbols, has many implementations. I initially tried GNU APL, but due to the lack of documentation and proper tooling, I went to Dyalog APL (which is proprietary, but free for personal use). There are also APL derivatives, that often use ASCII symbols: J (free) and Q/kdb+ (proprietary, but free for personal use).
The advantage of Dyalog is that it comes with good tooling (which is necessary for inserting all the symbols!), a large ecosystem, and pretty good documentation. If you want to start, look at Mastering Dyalog APL by Bernard Legrand, freely available online.
I needed a small project to try APL while I was learning. Something array-based, obviously. Since I already implemented a Metropolis-Hastings simulation of the Ising model, which is based on a regular lattice, I decided to reimplement it in Dyalog APL.
It is only a few lines long, but I will try to explain what it does step by step.
The first function simply generates a random lattice filled by elements of \(\{-1,+1\}\).
L←{(2×?⍵ ⍵⍴2)-3}
Let’s deconstruct what is done here:
⍴
function: ⍵ ⍵⍴2
?
draws a random number between 1 and its argument. We give it our matrix to generate a random matrix of 1 and 2.L
.Sample output:
ising.L 5
1 ¯1 1 ¯1 1
1 1 1 ¯1 ¯1
1 ¯1 ¯1 ¯1 ¯1
1 1 1 ¯1 ¯1
¯1 ¯1 1 1 1
Next, we compute the energy variation (for details on the Ising model, see my previous post).
∆E←{
⎕IO←0
(x y)←⍺
N←⊃⍴⍵
xn←N|((x-1)y)((x+1)y)
yn←N|(x(y-1))(x(y+1))
⍵[x;y]×+/⍵[xn,yn]
}
N
is the size of the lattice.xn
and yn
are respectively the vertical and lateral neighbours of the site. N|
takes the coordinates modulo N
(so the lattice is actually a torus). (Note: we used ⎕IO←0
to use 0-based array indexing.)+/
sums over all neighbours of the site, and then we multiply by the value of the site itself to get \(\Delta E\).Sample output, for site \((3, 3)\) in a random \(5\times 5\) lattice:
3 3ising.∆E ising.L 5
¯4
Then comes the actual Metropolis-Hastings part:
U←{
⎕IO←0
N←⊃⍴⍵
(x y)←?N N
new←⍵
new[x;y]×←(2×(?0)>*-⍺×x y ∆E ⍵)-1
new
}
?
function.new
is the lattice but with the \((x,y)\) site flipped.*
function (exponential) and our previous ∆E
function.?0
returns a uniform random number in \([0,1)\). Based on this value, we decide whether to update the lattice, and we return it.We can now bring everything together for display:
Ising←{' ⌹'[1+1=({10 U ⍵}⍣⍵)L ⍺]}
L ⍺
.⍣
function, which applies a function \(n\) times.Final output, with a \(80\times 80\) random lattice, after 50000 update steps:
80ising.Ising 50000
⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹ ⌹⌹⌹⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹ ⌹⌹⌹⌹ ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹
⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹
⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹ ⌹⌹⌹
⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹ ⌹⌹⌹ ⌹⌹⌹
⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹
⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹ ⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹ ⌹ ⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹ ⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹
⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹
⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹
⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹ ⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹ ⌹⌹⌹⌹ ⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
Complete code, with the namespace:
:Namespace ising
L←{(2×?⍵ ⍵⍴2)-3}
∆E←{
⎕IO←0
(x y)←⍺
N←⊃⍴⍵
xn←N|((x-1)y)((x+1)y)
yn←N|(x(y-1))(x(y+1))
⍵[x;y]×+/⍵[xn,yn]
}
U←{
⎕IO←0
N←⊃⍴⍵
(x y)←?N N
new←⍵
new[x;y]×←(2×(?0)>*-⍺×x y ∆E ⍵)-1
new
}
Ising←{' ⌹'[1+1=({10 U ⍵}⍣⍵)L ⍺]}
:EndNamespace
The algorithm is very fast (I think it can be optimized by the interpreter because there is no branching), and is easy to reason about. The whole program fits in a few lines, and you clearly see what each function and each line does. It could probably be optimized further (I don’t know every APL function yet…), and also could probably be golfed to a few lines (at the cost of readability?).
It took me some time to write this, but Dyalog’s tools make it really easy to insert symbols and to look up what they do. Next time, I will look into some ASCII-based APL descendants. J seems to have a good documentation and a tradition of tacit definitions, similar to the point-free style in Haskell. Overall, J seems well-suited to modern functional programming, while APL is still under the influence of its early days when it was more procedural. Another interesting area is K, Q, and their database engine kdb+, which seems to be extremely performant and actually used in production.
Still, Unicode symbols make the code much more readable, mainly because there is a one-to-one link between symbols and functions, which cannot be maintained with only a few ASCII characters.
]]>