summaryrefslogtreecommitdiff
path: root/Data/Libraries/Penlight/docs_topics/07-functional.md
diff options
context:
space:
mode:
Diffstat (limited to 'Data/Libraries/Penlight/docs_topics/07-functional.md')
-rw-r--r--Data/Libraries/Penlight/docs_topics/07-functional.md547
1 files changed, 547 insertions, 0 deletions
diff --git a/Data/Libraries/Penlight/docs_topics/07-functional.md b/Data/Libraries/Penlight/docs_topics/07-functional.md
new file mode 100644
index 0000000..5921a3d
--- /dev/null
+++ b/Data/Libraries/Penlight/docs_topics/07-functional.md
@@ -0,0 +1,547 @@
+## Functional Programming
+
+### Sequences
+
+@lookup pl.seq
+
+A Lua iterator (in its simplest form) is a function which can be repeatedly
+called to return a set of one or more values. The `for in` statement understands
+these iterators, and loops until the function returns `nil`. There are standard
+sequence adapters for tables in Lua (`ipairs` and `pairs`), and `io.lines`
+returns an iterator over all the lines in a file. In the Penlight libraries, such
+iterators are also called _sequences_. A sequence of single values (say from
+`io.lines`) is called _single-valued_, whereas the sequence defined by `pairs` is
+_double-valued_.
+
+`pl.seq` provides a number of useful iterators, and some functions which operate
+on sequences. At first sight this example looks like an attempt to write Python
+in Lua, (with the sequence being inclusive):
+
+ > for i in seq.range(1,4) do print(i) end
+ 1
+ 2
+ 3
+ 4
+
+But `range` is actually equivalent to Python's `xrange`, since it generates a
+sequence, not a list. To get a list, use `seq.copy(seq.range(1,10))`, which
+takes any single-value sequence and makes a table from the result. `seq.list` is
+like `ipairs` except that it does not give you the index, just the value.
+
+ > for x in seq.list {1,2,3} do print(x) end
+ 1
+ 2
+ 3
+
+`enum` takes a sequence and turns it into a double-valued sequence consisting of
+a sequence number and the value, so `enum(list(ls))` is actually equivalent to
+`ipairs`. A more interesting example prints out a file with line numbers:
+
+ for i,v in seq.enum(io.lines(fname)) do print(i..' '..v) end
+
+Sequences can be _combined_, either by 'zipping' them or by concatenating them.
+
+ > for x,y in seq.zip(l1,l2) do print(x,y) end
+ 10 1
+ 20 2
+ 30 3
+ > for x in seq.splice(l1,l2) do print(x) end
+ 10
+ 20
+ 30
+ 1
+ 2
+ 3
+
+`seq.printall` is useful for printing out single-valued sequences, and provides
+some finer control over formating, such as a delimiter, the number of fields per
+line, and a format string to use (@see string.format)
+
+ > seq.printall(seq.random(10))
+ 0.0012512588885159 0.56358531449324 0.19330423902097 ....
+ > seq.printall(seq.random(10), ',', 4, '%4.2f')
+ 0.17,0.86,0.71,0.51
+ 0.30,0.01,0.09,0.36
+ 0.15,0.17,
+
+`map` will apply a function to a sequence.
+
+ > seq.printall(seq.map(string.upper, {'one','two'}))
+ ONE TWO
+ > seq.printall(seq.map('+', {10,20,30}, 1))
+ 11 21 31
+
+`filter` will filter a sequence using a boolean function (often called a
+_predicate_). For instance, this code only prints lines in a file which are
+composed of digits:
+
+ for l in seq.filter(io.lines(file), stringx.isdigit) do print(l) end
+
+The following returns a table consisting of all the positive values in the
+original table (equivalent to `tablex.filter(ls, '>', 0)`)
+
+ ls = seq.copy(seq.filter(ls, '>', 0))
+
+We're already encounted `seq.sum` when discussing `input.numbers`. This can also
+be expressed with `seq.reduce`:
+
+ > seq.reduce(function(x,y) return x + y end, seq.list{1,2,3,4})
+ 10
+
+`seq.reduce` applies a binary function in a recursive fashion, so that:
+
+ reduce(op,{1,2,3}) => op(1,reduce(op,{2,3}) => op(1,op(2,3))
+
+it's now possible to easily generate other cumulative operations; the standard
+operations declared in `pl.operator` are useful here:
+
+ > ops = require 'pl.operator'
+ > -- can also say '*' instead of ops.mul
+ > = seq.reduce(ops.mul,input.numbers '1 2 3 4')
+ 24
+
+There are functions to extract statistics from a sequence of numbers:
+
+ > l1 = List {10,20,30}
+ > l2 = List {1,2,3}
+ > = seq.minmax(l1)
+ 10 30
+ > = seq.sum(l1)
+ 60 3
+
+It is common to get sequences where values are repeated, say the words in a file.
+`count_map` will take such a sequence and count the values, returning a table
+where the _keys_ are the unique values, and the value associated with each key is
+the number of times they occurred:
+
+ > t = seq.count_map {'one','fred','two','one','two','two'}
+ > = t
+ {one=2,fred=1,two=3}
+
+This will also work on numerical sequences, but you cannot expect the result to
+be a proper list, i.e. having no 'holes'. Instead, you always need to use `pairs`
+to iterate over the result - note that there is a hole at index 5:
+
+ > t = seq.count_map {1,2,4,2,2,3,4,2,6}
+ > for k,v in pairs(t) do print(k,v) end
+ 1 1
+ 2 4
+ 3 1
+ 4 2
+ 6 1
+
+`unique` uses `count_map` to return a list of the unique values, that is, just
+the keys of the resulting table.
+
+`last` turns a single-valued sequence into a double-valued sequence with the
+current value and the last value:
+
+ > for current,last in seq.last {10,20,30,40} do print (current,last) end
+ 20 10
+ 30 20
+ 40 30
+
+This makes it easy to do things like identify repeated lines in a file, or
+construct differences between values. `filter` can handle double-valued sequences
+as well, so one could filter such a sequence to only return cases where the
+current value is less than the last value by using `operator.lt` or just '<'.
+This code then copies the resulting code into a table.
+
+ > ls = {10,9,10,3}
+ > = seq.copy(seq.filter(seq.last(s),'<'))
+ {9,3}
+
+
+### Sequence Wrappers
+
+The functions in `pl.seq` cover the common patterns when dealing with sequences,
+but chaining these functions together can lead to ugly code. Consider the last
+example of the previous section; `seq` is repeated three times and the resulting
+expression has to be read right-to-left. The first issue can be helped by local
+aliases, so that the expression becomes `copy(filter(last(s),'<'))` but the
+second issue refers to the somewhat unnatural order of functional application.
+We tend to prefer reading operations from left to right, which is one reason why
+object-oriented notation has become popular. Sequence adapters allow this
+expression to be written like so:
+
+ seq(s):last():filter('<'):copy()
+
+With this notation, the operation becomes a chain of method calls running from
+left to right.
+
+'Sequence' is not a basic Lua type, they are generally functions or callable
+objects. The expression `seq(s)` wraps a sequence in a _sequence wrapper_, which
+is an object which understands all the functions in `pl.seq` as methods. This
+object then explicitly represents sequences.
+
+As a special case, the constructor (which is when you call the table `seq`) will
+make a wrapper for a plain list-like table. Here we apply the length operator to
+a sequence of strings, and print them out.
+
+ > seq{'one','tw','t'} :map '#' :printall()
+ 3 2 1
+
+As a convenience, there is a function `seq.lines` which behaves just like
+`io.lines` except it wraps the result as an explicit sequence type. This takes
+the first 10 lines from standard input, makes it uppercase, turns it into a
+sequence with a count and the value, glues these together with the concatenation
+operator, and finally prints out the sequence delimited by a newline.
+
+ seq.lines():take(10):upper():enum():map('..'):printall '\n'
+
+Note the method `upper`, which is not a `seq` function. if an unknown method is
+called, sequence wrappers apply that method to all the values in the sequence
+(this is implicit use of `mapmethod`)
+
+It is straightforward to create custom sequences that can be used in this way. On
+Unix, `/dev/random` gives you an _endless_ sequence of random bytes, so we use
+`take` to limit the sequence, and then `map` to scale the result into the desired
+range. The key step is to use `seq` to wrap the iterator function:
+
+ -- random.lua
+ local seq = require 'pl.seq'
+
+ function dev_random()
+ local f = io.open('/dev/random')
+ local byte = string.byte
+ return seq(function()
+ -- read two bytes into a string and convert into a 16-bit number
+ local s = f:read(2)
+ return byte(s,1) + 256*byte(s,2)
+ end)
+ end
+
+ -- print 10 random numbers from 0 to 1 !
+ dev_random():take(10):map('%',100):map('/',100):printall ','
+
+
+Another Linux one-liner depends on the `/proc` filesystem and makes a list of all
+the currently running processes:
+
+ pids = seq(lfs.dir '/proc'):filter(stringx.isdigit):map(tonumber):copy()
+
+This version of Penlight has an experimental feature which relies on the fact
+that _all_ Lua types can have metatables, including functions. This makes
+_implicit sequence wrapping_ possible:
+
+ > seq.import()
+ > seq.random(5):printall(',',5,'%4.1f')
+ 0.0, 0.1, 0.4, 0.1, 0.2
+
+This avoids the awkward `seq(seq.random(5))` construction. Or the iterator can
+come from somewhere else completely:
+
+ > ('one two three'):gfind('%a+'):printall(',')
+ one,two,three,
+
+After `seq.import`, it is no longer necessary to explicitly wrap sequence
+functions.
+
+But there is a price to pay for this convenience. _Every_ function is affected,
+so that any function can be used, appropriate or not:
+
+ > math.sin:printall()
+ ..seq.lua:287: bad argument #1 to '(for generator)' (number expected, got nil)
+ > a = tostring
+ > = a:find(' ')
+ function: 0042C920
+
+What function is returned? It's almost certain to be something that makes no
+sense in the current context. So implicit sequences may make certain kinds of
+programming mistakes harder to catch - they are best used for interactive
+exploration and small scripts.
+
+<a id="comprehensions"/>
+
+### List Comprehensions
+
+List comprehensions are a compact way to create tables by specifying their
+elements. In Python, you can say this:
+
+ ls = [x for x in range(5)] # == [0,1,2,3,4]
+
+In Lua, using `pl.comprehension`:
+
+ > C = require('pl.comprehension').new()
+ > = C ('x for x=1,10') ()
+ {1,2,3,4,5,6,7,8,9,10}
+
+`C` is a function which compiles a list comprehension _string_ into a _function_.
+In this case, the function has no arguments. The parentheses are redundant for a
+function taking a string argument, so this works as well:
+
+ > = C 'x^2 for x=1,4' ()
+ {1,4,9,16}
+ > = C '{x,x^2} for x=1,4' ()
+ {{1,1},{2,4},{3,9},{4,16}}
+
+Note that the expression can be _any_ function of the variable `x`!
+
+The basic syntax so far is `<expr> for <set>`, where `<set>` can be anything that
+the Lua `for` statement understands. `<set>` can also just be the variable, in
+which case the values will come from the _argument_ of the comprehension. Here
+I'm emphasizing that a comprehension is a function which can take a list argument:
+
+ > = C '2*x for x' {1,2,3}
+ {2,4,6}
+ > dbl = C '2*x for x'
+ > = dbl {10,20,30}
+ {20,40,60}
+
+Here is a somewhat more explicit way of saying the same thing; `_1` is a
+_placeholder_ refering to the _first_ argument passed to the comprehension.
+
+ > = C '2*x for _,x in pairs(_1)' {10,20,30}
+ {20,40,60}
+ > = C '_1(x) for x'(tostring,{1,2,3,4})
+ {'1','2','3','4'}
+
+This extended syntax is useful when you wish to collect the result of some
+iterator, such as `io.lines`. This comprehension creates a function which creates
+a table of all the lines in a file:
+
+ > f = io.open('array.lua')
+ > lines = C 'line for line in _1:lines()' (f)
+ > = #lines
+ 118
+
+There are a number of functions that may be applied to the result of a
+comprehension:
+
+ > = C 'min(x for x)' {1,44,0}
+ 0
+ > = C 'max(x for x)' {1,44,0}
+ 44
+ > = C 'sum(x for x)' {1,44,0}
+ 45
+
+(These are equivalent to a reduce operation on a list.)
+
+After the `for` part, there may be a condition, which filters the output. This
+comprehension collects the even numbers from a list:
+
+ > = C 'x for x if x % 2 == 0' {1,2,3,4,5}
+ {2,4}
+
+There may be a number of `for` parts:
+
+ > = C '{x,y} for x = 1,2 for y = 1,2' ()
+ {{1,1},{1,2},{2,1},{2,2}}
+ > = C '{x,y} for x for y' ({1,2},{10,20})
+ {{1,10},{1,20},{2,10},{2,20}}
+
+These comprehensions are useful when dealing with functions of more than one
+variable, and are not so easily achieved with the other Penlight functional forms.
+
+<a id="func"/>
+
+### Creating Functions from Functions
+
+@lookup pl.func
+
+Lua functions may be treated like any other value, although of course you cannot
+multiply or add them. One operation that makes sense is _function composition_,
+which chains function calls (so `(f * g)(x)` is `f(g(x))`.)
+
+ > func = require 'pl.func'
+ > printf = func.compose(io.write,string.format)
+ > printf("hello %s\n",'world')
+ hello world
+ true
+
+Many functions require you to pass a function as an argument, say to apply to all
+values of a sequence or as a callback. Often useful functions have the wrong
+number of arguments. So there is a need to construct a function of one argument
+from one of two arguments, _binding_ the extra argument to a given value.
+
+_partial application_ takes a function of n arguments and returns a function of n-1
+arguments where the first argument is bound to some value:
+
+ > p2 = func.bind1(print,'start>')
+ > p2('hello',2)
+ start> hello 2
+ > ops = require 'pl.operator'
+ > = tablex.filter({1,-2,10,-1,2},bind1(ops.gt,0))
+ {-2,-1}
+ > tablex.filter({1,-2,10,-1,2},bind1(ops.le,0))
+ {1,10,2}
+
+The last example unfortunately reads backwards, because `bind1` alway binds the
+first argument! Also unfortunately, in my youth I confused 'currying' with
+'partial application', so the old name for `bind1` is `curry` - this alias still exists.
+
+This is a specialized form of function argument binding. Here is another way
+to say the `print` example:
+
+ > p2 = func.bind(print,'start>',func._1,func._2)
+ > p2('hello',2)
+ start> hello 2
+
+where `_1` and `_2` are _placeholder variables_, corresponding to the first and
+second argument respectively.
+
+Having `func` all over the place is distracting, so it's useful to pull all of
+`pl.func` into the local context. Here is the filter example, this time the right
+way around:
+
+ > utils.import 'pl.func'
+ > tablex.filter({1,-2,10,-1,2},bind(ops.gt, _1, 0))
+ {1,10,2}
+
+`tablex.merge` does a general merge of two tables. This example shows the
+usefulness of binding the last argument of a function.
+
+ > S1 = {john=27, jane=31, mary=24}
+ > S2 = {jane=31, jones=50}
+ > intersection = bind(tablex.merge, _1, _2, false)
+ > union = bind(tablex.merge, _1, _2, true)
+ > = intersection(S1,S2)
+ {jane=31}
+ > = union(S1,S2)
+ {mary=24,jane=31,john=27,jones=50}
+
+When using `bind` with `print`, we got a function of precisely two arguments,
+whereas we really want our function to use varargs like `print`. This is the role
+of `_0`:
+
+ > _DEBUG = true
+ > p = bind(print,'start>', _0)
+ return function (fn,_v1)
+ return function(...) return fn(_v1,...) end
+ end
+
+ > p(1,2,3,4,5)
+ start> 1 2 3 4 5
+
+I've turned on the global `_DEBUG` flag, so that the function generated is
+printed out. It is actually a function which _generates_ the required function;
+the first call _binds the value_ of `_v1` to 'start>'.
+
+### Placeholder Expressions
+
+A common pattern in Penlight is a function which applies another function to all
+elements in a table or a sequence, such as `tablex.map` or `seq.filter`. Lua does
+anonymous functions well, although they can be a bit tedious to type:
+
+ > = tablex.map(function(x) return x*x end, {1,2,3,4})
+ {1,4,9,16}
+
+`pl.func` allows you to define _placeholder expressions_, which can cut down on
+the typing required, and also make your intent clearer. First, we bring contents
+of `pl.func` into our context, and then supply an expression using placeholder
+variables, such as `_1`,`_2`,etc. (C++ programmers will recognize this from the
+Boost libraries.)
+
+ > utils.import 'pl.func'
+ > = tablex.map(_1*_1, {1,2,3,4})
+ {1,4,9,16}
+
+Functions of up to 5 arguments can be generated.
+
+ > = tablex.map2(_1+_2,{1,2,3}, {10,20,30})
+ {11,22,33}
+
+These expressions can use arbitrary functions, altho they must first be
+registered with the functional library. `func.register` brings in a single
+function, and `func.import` brings in a whole table of functions, such as `math`.
+
+ > sin = register(math.sin)
+ > = tablex.map(sin(_1), {1,2,3,4})
+ {0.8414709848079,0.90929742682568,0.14112000805987,-0.75680249530793}
+ > import 'math'
+ > = tablex.map(cos(2*_1),{1,2,3,4})
+ {-0.41614683654714,-0.65364362086361,0.96017028665037,-0.14550003380861}
+
+A common operation is calling a method of a set of objects:
+
+ > = tablex.map(_1:sub(1,1), {'one','four','x'})
+ {'o','f','x'}
+
+There are some restrictions on what operators can be used in PEs. For instance,
+because the `__len` metamethod cannot be overriden by plain Lua tables, we need
+to define a special function to express `#_1':
+
+ > = tablex.map(Len(_1), {'one','four','x'})
+ {3,4,1}
+
+Likewise for comparison operators, which cannot be overloaded for _different_
+types, and thus also have to be expressed as a special function:
+
+ > = tablex.filter(Gt(_1,0), {1,-1,2,4,-3})
+ {1,2,4}
+
+It is useful to express the fact that a function returns multiple values. For
+instance, `tablex.pairmap` expects a function that will be called with the key
+and the value, and returns the new value and the key, in that order.
+
+ > = pairmap(Args(_2,_1:upper()),{fred=1,alice=2})
+ {ALICE=2,FRED=1}
+
+PEs cannot contain `nil` values, since PE function arguments are represented as
+an array. Instead, a special value called `Nil` is provided. So say
+`_1:f(Nil,1)` instead of `_1:f(nil,1)`.
+
+A placeholder expression cannot be automatically used as a Lua function. The
+technical reason is that the call operator must be overloaded to construct
+function calls like `_1(1)`. If you want to force a PE to return a function, use
+`func.I`.
+
+ > = tablex.map(_1(10),{I(2*_1),I(_1*_1),I(_1+2)})
+ {20,100,12}
+
+Here we make a table of functions taking a single argument, and then call them
+all with a value of 10.
+
+The essential idea with PEs is to 'quote' an expression so that it is not
+immediately evaluated, but instead turned into a function that can be applied
+later to some arguments. The basic mechanism is to wrap values and placeholders
+so that the usual Lua operators have the effect of building up an _expression
+tree_. (It turns out that you can do _symbolic algebra_ using PEs, see
+`symbols.lua` in the examples directory, and its test runner `testsym.lua`, which
+demonstrates symbolic differentiation.)
+
+The rule is that if any operator has a PE operand, the result will be quoted.
+Sometimes we need to quote things explicitly. For instance, say we want to pass a
+function to a filter that must return true if the element value is in a set.
+`set[_1]` is the obvious expression, but it does not give the desired result,
+since it evaluates directly, giving `nil`. Indexing works differently than a
+binary operation like addition (set+_1 _is_ properly quoted) so there is a need
+for an explicit quoting or wrapping operation. This is the job of the `_`
+function; the PE in this case should be `_(set)[_1]`. This works for functions
+as well, as a convenient alternative to registering functions: `_(math.sin)(_1)`.
+This is equivalent to using the `lines' method:
+
+ for line in I(_(f):read()) do print(line) end
+
+Now this will work for _any_ 'file-like' object which which has a `read` method
+returning the next line. If you had a LuaSocket client which was being 'pushed'
+by lines sent from a server, then `_(s):receive '*l'` would create an iterator
+for accepting input. These forms can be convenient for adapting your data flow so
+that it can be passed to the sequence functions in `pl.seq'.
+
+Placeholder expressions can be mixed with sequence wrapper expressions.
+`lexer.lua` will give us a double-valued sequence of tokens, where the first
+value is a type, and the second is a value. We filter out only the values where
+the type is 'iden', extract the actual value using `map`, get the unique values
+and finally copy to a list.
+
+ > str = 'for i=1,10 do for j = 1,10 do print(i,j) end end'
+ > = seq(lexer.lua(str)):filter('==','iden'):map(_2):unique():copy()
+ {i,print,j}
+
+This is a particularly intense line (and I don't always suggest making everything
+a one-liner!); the key is the behaviour of `map`, which will take both values of
+the sequence, so `_2` returns the value part. (Since `filter` here takes extra
+arguments, it only operates on the type values.)
+
+There are some performance considerations to using placeholder expressions.
+Instantiating a PE requires constructing and compiling a function, which is not
+such a fast operation. So to get best performance, factor out PEs from loops like
+this;
+
+ local fn = I(_1:f() + _2:g())
+ for i = 1,n do
+ res[i] = tablex.map2(fn,first[i],second[i])
+ end
+
+