From 42ec7286b2d36a9ba22925f816a17cb1cc2aa5ce Mon Sep 17 00:00:00 2001 From: chai Date: Sat, 30 Oct 2021 11:32:16 +0800 Subject: + Penlight --- .../Penlight/docs/manual/07-functional.md.html | 834 +++++++++++++++++++++ 1 file changed, 834 insertions(+) create mode 100644 Data/Libraries/Penlight/docs/manual/07-functional.md.html (limited to 'Data/Libraries/Penlight/docs/manual/07-functional.md.html') diff --git a/Data/Libraries/Penlight/docs/manual/07-functional.md.html b/Data/Libraries/Penlight/docs/manual/07-functional.md.html new file mode 100644 index 0000000..d4ca655 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/07-functional.md.html @@ -0,0 +1,834 @@ + + + + + Penlight Documentation + + + + +
+ +
+ +
+
+
+ + +
+ + + + + + +
+ + +

Functional Programming

+ +

+

Sequences

+ + +

A Lua iterator (in its simplest form) is a function which can be repeatedly +called to return a set of one or more values. The for in statement understands +these iterators, and loops until the function returns nil. There are standard +sequence adapters for tables in Lua (ipairs and pairs), and io.lines +returns an iterator over all the lines in a file. In the Penlight libraries, such +iterators are also called sequences. A sequence of single values (say from +io.lines) is called single-valued, whereas the sequence defined by pairs is +double-valued.

+ +

pl.seq provides a number of useful iterators, and some functions which operate +on sequences. At first sight this example looks like an attempt to write Python +in Lua, (with the sequence being inclusive):

+ + +
+> for i in seq.range(1,4) do print(i) end
+1
+2
+3
+4
+
+ +

But range is actually equivalent to Python's xrange, since it generates a +sequence, not a list. To get a list, use seq.copy(seq.range(1,10)), which +takes any single-value sequence and makes a table from the result. seq.list is +like ipairs except that it does not give you the index, just the value.

+ + +
+> for x in seq.list {1,2,3} do print(x) end
+1
+2
+3
+
+ +

enum takes a sequence and turns it into a double-valued sequence consisting of +a sequence number and the value, so enum(list(ls)) is actually equivalent to +ipairs. A more interesting example prints out a file with line numbers:

+ + +
+for i,v in seq.enum(io.lines(fname)) do print(i..' '..v) end
+
+ +

Sequences can be combined, either by 'zipping' them or by concatenating them.

+ + +
+> for x,y in seq.zip(l1,l2) do print(x,y) end
+10      1
+20      2
+30      3
+> for x in seq.splice(l1,l2) do print(x) end
+10
+20
+30
+1
+2
+3
+
+ +

seq.printall is useful for printing out single-valued sequences, and provides +some finer control over formating, such as a delimiter, the number of fields per +line, and a format string to use (@see string.format)

+ + +
+> seq.printall(seq.random(10))
+0.0012512588885159 0.56358531449324 0.19330423902097 ....
+> seq.printall(seq.random(10), ',', 4, '%4.2f')
+0.17,0.86,0.71,0.51
+0.30,0.01,0.09,0.36
+0.15,0.17,
+
+ +

map will apply a function to a sequence.

+ + +
+> seq.printall(seq.map(string.upper, {'one','two'}))
+ONE TWO
+> seq.printall(seq.map('+', {10,20,30}, 1))
+11 21 31
+
+ +

filter will filter a sequence using a boolean function (often called a +predicate). For instance, this code only prints lines in a file which are +composed of digits:

+ + +
+for l in seq.filter(io.lines(file), stringx.isdigit) do print(l) end
+
+ +

The following returns a table consisting of all the positive values in the +original table (equivalent to tablex.filter(ls, '>', 0))

+ + +
+ls = seq.copy(seq.filter(ls, '>', 0))
+
+ +

We're already encounted seq.sum when discussing input.numbers. This can also +be expressed with seq.reduce:

+ + +
+> seq.reduce(function(x,y) return x + y end, seq.list{1,2,3,4})
+10
+
+ +

seq.reduce applies a binary function in a recursive fashion, so that:

+ + +
+reduce(op,{1,2,3}) => op(1,reduce(op,{2,3}) => op(1,op(2,3))
+
+ +

it's now possible to easily generate other cumulative operations; the standard +operations declared in pl.operator are useful here:

+ + +
+> ops = require 'pl.operator'
+> -- can also say '*' instead of ops.mul
+> = seq.reduce(ops.mul,input.numbers '1 2 3 4')
+24
+
+ +

There are functions to extract statistics from a sequence of numbers:

+ + +
+> l1 = List {10,20,30}
+> l2 = List {1,2,3}
+> = seq.minmax(l1)
+10      30
+> = seq.sum(l1)
+60      3
+
+ +

It is common to get sequences where values are repeated, say the words in a file. +count_map will take such a sequence and count the values, returning a table +where the keys are the unique values, and the value associated with each key is +the number of times they occurred:

+ + +
+> t = seq.count_map {'one','fred','two','one','two','two'}
+> = t
+{one=2,fred=1,two=3}
+
+ +

This will also work on numerical sequences, but you cannot expect the result to +be a proper list, i.e. having no 'holes'. Instead, you always need to use pairs +to iterate over the result - note that there is a hole at index 5:

+ + +
+> t = seq.count_map {1,2,4,2,2,3,4,2,6}
+> for k,v in pairs(t) do print(k,v) end
+1       1
+2       4
+3       1
+4       2
+6       1
+
+ +

unique uses count_map to return a list of the unique values, that is, just +the keys of the resulting table.

+ +

last turns a single-valued sequence into a double-valued sequence with the +current value and the last value:

+ + +
+> for current,last in seq.last {10,20,30,40} do print (current,last) end
+20      10
+30      20
+40      30
+
+ +

This makes it easy to do things like identify repeated lines in a file, or +construct differences between values. filter can handle double-valued sequences +as well, so one could filter such a sequence to only return cases where the +current value is less than the last value by using operator.lt or just '<'. +This code then copies the resulting code into a table.

+ + +
+> ls = {10,9,10,3}
+> = seq.copy(seq.filter(seq.last(s),'<'))
+{9,3}
+
+ +

+

Sequence Wrappers

+ +

The functions in pl.seq cover the common patterns when dealing with sequences, +but chaining these functions together can lead to ugly code. Consider the last +example of the previous section; seq is repeated three times and the resulting +expression has to be read right-to-left. The first issue can be helped by local +aliases, so that the expression becomes copy(filter(last(s),'<')) but the +second issue refers to the somewhat unnatural order of functional application. +We tend to prefer reading operations from left to right, which is one reason why +object-oriented notation has become popular. Sequence adapters allow this +expression to be written like so:

+ + +
+seq(s):last():filter('<'):copy()
+
+ +

With this notation, the operation becomes a chain of method calls running from +left to right.

+ +

'Sequence' is not a basic Lua type, they are generally functions or callable +objects. The expression seq(s) wraps a sequence in a sequence wrapper, which +is an object which understands all the functions in pl.seq as methods. This +object then explicitly represents sequences.

+ +

As a special case, the constructor (which is when you call the table seq) will +make a wrapper for a plain list-like table. Here we apply the length operator to +a sequence of strings, and print them out.

+ + +
+> seq{'one','tw','t'} :map '#' :printall()
+3 2 1
+
+ +

As a convenience, there is a function seq.lines which behaves just like +io.lines except it wraps the result as an explicit sequence type. This takes +the first 10 lines from standard input, makes it uppercase, turns it into a +sequence with a count and the value, glues these together with the concatenation +operator, and finally prints out the sequence delimited by a newline.

+ + +
+seq.lines():take(10):upper():enum():map('..'):printall '\n'
+
+ +

Note the method upper, which is not a seq function. if an unknown method is +called, sequence wrappers apply that method to all the values in the sequence +(this is implicit use of mapmethod)

+ +

It is straightforward to create custom sequences that can be used in this way. On +Unix, /dev/random gives you an endless sequence of random bytes, so we use +take to limit the sequence, and then map to scale the result into the desired +range. The key step is to use seq to wrap the iterator function:

+ + +
+-- random.lua
+local seq = require 'pl.seq'
+
+function dev_random()
+    local f = io.open('/dev/random')
+    local byte = string.byte
+    return seq(function()
+        -- read two bytes into a string and convert into a 16-bit number
+        local s = f:read(2)
+        return byte(s,1) + 256*byte(s,2)
+    end)
+end
+
+-- print 10 random numbers from 0 to 1 !
+dev_random():take(10):map('%',100):map('/',100):printall ','
+
+ +

Another Linux one-liner depends on the /proc filesystem and makes a list of all +the currently running processes:

+ + +
+pids = seq(lfs.dir '/proc'):filter(stringx.isdigit):map(tonumber):copy()
+
+ +

This version of Penlight has an experimental feature which relies on the fact +that all Lua types can have metatables, including functions. This makes +implicit sequence wrapping possible:

+ + +
+> seq.import()
+> seq.random(5):printall(',',5,'%4.1f')
+ 0.0, 0.1, 0.4, 0.1, 0.2
+
+ +

This avoids the awkward seq(seq.random(5)) construction. Or the iterator can +come from somewhere else completely:

+ + +
+> ('one two three'):gfind('%a+'):printall(',')
+one,two,three,
+
+ +

After seq.import, it is no longer necessary to explicitly wrap sequence +functions.

+ +

But there is a price to pay for this convenience. Every function is affected, +so that any function can be used, appropriate or not:

+ + +
+> math.sin:printall()
+..seq.lua:287: bad argument #1 to '(for generator)' (number expected, got nil)
+> a = tostring
+> = a:find(' ')
+function: 0042C920
+
+ +

What function is returned? It's almost certain to be something that makes no +sense in the current context. So implicit sequences may make certain kinds of +programming mistakes harder to catch - they are best used for interactive +exploration and small scripts.

+ +

+ +

+

List Comprehensions

+ +

List comprehensions are a compact way to create tables by specifying their +elements. In Python, you can say this:

+ + +
+ls = [x for x in range(5)]  # == [0,1,2,3,4]
+
+ +

In Lua, using pl.comprehension:

+ + +
+> C = require('pl.comprehension').new()
+> = C ('x for x=1,10') ()
+{1,2,3,4,5,6,7,8,9,10}
+
+ +

C is a function which compiles a list comprehension string into a function. +In this case, the function has no arguments. The parentheses are redundant for a +function taking a string argument, so this works as well:

+ + +
+> = C 'x^2 for x=1,4' ()
+{1,4,9,16}
+> = C '{x,x^2} for x=1,4' ()
+{{1,1},{2,4},{3,9},{4,16}}
+
+ +

Note that the expression can be any function of the variable x!

+ +

The basic syntax so far is <expr> for <set>, where <set> can be anything that +the Lua for statement understands. <set> can also just be the variable, in +which case the values will come from the argument of the comprehension. Here +I'm emphasizing that a comprehension is a function which can take a list argument:

+ + +
+> = C '2*x for x' {1,2,3}
+{2,4,6}
+> dbl = C '2*x for x'
+> = dbl {10,20,30}
+{20,40,60}
+
+ +

Here is a somewhat more explicit way of saying the same thing; _1 is a +placeholder refering to the first argument passed to the comprehension.

+ + +
+> = C '2*x for _,x in pairs(_1)' {10,20,30}
+{20,40,60}
+> = C '_1(x) for x'(tostring,{1,2,3,4})
+{'1','2','3','4'}
+
+ +

This extended syntax is useful when you wish to collect the result of some +iterator, such as io.lines. This comprehension creates a function which creates +a table of all the lines in a file:

+ + +
+> f = io.open('array.lua')
+> lines = C 'line for line in _1:lines()' (f)
+> = #lines
+118
+
+ +

There are a number of functions that may be applied to the result of a +comprehension:

+ + +
+> = C 'min(x for x)' {1,44,0}
+0
+> = C 'max(x for x)' {1,44,0}
+44
+> = C 'sum(x for x)' {1,44,0}
+45
+
+ +

(These are equivalent to a reduce operation on a list.)

+ +

After the for part, there may be a condition, which filters the output. This +comprehension collects the even numbers from a list:

+ + +
+> = C 'x for x if x % 2 == 0' {1,2,3,4,5}
+{2,4}
+
+ +

There may be a number of for parts:

+ + +
+> = C '{x,y} for x = 1,2 for y = 1,2' ()
+{{1,1},{1,2},{2,1},{2,2}}
+> = C '{x,y} for x for y' ({1,2},{10,20})
+{{1,10},{1,20},{2,10},{2,20}}
+
+ +

These comprehensions are useful when dealing with functions of more than one +variable, and are not so easily achieved with the other Penlight functional forms.

+ +

+ +

+

Creating Functions from Functions

+ + +

Lua functions may be treated like any other value, although of course you cannot +multiply or add them. One operation that makes sense is function composition, +which chains function calls (so (f * g)(x) is f(g(x)).)

+ + +
+> func = require 'pl.func'
+> printf = func.compose(io.write,string.format)
+> printf("hello %s\n",'world')
+hello world
+true
+
+ +

Many functions require you to pass a function as an argument, say to apply to all +values of a sequence or as a callback. Often useful functions have the wrong +number of arguments. So there is a need to construct a function of one argument +from one of two arguments, binding the extra argument to a given value.

+ +

partial application takes a function of n arguments and returns a function of n-1 +arguments where the first argument is bound to some value:

+ + +
+> p2 = func.bind1(print,'start>')
+> p2('hello',2)
+start>  hello   2
+> ops = require 'pl.operator'
+> = tablex.filter({1,-2,10,-1,2},bind1(ops.gt,0))
+{-2,-1}
+> tablex.filter({1,-2,10,-1,2},bind1(ops.le,0))
+{1,10,2}
+
+ +

The last example unfortunately reads backwards, because bind1 alway binds the +first argument! Also unfortunately, in my youth I confused 'currying' with +'partial application', so the old name for bind1 is curry - this alias still exists.

+ +

This is a specialized form of function argument binding. Here is another way +to say the print example:

+ + +
+> p2 = func.bind(print,'start>',func._1,func._2)
+> p2('hello',2)
+start>  hello   2
+
+ +

where _1 and _2 are placeholder variables, corresponding to the first and +second argument respectively.

+ +

Having func all over the place is distracting, so it's useful to pull all of +pl.func into the local context. Here is the filter example, this time the right +way around:

+ + +
+> utils.import 'pl.func'
+> tablex.filter({1,-2,10,-1,2},bind(ops.gt, _1, 0))
+{1,10,2}
+
+ +

tablex.merge does a general merge of two tables. This example shows the +usefulness of binding the last argument of a function.

+ + +
+> S1 = {john=27, jane=31, mary=24}
+> S2 = {jane=31, jones=50}
+> intersection = bind(tablex.merge, _1, _2, false)
+> union = bind(tablex.merge, _1, _2, true)
+> = intersection(S1,S2)
+{jane=31}
+> = union(S1,S2)
+{mary=24,jane=31,john=27,jones=50}
+
+ +

When using bind with print, we got a function of precisely two arguments, +whereas we really want our function to use varargs like print. This is the role +of _0:

+ + +
+> _DEBUG = true
+> p = bind(print,'start>', _0)
+return function (fn,_v1)
+    return function(...) return fn(_v1,...) end
+end
+
+> p(1,2,3,4,5)
+start>  1       2       3       4       5
+
+ +

I've turned on the global _DEBUG flag, so that the function generated is +printed out. It is actually a function which generates the required function; +the first call binds the value of _v1 to 'start>'.

+ +

+

Placeholder Expressions

+ +

A common pattern in Penlight is a function which applies another function to all +elements in a table or a sequence, such as tablex.map or seq.filter. Lua does +anonymous functions well, although they can be a bit tedious to type:

+ + +
+> = tablex.map(function(x) return x*x end, {1,2,3,4})
+{1,4,9,16}
+
+ +

pl.func allows you to define placeholder expressions, which can cut down on +the typing required, and also make your intent clearer. First, we bring contents +of pl.func into our context, and then supply an expression using placeholder +variables, such as _1,_2,etc. (C++ programmers will recognize this from the +Boost libraries.)

+ + +
+> utils.import 'pl.func'
+> = tablex.map(_1*_1, {1,2,3,4})
+{1,4,9,16}
+
+ +

Functions of up to 5 arguments can be generated.

+ + +
+> = tablex.map2(_1+_2,{1,2,3}, {10,20,30})
+{11,22,33}
+
+ +

These expressions can use arbitrary functions, altho they must first be +registered with the functional library. func.register brings in a single +function, and func.import brings in a whole table of functions, such as math.

+ + +
+> sin = register(math.sin)
+> = tablex.map(sin(_1), {1,2,3,4})
+{0.8414709848079,0.90929742682568,0.14112000805987,-0.75680249530793}
+> import 'math'
+> = tablex.map(cos(2*_1),{1,2,3,4})
+{-0.41614683654714,-0.65364362086361,0.96017028665037,-0.14550003380861}
+
+ +

A common operation is calling a method of a set of objects:

+ + +
+> = tablex.map(_1:sub(1,1), {'one','four','x'})
+{'o','f','x'}
+
+ +

There are some restrictions on what operators can be used in PEs. For instance, +because the __len metamethod cannot be overriden by plain Lua tables, we need +to define a special function to express `#_1':

+ + +
+> = tablex.map(Len(_1), {'one','four','x'})
+{3,4,1}
+
+ +

Likewise for comparison operators, which cannot be overloaded for different +types, and thus also have to be expressed as a special function:

+ + +
+> = tablex.filter(Gt(_1,0), {1,-1,2,4,-3})
+{1,2,4}
+
+ +

It is useful to express the fact that a function returns multiple values. For +instance, tablex.pairmap expects a function that will be called with the key +and the value, and returns the new value and the key, in that order.

+ + +
+> = pairmap(Args(_2,_1:upper()),{fred=1,alice=2})
+{ALICE=2,FRED=1}
+
+ +

PEs cannot contain nil values, since PE function arguments are represented as +an array. Instead, a special value called Nil is provided. So say +_1:f(Nil,1) instead of _1:f(nil,1).

+ +

A placeholder expression cannot be automatically used as a Lua function. The +technical reason is that the call operator must be overloaded to construct +function calls like _1(1). If you want to force a PE to return a function, use +func.I.

+ + +
+> = tablex.map(_1(10),{I(2*_1),I(_1*_1),I(_1+2)})
+{20,100,12}
+
+ +

Here we make a table of functions taking a single argument, and then call them +all with a value of 10.

+ +

The essential idea with PEs is to 'quote' an expression so that it is not +immediately evaluated, but instead turned into a function that can be applied +later to some arguments. The basic mechanism is to wrap values and placeholders +so that the usual Lua operators have the effect of building up an expression +tree. (It turns out that you can do symbolic algebra using PEs, see +symbols.lua in the examples directory, and its test runner testsym.lua, which +demonstrates symbolic differentiation.)

+ +

The rule is that if any operator has a PE operand, the result will be quoted. +Sometimes we need to quote things explicitly. For instance, say we want to pass a +function to a filter that must return true if the element value is in a set. +set[_1] is the obvious expression, but it does not give the desired result, +since it evaluates directly, giving nil. Indexing works differently than a +binary operation like addition (set+_1 is properly quoted) so there is a need +for an explicit quoting or wrapping operation. This is the job of the _ +function; the PE in this case should be _(set)[_1]. This works for functions +as well, as a convenient alternative to registering functions: _(math.sin)(_1). +This is equivalent to using the `lines' method:

+ + +
+for line in I(_(f):read()) do print(line) end
+
+ +

Now this will work for any 'file-like' object which which has a read method +returning the next line. If you had a LuaSocket client which was being 'pushed' +by lines sent from a server, then _(s):receive '*l' would create an iterator +for accepting input. These forms can be convenient for adapting your data flow so +that it can be passed to the sequence functions in `pl.seq'.

+ +

Placeholder expressions can be mixed with sequence wrapper expressions. +lexer.lua will give us a double-valued sequence of tokens, where the first +value is a type, and the second is a value. We filter out only the values where +the type is 'iden', extract the actual value using map, get the unique values +and finally copy to a list.

+ + +
+> str = 'for i=1,10 do for j = 1,10 do print(i,j) end end'
+> = seq(lexer.lua(str)):filter('==','iden'):map(_2):unique():copy()
+{i,print,j}
+
+ +

This is a particularly intense line (and I don't always suggest making everything +a one-liner!); the key is the behaviour of map, which will take both values of +the sequence, so _2 returns the value part. (Since filter here takes extra +arguments, it only operates on the type values.)

+ +

There are some performance considerations to using placeholder expressions. +Instantiating a PE requires constructing and compiling a function, which is not +such a fast operation. So to get best performance, factor out PEs from loops like +this;

+ + +
+local fn = I(_1:f() + _2:g())
+for i = 1,n do
+    res[i] = tablex.map2(fn,first[i],second[i])
+end
+
+ + + +
+
+
+generated by LDoc 1.4.6 +
+
+ + -- cgit v1.1-26-g67d0