diff options
author | chai <chaifix@163.com> | 2021-10-30 11:32:16 +0800 |
---|---|---|
committer | chai <chaifix@163.com> | 2021-10-30 11:32:16 +0800 |
commit | 42ec7286b2d36a9ba22925f816a17cb1cc2aa5ce (patch) | |
tree | 24bc7009457a8d7500f264e89946dc20d069294f /Data/Libraries/Penlight/docs_topics/02-arrays.md | |
parent | 164885fd98d48703bd771f802d79557b7db97431 (diff) |
+ Penlight
Diffstat (limited to 'Data/Libraries/Penlight/docs_topics/02-arrays.md')
-rw-r--r-- | Data/Libraries/Penlight/docs_topics/02-arrays.md | 649 |
1 files changed, 649 insertions, 0 deletions
diff --git a/Data/Libraries/Penlight/docs_topics/02-arrays.md b/Data/Libraries/Penlight/docs_topics/02-arrays.md new file mode 100644 index 0000000..9ee292f --- /dev/null +++ b/Data/Libraries/Penlight/docs_topics/02-arrays.md @@ -0,0 +1,649 @@ +## Tables and Arrays + +<a id="list"/> + +### Python-style Lists + +One of the elegant things about Lua is that tables do the job of both lists and +dicts (as called in Python) or vectors and maps, (as called in C++), and they do +it efficiently. However, if we are dealing with 'tables with numerical indices' +we may as well call them lists and look for operations which particularly make +sense for lists. The Penlight `List` class was originally written by Nick Trout +for Lua 5.0, and translated to 5.1 and extended by myself. It seemed that +borrowing from Python was a good idea, and this eventually grew into Penlight. + +Here is an example showing `List` in action; it redefines `__tostring`, so that +it can print itself out more sensibly: + + > List = require 'pl.List' --> automatic with require 'pl' <--- + > l = List() + > l:append(10) + > l:append(20) + > = l + {10,20} + > l:extend {30,40} + {10,20,30,40} + > l:insert(1,5) + {5,10,20,30,40} + > = l:pop() + 40 + > = l + {5,10,20,30} + > = l:index(30) + 4 + > = l:contains(30) + true + > = l:reverse() ---> note: doesn't make a copy! + {30,20,10,5} + +Although methods like `sort` and `reverse` operate in-place and change the list, +they do return the original list. This makes it possible to do _method chaining_, +like `ls = ls:append(10):append(20):reverse():append(1)`. But (and this is an +important but) no extra copy is made, so `ls` does not change identity. `List` +objects (like tables) are _mutable_, unlike strings. If you want a copy of a +list, then `List(ls)` will do the job, i.e. it acts like a copy constructor. +However, if passed any other table, `List` will just set the metatable of the +table and _not_ make a copy. + +A particular feature of Python lists is _slicing_. This is fully supported in +this version of `List`, except we use 1-based indexing. So `List.slice` works +rather like `string.sub`: + + > l = List {10,20,30,40} + > = l:slice(1,1) ---> note: creates a new list! + {10} + > = l:slice(2,2) + {20} + > = l:slice(2,3) + {20,30} + > = l:slice(2,-2) + {20,30} + > = l:slice_assign(2,2,{21,22,23}) + {10,21,22,23,30,40} + > = l:chop(1,1) + {21,22,23,30,40} + +Functions like `slice_assign` and `chop` modify the list; the first is equivalent +to Python`l[i1:i2] = seq` and the second to `del l[i1:i2]`. + +List objects are ultimately just Lua 'list-like' tables, but they have extra +operations defined on them, such as equality and concatention. For regular +tables, equality is only true if the two tables are _identical objects_, whereas +two lists are equal if they have the same contents, i.e. that `l1[i]==l2[i]` for +all elements. + + > l1 = List {1,2,3} + > l2 = List {1,2,3} + > = l1 == l2 + true + > = l1..l2 + {1,2,3,1,2,3} + +The `List` constructor can be passed a function. If so, it's assumed that this is +an iterator function that can be repeatedly called to generate a sequence. One +such function is `io.lines`; the following short, intense little script counts +the number of lines in standard input: + + -- linecount.lua + require 'pl' + ls = List(io.lines()) + print(#ls) + +`List.iterate` captures what `List` considers a sequence. In particular, it can +also iterate over all 'characters' in a string: + + > for ch in List.iterate 'help' do io.write(ch,' ') end + h e l p > + +Since the function `iterate` is used internally by the `List` constructor, +strings can be made into lists of character strings very easily. + +There are a number of operations that go beyond the standard Python methods. For +instance, you can _partition_ a list into a table of sublists using a function. +In the simplest form, you use a predicate (a function returning a boolean value) +to partition the list into two lists, one of elements matching and another of +elements not matching. But you can use any function; if we use `type` then the +keys will be the standard Lua type names. + + > ls = List{1,2,3,4} + > ops = require 'pl.operator' + > ls:partition(function(x) return x > 2 end) + {false={1,2},true={3,4}} + > ls = List{'one',math.sin,List{1},10,20,List{1,2}} + > ls:partition(type) + {function={function: 00369110},string={one},number={10,20},table={{1},{1,2}}} + +This is one `List` method which returns a table which is not a `List`. Bear in +mind that you can always call a `List` method on a plain table argument, so +`List.partition(t,type)` works as expected. But these functions will only operate +on the array part of the table. + +The 'nominal' type of the returned table is `pl.Multimap`, which describes a mapping +between keys and multiple values. This does not mean that `pl.Multimap` is automatically +loaded whenever you use `partition` (or `List` for that matter); this is one of the +standard metatables which are only filled out when the appropriate module is loaded. +This allows tables to be tagged appropriately without causing excessive coupling. + +Stacks occur everywhere in computing. `List` supports stack-like operations; +there is already `pop` (remove and return last value) and `append` acts like +`push` (add a value to the end). `push` is provided as an alias for `append`, and +the other stack operation (size) is simply the size operator `#`. Queues can +also be implemented; you use `pop` to take values out of the queue, and `put` to +insert a value at the begining. + +You may derive classes from `List`, and since the list-returning methods +are covariant, the result of `slice` etc will return lists of the derived type, +not `List`. For instance, consider the specialization of a `List` type that contains +numbers in `tests/test-list.lua`: + + n1 = NA{10,20,30} + n2 = NA{1,2,3} + ns = n1 + 2*n2 + asserteq(ns,{12,24,36}) + min,max = ns:slice(1,2):minmax() + asserteq(T(min,max),T(12,24)) + asserteq(n1:normalize():sum(),1,1e-8) + + +### Map and Set classes + +The `Map` class exposes what Python would call a 'dict' interface, and accesses +the hash part of the table. The name 'Map' is used to emphasize the interface, +not the implementation; it is an object which maps keys onto values; `m['alice']` +or the equivalent `m.alice` is the access operation. This class also provides +explicit `set` and `get` methods, which are trivial for regular maps but get +interesting when `Map` is subclassed. The other operation is `update`, which +extends a map by copying the keys and values from another table, perhaps +overwriting existing keys: + + > Map = require 'pl.Map' + > m = Map{one=1,two=2} + > m:update {three=3,four=4,two=20} + > = m == M{one=1,two=20,three=3,four=4} + true + +The method `values` returns a list of the values, and `keys` returns a list of +the keys; there is no guarantee of order. `getvalues` is given a list of keys and +returns a list of values associated with these keys: + + > m = Map{one=1,two=2,three=3} + > = m:getvalues {'one','three'} + {1,3} + > = m:getvalues(m:keys()) == m:values() + true + +When querying the value of a `Map`, it is best to use the `get` method: + + > print(m:get 'one', m:get 'two') + 1 2 + +The reason is that `m[key]` can be ambiguous; due to the current implementation, +`m["get"]` will always succeed, because if a value is not present in the map, it +will be looked up in the `Map` metatable, which contains a method `get`. There is +currently no simple solution to this annoying restriction. + +There are some useful classes which inherit from `Map`. An `OrderedMap` behaves +like a `Map` but keeps its keys in order if you use its `set` method to add keys +and values. Like all the 'container' classes in Penlight, it defines an `iter` +method for iterating over its values; this will return the keys and values in the +order of insertion; the `keys` and `values` methods likewise. + +A `MultiMap` allows multiple values to be associated with a given key. So `set` +(as before) takes a key and a value, but calling it with the same key and a +different value does not overwrite but adds a new value. `get` (or using `[]`) +will return a list of values. + +A `Set` can be seen as a special kind of `Map`, where all the values are `true`, +the keys are the values, and the order is not important. So in this case +`Set.values` is defined to return a list of the keys. Sets can display +themselves, and the basic operations like `union` (`+`) and `intersection` (`*`) +are defined. + + > Set = require 'pl.Set' + > = Set{'one','two'} == Set{'two','one'} + true + > fruit = Set{'apple','banana','orange'} + > = fruit['banana'] + true + > = fruit['hazelnut'] + nil + > = fruit:values() + {apple,orange,banana} + > colours = Set{'red','orange','green','blue'} + > = fruit,colours + [apple,orange,banana] [blue,green,orange,red] + > = fruit+colours + [blue,green,apple,red,orange,banana] + > = fruit*colours + [orange] + +There are also the functions `Set.difference` and `Set.symmetric_difference`. The +first answers the question 'what fruits are not colours?' and the second 'what +are fruits and colours but not both?' + + > = fruit - colours + [apple,banana] + > = fruit ^ colours + [blue,green,apple,red,banana] + +Adding elements to a set is simply `fruit['peach'] = true` and removing is +`fruit['apple'] = nil` . To make this simplicity work properly, the `Set` class has no +methods - either you use the operator forms or explicitly use `Set.intersect` +etc. In this way we avoid the ambiguity that plagues `Map`. + + +(See `pl.Map` and `pl.Set`) + +### Useful Operations on Tables + +@lookup pl.tablex + +Some notes on terminology: Lua tables are usually _list-like_ (like an array) or +_map-like_ (like an associative array or dict); they can of course have a +list-like and a map-like part. Some of the table operations only make sense for +list-like tables, and some only for map-like tables. (The usual Lua terminology +is the array part and the hash part of the table, which reflects the actual +implementation used; it is more accurate to say that a Lua table is an +associative map which happens to be particularly efficient at acting like an +array.) + +The functions provided in `table` provide all the basic manipulations on Lua +tables, but as we saw with the `List` class, it is useful to build higher-level +operations on top of those functions. For instance, to copy a table involves this +kind of loop: + + local res = {} + for k,v in pairs(T) do + res[k] = v + end + return res + +The `tablex` module provides this as `copy`, which does a _shallow_ copy of a +table. There is also `deepcopy` which goes further than a simple loop in two +ways; first, it also gives the copy the same metatable as the original (so it can +copy objects like `List` above) and any nested tables will also be copied, to +arbitrary depth. There is also `icopy` which operates on list-like tables, where +you can set optionally set the start index of the source and destination as well. +It ensures that any left-over elements will be deleted: + + asserteq(icopy({1,2,3,4,5,6},{20,30}),{20,30}) -- start at 1 + asserteq(icopy({1,2,3,4,5,6},{20,30},2),{1,20,30}) -- start at 2 + asserteq(icopy({1,2,3,4,5,6},{20,30},2,2),{1,30}) -- start at 2, copy from 2 + +(This code from the `tablex` test module shows the use of `pl.test.asserteq`) + +Whereas, `move` overwrites but does not delete the rest of the destination: + + asserteq(move({1,2,3,4,5,6},{20,30}),{20,30,3,4,5,6}) + asserteq(move({1,2,3,4,5,6},{20,30},2),{1,20,30,4,5,6}) + asserteq(move({1,2,3,4,5,6},{20,30},2,2),{1,30,3,4,5,6}) + +(The difference is somewhat like that between C's `strcpy` and `memmove`.) + +To summarize, use `copy` or `deepcopy` to make a copy of an arbitrary table. To +copy into a map-like table, use `update`; to copy into a list-like table use +`icopy`, and `move` if you are updating a range in the destination. + +To complete this set of operations, there is `insertvalues` which works like +`table.insert` except that one provides a table of values to be inserted, and +`removevalues` which removes a range of values. + + asserteq(insertvalues({1,2,3,4},2,{20,30}),{1,20,30,2,3,4}) + asserteq(insertvalues({1,2},{3,4}),{1,2,3,4}) + +Another example: + + > T = require 'pl.tablex' + > t = {10,20,30,40} + > = T.removevalues(t,2,3) + {10,40} + > = T.insertvalues(t,2,{20,30}) + {10,20,30,40} + + +In a similar spirit to `deepcopy`, `deepcompare` will take two tables and return +true only if they have exactly the same values and structure. + + > t1 = {1,{2,3},4} + > t2 = deepcopy(t1) + > = t1 == t2 + false + > = deepcompare(t1,t2) + true + +`find` will return the index of a given value in a list-like table. Note that +like `string.find` you can specify an index to start searching, so that all +instances can be found. There is an optional fourth argument, which makes the +search start at the end and go backwards, so we could define `rfind` like so: + + function rfind(t,val,istart) + return tablex.find(t,val,istart,true) + end + +`find` does a linear search, so it can slow down code that depends on it. If +efficiency is required for large tables, consider using an _index map_. +`index_map` will return a table where the keys are the original values of the +list, and the associated values are the indices. (It is almost exactly the +representation needed for a _set_.) + + > t = {'one','two','three'} + > = tablex.find(t,'two') + 2 + > = tablex.find(t,'four') + nil + > il = tablex.index_map(t) + > = il['two'] + 2 + > = il.two + 2 + +A version of `index_map` called `makeset` is also provided, where the values are +just `true`. This is useful because two such sets can be compared for equality +using `deepcompare`: + + > = deepcompare(makeset {1,2,3},makeset {2,1,3}) + true + +Consider the problem of determining the new employees that have joined in a +period. Assume we have two files of employee names: + + (last-month.txt) + smith,john + brady,maureen + mongale,thabo + + (this-month.txt) + smith,john + smit,johan + brady,maureen + mogale,thabo + van der Merwe,Piet + +To find out differences, just make the employee lists into sets, like so: + + require 'pl' + + function read_employees(file) + local ls = List(io.lines(file)) -- a list of employees + return tablex.makeset(ls) + end + + last = read_employees 'last-month.txt' + this = read_employees 'this-month.txt' + + -- who is in this but not in last? + diff = tablex.difference(this,last) + + -- in a set, the keys are the values... + for e in pairs(diff) do print(e) end + + -- *output* + -- van der Merwe,Piet + -- smit,johan + +The `difference` operation is easy to write and read: + + for e in pairs(this) do + if not last[e] then + print(e) + end + end + +Using `difference` here is not that it is a tricky thing to code, it is that you +are stating your intentions clearly to other readers of your code. (And naturally +to your future self, in six months time.) + +`find_if` will search a table using a function. The optional third argument is a +value which will be passed as a second argument to the function. `pl.operator` +provides the Lua operators conveniently wrapped as functions, so the basic +comparison functions are available: + + > ops = require 'pl.operator' + > = tablex.find_if({10,20,30,40},ops.gt,20) + 3 true + +Note that `find_if` will also return the _actual value_ returned by the function, +which of course is usually just `true` for a boolean function, but any value +which is not `nil` and not `false` can be usefully passed back. + +`deepcompare` does a thorough recursive comparison, but otherwise using the +default equality operator. `compare` allows you to specify exactly what function +to use when comparing two list-like tables, and `compare_no_order` is true if +they contain exactly the same elements. Do note that the latter does not need an +explicit comparison function - in this case the implementation is actually to +compare the two sets, as above: + + > = compare_no_order({1,2,3},{2,1,3}) + true + > = compare_no_order({1,2,3},{2,1,3},'==') + true + +(Note the special string '==' above; instead of saying `ops.gt` or `ops.eq` we +can use the strings '>' or '==' respectively.) + +`sort` and `sortv` return iterators that will iterate through the +sorted elements of a table. `sort` iterates by sorted key order, and +`sortv` iterates by sorted value order. For example, given a table +with names and ages, it is trivial to iterate over the elements: + + > t = {john=27,jane=31,mary=24} + > for name,age in tablex.sort(t) do print(name,age) end + jane 31 + john 27 + mary 24 + > for name,age in tablex.sortv(t) do print(name,age) end + mary 24 + john 27 + jane 31 + +There are several ways to merge tables in PL. If they are list-like, then see the +operations defined by `pl.List`, like concatenation. If they are map-like, then +`merge` provides two basic operations. If the third arg is false, then the result +only contains the keys that are in common between the two tables, and if true, +then the result contains all the keys of both tables. These are in fact +generalized set union and intersection operations: + + > S1 = {john=27,jane=31,mary=24} + > S2 = {jane=31,jones=50} + > = tablex.merge(S1, S2, false) + {jane=31} + > = tablex.merge(S1, S2, true) + {mary=24,jane=31,john=27,jones=50} + +When working with tables, you will often find yourself writing loops like in the +first example. Loops are second nature to programmers, but they are often not the +most elegant and self-describing way of expressing an operation. Consider the +`map` function, which creates a new table by applying a function to each element +of the original: + + > = map(math.sin, {1,2,3,4}) + { 0.84, 0.91, 0.14, -0.76} + > = map(function(x) return x*x end, {1,2,3,4}) + {1,4,9,16} + +`map` saves you from writing a loop, and the resulting code is often clearer, as +well as being shorter. This is not to say that 'loops are bad' (although you will +hear that from some extremists), just that it's good to capture standard +patterns. Then the loops you do write will stand out and acquire more significance. + +`pairmap` is interesting, because the function works with both the key and the +value. + + > t = {fred=10,bonzo=20,alice=4} + > = pairmap(function(k,v) return v end, t) + {4,10,20} + > = pairmap(function(k,v) return k end, t) + {'alice','fred','bonzo'} + +(These are common enough operations that the first is defined as `values` and the +second as `keys`.) If the function returns two values, then the _second_ value is +considered to be the new key: + + > = pairmap(t,function(k,v) return v+10, k:upper() end) + {BONZO=30,FRED=20,ALICE=14} + +`map2` applies a function to two tables: + + > map2(ops.add,{1,2},{10,20}) + {11,22} + > map2('*',{1,2},{10,20}) + {10,40} + +The various map operations generate tables; `reduce` applies a function of two +arguments over a table and returns the result as a scalar: + + > reduce ('+', {1,2,3}) + 6 + > reduce ('..', {'one','two','three'}) + 'onetwothree' + +Finally, `zip` sews different tables together: + + > = zip({1,2,3},{10,20,30}) + {{1,10},{2,20},{3,30}} + +Browsing through the documentation, you will find that `tablex` and `List` share +methods. For instance, `tablex.imap` and `List.map` are basically the same +function; they both operate over the array-part of the table and generate another +table. This can also be expressed as a _list comprehension_ `C 'f(x) for x' (t)` +which makes the operation more explicit. So why are there different ways to do +the same thing? The main reason is that not all tables are Lists: the expression +`ls:map('#')` will return a _list_ of the lengths of any elements of `ls`. A list +is a thin wrapper around a table, provided by the metatable `List`. Sometimes you +may wish to work with ordinary Lua tables; the `List` interface is not a +compulsory way to use Penlight table operations. + +### Operations on two-dimensional tables + +@lookup pl.array2d + +Two-dimensional tables are of course easy to represent in Lua, for instance +`{{1,2},{3,4}}` where we store rows as subtables and index like so `A[col][row]`. +This is the common representation used by matrix libraries like +[LuaMatrix](http://lua-users.org/wiki/LuaMatrix). `pl.array2d` does not provide +matrix operations, since that is the job for a specialized library, but rather +provides generalizations of the higher-level operations provided by `pl.tablex` +for one-dimensional arrays. + +`iter` is a useful generalization of `ipairs`. (The extra parameter determines +whether you want the indices as well.) + + > a = {{1,2},{3,4}} + > for i,j,v in array2d.iter(a,true) do print(i,j,v) end + 1 1 1 + 1 2 2 + 2 1 3 + 2 2 4 + +Note that you can always convert an arbitrary 2D array into a 'list of lists' +with `List(tablex.map(List,a))` + +`map` will apply a function over all elements (notice that extra arguments can be +provided, so this operation is in effect `function(x) return x-1 end`) + + > array2d.map('-',a,1) + {{0,1},{2,3}} + +2D arrays are stored as an array of rows, but columns can be extracted: + + > array2d.column(a,1) + {1,3} + +There are three equivalents to `tablex.reduce`. You can either reduce along the +rows (which is the most efficient) or reduce along the columns. Either one will +give you a 1D array. And `reduce2` will apply two operations: the first one +reduces the rows, and the second reduces the result. + + > array2d.reduce_rows('+',a) + {3,7} + > array2d.reduce_cols('+',a) + {4,6} + > -- same as tablex.reduce('*',array.reduce_rows('+',a)) + > array2d.reduce2('*','+',a) + 21 ` + +`tablex.map2` applies an operation to two tables, giving another table. +`array2d.map2` does this for 2D arrays. Note that you have to provide the _rank_ +of the arrays involved, since it's hard to always correctly deduce this from the +data: + + > b = {{10,20},{30,40}} + > a = {{1,2},{3,4}} + > = array2d.map2('+',2,2,a,b) -- two 2D arrays + {{11,22},{33,44}} + > = array2d.map2('+',1,2,{10,100},a) -- 1D, 2D + {{11,102},{13,104}} + > = array2d.map2('*',2,1,a,{1,-1}) -- 2D, 1D + {{1,-2},{3,-4}} + +Of course, you are not limited to simple arithmetic. Say we have a 2D array of +strings, and wish to print it out with proper right justification. The first step +is to create all the string lengths by mapping `string.len` over the array, the +second is to reduce this along the columns using `math.max` to get maximum column +widths, and last, apply `stringx.rjust` with these widths. + + maxlens = reduce_cols(math.max,map('#',lines)) + lines = map2(stringx.rjust,2,1,lines,maxlens) + +There is `product` which returns the _Cartesian product_ of two 1D arrays. The +result is a 2D array formed from applying the function to all possible pairs from +the two arrays. + + > array2d.product('{}',{1,2},{'a','b'}) + {{{1,'b'},{2,'a'}},{{1,'a'},{2,'b'}}} + +There is a set of operations which work in-place on 2D arrays. You can +`swap_rows` and `swap_cols`; the first really is a simple one-liner, but the idea +here is to give the operation a name. `remove_row` and `remove_col` are +generalizations of `table.remove`. Likewise, `extract_rows` and `extract_cols` +are given arrays of indices and discard anything else. So, for instance, +`extract_cols(A,{2,4})` will leave just columns 2 and 4 in the array. + +`List.slice` is often useful on 1D arrays; `slice` does the same thing, but is +generally given a start (row,column) and a end (row,column). + + > A = {{1,2,3},{4,5,6},{7,8,9}} + > B = slice(A,1,1,2,2) + > write(B) + 1 2 + 4 5 + > B = slice(A,2,2) + > write(B,nil,'%4.1f') + 5.0 6.0 + 8.0 9.0 + +Here `write` is used to print out an array nicely; the second parameter is `nil`, +which is the default (stdout) but can be any file object and the third parameter +is an optional format (as used in `string.format`). + +`parse_range` will take a spreadsheet range like 'A1:B2' or 'R1C1:R2C2' and +return the range as four numbers, which can be passed to `slice`. The rule is +that `slice` will return an array of the appropriate shape depending on the +range; if a range represents a row or a column, the result is 1D, otherwise 2D. + +This applies to `iter` as well, which can also optionally be given a range: + + + > for i,j,v in iter(A,true,2,2) do print(i,j,v) end + 2 2 5 + 2 3 6 + 3 2 8 + 3 3 9 + +`new` will construct a new 2D array with the given dimensions. You provide an +initial value for the elements, which is interpreted as a function if it's +callable. With `L` being `utils.string_lambda` we then have the following way to +make an _identity matrix_: + + asserteq( + array.new(3,3,L'|i,j| i==j and 1 or 0'), + {{1,0,0},{0,1,0},{0,0,1}} + ) + +Please note that most functions in `array2d` are _covariant_, that is, they +return an array of the same type as they receive. In particular, any objects +created with `data.new` or `matrix.new` will remain data or matrix objects when +reshaped or sliced, etc. Data objects have the `array2d` functions available as +methods. + + |