## Tables and Arrays

<a id="list"/>

### Python-style Lists

One of the elegant things about Lua is that tables do the job of both lists and
dicts (as called in Python) or vectors and maps, (as called in C++), and they do
it efficiently.  However, if we are dealing with 'tables with numerical indices'
we may as well call them lists and look for operations which particularly make
sense for lists. The Penlight `List` class was originally written by Nick Trout
for Lua 5.0, and translated to 5.1 and extended by myself.  It seemed that
borrowing from Python was a good idea, and this eventually grew into Penlight.

Here is an example showing `List` in action; it redefines `__tostring`, so that
it can print itself out more sensibly:

    > List = require 'pl.List'  --> automatic with require 'pl' <---
    > l = List()
    > l:append(10)
    > l:append(20)
    > = l
    {10,20}
    > l:extend {30,40}
    {10,20,30,40}
    > l:insert(1,5)
    {5,10,20,30,40}
    > = l:pop()
    40
    > = l
    {5,10,20,30}
    > = l:index(30)
    4
    > = l:contains(30)
    true
    > = l:reverse()  ---> note: doesn't make a copy!
    {30,20,10,5}

Although methods like `sort` and `reverse` operate in-place and change the list,
they do return the original list. This makes it possible to do _method chaining_,
like `ls = ls:append(10):append(20):reverse():append(1)`. But (and this is an
important but) no extra copy is made, so `ls` does not change identity. `List`
objects (like tables) are _mutable_, unlike strings. If you want a copy of a
list, then `List(ls)` will do the job, i.e. it acts like a copy constructor.
However, if passed any other table, `List` will just set the metatable of the
table and _not_ make a copy.

A particular feature of Python lists is _slicing_. This is fully supported in
this version of `List`, except we use 1-based indexing. So `List.slice` works
rather like `string.sub`:

    > l = List {10,20,30,40}
    > = l:slice(1,1)  ---> note: creates a new list!
    {10}
    > = l:slice(2,2)
    {20}
    > = l:slice(2,3)
    {20,30}
    > = l:slice(2,-2)
    {20,30}
    > = l:slice_assign(2,2,{21,22,23})
    {10,21,22,23,30,40}
    > = l:chop(1,1)
    {21,22,23,30,40}

Functions like `slice_assign` and `chop` modify the list; the first is equivalent
to Python`l[i1:i2] = seq` and the second to `del l[i1:i2]`.

List objects are ultimately just Lua 'list-like' tables, but they have extra
operations defined on them, such as equality and concatention.  For regular
tables, equality is only true if the two tables are _identical objects_, whereas
two lists are equal if they have the same contents, i.e. that `l1[i]==l2[i]` for
all elements.

    > l1 = List {1,2,3}
    > l2 = List {1,2,3}
    > = l1 == l2
    true
    > = l1..l2
    {1,2,3,1,2,3}

The `List` constructor can be passed a function. If so, it's assumed that this is
an iterator function that can be repeatedly called to generate a sequence.  One
such function is `io.lines`; the following short, intense little script counts
the number of lines in standard input:

    -- linecount.lua
    require 'pl'
    ls = List(io.lines())
    print(#ls)

`List.iterate` captures what `List` considers a sequence. In particular, it can
also iterate over all 'characters' in a string:

    > for ch in List.iterate 'help' do io.write(ch,' ') end
    h e l p >

Since the function `iterate` is used internally by the `List` constructor,
strings can be made into lists of character strings very easily.

There are a number of operations that go beyond the standard Python methods. For
instance, you can _partition_ a list into a table of sublists using a function.
In the simplest form, you use a predicate (a function returning a boolean value)
to partition the list into two lists, one of elements matching and another of
elements not matching. But you can use any function; if we use `type` then the
keys will be the standard Lua type names.

    > ls = List{1,2,3,4}
    > ops = require 'pl.operator'
    > ls:partition(function(x) return x > 2 end)
    {false={1,2},true={3,4}}
    > ls = List{'one',math.sin,List{1},10,20,List{1,2}}
    > ls:partition(type)
    {function={function: 00369110},string={one},number={10,20},table={{1},{1,2}}}

This is one `List` method which returns a table which is not a `List`. Bear in
mind that you can always call a `List` method on a plain table argument, so
`List.partition(t,type)` works as expected. But these functions will only operate
on the array part of the table.

The 'nominal' type of the returned table is `pl.Multimap`, which describes a mapping
between keys and multiple values. This does not mean that `pl.Multimap` is automatically
loaded whenever you use `partition` (or `List` for that matter); this is one of the
standard metatables which are only filled out when the appropriate module is loaded.
This allows tables to be tagged appropriately without causing excessive coupling.

Stacks occur everywhere in computing. `List` supports stack-like operations;
there is already `pop` (remove and return last value) and `append` acts like
`push` (add a value to the end). `push` is provided as an alias for `append`, and
the other stack operation (size) is simply the size operator `#`.  Queues can
also be implemented; you use `pop` to take values out of the queue, and `put` to
insert a value at the begining.

You may derive classes from `List`, and since the list-returning methods
are covariant, the result of `slice` etc will return lists of the derived type,
not `List`. For instance, consider the specialization of a `List` type that contains
numbers in `tests/test-list.lua`:

    n1 = NA{10,20,30}
    n2 = NA{1,2,3}
    ns = n1 + 2*n2
    asserteq(ns,{12,24,36})
    min,max = ns:slice(1,2):minmax()
    asserteq(T(min,max),T(12,24))
    asserteq(n1:normalize():sum(),1,1e-8)


### Map and Set classes

The `Map` class exposes what Python would call a 'dict' interface, and accesses
the hash part of the table. The name 'Map' is used to emphasize the interface,
not the implementation; it is an object which maps keys onto values; `m['alice']`
or the equivalent `m.alice` is the access operation.  This class also provides
explicit `set` and `get` methods, which are trivial for regular maps but get
interesting when `Map` is subclassed. The other operation is `update`, which
extends a map by copying the keys and values from another table, perhaps
overwriting existing keys:

    > Map = require 'pl.Map'
    > m = Map{one=1,two=2}
    > m:update {three=3,four=4,two=20}
    > = m == M{one=1,two=20,three=3,four=4}
    true

The method `values` returns a list of the values, and `keys` returns a list of
the keys; there is no guarantee of order. `getvalues` is given a list of keys and
returns a list of values associated with these keys:

    > m = Map{one=1,two=2,three=3}
    > = m:getvalues {'one','three'}
    {1,3}
    > = m:getvalues(m:keys()) == m:values()
    true

When querying the value of a `Map`, it is best to use the `get` method:

    > print(m:get 'one', m:get 'two')
    1     2

The reason is that `m[key]` can be ambiguous; due to the current implementation,
`m["get"]` will always succeed, because if a value is not present in the map, it
will be looked up in the `Map` metatable, which contains a method `get`. There is
currently no simple solution to this annoying restriction.

There are some useful classes which inherit from `Map`. An `OrderedMap` behaves
like a `Map` but keeps its keys in order if you use its `set` method to add keys
and values.  Like all the 'container' classes in Penlight, it defines an `iter`
method for iterating over its values; this will return the keys and values in the
order of insertion; the `keys` and `values` methods likewise.

A `MultiMap` allows multiple values to be associated with a given key. So `set`
(as before) takes a key and a value, but calling it with the same key and a
different value does not overwrite but adds a new value. `get` (or using `[]`)
will return a list of values.

A `Set` can be seen as a special kind of `Map`, where all the values are `true`,
the keys are the values, and the order is not important. So in this case
`Set.values` is defined to return a list of the keys.  Sets can display
themselves, and the basic operations like `union` (`+`) and `intersection` (`*`)
are defined.

    > Set = require 'pl.Set'
    > = Set{'one','two'} == Set{'two','one'}
    true
    > fruit = Set{'apple','banana','orange'}
    > = fruit['banana']
    true
    > = fruit['hazelnut']
    nil
    > = fruit:values()
    {apple,orange,banana}
    > colours = Set{'red','orange','green','blue'}
    > = fruit,colours
    [apple,orange,banana]   [blue,green,orange,red]
    > = fruit+colours
    [blue,green,apple,red,orange,banana]
    > = fruit*colours
    [orange]

There are also the functions `Set.difference` and `Set.symmetric_difference`. The
first answers the question 'what fruits are not colours?' and the second 'what
are fruits and colours but not both?'

    > = fruit - colours
    [apple,banana]
    > = fruit ^ colours
    [blue,green,apple,red,banana]

Adding elements to a set is simply `fruit['peach'] = true` and removing is
`fruit['apple'] = nil` . To make this simplicity work properly, the `Set` class has no
methods - either you use the operator forms or explicitly use `Set.intersect`
etc. In this way we avoid the ambiguity that plagues `Map`.


(See `pl.Map` and `pl.Set`)

### Useful Operations on Tables

@lookup pl.tablex

Some notes on terminology: Lua tables are usually _list-like_ (like an array) or
_map-like_ (like an associative array or dict); they can of course have a
list-like and a map-like part. Some of the table operations only make sense for
list-like tables, and some only for map-like tables. (The usual Lua terminology
is the array part and the hash part of the table, which reflects the actual
implementation used; it is more accurate to say that a Lua table is an
associative map which happens to be particularly efficient at acting like an
array.)

The functions provided in `table` provide all the basic manipulations on Lua
tables, but as we saw with the `List` class, it is useful to build higher-level
operations on top of those functions. For instance, to copy a table involves this
kind of loop:

    local res = {}
    for k,v in pairs(T) do
        res[k] = v
    end
    return res

The `tablex` module provides this as `copy`, which does a _shallow_ copy of a
table. There is also `deepcopy` which goes further than a simple loop in two
ways; first, it also gives the copy the same metatable as the original (so it can
copy objects like `List` above) and any nested tables will also be copied, to
arbitrary depth. There is also `icopy` which operates on list-like tables, where
you can set optionally set the start index of the source and destination as well.
It ensures that any left-over elements will be deleted:

    asserteq(icopy({1,2,3,4,5,6},{20,30}),{20,30})   -- start at 1
    asserteq(icopy({1,2,3,4,5,6},{20,30},2),{1,20,30}) -- start at 2
    asserteq(icopy({1,2,3,4,5,6},{20,30},2,2),{1,30}) -- start at 2, copy from 2

(This code from the `tablex` test module shows the use of `pl.test.asserteq`)

Whereas, `move` overwrites but does not delete the rest of the destination:

    asserteq(move({1,2,3,4,5,6},{20,30}),{20,30,3,4,5,6})
    asserteq(move({1,2,3,4,5,6},{20,30},2),{1,20,30,4,5,6})
    asserteq(move({1,2,3,4,5,6},{20,30},2,2),{1,30,3,4,5,6})

(The difference is somewhat like that between C's `strcpy` and `memmove`.)

To summarize, use `copy` or `deepcopy` to make a copy of an arbitrary table. To
copy into a map-like table, use `update`; to copy into a list-like table use
`icopy`, and `move` if you are updating a range in the destination.

To complete this set of operations, there is `insertvalues` which works like
`table.insert` except that one provides a table of values to be inserted, and
`removevalues` which removes a range of values.

    asserteq(insertvalues({1,2,3,4},2,{20,30}),{1,20,30,2,3,4})
    asserteq(insertvalues({1,2},{3,4}),{1,2,3,4})

Another example:

    > T = require 'pl.tablex'
    > t = {10,20,30,40}
    > = T.removevalues(t,2,3)
    {10,40}
    > = T.insertvalues(t,2,{20,30})
    {10,20,30,40}


In a similar spirit to `deepcopy`, `deepcompare` will take two tables and return
true only if they have exactly the same values and structure.

    > t1 = {1,{2,3},4}
    > t2 = deepcopy(t1)
    > = t1 == t2
    false
    > = deepcompare(t1,t2)
    true

`find` will return the index of a given value in a list-like table. Note that
like `string.find` you can specify an index to start searching, so that all
instances can be found. There is an optional fourth argument, which makes the
search start at the end and go backwards, so we could define `rfind` like so:

    function rfind(t,val,istart)
        return tablex.find(t,val,istart,true)
    end

`find` does a linear search, so it can slow down code that depends on it.  If
efficiency is required for large tables, consider using an _index map_.
`index_map` will return a table where the keys are the original values of the
list, and the associated values are the indices. (It is almost exactly the
representation needed for a _set_.)

    > t = {'one','two','three'}
    > = tablex.find(t,'two')
    2
    > = tablex.find(t,'four')
    nil
    > il = tablex.index_map(t)
    > = il['two']
    2
    > = il.two
    2

A version of `index_map` called `makeset` is also provided, where the values are
just `true`. This is useful because two such sets can be compared for equality
using `deepcompare`:

    > = deepcompare(makeset {1,2,3},makeset {2,1,3})
    true

Consider the problem of determining the new employees that have joined in a
period. Assume we have two files of employee names:

    (last-month.txt)
    smith,john
    brady,maureen
    mongale,thabo

    (this-month.txt)
    smith,john
    smit,johan
    brady,maureen
    mogale,thabo
    van der Merwe,Piet

To find out differences, just make the employee lists into sets, like so:

    require 'pl'

    function read_employees(file)
      local ls = List(io.lines(file)) -- a list of employees
      return tablex.makeset(ls)
    end

    last = read_employees 'last-month.txt'
    this = read_employees 'this-month.txt'

    -- who is in this but not in last?
    diff = tablex.difference(this,last)

    -- in a set, the keys are the values...
    for e in pairs(diff) do print(e) end

    --  *output*
    -- van der Merwe,Piet
    -- smit,johan

The `difference` operation is easy to write and read:

    for e in pairs(this) do
      if not last[e] then
        print(e)
      end
    end

Using `difference` here is not that it is a tricky thing to code, it is that you
are stating your intentions clearly to other readers of your code. (And naturally
to your future self, in six months time.)

`find_if` will search a table using a function. The optional third argument is a
value which will be passed as a second argument to the function. `pl.operator`
provides the Lua operators conveniently wrapped as functions, so the basic
comparison functions are available:

    > ops = require 'pl.operator'
    > = tablex.find_if({10,20,30,40},ops.gt,20)
    3       true

Note that `find_if` will also return the _actual value_ returned by the function,
which of course is usually just  `true` for a boolean function, but any value
which is not `nil` and not `false` can be usefully passed back.

`deepcompare` does a thorough recursive comparison, but otherwise using the
default equality operator. `compare` allows you to specify exactly what function
to use when comparing two list-like tables, and `compare_no_order` is true if
they contain exactly the same elements. Do note that the latter does not need an
explicit comparison function - in this case the implementation is actually to
compare the two sets, as above:

    > = compare_no_order({1,2,3},{2,1,3})
    true
    > = compare_no_order({1,2,3},{2,1,3},'==')
    true

(Note the special string '==' above; instead of saying `ops.gt` or `ops.eq` we
can use the strings '>' or '==' respectively.)

`sort` and `sortv` return iterators that will iterate through the
sorted elements of a table. `sort` iterates by sorted key order, and
`sortv` iterates by sorted value order. For example, given a table
with names and ages, it is trivial to iterate over the elements:

    > t = {john=27,jane=31,mary=24}
    > for name,age in tablex.sort(t) do print(name,age) end
    jane    31
    john    27
    mary    24
    > for name,age in tablex.sortv(t) do print(name,age) end
    mary    24
    john    27
    jane    31

There are several ways to merge tables in PL. If they are list-like, then see the
operations defined by `pl.List`, like concatenation. If they are map-like, then
`merge` provides two basic operations. If the third arg is false, then the result
only contains the keys that are in common between the two tables, and if true,
then the result contains all the keys of both tables. These are in fact
generalized set union and intersection operations:

    > S1 = {john=27,jane=31,mary=24}
    > S2 = {jane=31,jones=50}
    > = tablex.merge(S1, S2, false)
    {jane=31}
    > = tablex.merge(S1, S2, true)
    {mary=24,jane=31,john=27,jones=50}

When working with tables, you will often find yourself writing loops like in the
first example. Loops are second nature to programmers, but they are often not the
most elegant and self-describing way of expressing an operation. Consider the
`map` function, which creates a new table by applying a function to each element
of the original:

    > = map(math.sin, {1,2,3,4})
    {  0.84,  0.91,  0.14, -0.76}
    > = map(function(x) return x*x end, {1,2,3,4})
    {1,4,9,16}

`map` saves you from writing a loop, and the resulting code is often clearer, as
well as being shorter. This is not to say that 'loops are bad' (although you will
hear that from some extremists), just that it's good to capture standard
patterns. Then the loops you do write will stand out and acquire more significance.

`pairmap` is interesting, because the function works with both the key and the
value.

    > t = {fred=10,bonzo=20,alice=4}
    > = pairmap(function(k,v) return v end, t)
    {4,10,20}
    > = pairmap(function(k,v) return k end, t)
    {'alice','fred','bonzo'}

(These are common enough operations that the first is defined as `values` and the
second as `keys`.) If the function returns two values, then the _second_ value is
considered to be the new key:

    > = pairmap(t,function(k,v) return v+10, k:upper() end)
    {BONZO=30,FRED=20,ALICE=14}

`map2` applies a function to two tables:

    > map2(ops.add,{1,2},{10,20})
    {11,22}
    > map2('*',{1,2},{10,20})
    {10,40}

The various map operations generate tables; `reduce` applies a function of two
arguments over a table and returns the result as a scalar:

    > reduce ('+', {1,2,3})
    6
    > reduce ('..', {'one','two','three'})
    'onetwothree'

Finally, `zip` sews different tables together:

    > = zip({1,2,3},{10,20,30})
    {{1,10},{2,20},{3,30}}

Browsing through the documentation, you will find that `tablex` and `List` share
methods.  For instance, `tablex.imap` and `List.map` are basically the same
function; they both operate over the array-part of the table and generate another
table. This can also be expressed as a _list comprehension_ `C 'f(x) for x' (t)`
which makes the operation more explicit. So why are there different ways to do
the same thing? The main reason is that not all tables are Lists: the expression
`ls:map('#')` will return a _list_ of the lengths of any elements of `ls`. A list
is a thin wrapper around a table, provided by the metatable `List`. Sometimes you
may wish to work with ordinary Lua tables; the `List` interface is not a
compulsory way to use Penlight table operations.

### Operations on two-dimensional tables

@lookup pl.array2d

Two-dimensional tables are of course easy to represent in Lua, for instance
`{{1,2},{3,4}}` where we store rows as subtables and index like so `A[col][row]`.
This is the common representation used by matrix libraries like
[LuaMatrix](http://lua-users.org/wiki/LuaMatrix). `pl.array2d` does not provide
matrix operations, since that is the job for a specialized library, but rather
provides generalizations of the higher-level operations provided by `pl.tablex`
for one-dimensional arrays.

`iter` is a useful generalization of `ipairs`. (The extra parameter determines
whether you want the indices as well.)

    > a = {{1,2},{3,4}}
    > for i,j,v in array2d.iter(a,true) do print(i,j,v) end
    1       1       1
    1       2       2
    2       1       3
    2       2       4

Note that you can always convert an arbitrary 2D array into a 'list of lists'
with `List(tablex.map(List,a))`

`map` will apply a function over all elements (notice that extra arguments can be
provided, so this operation is in effect `function(x) return x-1 end`)

    > array2d.map('-',a,1)
    {{0,1},{2,3}}

2D arrays are stored as an array of rows, but columns can be extracted:

    > array2d.column(a,1)
    {1,3}

There are three equivalents to `tablex.reduce`. You can either reduce along the
rows (which is the most efficient) or reduce along the columns. Either one will
give you a 1D array. And `reduce2` will apply two operations: the first one
reduces the rows, and the second reduces the result.

    > array2d.reduce_rows('+',a)
    {3,7}
    > array2d.reduce_cols('+',a)
    {4,6}
    > -- same as tablex.reduce('*',array.reduce_rows('+',a))
    > array2d.reduce2('*','+',a)
    21    `

`tablex.map2` applies an operation to two tables, giving another table.
`array2d.map2` does this for 2D arrays. Note that you have to provide the _rank_
of the arrays involved, since it's hard to always correctly deduce this from the
data:

    > b = {{10,20},{30,40}}
    > a = {{1,2},{3,4}}
    > = array2d.map2('+',2,2,a,b)  -- two 2D arrays
    {{11,22},{33,44}}
    > = array2d.map2('+',1,2,{10,100},a)  -- 1D, 2D
    {{11,102},{13,104}}
    > = array2d.map2('*',2,1,a,{1,-1})  -- 2D, 1D
    {{1,-2},{3,-4}}

Of course, you are not limited to simple arithmetic. Say we have a 2D array of
strings, and wish to print it out with proper right justification. The first step
is to create all the string lengths by mapping `string.len` over the array, the
second is to reduce this along the columns using `math.max` to get maximum column
widths, and last, apply `stringx.rjust` with these widths.

    maxlens = reduce_cols(math.max,map('#',lines))
    lines = map2(stringx.rjust,2,1,lines,maxlens)

There is `product` which returns  the _Cartesian product_ of two 1D arrays. The
result is a 2D array formed from applying the function to all possible pairs from
the two arrays.

    > array2d.product('{}',{1,2},{'a','b'})
    {{{1,'b'},{2,'a'}},{{1,'a'},{2,'b'}}}

There is a set of operations which work in-place on 2D arrays. You can
`swap_rows` and `swap_cols`; the first really is a simple one-liner, but the idea
here is to give the operation a name. `remove_row` and `remove_col` are
generalizations of `table.remove`. Likewise, `extract_rows` and `extract_cols`
are given arrays of indices and discard anything else. So, for instance,
`extract_cols(A,{2,4})` will leave just columns 2 and 4 in the array.

`List.slice` is often useful on 1D arrays; `slice` does the same thing, but is
generally given a start (row,column) and a end (row,column).

    > A = {{1,2,3},{4,5,6},{7,8,9}}
    > B = slice(A,1,1,2,2)
    > write(B)
     1 2
     4 5
    > B = slice(A,2,2)
    > write(B,nil,'%4.1f')
     5.0 6.0
     8.0 9.0

Here `write` is used to print out an array nicely; the second parameter is `nil`,
which is the default (stdout) but can be any file object and the third parameter
is an optional format (as used in `string.format`).

`parse_range` will take a spreadsheet range like 'A1:B2' or 'R1C1:R2C2' and
return the range as four numbers, which can be passed to `slice`. The rule is
that `slice` will return an array of the appropriate shape depending on the
range; if a range represents a row or a column, the result is 1D, otherwise 2D.

This applies to `iter` as well, which can also optionally be given a range:


    > for i,j,v in iter(A,true,2,2) do print(i,j,v) end
    2       2       5
    2       3       6
    3       2       8
    3       3       9

`new` will construct a new 2D array with the given dimensions. You provide an
initial value for the elements, which is interpreted as a function if it's
callable. With `L` being `utils.string_lambda` we then have the following way to
make an _identity matrix_:

    asserteq(
        array.new(3,3,L'|i,j| i==j and 1 or 0'),
        {{1,0,0},{0,1,0},{0,0,1}}
    )

Please note that most functions in `array2d` are _covariant_, that is, they
return an array of the same type as they receive.  In particular, any objects
created with `data.new` or `matrix.new` will remain data or matrix objects when
reshaped or sliced, etc.  Data objects have the `array2d` functions available as
methods.