Penlight Documentation

+ + + + + + +

+ + +

Tables and Arrays

+ +

Python-style Lists

+ +

One of the elegant things about Lua is that tables do the job of both lists and +dicts (as called in Python) or vectors and maps, (as called in C++), and they do +it efficiently. However, if we are dealing with 'tables with numerical indices' +we may as well call them lists and look for operations which particularly make +sense for lists. The Penlight List class was originally written by Nick Trout +for Lua 5.0, and translated to 5.1 and extended by myself. It seemed that +borrowing from Python was a good idea, and this eventually grew into Penlight.

+ +

Here is an example showing List in action; it redefines __tostring, so that +it can print itself out more sensibly:

+ + +

+> List = require 'pl.List'  --> automatic with require 'pl' <---
+> l = List()
+> l:append(10)
+> l:append(20)
+> = l
+{10,20}
+> l:extend {30,40}
+{10,20,30,40}
+> l:insert(1,5)
+{5,10,20,30,40}
+> = l:pop()
+40
+> = l
+{5,10,20,30}
+> = l:index(30)
+4
+> = l:contains(30)
+true
+> = l:reverse()  ---> note: doesn't make a copy!
+{30,20,10,5}
+

+ +

Although methods like sort and reverse operate in-place and change the list, +they do return the original list. This makes it possible to do method chaining, +like ls = ls:append(10):append(20):reverse():append(1). But (and this is an +important but) no extra copy is made, so ls does not change identity. List +objects (like tables) are mutable, unlike strings. If you want a copy of a +list, then List(ls) will do the job, i.e. it acts like a copy constructor. +However, if passed any other table, List will just set the metatable of the +table and not make a copy.

+ +

A particular feature of Python lists is slicing. This is fully supported in +this version of List, except we use 1-based indexing. So List.slice works +rather like string.sub:

+ + +

+> l = List {10,20,30,40}
+> = l:slice(1,1)  ---> note: creates a new list!
+{10}
+> = l:slice(2,2)
+{20}
+> = l:slice(2,3)
+{20,30}
+> = l:slice(2,-2)
+{20,30}
+> = l:slice_assign(2,2,{21,22,23})
+{10,21,22,23,30,40}
+> = l:chop(1,1)
+{21,22,23,30,40}
+

+ +

Functions like slice_assign and chop modify the list; the first is equivalent +to Pythonl[i1:i2] = seq and the second to del l[i1:i2].

+ +

List objects are ultimately just Lua 'list-like' tables, but they have extra +operations defined on them, such as equality and concatention. For regular +tables, equality is only true if the two tables are identical objects, whereas +two lists are equal if they have the same contents, i.e. that l1[i]==l2[i] for +all elements.

+ + +

+> l1 = List {1,2,3}
+> l2 = List {1,2,3}
+> = l1 == l2
+true
+> = l1..l2
+{1,2,3,1,2,3}
+

+ +

The List constructor can be passed a function. If so, it's assumed that this is +an iterator function that can be repeatedly called to generate a sequence. One +such function is io.lines; the following short, intense little script counts +the number of lines in standard input:

+ + +

+-- linecount.lua
+require 'pl'
+ls = List(io.lines())
+print(#ls)
+

+ +

List.iterate captures what List considers a sequence. In particular, it can +also iterate over all 'characters' in a string:

+ + +

+> for ch in List.iterate 'help' do io.write(ch,' ') end
+h e l p >
+

+ +

Since the function iterate is used internally by the List constructor, +strings can be made into lists of character strings very easily.

+ +

There are a number of operations that go beyond the standard Python methods. For +instance, you can partition a list into a table of sublists using a function. +In the simplest form, you use a predicate (a function returning a boolean value) +to partition the list into two lists, one of elements matching and another of +elements not matching. But you can use any function; if we use type then the +keys will be the standard Lua type names.

+ + +

+> ls = List{1,2,3,4}
+> ops = require 'pl.operator'
+> ls:partition(function(x) return x > 2 end)
+{false={1,2},true={3,4}}
+> ls = List{'one',math.sin,List{1},10,20,List{1,2}}
+> ls:partition(type)
+{function={function: 00369110},string={one},number={10,20},table={{1},{1,2}}}
+

+ +

This is one List method which returns a table which is not a List. Bear in +mind that you can always call a List method on a plain table argument, so +List.partition(t,type) works as expected. But these functions will only operate +on the array part of the table.

+ +

The 'nominal' type of the returned table is pl.Multimap, which describes a mapping +between keys and multiple values. This does not mean that pl.Multimap is automatically +loaded whenever you use partition (or List for that matter); this is one of the +standard metatables which are only filled out when the appropriate module is loaded. +This allows tables to be tagged appropriately without causing excessive coupling.

+ +

Stacks occur everywhere in computing. List supports stack-like operations; +there is already pop (remove and return last value) and append acts like +push (add a value to the end). push is provided as an alias for append, and +the other stack operation (size) is simply the size operator #. Queues can +also be implemented; you use pop to take values out of the queue, and put to +insert a value at the begining.

+ +

You may derive classes from List, and since the list-returning methods +are covariant, the result of slice etc will return lists of the derived type, +not List. For instance, consider the specialization of a List type that contains +numbers in tests/test-list.lua:

+ + +

+n1 = NA{10,20,30}
+n2 = NA{1,2,3}
+ns = n1 + 2*n2
+asserteq(ns,{12,24,36})
+min,max = ns:slice(1,2):minmax()
+asserteq(T(min,max),T(12,24))
+asserteq(n1:normalize():sum(),1,1e-8)
+

+ +

Map and Set classes

+ +

The Map class exposes what Python would call a 'dict' interface, and accesses +the hash part of the table. The name 'Map' is used to emphasize the interface, +not the implementation; it is an object which maps keys onto values; m['alice'] +or the equivalent m.alice is the access operation. This class also provides +explicit set and get methods, which are trivial for regular maps but get +interesting when Map is subclassed. The other operation is update, which +extends a map by copying the keys and values from another table, perhaps +overwriting existing keys:

+ + +

+> Map = require 'pl.Map'
+> m = Map{one=1,two=2}
+> m:update {three=3,four=4,two=20}
+> = m == M{one=1,two=20,three=3,four=4}
+true
+

+ +

The method values returns a list of the values, and keys returns a list of +the keys; there is no guarantee of order. getvalues is given a list of keys and +returns a list of values associated with these keys:

+ + +

+> m = Map{one=1,two=2,three=3}
+> = m:getvalues {'one','three'}
+{1,3}
+> = m:getvalues(m:keys()) == m:values()
+true
+

+ +

When querying the value of a Map, it is best to use the get method:

+ + +

+> print(m:get 'one', m:get 'two')
+1     2
+

+ +

The reason is that m[key] can be ambiguous; due to the current implementation, +m["get"] will always succeed, because if a value is not present in the map, it +will be looked up in the Map metatable, which contains a method get. There is +currently no simple solution to this annoying restriction.

+ +

There are some useful classes which inherit from Map. An OrderedMap behaves +like a Map but keeps its keys in order if you use its set method to add keys +and values. Like all the 'container' classes in Penlight, it defines an iter +method for iterating over its values; this will return the keys and values in the +order of insertion; the keys and values methods likewise.

+ +

A MultiMap allows multiple values to be associated with a given key. So set +(as before) takes a key and a value, but calling it with the same key and a +different value does not overwrite but adds a new value. get (or using []) +will return a list of values.

+ +

A Set can be seen as a special kind of Map, where all the values are true, +the keys are the values, and the order is not important. So in this case +Set.values is defined to return a list of the keys. Sets can display +themselves, and the basic operations like union (+) and intersection (*) +are defined.

+ + +

+> Set = require 'pl.Set'
+> = Set{'one','two'} == Set{'two','one'}
+true
+> fruit = Set{'apple','banana','orange'}
+> = fruit['banana']
+true
+> = fruit['hazelnut']
+nil
+> = fruit:values()
+{apple,orange,banana}
+> colours = Set{'red','orange','green','blue'}
+> = fruit,colours
+[apple,orange,banana]   [blue,green,orange,red]
+> = fruit+colours
+[blue,green,apple,red,orange,banana]
+> = fruit*colours
+[orange]
+

+ +

There are also the functions Set.difference and Set.symmetric_difference. The +first answers the question 'what fruits are not colours?' and the second 'what +are fruits and colours but not both?'

+ + +

+> = fruit - colours
+[apple,banana]
+> = fruit ^ colours
+[blue,green,apple,red,banana]
+

+ +

Adding elements to a set is simply fruit['peach'] = true and removing is +fruit['apple'] = nil . To make this simplicity work properly, the Set class has no +methods - either you use the operator forms or explicitly use Set.intersect +etc. In this way we avoid the ambiguity that plagues Map.

+ + +

(See pl.Map and pl.Set)

+ +

Useful Operations on Tables

+ + +

Some notes on terminology: Lua tables are usually list-like (like an array) or +map-like (like an associative array or dict); they can of course have a +list-like and a map-like part. Some of the table operations only make sense for +list-like tables, and some only for map-like tables. (The usual Lua terminology +is the array part and the hash part of the table, which reflects the actual +implementation used; it is more accurate to say that a Lua table is an +associative map which happens to be particularly efficient at acting like an +array.)

+ +

The functions provided in table provide all the basic manipulations on Lua +tables, but as we saw with the List class, it is useful to build higher-level +operations on top of those functions. For instance, to copy a table involves this +kind of loop:

+ + +

+local res = {}
+for k,v in pairs(T) do
+    res[k] = v
+end
+return res
+

+ +

The tablex module provides this as copy, which does a shallow copy of a +table. There is also deepcopy which goes further than a simple loop in two +ways; first, it also gives the copy the same metatable as the original (so it can +copy objects like List above) and any nested tables will also be copied, to +arbitrary depth. There is also icopy which operates on list-like tables, where +you can set optionally set the start index of the source and destination as well. +It ensures that any left-over elements will be deleted:

+ + +

+asserteq(icopy({1,2,3,4,5,6},{20,30}),{20,30})   -- start at 1
+asserteq(icopy({1,2,3,4,5,6},{20,30},2),{1,20,30}) -- start at 2
+asserteq(icopy({1,2,3,4,5,6},{20,30},2,2),{1,30}) -- start at 2, copy from 2
+

+ +

(This code from the tablex test module shows the use of pl.test.asserteq)

+ +

Whereas, move overwrites but does not delete the rest of the destination:

+ + +

+asserteq(move({1,2,3,4,5,6},{20,30}),{20,30,3,4,5,6})
+asserteq(move({1,2,3,4,5,6},{20,30},2),{1,20,30,4,5,6})
+asserteq(move({1,2,3,4,5,6},{20,30},2,2),{1,30,3,4,5,6})
+

+ +

(The difference is somewhat like that between C's strcpy and memmove.)

+ +

To summarize, use copy or deepcopy to make a copy of an arbitrary table. To +copy into a map-like table, use update; to copy into a list-like table use +icopy, and move if you are updating a range in the destination.

+ +

To complete this set of operations, there is insertvalues which works like +table.insert except that one provides a table of values to be inserted, and +removevalues which removes a range of values.

+ + +

+asserteq(insertvalues({1,2,3,4},2,{20,30}),{1,20,30,2,3,4})
+asserteq(insertvalues({1,2},{3,4}),{1,2,3,4})
+

+ +

Another example:

+ + +

+> T = require 'pl.tablex'
+> t = {10,20,30,40}
+> = T.removevalues(t,2,3)
+{10,40}
+> = T.insertvalues(t,2,{20,30})
+{10,20,30,40}
+

+ +

In a similar spirit to deepcopy, deepcompare will take two tables and return +true only if they have exactly the same values and structure.

+ + +

+> t1 = {1,{2,3},4}
+> t2 = deepcopy(t1)
+> = t1 == t2
+false
+> = deepcompare(t1,t2)
+true
+

+ +

find will return the index of a given value in a list-like table. Note that +like string.find you can specify an index to start searching, so that all +instances can be found. There is an optional fourth argument, which makes the +search start at the end and go backwards, so we could define rfind like so:

+ + +

+function rfind(t,val,istart)
+    return tablex.find(t,val,istart,true)
+end
+

+ +

find does a linear search, so it can slow down code that depends on it. If +efficiency is required for large tables, consider using an index map. +index_map will return a table where the keys are the original values of the +list, and the associated values are the indices. (It is almost exactly the +representation needed for a set.)

+ + +

+> t = {'one','two','three'}
+> = tablex.find(t,'two')
+2
+> = tablex.find(t,'four')
+nil
+> il = tablex.index_map(t)
+> = il['two']
+2
+> = il.two
+2
+

+ +

A version of index_map called makeset is also provided, where the values are +just true. This is useful because two such sets can be compared for equality +using deepcompare:

+ + +

+> = deepcompare(makeset {1,2,3},makeset {2,1,3})
+true
+

+ +

Consider the problem of determining the new employees that have joined in a +period. Assume we have two files of employee names:

+ + +

+(last-month.txt)
+smith,john
+brady,maureen
+mongale,thabo
+
+(this-month.txt)
+smith,john
+smit,johan
+brady,maureen
+mogale,thabo
+van der Merwe,Piet
+

+ +

To find out differences, just make the employee lists into sets, like so:

+ + +

+require 'pl'
+
+function read_employees(file)
+  local ls = List(io.lines(file)) -- a list of employees
+  return tablex.makeset(ls)
+end
+
+last = read_employees 'last-month.txt'
+this = read_employees 'this-month.txt'
+
+-- who is in this but not in last?
+diff = tablex.difference(this,last)
+
+-- in a set, the keys are the values...
+for e in pairs(diff) do print(e) end
+
+--  *output*
+-- van der Merwe,Piet
+-- smit,johan
+

+ +

The difference operation is easy to write and read:

+ + +

+for e in pairs(this) do
+  if not last[e] then
+    print(e)
+  end
+end
+

+ +

Using difference here is not that it is a tricky thing to code, it is that you +are stating your intentions clearly to other readers of your code. (And naturally +to your future self, in six months time.)

+ +

find_if will search a table using a function. The optional third argument is a +value which will be passed as a second argument to the function. pl.operator +provides the Lua operators conveniently wrapped as functions, so the basic +comparison functions are available:

+ + +

+> ops = require 'pl.operator'
+> = tablex.find_if({10,20,30,40},ops.gt,20)
+3       true
+

+ +

Note that find_if will also return the actual value returned by the function, +which of course is usually just true for a boolean function, but any value +which is not nil and not false can be usefully passed back.

+ +

deepcompare does a thorough recursive comparison, but otherwise using the +default equality operator. compare allows you to specify exactly what function +to use when comparing two list-like tables, and compare_no_order is true if +they contain exactly the same elements. Do note that the latter does not need an +explicit comparison function - in this case the implementation is actually to +compare the two sets, as above:

+ + +

+> = compare_no_order({1,2,3},{2,1,3})
+true
+> = compare_no_order({1,2,3},{2,1,3},'==')
+true
+

+ +

(Note the special string '==' above; instead of saying ops.gt or ops.eq we +can use the strings '>' or '==' respectively.)

+ +

sort and sortv return iterators that will iterate through the +sorted elements of a table. sort iterates by sorted key order, and +sortv iterates by sorted value order. For example, given a table +with names and ages, it is trivial to iterate over the elements:

+ + +

+> t = {john=27,jane=31,mary=24}
+> for name,age in tablex.sort(t) do print(name,age) end
+jane    31
+john    27
+mary    24
+> for name,age in tablex.sortv(t) do print(name,age) end
+mary    24
+john    27
+jane    31
+

+ +

There are several ways to merge tables in PL. If they are list-like, then see the +operations defined by pl.List, like concatenation. If they are map-like, then +merge provides two basic operations. If the third arg is false, then the result +only contains the keys that are in common between the two tables, and if true, +then the result contains all the keys of both tables. These are in fact +generalized set union and intersection operations:

+ + +

+> S1 = {john=27,jane=31,mary=24}
+> S2 = {jane=31,jones=50}
+> = tablex.merge(S1, S2, false)
+{jane=31}
+> = tablex.merge(S1, S2, true)
+{mary=24,jane=31,john=27,jones=50}
+

+ +

When working with tables, you will often find yourself writing loops like in the +first example. Loops are second nature to programmers, but they are often not the +most elegant and self-describing way of expressing an operation. Consider the +map function, which creates a new table by applying a function to each element +of the original:

+ + +

+> = map(math.sin, {1,2,3,4})
+{  0.84,  0.91,  0.14, -0.76}
+> = map(function(x) return x*x end, {1,2,3,4})
+{1,4,9,16}
+

+ +

map saves you from writing a loop, and the resulting code is often clearer, as +well as being shorter. This is not to say that 'loops are bad' (although you will +hear that from some extremists), just that it's good to capture standard +patterns. Then the loops you do write will stand out and acquire more significance.

+ +

pairmap is interesting, because the function works with both the key and the +value.

+ + +

+> t = {fred=10,bonzo=20,alice=4}
+> = pairmap(function(k,v) return v end, t)
+{4,10,20}
+> = pairmap(function(k,v) return k end, t)
+{'alice','fred','bonzo'}
+

+ +

(These are common enough operations that the first is defined as values and the +second as keys.) If the function returns two values, then the second value is +considered to be the new key:

+ + +

+> = pairmap(t,function(k,v) return v+10, k:upper() end)
+{BONZO=30,FRED=20,ALICE=14}
+

+ +

map2 applies a function to two tables:

+ + +

+> map2(ops.add,{1,2},{10,20})
+{11,22}
+> map2('*',{1,2},{10,20})
+{10,40}
+

+ +

The various map operations generate tables; reduce applies a function of two +arguments over a table and returns the result as a scalar:

+ + +

+> reduce ('+', {1,2,3})
+6
+> reduce ('..', {'one','two','three'})
+'onetwothree'
+

+ +

Finally, zip sews different tables together:

+ + +

+> = zip({1,2,3},{10,20,30})
+{{1,10},{2,20},{3,30}}
+

+ +

Browsing through the documentation, you will find that tablex and List share +methods. For instance, tablex.imap and List.map are basically the same +function; they both operate over the array-part of the table and generate another +table. This can also be expressed as a list comprehension C 'f(x) for x' (t) +which makes the operation more explicit. So why are there different ways to do +the same thing? The main reason is that not all tables are Lists: the expression +ls:map('#') will return a list of the lengths of any elements of ls. A list +is a thin wrapper around a table, provided by the metatable List. Sometimes you +may wish to work with ordinary Lua tables; the List interface is not a +compulsory way to use Penlight table operations.

+ +

Operations on two-dimensional tables

+ + +

Two-dimensional tables are of course easy to represent in Lua, for instance +{{1,2},{3,4}} where we store rows as subtables and index like so A[col][row]. +This is the common representation used by matrix libraries like +LuaMatrix. pl.array2d does not provide +matrix operations, since that is the job for a specialized library, but rather +provides generalizations of the higher-level operations provided by pl.tablex +for one-dimensional arrays.

+ +

iter is a useful generalization of ipairs. (The extra parameter determines +whether you want the indices as well.)

+ + +

+> a = {{1,2},{3,4}}
+> for i,j,v in array2d.iter(a,true) do print(i,j,v) end
+1       1       1
+1       2       2
+2       1       3
+2       2       4
+

+ +

Note that you can always convert an arbitrary 2D array into a 'list of lists' +with List(tablex.map(List,a))

+ +

map will apply a function over all elements (notice that extra arguments can be +provided, so this operation is in effect function(x) return x-1 end)

+ + +

+> array2d.map('-',a,1)
+{{0,1},{2,3}}
+

+ +

2D arrays are stored as an array of rows, but columns can be extracted:

+ + +

+> array2d.column(a,1)
+{1,3}
+

+ +

There are three equivalents to tablex.reduce. You can either reduce along the +rows (which is the most efficient) or reduce along the columns. Either one will +give you a 1D array. And reduce2 will apply two operations: the first one +reduces the rows, and the second reduces the result.

+ + +

+> array2d.reduce_rows('+',a)
+{3,7}
+> array2d.reduce_cols('+',a)
+{4,6}
+> -- same as tablex.reduce('*',array.reduce_rows('+',a))
+> array2d.reduce2('*','+',a)
+21    `
+

+ +

tablex.map2 applies an operation to two tables, giving another table. +array2d.map2 does this for 2D arrays. Note that you have to provide the rank +of the arrays involved, since it's hard to always correctly deduce this from the +data:

+ + +

+> b = {{10,20},{30,40}}
+> a = {{1,2},{3,4}}
+> = array2d.map2('+',2,2,a,b)  -- two 2D arrays
+{{11,22},{33,44}}
+> = array2d.map2('+',1,2,{10,100},a)  -- 1D, 2D
+{{11,102},{13,104}}
+> = array2d.map2('*',2,1,a,{1,-1})  -- 2D, 1D
+{{1,-2},{3,-4}}
+

+ +

Of course, you are not limited to simple arithmetic. Say we have a 2D array of +strings, and wish to print it out with proper right justification. The first step +is to create all the string lengths by mapping string.len over the array, the +second is to reduce this along the columns using math.max to get maximum column +widths, and last, apply stringx.rjust with these widths.

+ + +

+maxlens = reduce_cols(math.max,map('#',lines))
+lines = map2(stringx.rjust,2,1,lines,maxlens)
+

+ +

There is product which returns the Cartesian product of two 1D arrays. The +result is a 2D array formed from applying the function to all possible pairs from +the two arrays.

+ + +

+> array2d.product('{}',{1,2},{'a','b'})
+{{{1,'b'},{2,'a'}},{{1,'a'},{2,'b'}}}
+

+ +

There is a set of operations which work in-place on 2D arrays. You can +swap_rows and swap_cols; the first really is a simple one-liner, but the idea +here is to give the operation a name. remove_row and remove_col are +generalizations of table.remove. Likewise, extract_rows and extract_cols +are given arrays of indices and discard anything else. So, for instance, +extract_cols(A,{2,4}) will leave just columns 2 and 4 in the array.

+ +

List.slice is often useful on 1D arrays; slice does the same thing, but is +generally given a start (row,column) and a end (row,column).

+ + +

+> A = {{1,2,3},{4,5,6},{7,8,9}}
+> B = slice(A,1,1,2,2)
+> write(B)
+ 1 2
+ 4 5
+> B = slice(A,2,2)
+> write(B,nil,'%4.1f')
+ 5.0 6.0
+ 8.0 9.0
+

+ +

Here write is used to print out an array nicely; the second parameter is nil, +which is the default (stdout) but can be any file object and the third parameter +is an optional format (as used in string.format).

+ +

parse_range will take a spreadsheet range like 'A1:B2' or 'R1C1:R2C2' and +return the range as four numbers, which can be passed to slice. The rule is +that slice will return an array of the appropriate shape depending on the +range; if a range represents a row or a column, the result is 1D, otherwise 2D.

+ +

This applies to iter as well, which can also optionally be given a range:

+ + + +

+> for i,j,v in iter(A,true,2,2) do print(i,j,v) end
+2       2       5
+2       3       6
+3       2       8
+3       3       9
+

+ +

new will construct a new 2D array with the given dimensions. You provide an +initial value for the elements, which is interpreted as a function if it's +callable. With L being utils.string_lambda we then have the following way to +make an identity matrix:

+ + +

+asserteq(
+    array.new(3,3,L'|i,j| i==j and 1 or 0'),
+    {{1,0,0},{0,1,0},{0,0,1}}
+)
+

+ +

Please note that most functions in array2d are covariant, that is, they +return an array of the same type as they receive. In particular, any objects +created with data.new or matrix.new will remain data or matrix objects when +reshaped or sliced, etc. Data objects have the array2d functions available as +methods.

+ + + + +

Penlight

Contents

Manual

Libraries

Classes

Examples

Tables and Arrays

Python-style Lists

Map and Set classes

Useful Operations on Tables

Operations on two-dimensional tables