From 42ec7286b2d36a9ba22925f816a17cb1cc2aa5ce Mon Sep 17 00:00:00 2001 From: chai Date: Sat, 30 Oct 2021 11:32:16 +0800 Subject: + Penlight --- .../Penlight/docs/manual/01-introduction.md.html | 843 ++++++++++ .../Penlight/docs/manual/02-arrays.md.html | 914 +++++++++++ .../Penlight/docs/manual/03-strings.md.html | 397 +++++ .../Penlight/docs/manual/04-paths.md.html | 329 ++++ .../Penlight/docs/manual/05-dates.md.html | 269 ++++ .../Libraries/Penlight/docs/manual/06-data.md.html | 1633 ++++++++++++++++++++ .../Penlight/docs/manual/07-functional.md.html | 834 ++++++++++ .../Penlight/docs/manual/08-additional.md.html | 815 ++++++++++ .../Penlight/docs/manual/09-discussion.md.html | 233 +++ 9 files changed, 6267 insertions(+) create mode 100644 Data/Libraries/Penlight/docs/manual/01-introduction.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/02-arrays.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/03-strings.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/04-paths.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/05-dates.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/06-data.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/07-functional.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/08-additional.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/09-discussion.md.html (limited to 'Data/Libraries/Penlight/docs/manual') diff --git a/Data/Libraries/Penlight/docs/manual/01-introduction.md.html b/Data/Libraries/Penlight/docs/manual/01-introduction.md.html new file mode 100644 index 0000000..fe42256 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/01-introduction.md.html @@ -0,0 +1,843 @@ + + + + + Penlight Documentation + + + + +
+ +
+ +
+
+
+ + +
+ + + + + + +
+ + +

Introduction

+ +

+

Purpose

+ +

It is often said of Lua that it does not include batteries. That is because the +goal of Lua is to produce a lean expressive language that will be used on all +sorts of machines, (some of which don't even have hierarchical filesystems). The +Lua language is the equivalent of an operating system kernel; the creators of Lua +do not see it as their responsibility to create a full software ecosystem around +the language. That is the role of the community.

+ +

A principle of software design is to recognize common patterns and reuse them. If +you find yourself writing things like `io.write(string.format('the answer is %d +',42))` more than a number of times then it becomes useful just to define a +function printf. This is good, not just because repeated code is harder to +maintain, but because such code is easier to read, once people understand your +libraries.

+ +

Penlight captures many such code patterns, so that the intent of your code +becomes clearer. For instance, a Lua idiom to copy a table is {unpack(t)}, but +this will only work for 'small' tables (for a given value of 'small') so it is +not very robust. Also, the intent is not clear. So tablex.deepcopy is provided, +which will also copy nested tables and and associated metatables, so it can be +used to clone complex objects.

+ +

The default error handling policy follows that of the Lua standard libraries: if +a argument is the wrong type, then an error will be thrown, but otherwise we +return nil,message if there is a problem. There are some exceptions; functions +like input.fields default to shutting down the program immediately with a +useful message. This is more appropriate behaviour for a script than providing +a stack trace. (However, this default can be changed.) The lexer functions always +throw errors, to simplify coding, and so should be wrapped in pcall.

+ +

If you are used to Python conventions, please note that all indices consistently +start at 1.

+ +

The Lua function table.foreach has been deprecated in favour of the for in +statement, but such an operation becomes particularly useful with the +higher-order function support in Penlight. Note that tablex.foreach reverses +the order, so that the function is passed the value and then the key. Although +perverse, this matches the intended use better.

+ +

The only important external dependence of Penlight is +LuaFileSystem +(lfs), and if you want dir.copyfile to work cleanly on Windows, you will need +either alien or be using +LuaJIT as well. (The fallback is to call the equivalent +shell commands.)

+ +

+

To Inject or not to Inject?

+ +

It was realized a long time ago that large programs needed a way to keep names +distinct by putting them into tables (Lua), namespaces (C++) or modules +(Python). It is obviously impossible to run a company where everyone is called +'Bruce', except in Monty Python skits. These 'namespace clashes' are more of a +problem in a simple language like Lua than in C++, because C++ does more +complicated lookup over 'injected namespaces'. However, in a small group of +friends, 'Bruce' is usually unique, so in particular situations it's useful to +drop the formality and not use last names. It depends entirely on what kind of +program you are writing, whether it is a ten line script or a ten thousand line +program.

+ +

So the Penlight library provides the formal way and the informal way, without +imposing any preference. You can do it formally like:

+ + +
+local utils = require 'pl.utils'
+utils.printf("%s\n","hello, world!")
+
+ +

or informally like:

+ + +
+require 'pl'
+utils.printf("%s\n","That feels better")
+
+ +

require 'pl' makes all the separate Penlight modules available, without needing +to require them each individually.

+ +

Generally, the formal way is better when writing modules, since then there are no +global side-effects and the dependencies of your module are made explicit.

+ +

Andrew Starks has contributed another way, which balances nicely between the +formal need to keep the global table uncluttered and the informal need for +convenience. require'pl.import_into' returns a function, which accepts a table +for injecting Penlight into, or if no table is given, it passes back a new one.

+ + +
+local pl = require'pl.import_into'()
+
+ +

The table pl is a 'lazy table' which loads modules as needed, so we can then +use pl.utils.printf and so forth, without an explicit `require' or harming any +globals.

+ +

If you are using _ENV with Lua 5.2 to define modules, then here is a way to +make Penlight available within a module:

+ + +
+local _ENV,M = require 'pl.import_into' ()
+
+function answer ()
+    -- all the Penlight modules are available!
+    return pretty.write(utils.split '10 20  30', '')
+end
+
+return M
+
+ +

The default is to put Penlight into \_ENV, which has the unintended effect of +making it available from the module (much as module(...,package.seeall) does). +To satisfy both convenience and safety, you may pass true to this function, and +then the module M is not the same as \_ENV, but only contains the exported +functions.

+ +

Otherwise, Penlight will not bring in functions into the global table, or +clobber standard tables like 'io'. require('pl') will bring tables like +'utils','tablex',etc into the global table if they are used. This +'load-on-demand' strategy ensures that the whole kitchen sink is not loaded up +front, so this method is as efficient as explicitly loading required modules.

+ +

You have an option to bring the pl.stringx methods into the standard string +table. All strings have a metatable that allows for automatic lookup in string, +so we can say s:upper(). Importing stringx allows for its functions to also +be called as methods: s:strip(),etc:

+ + +
+require 'pl'
+stringx.import()
+
+ +

or, more explicitly:

+ + +
+require('pl.stringx').import()
+
+ +

A more delicate operation is importing tables into the local environment. This is +convenient when the context makes the meaning of a name very clear:

+ + +
+> require 'pl'
+> utils.import(math)
+> = sin(1.2)
+0.93203908596723
+
+ +

utils.import can also be passed a module name as a string, which is first +required and then imported. If used in a module, import will bring the symbols +into the module context.

+ +

Keeping the global scope simple is very necessary with dynamic languages. Using +global variables in a big program is always asking for trouble, especially since +you do not have the spell-checking provided by a compiler. The pl.strict +module enforces a simple rule: globals must be 'declared'. This means that they +must be assigned before use; assigning to nil is sufficient.

+ + +
+> require 'pl.strict'
+> print(x)
+stdin:1: variable 'x' is not declared
+> x = nil
+> print(x)
+nil
+
+ +

The strict module provided by Penlight is compatible with the 'load-on-demand' +scheme used by require 'pl.

+ +

strict also disallows assignment to global variables, except in the main +program. Generally, modules have no business messing with global scope; if you +must do it, then use a call to rawset. Similarly, if you have to check for the +existence of a global, use rawget.

+ +

If you wish to enforce strictness globally, then just add require 'pl.strict' +at the end of pl/init.lua, otherwise call it from your main program.

+ +

As from 1.1.0, this module provides a strict.module function which creates (or +modifies) modules so that accessing an unknown function or field causes an error.

+ +

For example,

+ + +
+-- mymod.lua
+local strict = require 'pl.strict'
+local M = strict.module (...)
+
+function M.answer ()
+    return 42
+end
+
+return M
+
+ +

If you were to accidently type mymod.Answer(), then you would get a runtime +error: "variable 'Answer' is not declared in 'mymod'".

+ +

This can be applied to existing modules. You may desire to have the same level +of checking for the Lua standard libraries:

+ + +
+strict.make_all_strict(_G)
+
+ +

Thereafter a typo such as math.cosine will give you an explicit error, rather +than merely returning a nil that will cause problems later.

+ +

+

What are function arguments in Penlight?

+ +

Many functions in Penlight themselves take function arguments, like map which +applies a function to a list, element by element. You can use existing +functions, like math.max, anonymous functions (like `function(x,y) return x > y +end ), or operations by name (e.g '*' or '..'). The module pl.operator` exports +all the standard Lua operations, like the Python module of the same name. +Penlight allows these to be referred to by name, so operator.gt can be more +concisely expressed as '>'.

+ +

Note that the map functions pass any extra arguments to the function, so we can +have ls:filter('>',0), which is a shortcut for +ls:filter(function(x) return x > 0 end).

+ +

Finally, pl.func supports placeholder expressions in the Boost lambda style, +so that an anonymous function to multiply the two arguments can be expressed as +\1*\2.

+ +

To use them directly, note that all function arguments in Penlight go through +utils.function_arg. pl.func registers itself with this function, so that you +can directly use placeholder expressions with standard methods:

+ + +
+> _1 = func._1
+> = List{10,20,30}:map(_1+1)
+{11,21,31}
+
+ +

Another option for short anonymous functions is provided by +utils.string_lambda; this is invoked automatically:

+ + +
+> = List{10,20,30}:map '|x| x + 1'
+{11,21,31}
+
+ +

+

Pros and Cons of Loopless Programming

+ +

The standard loops-and-ifs 'imperative' style of programming is dominant, and +often seems to be the 'natural' way of telling a machine what to do. It is in +fact very much how the machine does things, but we need to take a step back and +find ways of expressing solutions in a higher-level way. For instance, applying +a function to all elements of a list is a common operation:

+ + +
+local res = {}
+for i = 1,#ls do
+    res[i] = fun(ls[i])
+end
+
+ +

This can be efficiently and succintly expressed as ls:map(fun). Not only is +there less typing but the intention of the code is clearer. If readers of your +code spend too much time trying to guess your intention by analyzing your loops, +then you have failed to express yourself clearly. Similarly, ls:filter('>',0) +will give you all the values in a list greater than zero. (Of course, if you +don't feel like using List, or have non-list-like tables, then pl.tablex +offers the same facilities. In fact, the List methods are implemented using +tablex functions.)

+ +

A common observation is that loopless programming is less efficient, particularly +in the way it uses memory. ls1:map2('*',ls2):reduce '+' will give you the dot +product of two lists, but an unnecessary temporary list is created. But +efficiency is relative to the actual situation, it may turn out to be fast +enough, or may not appear in any crucial inner loops, etc.

+ +

Writing loops is 'error-prone and tedious', as Stroustrup says. But any +half-decent editor can be taught to do much of that typing for you. The question +should actually be: is it tedious to read loops? As with natural language, +programmers tend to read chunks at a time. A for-loop causes no surprise, and +probably little brain activity. One argument for loopless programming is the +loops that you do write stand out more, and signal 'something different +happening here'. It should not be an all-or-nothing thing, since most programs +require a mixture of idioms that suit the problem. Some languages (like APL) do +nearly everything with map and reduce operations on arrays, and so solutions can +sometimes seem forced. Wisdom is knowing when a particular idiom makes a +particular problem easy to solve and the solution easy to explain afterwards.

+ +

+

Generally useful functions.

+ +

The function printf discussed earlier is included in pl.utils because it +makes properly formatted output easier. (There is an equivalent fprintf which +also takes a file object parameter, just like the C function.)

+ +

Splitting a string using a delimiter is a fairly common operation, hence split.

+ +

Utility functions like is_type help with identifying what +kind of animal you are dealing with. +The Lua type function handles the basic types, but can't distinguish between +different kinds of objects, which are all tables. So is_type handles both +cases, like is_type(s,"string") and is_type(ls,List).

+ +

A common pattern when working with Lua varargs is capturing all the arguments in +a table:

+ + +
+function t(...)
+    local args = {...}
+    ...
+end
+
+ +

But this will bite you someday when nil is one of the arguments, since this +will put a 'hole' in your table. In particular, #ls will only give you the size +upto the nil value. Hence the need for table.pack - this is a new Lua 5.2 +function which Penlight defines also for Lua 5.1.

+ + +
+function t(...)
+    local args,n = table.pack(...)
+    for i = 1,n do
+      ...
+    end
+end
+
+ +

The 'memoize' pattern occurs when you have a function which is expensive to call, +but will always return the same value subsequently. utils.memoize is given a +function, and returns another function. This calls the function the first time, +saves the value for that argument, and thereafter for that argument returns the +saved value. This is a more flexible alternative to building a table of values +upfront, since in general you won't know what values are needed.

+ + +
+sum = utils.memoize(function(n)
+    local sum = 0
+    for i = 1,n do sum = sum + i end
+    return sum
+end)
+...
+s = sum(1e8) --takes time!
+...
+s = sum(1e8) --returned saved value!
+
+ +

Penlight is fully compatible with Lua 5.1, 5.2 and LuaJIT 2. To ensure this, +utils also defines the global Lua 5.2 +load function as utils.load

+ +
    +
  • the input (either a string or a function)
  • +
  • the source name used in debug information
  • +
  • the mode is a string that can have either or both of 'b' or 't', depending on + whether the source is a binary chunk or text code (default is 'bt')
  • +
  • the environment for the compiled chunk
  • +
+ +

Using utils.load should reduce the need to call the deprecated function setfenv, +and make your Lua 5.1 code 5.2-friendly.

+ +

The utils module exports getfenv and setfenv for +Lua 5.2 as well, based on code by Sergey Rozhenko. Note that these functions can fail +for functions which don't access any globals.

+ +

+

Application Support

+ +

app.parse_args is a simple command-line argument parser. If called without any +arguments, it tries to use the global arg array. It returns the flags +(options begining with '-') as a table of name/value pairs, and the arguments +as an array. It knows about long GNU-style flag names, e.g. --value, and +groups of short flags are understood, so that -ab is short for -a -b. The +flags result would then look like {value=true,a=true,b=true}.

+ +

Flags may take values. The command-line --value=open -n10 would result in +{value='open',n='10'}; generally you can use '=' or ':' to separate the flag +from its value, except in the special case where a short flag is followed by an +integer. Or you may specify upfront that some flags have associated values, and +then the values will follow the flag.

+ + +
+> require 'pl'
+> flags,args = app.parse_args({'-o','fred','-n10','fred.txt'},{o=true})
+> pretty.dump(flags)
+{o='fred',n='10'}
+
+ +

parse_args is not intelligent or psychic; it will not convert any flag values +or arguments for you, or raise errors. For that, have a look at +Lapp.

+ +

An application which consists of several files usually cannot use require to +load files in the same directory as the main script. app.require_here() +ensures that the Lua module path is modified so that files found locally are +found first. In the examples directory, test-symbols.lua uses this function +to ensure that it can find symbols.lua even if it is not run from this directory.

+ +

app.appfile will create a filename that your application can use to store its +private data, based on the script name. For example, app.appfile "test.txt" +from a script called testapp.lua produces the following file on my Windows +machine:

+ +
C:\Documents and Settings\SJDonova\.testapp\test.txt
+
+ + +

and the equivalent on my Linux machine:

+ +
/home/sdonovan/.testapp/test.txt
+
+ + +

If .testapp does not exist, it will be created.

+ +

Penlight makes it convenient to save application data in Lua format. You can use +pretty.dump(t,file) to write a Lua table in a human-readable form to a file, +and pretty.read(file.read(file)) to generate the table again, using the +pretty module.

+ + +

+

Simplifying Object-Oriented Programming in Lua

+ +

Lua is similar to JavaScript in that the concept of class is not directly +supported by the language. In fact, Lua has a very general mechanism for +extending the behaviour of tables which makes it straightforward to implement +classes. A table's behaviour is controlled by its metatable. If that metatable +has a \\index function or table, this will handle looking up anything which is +not found in the original table. A class is just a table with an __index key +pointing to itself. Creating an object involves making a table and setting its +metatable to the class; then when handling obj.fun, Lua first looks up fun in +the table obj, and if not found it looks it up in the class. obj:fun(a) is +just short for obj.fun(obj,a). So with the metatable mechanism and this bit of +syntactic sugar, it is straightforward to implement classic object orientation.

+ + +
+-- animal.lua
+
+class = require 'pl.class'
+
+class.Animal()
+
+function Animal:_init(name)
+    self.name = name
+end
+
+function Animal:__tostring()
+  return self.name..': '..self:speak()
+end
+
+class.Dog(Animal)
+
+function Dog:speak()
+  return 'bark'
+end
+
+class.Cat(Animal)
+
+function Cat:_init(name,breed)
+    self:super(name)  -- must init base!
+    self.breed = breed
+end
+
+function Cat:speak()
+  return 'meow'
+end
+
+class.Lion(Cat)
+
+function Lion:speak()
+  return 'roar'
+end
+
+fido = Dog('Fido')
+felix = Cat('Felix','Tabby')
+leo = Lion('Leo','African')
+
+$ lua -i animal.lua
+> = fido,felix,leo
+Fido: bark      Felix: meow     Leo: roar
+> = leo:is_a(Animal)
+true
+> = leo:is_a(Dog)
+false
+> = leo:is_a(Cat)
+true
+
+ +

All Animal does is define \\tostring, which Lua will use whenever a string +representation is needed of the object. In turn, this relies on speak, which is +not defined. So it's what C++ people would call an abstract base class; the +specific derived classes like Dog define speak. Please note that if derived +classes have their own constructors, they must explicitly call the base +constructor for their base class; this is conveniently available as the super +method.

+ +

Note that (as always) there are multiple ways to implement OOP in Lua; this method +uses the classic 'a class is the __index of its objects' but does 'fat inheritance'; +methods from the base class are copied into the new class. The advantage of this is +that you are not penalized for long inheritance chains, for the price of larger classes, +but generally objects outnumber classes! (If not, something odd is going on with your design.)

+ +

All such objects will have a is_a method, which looks up the inheritance chain +to find a match. Another form is class_of, which can be safely called on all +objects, so instead of leo:is_a(Animal) one can say Animal:class_of(leo).

+ +

There are two ways to define a class, either class.Name() or Name = class(); +both work identically, except that the first form will always put the class in +the current environment (whether global or module); the second form provides more +flexibility about where to store the class. The first form does name the class +by setting the _name field, which can be useful in identifying the objects of +this type later. This session illustrates the usefulness of having named classes, +if no __tostring method is explicitly defined.

+ + +
+> class.Fred()
+> a = Fred()
+> = a
+Fred: 00459330
+> Alice = class()
+> b = Alice()
+> = b
+table: 00459AE8
+> Alice._name = 'Alice'
+> = b
+Alice: 00459AE8
+
+ +

So Alice = class(); Alice._name = 'Alice' is exactly the same as class.Alice().

+ +

This useful notation is borrowed from Hugo Etchegoyen's +classlib which further +extends this concept to allow for multiple inheritance. Notice that the +more convenient form puts the class name in the current environment! That is, +you may use it safely within modules using the old-fashioned module() +or the new _ENV mechanism.

+ +

There is always more than one way of doing things in Lua; some may prefer this +style for creating classes:

+ + +
+local class = require 'pl.class'
+
+class.Named {
+    _init = function(self,name)
+        self.name = name
+    end;
+
+    __tostring = function(self)
+        return 'boo '..self.name
+    end;
+}
+
+b = Named 'dog'
+print(b)
+--> boo dog
+
+ +

Note that you have to explicitly declare self and end each function definition +with a semi-colon or comma, since this is a Lua table. To inherit from a base class, +set the special field _base to the class in this table.

+ +

Penlight provides a number of useful classes; there is List, which is a Lua +clone of the standard Python list object, and Set which represents sets. There +are three kinds of map defined: Map, MultiMap (where a key may have +multiple values) and OrderedMap (where the order of insertion is remembered.). +There is nothing special about these classes and you may inherit from them.

+ +

A powerful thing about dynamic languages is that you can redefine existing classes +and functions, which is often called 'monkey patching' It's entertaining and convenient, +but ultimately anti-social; you may modify List but then any other modules using +this shared resource can no longer be sure about its behaviour. (This is why you +must say stringx.import() explicitly if you want the extended string methods - it +would be a bad default.) Lua is particularly open to modification but the +community is not as tolerant of monkey-patching as the Ruby community, say. You may +wish to add some new methods to List? Cool, but that's what subclassing is for.

+ + +
+class.Strings(List)
+
+function Strings:my_method()
+...
+end
+
+ +

It's definitely more useful to define exactly how your objects behave +in unknown conditions. All classes have a catch method you can use to set +a handler for unknown lookups; the function you pass looks exactly like the +__index metamethod.

+ + +
+Strings:catch(function(self,name)
+    return function() error("no such method "..name,2) end
+end)
+
+ +

In this case we're just customizing the error message, but +creative things can be done. Consider this code from test-vector.lua:

+ + +
+Strings:catch(List.default_map_with(string))
+
+ls = Strings{'one','two','three'}
+asserteq(ls:upper(),{'ONE','TWO','THREE'})
+asserteq(ls:sub(1,2),{'on','tw','th'})
+
+ +

So we've converted a unknown method invocation into a map using the function of +that name found in string. So for a Vector (which is a specialization of List +for numbers) it makes sense to make math the default map so that v:sin() makes +sense.

+ +

Note that map operations return a object of the same type - this is often called +covariance. So ls:upper() itself returns a Strings object.

+ +

This is not always what you want, but objects can always be cast to the desired type. +(cast doesn't create a new object, but returns the object passed.)

+ + +
+local sizes = ls:map '#'
+asserteq(sizes, {3,3,5})
+asserteq(utils.type(sizes),'Strings')
+asserteq(sizes:is_a(Strings),true)
+sizes = Vector:cast(sizes)
+asserteq(utils.type(sizes),'Vector')
+asserteq(sizes+1,{4,4,6})
+
+ +

About utils.type: it can only return a string for a class type if that class does +in fact have a _name field.

+ + +

Properties are a useful object-oriented pattern. We wish to control access to a +field, but don't wish to force the user of the class to say obj:get_field() +etc. This excerpt from tests/test-class.lua shows how it is done:

+ + + +
+local MyProps = class(class.properties)
+local setted_a, got_b
+
+function MyProps:_init ()
+    self._a = 1
+    self._b = 2
+end
+
+function MyProps:set_a (v)
+    setted_a = true
+    self._a = v
+end
+
+function MyProps:get_b ()
+    got_b = true
+    return self._b
+end
+
+local mp = MyProps()
+
+mp.a = 10
+
+asserteq(mp.a,10)
+asserteq(mp.b,2)
+asserteq(setted_a and got_b, true)
+
+ +

The convention is that the internal field name is prefixed with an underscore; +when reading mp.a, first a check for an explicit getter get_a and then only +look for _a. Simularly, writing mp.a causes the setter set_a to be used.

+ +

This is cool behaviour, but like much Lua metaprogramming, it is not free. Method +lookup on such objects goes through \\index as before, but now \\index is a +function which has to explicitly look up methods in the class, before doing any +property indexing, which is not going to be as fast as field lookup. If however, +your accessors actually do non-trivial things, then the extra overhead could be +worth it.

+ +

This is not really intended for access control because external code can write +to mp._a directly. It is possible to have this kind of control in Lua, but it +again comes with run-time costs.

+ + +
+
+
+generated by LDoc 1.4.6 +
+
+ + diff --git a/Data/Libraries/Penlight/docs/manual/02-arrays.md.html b/Data/Libraries/Penlight/docs/manual/02-arrays.md.html new file mode 100644 index 0000000..28dc6a2 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/02-arrays.md.html @@ -0,0 +1,914 @@ + + + + + Penlight Documentation + + + + +
+ +
+ +
+
+
+ + +
+ + + + + + +
+ + +

Tables and Arrays

+ +

+ +

+

Python-style Lists

+ +

One of the elegant things about Lua is that tables do the job of both lists and +dicts (as called in Python) or vectors and maps, (as called in C++), and they do +it efficiently. However, if we are dealing with 'tables with numerical indices' +we may as well call them lists and look for operations which particularly make +sense for lists. The Penlight List class was originally written by Nick Trout +for Lua 5.0, and translated to 5.1 and extended by myself. It seemed that +borrowing from Python was a good idea, and this eventually grew into Penlight.

+ +

Here is an example showing List in action; it redefines __tostring, so that +it can print itself out more sensibly:

+ + +
+> List = require 'pl.List'  --> automatic with require 'pl' <---
+> l = List()
+> l:append(10)
+> l:append(20)
+> = l
+{10,20}
+> l:extend {30,40}
+{10,20,30,40}
+> l:insert(1,5)
+{5,10,20,30,40}
+> = l:pop()
+40
+> = l
+{5,10,20,30}
+> = l:index(30)
+4
+> = l:contains(30)
+true
+> = l:reverse()  ---> note: doesn't make a copy!
+{30,20,10,5}
+
+ +

Although methods like sort and reverse operate in-place and change the list, +they do return the original list. This makes it possible to do method chaining, +like ls = ls:append(10):append(20):reverse():append(1). But (and this is an +important but) no extra copy is made, so ls does not change identity. List +objects (like tables) are mutable, unlike strings. If you want a copy of a +list, then List(ls) will do the job, i.e. it acts like a copy constructor. +However, if passed any other table, List will just set the metatable of the +table and not make a copy.

+ +

A particular feature of Python lists is slicing. This is fully supported in +this version of List, except we use 1-based indexing. So List.slice works +rather like string.sub:

+ + +
+> l = List {10,20,30,40}
+> = l:slice(1,1)  ---> note: creates a new list!
+{10}
+> = l:slice(2,2)
+{20}
+> = l:slice(2,3)
+{20,30}
+> = l:slice(2,-2)
+{20,30}
+> = l:slice_assign(2,2,{21,22,23})
+{10,21,22,23,30,40}
+> = l:chop(1,1)
+{21,22,23,30,40}
+
+ +

Functions like slice_assign and chop modify the list; the first is equivalent +to Pythonl[i1:i2] = seq and the second to del l[i1:i2].

+ +

List objects are ultimately just Lua 'list-like' tables, but they have extra +operations defined on them, such as equality and concatention. For regular +tables, equality is only true if the two tables are identical objects, whereas +two lists are equal if they have the same contents, i.e. that l1[i]==l2[i] for +all elements.

+ + +
+> l1 = List {1,2,3}
+> l2 = List {1,2,3}
+> = l1 == l2
+true
+> = l1..l2
+{1,2,3,1,2,3}
+
+ +

The List constructor can be passed a function. If so, it's assumed that this is +an iterator function that can be repeatedly called to generate a sequence. One +such function is io.lines; the following short, intense little script counts +the number of lines in standard input:

+ + +
+-- linecount.lua
+require 'pl'
+ls = List(io.lines())
+print(#ls)
+
+ +

List.iterate captures what List considers a sequence. In particular, it can +also iterate over all 'characters' in a string:

+ + +
+> for ch in List.iterate 'help' do io.write(ch,' ') end
+h e l p >
+
+ +

Since the function iterate is used internally by the List constructor, +strings can be made into lists of character strings very easily.

+ +

There are a number of operations that go beyond the standard Python methods. For +instance, you can partition a list into a table of sublists using a function. +In the simplest form, you use a predicate (a function returning a boolean value) +to partition the list into two lists, one of elements matching and another of +elements not matching. But you can use any function; if we use type then the +keys will be the standard Lua type names.

+ + +
+> ls = List{1,2,3,4}
+> ops = require 'pl.operator'
+> ls:partition(function(x) return x > 2 end)
+{false={1,2},true={3,4}}
+> ls = List{'one',math.sin,List{1},10,20,List{1,2}}
+> ls:partition(type)
+{function={function: 00369110},string={one},number={10,20},table={{1},{1,2}}}
+
+ +

This is one List method which returns a table which is not a List. Bear in +mind that you can always call a List method on a plain table argument, so +List.partition(t,type) works as expected. But these functions will only operate +on the array part of the table.

+ +

The 'nominal' type of the returned table is pl.Multimap, which describes a mapping +between keys and multiple values. This does not mean that pl.Multimap is automatically +loaded whenever you use partition (or List for that matter); this is one of the +standard metatables which are only filled out when the appropriate module is loaded. +This allows tables to be tagged appropriately without causing excessive coupling.

+ +

Stacks occur everywhere in computing. List supports stack-like operations; +there is already pop (remove and return last value) and append acts like +push (add a value to the end). push is provided as an alias for append, and +the other stack operation (size) is simply the size operator #. Queues can +also be implemented; you use pop to take values out of the queue, and put to +insert a value at the begining.

+ +

You may derive classes from List, and since the list-returning methods +are covariant, the result of slice etc will return lists of the derived type, +not List. For instance, consider the specialization of a List type that contains +numbers in tests/test-list.lua:

+ + +
+n1 = NA{10,20,30}
+n2 = NA{1,2,3}
+ns = n1 + 2*n2
+asserteq(ns,{12,24,36})
+min,max = ns:slice(1,2):minmax()
+asserteq(T(min,max),T(12,24))
+asserteq(n1:normalize():sum(),1,1e-8)
+
+ +

+

Map and Set classes

+ +

The Map class exposes what Python would call a 'dict' interface, and accesses +the hash part of the table. The name 'Map' is used to emphasize the interface, +not the implementation; it is an object which maps keys onto values; m['alice'] +or the equivalent m.alice is the access operation. This class also provides +explicit set and get methods, which are trivial for regular maps but get +interesting when Map is subclassed. The other operation is update, which +extends a map by copying the keys and values from another table, perhaps +overwriting existing keys:

+ + +
+> Map = require 'pl.Map'
+> m = Map{one=1,two=2}
+> m:update {three=3,four=4,two=20}
+> = m == M{one=1,two=20,three=3,four=4}
+true
+
+ +

The method values returns a list of the values, and keys returns a list of +the keys; there is no guarantee of order. getvalues is given a list of keys and +returns a list of values associated with these keys:

+ + +
+> m = Map{one=1,two=2,three=3}
+> = m:getvalues {'one','three'}
+{1,3}
+> = m:getvalues(m:keys()) == m:values()
+true
+
+ +

When querying the value of a Map, it is best to use the get method:

+ + +
+> print(m:get 'one', m:get 'two')
+1     2
+
+ +

The reason is that m[key] can be ambiguous; due to the current implementation, +m["get"] will always succeed, because if a value is not present in the map, it +will be looked up in the Map metatable, which contains a method get. There is +currently no simple solution to this annoying restriction.

+ +

There are some useful classes which inherit from Map. An OrderedMap behaves +like a Map but keeps its keys in order if you use its set method to add keys +and values. Like all the 'container' classes in Penlight, it defines an iter +method for iterating over its values; this will return the keys and values in the +order of insertion; the keys and values methods likewise.

+ +

A MultiMap allows multiple values to be associated with a given key. So set +(as before) takes a key and a value, but calling it with the same key and a +different value does not overwrite but adds a new value. get (or using []) +will return a list of values.

+ +

A Set can be seen as a special kind of Map, where all the values are true, +the keys are the values, and the order is not important. So in this case +Set.values is defined to return a list of the keys. Sets can display +themselves, and the basic operations like union (+) and intersection (*) +are defined.

+ + +
+> Set = require 'pl.Set'
+> = Set{'one','two'} == Set{'two','one'}
+true
+> fruit = Set{'apple','banana','orange'}
+> = fruit['banana']
+true
+> = fruit['hazelnut']
+nil
+> = fruit:values()
+{apple,orange,banana}
+> colours = Set{'red','orange','green','blue'}
+> = fruit,colours
+[apple,orange,banana]   [blue,green,orange,red]
+> = fruit+colours
+[blue,green,apple,red,orange,banana]
+> = fruit*colours
+[orange]
+
+ +

There are also the functions Set.difference and Set.symmetric_difference. The +first answers the question 'what fruits are not colours?' and the second 'what +are fruits and colours but not both?'

+ + +
+> = fruit - colours
+[apple,banana]
+> = fruit ^ colours
+[blue,green,apple,red,banana]
+
+ +

Adding elements to a set is simply fruit['peach'] = true and removing is +fruit['apple'] = nil . To make this simplicity work properly, the Set class has no +methods - either you use the operator forms or explicitly use Set.intersect +etc. In this way we avoid the ambiguity that plagues Map.

+ + +

(See pl.Map and pl.Set)

+ +

+

Useful Operations on Tables

+ + +

Some notes on terminology: Lua tables are usually list-like (like an array) or +map-like (like an associative array or dict); they can of course have a +list-like and a map-like part. Some of the table operations only make sense for +list-like tables, and some only for map-like tables. (The usual Lua terminology +is the array part and the hash part of the table, which reflects the actual +implementation used; it is more accurate to say that a Lua table is an +associative map which happens to be particularly efficient at acting like an +array.)

+ +

The functions provided in table provide all the basic manipulations on Lua +tables, but as we saw with the List class, it is useful to build higher-level +operations on top of those functions. For instance, to copy a table involves this +kind of loop:

+ + +
+local res = {}
+for k,v in pairs(T) do
+    res[k] = v
+end
+return res
+
+ +

The tablex module provides this as copy, which does a shallow copy of a +table. There is also deepcopy which goes further than a simple loop in two +ways; first, it also gives the copy the same metatable as the original (so it can +copy objects like List above) and any nested tables will also be copied, to +arbitrary depth. There is also icopy which operates on list-like tables, where +you can set optionally set the start index of the source and destination as well. +It ensures that any left-over elements will be deleted:

+ + +
+asserteq(icopy({1,2,3,4,5,6},{20,30}),{20,30})   -- start at 1
+asserteq(icopy({1,2,3,4,5,6},{20,30},2),{1,20,30}) -- start at 2
+asserteq(icopy({1,2,3,4,5,6},{20,30},2,2),{1,30}) -- start at 2, copy from 2
+
+ +

(This code from the tablex test module shows the use of pl.test.asserteq)

+ +

Whereas, move overwrites but does not delete the rest of the destination:

+ + +
+asserteq(move({1,2,3,4,5,6},{20,30}),{20,30,3,4,5,6})
+asserteq(move({1,2,3,4,5,6},{20,30},2),{1,20,30,4,5,6})
+asserteq(move({1,2,3,4,5,6},{20,30},2,2),{1,30,3,4,5,6})
+
+ +

(The difference is somewhat like that between C's strcpy and memmove.)

+ +

To summarize, use copy or deepcopy to make a copy of an arbitrary table. To +copy into a map-like table, use update; to copy into a list-like table use +icopy, and move if you are updating a range in the destination.

+ +

To complete this set of operations, there is insertvalues which works like +table.insert except that one provides a table of values to be inserted, and +removevalues which removes a range of values.

+ + +
+asserteq(insertvalues({1,2,3,4},2,{20,30}),{1,20,30,2,3,4})
+asserteq(insertvalues({1,2},{3,4}),{1,2,3,4})
+
+ +

Another example:

+ + +
+> T = require 'pl.tablex'
+> t = {10,20,30,40}
+> = T.removevalues(t,2,3)
+{10,40}
+> = T.insertvalues(t,2,{20,30})
+{10,20,30,40}
+
+ +

In a similar spirit to deepcopy, deepcompare will take two tables and return +true only if they have exactly the same values and structure.

+ + +
+> t1 = {1,{2,3},4}
+> t2 = deepcopy(t1)
+> = t1 == t2
+false
+> = deepcompare(t1,t2)
+true
+
+ +

find will return the index of a given value in a list-like table. Note that +like string.find you can specify an index to start searching, so that all +instances can be found. There is an optional fourth argument, which makes the +search start at the end and go backwards, so we could define rfind like so:

+ + +
+function rfind(t,val,istart)
+    return tablex.find(t,val,istart,true)
+end
+
+ +

find does a linear search, so it can slow down code that depends on it. If +efficiency is required for large tables, consider using an index map. +index_map will return a table where the keys are the original values of the +list, and the associated values are the indices. (It is almost exactly the +representation needed for a set.)

+ + +
+> t = {'one','two','three'}
+> = tablex.find(t,'two')
+2
+> = tablex.find(t,'four')
+nil
+> il = tablex.index_map(t)
+> = il['two']
+2
+> = il.two
+2
+
+ +

A version of index_map called makeset is also provided, where the values are +just true. This is useful because two such sets can be compared for equality +using deepcompare:

+ + +
+> = deepcompare(makeset {1,2,3},makeset {2,1,3})
+true
+
+ +

Consider the problem of determining the new employees that have joined in a +period. Assume we have two files of employee names:

+ + +
+(last-month.txt)
+smith,john
+brady,maureen
+mongale,thabo
+
+(this-month.txt)
+smith,john
+smit,johan
+brady,maureen
+mogale,thabo
+van der Merwe,Piet
+
+ +

To find out differences, just make the employee lists into sets, like so:

+ + +
+require 'pl'
+
+function read_employees(file)
+  local ls = List(io.lines(file)) -- a list of employees
+  return tablex.makeset(ls)
+end
+
+last = read_employees 'last-month.txt'
+this = read_employees 'this-month.txt'
+
+-- who is in this but not in last?
+diff = tablex.difference(this,last)
+
+-- in a set, the keys are the values...
+for e in pairs(diff) do print(e) end
+
+--  *output*
+-- van der Merwe,Piet
+-- smit,johan
+
+ +

The difference operation is easy to write and read:

+ + +
+for e in pairs(this) do
+  if not last[e] then
+    print(e)
+  end
+end
+
+ +

Using difference here is not that it is a tricky thing to code, it is that you +are stating your intentions clearly to other readers of your code. (And naturally +to your future self, in six months time.)

+ +

find_if will search a table using a function. The optional third argument is a +value which will be passed as a second argument to the function. pl.operator +provides the Lua operators conveniently wrapped as functions, so the basic +comparison functions are available:

+ + +
+> ops = require 'pl.operator'
+> = tablex.find_if({10,20,30,40},ops.gt,20)
+3       true
+
+ +

Note that find_if will also return the actual value returned by the function, +which of course is usually just true for a boolean function, but any value +which is not nil and not false can be usefully passed back.

+ +

deepcompare does a thorough recursive comparison, but otherwise using the +default equality operator. compare allows you to specify exactly what function +to use when comparing two list-like tables, and compare_no_order is true if +they contain exactly the same elements. Do note that the latter does not need an +explicit comparison function - in this case the implementation is actually to +compare the two sets, as above:

+ + +
+> = compare_no_order({1,2,3},{2,1,3})
+true
+> = compare_no_order({1,2,3},{2,1,3},'==')
+true
+
+ +

(Note the special string '==' above; instead of saying ops.gt or ops.eq we +can use the strings '>' or '==' respectively.)

+ +

sort and sortv return iterators that will iterate through the +sorted elements of a table. sort iterates by sorted key order, and +sortv iterates by sorted value order. For example, given a table +with names and ages, it is trivial to iterate over the elements:

+ + +
+> t = {john=27,jane=31,mary=24}
+> for name,age in tablex.sort(t) do print(name,age) end
+jane    31
+john    27
+mary    24
+> for name,age in tablex.sortv(t) do print(name,age) end
+mary    24
+john    27
+jane    31
+
+ +

There are several ways to merge tables in PL. If they are list-like, then see the +operations defined by pl.List, like concatenation. If they are map-like, then +merge provides two basic operations. If the third arg is false, then the result +only contains the keys that are in common between the two tables, and if true, +then the result contains all the keys of both tables. These are in fact +generalized set union and intersection operations:

+ + +
+> S1 = {john=27,jane=31,mary=24}
+> S2 = {jane=31,jones=50}
+> = tablex.merge(S1, S2, false)
+{jane=31}
+> = tablex.merge(S1, S2, true)
+{mary=24,jane=31,john=27,jones=50}
+
+ +

When working with tables, you will often find yourself writing loops like in the +first example. Loops are second nature to programmers, but they are often not the +most elegant and self-describing way of expressing an operation. Consider the +map function, which creates a new table by applying a function to each element +of the original:

+ + +
+> = map(math.sin, {1,2,3,4})
+{  0.84,  0.91,  0.14, -0.76}
+> = map(function(x) return x*x end, {1,2,3,4})
+{1,4,9,16}
+
+ +

map saves you from writing a loop, and the resulting code is often clearer, as +well as being shorter. This is not to say that 'loops are bad' (although you will +hear that from some extremists), just that it's good to capture standard +patterns. Then the loops you do write will stand out and acquire more significance.

+ +

pairmap is interesting, because the function works with both the key and the +value.

+ + +
+> t = {fred=10,bonzo=20,alice=4}
+> = pairmap(function(k,v) return v end, t)
+{4,10,20}
+> = pairmap(function(k,v) return k end, t)
+{'alice','fred','bonzo'}
+
+ +

(These are common enough operations that the first is defined as values and the +second as keys.) If the function returns two values, then the second value is +considered to be the new key:

+ + +
+> = pairmap(t,function(k,v) return v+10, k:upper() end)
+{BONZO=30,FRED=20,ALICE=14}
+
+ +

map2 applies a function to two tables:

+ + +
+> map2(ops.add,{1,2},{10,20})
+{11,22}
+> map2('*',{1,2},{10,20})
+{10,40}
+
+ +

The various map operations generate tables; reduce applies a function of two +arguments over a table and returns the result as a scalar:

+ + +
+> reduce ('+', {1,2,3})
+6
+> reduce ('..', {'one','two','three'})
+'onetwothree'
+
+ +

Finally, zip sews different tables together:

+ + +
+> = zip({1,2,3},{10,20,30})
+{{1,10},{2,20},{3,30}}
+
+ +

Browsing through the documentation, you will find that tablex and List share +methods. For instance, tablex.imap and List.map are basically the same +function; they both operate over the array-part of the table and generate another +table. This can also be expressed as a list comprehension C 'f(x) for x' (t) +which makes the operation more explicit. So why are there different ways to do +the same thing? The main reason is that not all tables are Lists: the expression +ls:map('#') will return a list of the lengths of any elements of ls. A list +is a thin wrapper around a table, provided by the metatable List. Sometimes you +may wish to work with ordinary Lua tables; the List interface is not a +compulsory way to use Penlight table operations.

+ +

+

Operations on two-dimensional tables

+ + +

Two-dimensional tables are of course easy to represent in Lua, for instance +{{1,2},{3,4}} where we store rows as subtables and index like so A[col][row]. +This is the common representation used by matrix libraries like +LuaMatrix. pl.array2d does not provide +matrix operations, since that is the job for a specialized library, but rather +provides generalizations of the higher-level operations provided by pl.tablex +for one-dimensional arrays.

+ +

iter is a useful generalization of ipairs. (The extra parameter determines +whether you want the indices as well.)

+ + +
+> a = {{1,2},{3,4}}
+> for i,j,v in array2d.iter(a,true) do print(i,j,v) end
+1       1       1
+1       2       2
+2       1       3
+2       2       4
+
+ +

Note that you can always convert an arbitrary 2D array into a 'list of lists' +with List(tablex.map(List,a))

+ +

map will apply a function over all elements (notice that extra arguments can be +provided, so this operation is in effect function(x) return x-1 end)

+ + +
+> array2d.map('-',a,1)
+{{0,1},{2,3}}
+
+ +

2D arrays are stored as an array of rows, but columns can be extracted:

+ + +
+> array2d.column(a,1)
+{1,3}
+
+ +

There are three equivalents to tablex.reduce. You can either reduce along the +rows (which is the most efficient) or reduce along the columns. Either one will +give you a 1D array. And reduce2 will apply two operations: the first one +reduces the rows, and the second reduces the result.

+ + +
+> array2d.reduce_rows('+',a)
+{3,7}
+> array2d.reduce_cols('+',a)
+{4,6}
+> -- same as tablex.reduce('*',array.reduce_rows('+',a))
+> array2d.reduce2('*','+',a)
+21    `
+
+ +

tablex.map2 applies an operation to two tables, giving another table. +array2d.map2 does this for 2D arrays. Note that you have to provide the rank +of the arrays involved, since it's hard to always correctly deduce this from the +data:

+ + +
+> b = {{10,20},{30,40}}
+> a = {{1,2},{3,4}}
+> = array2d.map2('+',2,2,a,b)  -- two 2D arrays
+{{11,22},{33,44}}
+> = array2d.map2('+',1,2,{10,100},a)  -- 1D, 2D
+{{11,102},{13,104}}
+> = array2d.map2('*',2,1,a,{1,-1})  -- 2D, 1D
+{{1,-2},{3,-4}}
+
+ +

Of course, you are not limited to simple arithmetic. Say we have a 2D array of +strings, and wish to print it out with proper right justification. The first step +is to create all the string lengths by mapping string.len over the array, the +second is to reduce this along the columns using math.max to get maximum column +widths, and last, apply stringx.rjust with these widths.

+ + +
+maxlens = reduce_cols(math.max,map('#',lines))
+lines = map2(stringx.rjust,2,1,lines,maxlens)
+
+ +

There is product which returns the Cartesian product of two 1D arrays. The +result is a 2D array formed from applying the function to all possible pairs from +the two arrays.

+ + +
+> array2d.product('{}',{1,2},{'a','b'})
+{{{1,'b'},{2,'a'}},{{1,'a'},{2,'b'}}}
+
+ +

There is a set of operations which work in-place on 2D arrays. You can +swap_rows and swap_cols; the first really is a simple one-liner, but the idea +here is to give the operation a name. remove_row and remove_col are +generalizations of table.remove. Likewise, extract_rows and extract_cols +are given arrays of indices and discard anything else. So, for instance, +extract_cols(A,{2,4}) will leave just columns 2 and 4 in the array.

+ +

List.slice is often useful on 1D arrays; slice does the same thing, but is +generally given a start (row,column) and a end (row,column).

+ + +
+> A = {{1,2,3},{4,5,6},{7,8,9}}
+> B = slice(A,1,1,2,2)
+> write(B)
+ 1 2
+ 4 5
+> B = slice(A,2,2)
+> write(B,nil,'%4.1f')
+ 5.0 6.0
+ 8.0 9.0
+
+ +

Here write is used to print out an array nicely; the second parameter is nil, +which is the default (stdout) but can be any file object and the third parameter +is an optional format (as used in string.format).

+ +

parse_range will take a spreadsheet range like 'A1:B2' or 'R1C1:R2C2' and +return the range as four numbers, which can be passed to slice. The rule is +that slice will return an array of the appropriate shape depending on the +range; if a range represents a row or a column, the result is 1D, otherwise 2D.

+ +

This applies to iter as well, which can also optionally be given a range:

+ + + +
+> for i,j,v in iter(A,true,2,2) do print(i,j,v) end
+2       2       5
+2       3       6
+3       2       8
+3       3       9
+
+ +

new will construct a new 2D array with the given dimensions. You provide an +initial value for the elements, which is interpreted as a function if it's +callable. With L being utils.string_lambda we then have the following way to +make an identity matrix:

+ + +
+asserteq(
+    array.new(3,3,L'|i,j| i==j and 1 or 0'),
+    {{1,0,0},{0,1,0},{0,0,1}}
+)
+
+ +

Please note that most functions in array2d are covariant, that is, they +return an array of the same type as they receive. In particular, any objects +created with data.new or matrix.new will remain data or matrix objects when +reshaped or sliced, etc. Data objects have the array2d functions available as +methods.

+ + + + +
+
+
+generated by LDoc 1.4.6 +
+
+ + diff --git a/Data/Libraries/Penlight/docs/manual/03-strings.md.html b/Data/Libraries/Penlight/docs/manual/03-strings.md.html new file mode 100644 index 0000000..a629192 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/03-strings.md.html @@ -0,0 +1,397 @@ + + + + + Penlight Documentation + + + + +
+ +
+ +
+
+
+ + +
+ + + + + + +
+ + +

Strings. Higher-level operations on strings.

+ +

+

Extra String Methods

+ + +

These are convenient borrowings from Python, as described in 3.6.1 of the Python +reference, but note that indices in Lua always begin at one. stringx defines +functions like isalpha and isdigit, which return true if s is only composed +of letters or digits respectively. startswith and endswith are convenient +ways to find substrings. (endswith works as in Python 2.5, so that `f:endswith +{'.bat','.exe','.cmd'}` will be true for any filename which ends with these +extensions.) There are justify methods and whitespace trimming functions like +strip.

+ + +
+> stringx.import()
+> ('bonzo.dog'):endswith {'.dog','.cat'}
+true
+> ('bonzo.txt'):endswith {'.dog','.cat'}
+false
+> ('bonzo.cat'):endswith {'.dog','.cat'}
+true
+> (' stuff'):ljust(20,'+')
+'++++++++++++++ stuff'
+> ('  stuff '):lstrip()
+'stuff '
+> ('  stuff '):rstrip()
+'  stuff'
+> ('  stuff '):strip()
+'stuff'
+> for s in ('one\ntwo\nthree\n'):lines() do print(s) end
+one
+two
+three
+
+ +

Most of these can be fairly easily implemented using the Lua string library, +which is more general and powerful. But they are convenient operations to have +easily at hand. Note that can be injected into the string table if you use +stringx.import, but a simple alias like local stringx = require 'pl.stringx' +is preferrable. This is the recommended practice when writing modules for +consumption by other people, since it is bad manners to change the global state +of the rest of the system. Magic may be used for convenience, but there is always +a price.

+ + +

+

String Templates

+ + +

Another borrowing from Python, string templates allow you to substitute values +looked up in a table:

+ + +
+local Template = require ('pl.text').Template
+t = Template('${here} is the $answer')
+print(t:substitute {here = 'Lua', answer = 'best'})
+==>
+Lua is the best
+
+ +

'$ variables' can optionally have curly braces; this form is useful if you are +glueing text together to make variables, e.g ${prefix}_name_${postfix}. The +substitute method will throw an error if a $ variable is not found in the +table, and the safe_substitute method will not.

+ +

The Lua implementation has an extra method, indent_substitute which is very +useful for inserting blocks of text, because it adjusts indentation. Consider +this example:

+ + +
+-- testtemplate.lua
+local Template = require ('pl.text').Template
+
+t = Template [[
+    for i = 1,#$t do
+        $body
+    end
+]]
+
+body = Template [[
+local row = $t[i]
+for j = 1,#row do
+    fun(row[j])
+end
+]]
+
+print(t:indent_substitute {body=body,t='tbl'})
+
+ +

And the output is:

+ + +
+for i = 1,#tbl do
+    local row = tbl[i]
+    for j = 1,#row do
+        fun(row[j])
+    end
+end
+
+ +

indent_substitute can substitute templates, and in which case they themselves +will be substituted using the given table. So in this case, $t was substituted +twice.

+ +

pl.text also has a number of useful functions like dedent, which strips all +the initial indentation from a multiline string. As in Python, this is useful for +preprocessing multiline strings if you like indenting them with your code. The +function wrap is passed a long string (a paragraph) and returns a list of +lines that fit into a desired line width. As an extension, there is also indent +for indenting multiline strings.

+ +

New in Penlight with the 0.9 series is text.format_operator. Calling this +enables Python-style string formating using the modulo operator %:

+ + +
+> text.format_operator()
+> = '%s[%d]' % {'dog',1}
+dog[1]
+
+ +

So in its simplest form it saves the typing involved with string.format; it +will also expand $ variables using named fields:

+ + +
+> = '$animal[$num]' % {animal='dog',num=1}
+dog[1]
+
+ +

As with stringx.import you have to do this explicitly, since all strings share the same +metatable. But in your own scripts you can feel free to do this.

+ +

+

Another Style of Template

+ +

A new module is template, which is a version of Rici Lake's Lua +Preprocessor. This +allows you to mix Lua code with your templates in a straightforward way. There +are only two rules:

+ +
    +
  • Lines begining with # are Lua
  • +
  • Otherwise, anything inside $() is a Lua expression.
  • +
+ +

So a template generating an HTML list would look like this:

+ + +
+<ul>
+# for i,val in ipairs(T) do
+<li>$(i) = $(val:upper())</li>
+# end
+</ul>
+
+ +

Assume the text is inside tmpl, then the template can be expanded using:

+ + +
+local template = require 'pl.template'
+local my_env = {
+  ipairs = ipairs,
+  T = {'one','two','three'}
+}
+res = template.substitute(tmpl, my_env)
+
+ +

and we get

+ + +
+<ul>
+<li>1 = ONE</li>
+<li>2 = TWO</li>
+<li>3 = THREE</li>
+</ul>
+
+ +

There is a single function, template.substitute which is passed a template +string and an environment table. This table may contain some special fields, +like \_parent which can be set to a table representing a 'fallback' environment +in case a symbol was not found. \_brackets is usually '()' and \_escape is +usually '#' but it's sometimes necessary to redefine these if the defaults +interfere with the target language - for instance, $(V) has another meaning in +Make, and # means a preprocessor line in C/C++.

+ +

Finally, if something goes wrong, passing _debug will cause the intermediate +Lua code to be dumped if there's a problem.

+ +

Here is a C code generation example; something that could easily be extended to +be a minimal Lua extension skeleton generator.

+ + +
+local subst = require 'pl.template'.substitute
+
+local templ = [[
+#include <lua.h>
+#include <lauxlib.h>
+#include <lualib.h>
+
+> for _,f in ipairs(mod) do
+static int l_$(f.name) (lua_State *L) {
+
+}
+> end
+
+static const luaL_reg $(mod.name)[] = {
+> for _,f in ipairs(mod) do
+    {"$(f.name)",l_$(f.name)},
+> end
+    {NULL,NULL}
+};
+
+int luaopen_$(mod.name) {
+   luaL_register (L, "$(mod.name)", $(mod.name));
+    return 1;
+}
+]]
+
+print(subst(templ,{
+    _escape = '>',
+    ipairs = ipairs,
+    mod = {
+        name = 'baggins';
+        {name='frodo'},
+        {name='bilbo'}
+    }
+}))
+
+ +

+

File-style I/O on Strings

+ +

pl.stringio provides just three functions; stringio.open is passed a string, +and returns a file-like object for reading. It supports a read method, which +takes the same arguments as standard file objects:

+ + +
+> f = stringio.open 'first line\n10 20 30\n'
+> = f:read()
+first line
+> = f:read('*n','*n','*n')
+10    20    30
+
+ +

lines and seek are also supported.

+ +

stringio.lines is a useful short-cut for iterating over all the lines in a +string.

+ +

stringio.create creates a writeable file-like object. You then use write to +this stream, and finally extract the builded string using value. This 'string +builder' pattern is useful for efficiently creating large strings.

+ + + +
+
+
+generated by LDoc 1.4.6 +
+
+ + diff --git a/Data/Libraries/Penlight/docs/manual/04-paths.md.html b/Data/Libraries/Penlight/docs/manual/04-paths.md.html new file mode 100644 index 0000000..070a3ea --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/04-paths.md.html @@ -0,0 +1,329 @@ + + + + + Penlight Documentation + + + + +
+ +
+ +
+
+
+ + +
+ + + + + + +
+ + +

Paths and Directories

+ +

+

Working with Paths

+ +

Programs should not depend on quirks of your operating system. They will be +harder to read, and need to be ported for other systems. The worst of course is +hardcoding paths like 'c:\' in programs, and wondering why Vista complains so +much. But even something like dir..'\'..file is a problem, since Unix can't +understand backslashes in this way. dir..'/'..file is usually portable, but +it's best to put this all into a simple function, path.join. If you +consistently use path.join, then it's much easier to write cross-platform code, +since it handles the directory separator for you.

+ +

pl.path provides the same functionality as Python's os.path module (11.1).

+ + +
+> p = 'c:\\bonzo\\DOG.txt'
+> = path.normcase (p)  ---> only makes sense on Windows
+c:\bonzo\dog.txt
+> = path.splitext (p)
+c:\bonzo\DOG    .txt
+> = path.extension (p)
+.txt
+> = path.basename (p)
+DOG.txt
+> = path.exists(p)
+false
+> = path.join ('fred','alice.txt')
+fred\alice.txt
+> = path.exists 'pretty.lua'
+true
+> = path.getsize 'pretty.lua'
+2125
+> = path.isfile 'pretty.lua'
+true
+> = path.isdir 'pretty.lua'
+false
+
+ +

It is very important for all programmers, not just on Unix, to only write to +where they are allowed to write. path.expanduser will expand '~' (tilde) into +the home directory. Depending on your OS, this will be a guaranteed place where +you can create files:

+ + +
+> = path.expanduser '~/mydata.txt'
+'C:\Documents and Settings\SJDonova/mydata.txt'
+
+> = path.expanduser '~/mydata.txt'
+/home/sdonovan/mydata.txt
+
+ +

Under Windows, os.tmpname returns a path which leads to your drive root full of +temporary files. (And increasingly, you do not have access to this root folder.) +This is corrected by path.tmpname, which uses the environment variable TMP:

+ + +
+> os.tmpname()  -- not a good place to put temporary files!
+'\s25g.'
+> path.tmpname()
+'C:\DOCUME~1\SJDonova\LOCALS~1\Temp\s25g.1'
+
+ +

A useful extra function is pl.path.package_path, which will tell you the path +of a particular Lua module. So on my system, package_path('pl.path') returns +'C:\Program Files\Lua\5.1\lualibs\pl\path.lua', and package_path('ifs') returns +'C:\Program Files\Lua\5.1\clibs\lfs.dll'. It is implemented in terms of +package.searchpath, which is a new function in Lua 5.2 which has been +implemented for Lua 5.1 in Penlight.

+ +

+

File Operations

+ +

pl.file is a new module that provides more sensible names for common file +operations. For instance, file.read and file.write are aliases for +utils.readfile and utils.writefile.

+ +

Smaller files can be efficiently read and written in one operation. file.read +is passed a filename and returns the contents as a string, if successful; if not, +then it returns nil and the actual error message. There is an optional boolean +parameter if you want the file to be read in binary mode (this makes no +difference on Unix but remains important with Windows.)

+ +

In previous versions of Penlight, utils.readfile would read standard input if +the file was not specified, but this can lead to nasty bugs; use io.read '*a' +to grab all of standard input.

+ +

Similarly, file.write takes a filename and a string which will be written to +that file.

+ +

For example, this little script converts a file into upper case:

+ + +
+require 'pl'
+assert(#arg == 2, 'supply two filenames')
+text = assert(file.read(arg[1]))
+assert(file.write(arg[2],text:upper()))
+
+ +

Copying files is suprisingly tricky. file.copy and file.move attempt to use +the best implementation possible. On Windows, they link to the API functions +CopyFile and MoveFile, but only if the alien package is installed (this is +true for Lua for Windows.) Otherwise, the system copy command is used. This can +be ugly when writing Windows GUI applications, because of the dreaded flashing +black-box problem with launching processes.

+ +

+

Directory Operations

+ +

pl.dir provides some useful functions for working with directories. fnmatch +will match a filename against a shell pattern, and filter will return any files +in the supplied list which match the given pattern, which correspond to the +functions in the Python fnmatch module. getdirectories will return all +directories contained in a directory, and getfiles will return all files in a +directory which match a shell pattern. These functions return the files as a +table, unlike lfs.dir which returns an iterator.)

+ +

dir.makepath can create a full path, creating subdirectories as necessary; +rmtree is the Nuclear Option of file deleting functions, since it will +recursively clear out and delete all directories found begining at a path (there +is a similar function with this name in the Python shutils module.)

+ + +
+> = dir.makepath 't\\temp\\bonzo'
+> = path.isdir 't\\temp\\bonzo'
+true
+> = dir.rmtree 't'
+
+ +

dir.rmtree depends on dir.walk, which is a powerful tool for scanning a whole +directory tree. Here is the implementation of dir.rmtree:

+ + +
+--- remove a whole directory tree.
+-- @param path A directory path
+function dir.rmtree(fullpath)
+    for root,dirs,files in dir.walk(fullpath) do
+        for i,f in ipairs(files) do
+            os.remove(path.join(root,f))
+        end
+        lfs.rmdir(root)
+    end
+end
+
+ +

dir.clonetree clones directory trees. The first argument is a path that must +exist, and the second path is the path to be cloned. (Note that this path cannot +be inside the first path, since this leads to madness.) By default, it will +then just recreate the directory structure. You can in addition provide a +function, which will be applied for all files found.

+ + +
+-- make a copy of my libs folder
+require 'pl'
+p1 = [[d:\dev\lua\libs]]
+p2 = [[D:\dev\lua\libs\..\tests]]
+dir.clonetree(p1,p2,dir.copyfile)
+
+ +

A more sophisticated version, which only copies files which have been modified:

+ + +
+-- p1 and p2 as before, or from arg[1] and arg[2]
+dir.clonetree(p1,p2,function(f1,f2)
+  local res
+  local t1,t2 = path.getmtime(f1),path.getmtime(f2)
+  -- f2 might not exist, so be careful about t2
+  if not t2 or t1 > t2 then
+    res = dir.copyfile(f1,f2)
+  end
+  return res -- indicates successful operation
+end)
+
+ +

dir.clonetree uses path.common_prefix. With p1 and p2 defined above, the +common path is 'd:\dev\lua'. So 'd:\dev\lua\libs\testfunc.lua' is copied to +'d:\dev\lua\test\testfunc.lua', etc.

+ +

If you need to find the common path of list of files, then tablex.reduce will +do the job:

+ + +
+> p3 = [[d:\dev]]
+> = tablex.reduce(path.common_prefix,{p1,p2,p3})
+'d:\dev'
+
+ + + +
+
+
+generated by LDoc 1.4.6 +
+
+ + diff --git a/Data/Libraries/Penlight/docs/manual/05-dates.md.html b/Data/Libraries/Penlight/docs/manual/05-dates.md.html new file mode 100644 index 0000000..c04b036 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/05-dates.md.html @@ -0,0 +1,269 @@ + + + + + Penlight Documentation + + + + +
+ +
+ +
+
+
+ + +
+ + + + + + +
+ + +

Date and Time

+ +

+ +

NOTE: the Date module is deprecated

+ +

+

Creating and Displaying Dates

+ +

The Date class provides a simplified way to work with date and +time in Lua; it leans heavily on the functions +os.date and os.time.

+ +

A Date object can be constructed from a table, just like with os.time. +Methods are provided to get and set the various parts of the date.

+ + +
+> d = Date {year = 2011, month = 3, day = 2 }
+> = d
+2011-03-02 12:00:00
+> = d:month(),d:year(),d:day()
+3    2011    2
+> d:month(4)
+> = d
+2011-04-02 12:00:00
+> d:add {day=1}
+> = d
+2011-04-03 12:00:00
+
+ +

add takes a table containing one of the date table fields.

+ + +
+> = d:weekday_name()
+Sun
+> = d:last_day()
+2011-04-30 12:00:00
+> = d:month_name(true)
+April
+
+ +

There is a default conversion to text for date objects, but Date.Format gives +you full control of the format for both parsing and displaying dates:

+ + +
+> iso = Date.Format 'yyyy-mm-dd'
+> d = iso:parse '2010-04-10'
+> amer = Date.Format 'mm/dd/yyyy'
+> = amer:tostring(d)
+04/10/2010
+
+ +

With the 0.9.7 relase, the Date constructor has become more flexible. You may +omit any of the 'year', 'month' or 'day' fields:

+ + +
+> = Date { year = 2008 }
+2008-01-01 12:00:00
+> = Date { month = 3 }
+2011-03-01 12:00:00
+> = Date { day = 20 }
+2011-10-20 12:00:00
+> = Date { hour = 14, min = 30 }
+2011-10-13 14:30:00
+
+ +

If 'year' is omitted, then the current year is assumed, and likewise for 'month'.

+ +

To set the time on such a partial date, you can use the fact that the 'setter' +methods return the date object and so you can 'chain' these methods.

+ + +
+> d = Date { day = 03 }
+> = d:hour(18):min(30)
+2011-10-03 18:30:00
+
+ +

Finally, Date also now accepts positional arguments:

+ + +
+> = Date(2011,10,3)
+2011-10-03 12:00:00
+> = Date(2011,10,3,18,30,23)
+2011-10-03 18:30:23
+
+ +

Date.format has been extended. If you construct an instance without a pattern, +then it will try to match against a set of known formats. This is useful for +human-input dates since keeping to a strict format is not one of the strong +points of users. It assumes that there will be a date, and then a date.

+ + +
+> df = Date.Format()
+> = df:parse '5.30pm'
+2011-10-13 17:30:00
+> = df:parse '1730'
+nil     day out of range: 1730 is not between 1 and 31
+> = df:parse '17.30'
+2011-10-13 17:30:00
+> = df:parse 'mar'
+2011-03-01 12:00:00
+> = df:parse '3 March'
+2011-03-03 12:00:00
+> = df:parse '15 March'
+2011-03-15 12:00:00
+> = df:parse '15 March 2008'
+2008-03-15 12:00:00
+> = df:parse '15 March 2008 1.30pm'
+2008-03-15 13:30:00
+> = df:parse '2008-10-03 15:30:23'
+2008-10-03 15:30:23
+
+ +

ISO date format is of course a good idea if you need to deal with users from +different countries. Here is the default behaviour for 'short' dates:

+ + +
+> = df:parse '24/02/12'
+2012-02-24 12:00:00
+
+ +

That's not what Americans expect! It's tricky to work out in a cross-platform way +exactly what the expected format is, so there is an explicit flag:

+ + +
+> df:US_order(true)
+> = df:parse '9/11/01'
+2001-11-09 12:00:00
+
+ + + +
+
+
+generated by LDoc 1.4.6 +
+
+ + diff --git a/Data/Libraries/Penlight/docs/manual/06-data.md.html b/Data/Libraries/Penlight/docs/manual/06-data.md.html new file mode 100644 index 0000000..585e23e --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/06-data.md.html @@ -0,0 +1,1633 @@ + + + + + Penlight Documentation + + + + +
+ +
+ +
+
+
+ + +
+ + + + + + +
+ + +

Data

+ +

+

Reading Data Files

+ +

The first thing to consider is this: do you actually need to write a custom file +reader? And if the answer is yes, the next question is: can you write the reader +in as clear a way as possible? Correctness, Robustness, and Speed; pick the first +two and the third can be sorted out later, if necessary.

+ +

A common sort of data file is the configuration file format commonly used on Unix +systems. This format is often called a property file in the Java world.

+ + +
+# Read timeout in seconds
+read.timeout=10
+
+# Write timeout in seconds
+write.timeout=10
+
+ +

Here is a simple Lua implementation:

+ + +
+-- property file parsing with Lua string patterns
+props = []
+for line in io.lines() do
+    if line:find('#',1,true) ~= 1 and not line:find('^%s*$') then
+        local var,value = line:match('([^=]+)=(.*)')
+        props[var] = value
+    end
+end
+
+ +

Very compact, but it suffers from a similar disease in equivalent Perl programs; +it uses odd string patterns which are 'lexically noisy'. Noisy code like this +slows the casual reader down. (For an even more direct way of doing this, see the +next section, 'Reading Configuration Files')

+ +

Another implementation, using the Penlight libraries:

+ + +
+-- property file parsing with extended string functions
+require 'pl'
+stringx.import()
+props = []
+for line in io.lines() do
+    if not line:startswith('#') and not line:isspace() then
+        local var,value = line:splitv('=')
+        props[var] = value
+    end
+end
+
+ +

This is more self-documenting; it is generally better to make the code express +the intention, rather than having to scatter comments everywhere - comments are +necessary, of course, but mostly to give the higher view of your intention that +cannot be expressed in code. It is slightly slower, true, but in practice the +speed of this script is determined by I/O, so further optimization is unnecessary.

+ +

+

Reading Unstructured Text Data

+ +

Text data is sometimes unstructured, for example a file containing words. The +pl.input module has a number of functions which makes processing such files +easier. For example, a script to count the number of words in standard input +using import.words:

+ + +
+-- countwords.lua
+require 'pl'
+local k = 1
+for w in input.words(io.stdin) do
+    k = k + 1
+end
+print('count',k)
+
+ +

Or this script to calculate the average of a set of numbers using input.numbers:

+ + +
+-- average.lua
+require 'pl'
+local k = 1
+local sum = 0
+for n in input.numbers(io.stdin) do
+    sum = sum + n
+    k = k + 1
+end
+print('average',sum/k)
+
+ +

These scripts can be improved further by eliminating loops In the last case, +there is a perfectly good function seq.sum which can already take a sequence of +numbers and calculate these numbers for us:

+ + +
+-- average2.lua
+require 'pl'
+local total,n = seq.sum(input.numbers())
+print('average',total/n)
+
+ +

A further simplification here is that if numbers or words are not passed an +argument, they will grab their input from standard input. The first script can +be rewritten:

+ + +
+-- countwords2.lua
+require 'pl'
+print('count',seq.count(input.words()))
+
+ +

A useful feature of a sequence generator like numbers is that it can read from +a string source. Here is a script to calculate the sums of the numbers on each +line in a file:

+ + +
+-- sums.lua
+for line in io.lines() do
+    print(seq.sum(input.numbers(line))
+end
+
+ +

+

Reading Columnar Data

+ +

It is very common to find data in columnar form, either space or comma-separated, +perhaps with an initial set of column headers. Here is a typical example:

+ + +
+EventID    Magnitude    LocationX    LocationY    LocationZ
+981124001    2.0    18988.4    10047.1    4149.7
+981125001    0.8    19104.0    9970.4    5088.7
+981127003    0.5    19012.5    9946.9    3831.2
+...
+
+ +

input.fields is designed to extract several columns, given some delimiter +(default to whitespace). Here is a script to calculate the average X location of +all the events:

+ + +
+-- avg-x.lua
+require 'pl'
+io.read() -- skip the header line
+local sum,count = seq.sum(input.fields {3})
+print(sum/count)
+
+ +

input.fields is passed either a field count, or a list of column indices, +starting at one as usual. So in this case we're only interested in column 3. If +you pass it a field count, then you get every field up to that count:

+ + +
+for id,mag,locX,locY,locZ in input.fields (5) do
+....
+end
+
+ +

input.fields by default tries to convert each field to a number. It will skip +lines which clearly don't match the pattern, but will abort the script if there +are any fields which cannot be converted to numbers.

+ +

The second parameter is a delimiter, by default spaces. ' ' is understood to mean +'any number of spaces', i.e. '%s+'. Any Lua string pattern can be used.

+ +

The third parameter is a data source, by default standard input (defined by +input.create_getter.) It assumes that the data source has a read method which +brings in the next line, i.e. it is a 'file-like' object. As a special case, a +string will be split into its lines:

+ + +
+> for x,y in input.fields(2,' ','10 20\n30 40\n') do print(x,y) end
+10      20
+30      40
+
+ +

Note the default behaviour for bad fields, which is to show the offending line +number:

+ + +
+> for x,y in input.fields(2,' ','10 20\n30 40x\n') do print(x,y) end
+10      20
+line 2: cannot convert '40x' to number
+
+ +

This behaviour of input.fields is appropriate for a script which you want to +fail immediately with an appropriate user error message if conversion fails. +The fourth optional parameter is an options table: {no_fail=true} means that +conversion is attempted but if it fails it just returns the string, rather as AWK +would operate. You are then responsible for checking the type of the returned +field. {no_convert=true} switches off conversion altogether and all fields are +returned as strings.

+ + +

Sometimes it is useful to bring a whole dataset into memory, for operations such +as extracting columns. Penlight provides a flexible reader specifically for +reading this kind of data, using the data module. Given a file looking like this:

+ + +
+x,y
+10,20
+2,5
+40,50
+
+ +

Then data.read will create a table like this, with each row represented by a +sublist:

+ + +
+> t = data.read 'test.txt'
+> pretty.dump(t)
+{{10,20},{2,5},{40,50},fieldnames={'x','y'},delim=','}
+
+ +

You can now analyze this returned table using the supplied methods. For instance, +the method column_by_name returns a table of all the values of that column.

+ + +
+-- testdata.lua
+require 'pl'
+d = data.read('fev.txt')
+for _,name in ipairs(d.fieldnames) do
+    local col = d:column_by_name(name)
+    if type(col[1]) == 'number' then
+        local total,n = seq.sum(col)
+        utils.printf("Average for %s is %f\n",name,total/n)
+    end
+end
+
+ +

data.read tries to be clever when given data; by default it expects a first +line of column names, unless any of them are numbers. It tries to deduce the +column delimiter by looking at the first line. Sometimes it guesses wrong; these +things can be specified explicitly. The second optional parameter is an options +table: can override delim (a string pattern), fieldnames (a list or +comma-separated string), specify no_convert (default is to convert), numfields +(indices of columns known to be numbers, as a list) and thousands_dot (when the +thousands separator in Excel CSV is '.')

+ +

A very powerful feature is a way to execute SQL-like queries on such data:

+ + +
+-- queries on tabular data
+require 'pl'
+local d = data.read('xyz.txt')
+local q = d:select('x,y,z where x > 3 and z < 2 sort by y')
+for x,y,z in q do
+    print(x,y,z)
+end
+
+ +

Please note that the format of queries is restricted to the following syntax:

+ + +
+FIELDLIST [ 'where' CONDITION ] [ 'sort by' FIELD [asc|desc]]
+
+ +

Any valid Lua code can appear in CONDITION; remember it is not SQL and you +have to use == (this warning comes from experience.)

+ +

For this to work, field names must be Lua identifiers. So read will massage +fieldnames so that all non-alphanumeric chars are replaced with underscores. +However, the original_fieldnames field always contains the original un-massaged +fieldnames.

+ +

read can handle standard CSV files fine, although doesn't try to be a +full-blown CSV parser. With the csv=true option, it's possible to have +double-quoted fields, which may contain commas; then trailing commas become +significant as well.

+ +

Spreadsheet programs are not always the best tool to +process such data, strange as this might seem to some people. This is a toy CSV +file; to appreciate the problem, imagine thousands of rows and dozens of columns +like this:

+ + +
+Department Name,Employee ID,Project,Hours Booked
+sales,1231,overhead,4
+sales,1255,overhead,3
+engineering,1501,development,5
+engineering,1501,maintenance,3
+engineering,1433,maintenance,10
+
+ +

The task is to reduce the dataset to a relevant set of rows and columns, perhaps +do some processing on row data, and write the result out to a new CSV file. The +write_row method uses the delimiter to write the row to a file; +Data.select_row is like Data.select, except it iterates over rows, not +fields; this is necessary if we are dealing with a lot of columns!

+ + +
+names = {[1501]='don',[1433]='dilbert'}
+keepcols = {'Employee_ID','Hours_Booked'}
+t:write_row (outf,{'Employee','Hours_Booked'})
+q = t:select_row {
+    fields=keepcols,
+    where=function(row) return row[1]=='engineering' end
+}
+for row in q do
+    row[1] = names[row[1]]
+    t:write_row(outf,row)
+end
+
+ +

Data.select_row and Data.select can be passed a table specifying the query; a +list of field names, a function defining the condition and an optional parameter +sort_by. It isn't really necessary here, but if we had a more complicated row +condition (such as belonging to a specified set) then it is not generally +possible to express such a condition as a query string, without resorting to +hackery such as global variables.

+ +

With 1.0.3, you can specify explicit conversion functions for selected columns. +For instance, this is a log file with a Unix date stamp:

+ + +
+Time Message
+1266840760 +# EE7C0600006F0D00C00F06010302054000000308010A00002B00407B00
+1266840760 closure data 0.000000 1972 1972 0
+1266840760 ++ 1266840760 EE 1
+1266840760 +# EE7C0600006F0D00C00F06010302054000000408020A00002B00407B00
+1266840764 closure data 0.000000 1972 1972 0
+
+ +

We would like the first column as an actual date object, so the convert +field sets an explicit conversion for column 1. (Note that we have to explicitly +convert the string to a number first.)

+ + +
+Date = require 'pl.Date'
+
+function date_convert (ds)
+    return Date(tonumber(ds))
+end
+
+d = data.read(f,{convert={[1]=date_convert},last_field_collect=true})
+
+ +

This gives us a two-column dataset, where the first column contains Date objects +and the second column contains the rest of the line. Queries can then easily +pick out events on a day of the week:

+ + +
+q = d:select "Time,Message where Time:weekday_name()=='Sun'"
+
+ +

Data does not have to come from files, nor does it necessarily come from the lab +or the accounts department. On Linux, ps aux gives you a full listing of all +processes running on your machine. It is straightforward to feed the output of +this command into data.read and perform useful queries on it. Notice that +non-identifier characters like '%' get converted into underscores:

+ + +
+require 'pl'
+f = io.popen 'ps aux'
+s = data.read (f,{last_field_collect=true})
+f:close()
+print(s.fieldnames)
+print(s:column_by_name 'USER')
+qs = 'COMMAND,_MEM where _MEM > 5 and USER=="steve"'
+for name,mem in s:select(qs) do
+    print(mem,name)
+end
+
+ +

I've always been an admirer of the AWK programming language; with filter you +can get Lua programs which are just as compact:

+ + +
+-- printxy.lua
+require 'pl'
+data.filter 'x,y where x > 3'
+
+ +

It is common enough to have data files without headers of field names. +data.read makes a special exception for such files if all fields are numeric. +Since there are no column names to use in query expressions, you can use AWK-like +column indexes, e.g. '$1,$2 where $1 > 3'. I have a little executable script on +my system called lf which looks like this:

+ + +
+#!/usr/bin/env lua
+require 'pl.data'.filter(arg[1])
+
+ +

And it can be used generally as a filter command to extract columns from data. +(The column specifications may be expressions or even constants.)

+ + +
+$ lf '$1,$5/10' < test.dat
+
+ +

(As with AWK, please note the single-quotes used in this command; this prevents +the shell trying to expand the column indexes. If you are on Windows, then you +must quote the expression in double-quotes so +it is passed as one argument to your batch file.)

+ +

As a tutorial resource, have a look at test-data.lua in the PL tests directory +for other examples of use, plus comments.

+ +

The data returned by read or constructed by Data.copy_select from a query is +basically just an array of rows: {{1,2},{3,4}}. So you may use read to pull +in any array-like dataset, and process with any function that expects such a +implementation. In particular, the functions in array2d will work fine with +this data. In fact, these functions are available as methods; e.g. +array2d.flatten can be called directly like so to give us a one-dimensional list:

+ + +
+v = data.read('dat.txt'):flatten()
+
+ +

The data is also in exactly the right shape to be treated as matrices by +LuaMatrix:

+ + +
+> matrix = require 'matrix'
+> m = matrix(data.read 'mat.txt')
+> = m
+1       0.2     0.3
+0.2     1       0.1
+0.1     0.2     1
+> = m^2  -- same as m*m
+1.07    0.46    0.62
+0.41    1.06    0.26
+0.24    0.42    1.05
+
+ +

write will write matrices back to files for you.

+ +

Finally, for the curious, the global variable _DEBUG can be used to print out +the actual iterator function which a query generates and dynamically compiles. By +using code generation, we can get pretty much optimal performance out of +arbitrary queries.

+ + +
+> lua -lpl -e "_DEBUG=true" -e "data.filter 'x,y where x > 4 sort by x'" < test.txt
+return function (t)
+        local i = 0
+        local v
+        local ls = {}
+        for i,v in ipairs(t) do
+            if v[1] > 4  then
+                    ls[#ls+1] = v
+            end
+        end
+        table.sort(ls,function(v1,v2)
+            return v1[1] < v2[1]
+        end)
+        local n = #ls
+        return function()
+            i = i + 1
+            v = ls[i]
+            if i > n then return end
+            return v[1],v[2]
+        end
+end
+
+10,20
+40,50
+
+ +

+

Reading Configuration Files

+ +

The config module provides a simple way to convert several kinds of +configuration files into a Lua table. Consider the simple example:

+ + +
+# test.config
+# Read timeout in seconds
+read.timeout=10
+
+# Write timeout in seconds
+write.timeout=5
+
+#acceptable ports
+ports = 1002,1003,1004
+
+ +

This can be easily brought in using config.read and the result shown using +pretty.write:

+ + +
+-- readconfig.lua
+local config = require 'pl.config'
+local pretty= require 'pl.pretty'
+
+local t = config.read(arg[1])
+print(pretty.write(t))
+
+ +

and the output of lua readconfig.lua test.config is:

+ + +
+{
+  ports = {
+    1002,
+    1003,
+    1004
+  },
+  write_timeout = 5,
+  read_timeout = 10
+}
+
+ +

That is, config.read will bring in all key/value pairs, ignore # comments, and +ensure that the key names are proper Lua identifiers by replacing non-identifier +characters with '_'. If the values are numbers, then they will be converted. (So +the value of t.write_timeout is the number 5). In addition, any values which +are separated by commas will be converted likewise into an array.

+ +

Any line can be continued with a backslash. So this will all be considered one +line:

+ + +
+names=one,two,three, \
+four,five,six,seven, \
+eight,nine,ten
+
+ +

Windows-style INI files are also supported. The section structure of INI files +translates naturally to nested tables in Lua:

+ + +
+; test.ini
+[timeouts]
+read=10 ; Read timeout in seconds
+write=5 ; Write timeout in seconds
+[portinfo]
+ports = 1002,1003,1004
+
+ +

The output is:

+ + +
+{
+  portinfo = {
+    ports = {
+      1002,
+      1003,
+      1004
+    }
+  },
+  timeouts = {
+    write = 5,
+    read = 10
+  }
+}
+
+ +

You can now refer to the write timeout as t.timeouts.write.

+ +

As a final example of the flexibility of config.read, if passed this simple +comma-delimited file

+ + +
+one,two,three
+10,20,30
+40,50,60
+1,2,3
+
+ +

it will produce the following table:

+ + +
+{
+  { "one", "two", "three" },
+  { 10, 20, 30 },
+  { 40, 50, 60  },
+  { 1, 2, 3 }
+}
+
+ +

config.read isn't designed to read all CSV files in general, but intended to +support some Unix configuration files not structured as key-value pairs, such as +'/etc/passwd'.

+ +

This function is intended to be a Swiss Army Knife of configuration readers, but +it does have to make assumptions, and you may not like them. So there is an +optional extra parameter which allows some control, which is table that may have +the following fields:

+ + +
+{
+   variablilize = true,
+   convert_numbers = tonumber,
+   trim_space = true,
+   list_delim = ',',
+   trim_quotes = true,
+   ignore_assign = false,
+   keysep = '=',
+   smart = false,
+}
+
+ +

variablilize is the option that converted write.timeout in the first example +to the valid Lua identifier write_timeout. If convert_numbers is true, then +an attempt is made to convert any string that starts like a number. You can +specify your own function (say one that will convert a string like '5224 kb' into +a number.)

+ +

trim_space ensures that there is no starting or trailing whitespace with +values, and list_delim is the character that will be used to decide whether to +split a value up into a list (it may be a Lua string pattern such as '%s+'.)

+ +

For instance, the password file in Unix is colon-delimited:

+ + +
+t = config.read('/etc/passwd',{list_delim=':'})
+
+ +

This produces the following output on my system (only last two lines shown):

+ + +
+{
+  ...
+  {
+    "user",
+    "x",
+    "1000",
+    "1000",
+    "user,,,",
+    "/home/user",
+    "/bin/bash"
+  },
+  {
+    "sdonovan",
+    "x",
+    "1001",
+    "1001",
+    "steve donovan,28,,",
+    "/home/sdonovan",
+    "/bin/bash"
+  }
+}
+
+ +

You can get this into a more sensible format, where the usernames are the keys, +with this (the tablex.pairmap function must return value, key!)

+ + +
+t = tablex.pairmap(function(k,v) return v,v[1] end,t)
+
+ +

and you get:

+ + +
+{ ...
+  sdonovan = {
+    "sdonovan",
+    "x",
+    "1001",
+    "1001",
+    "steve donovan,28,,",
+    "/home/sdonovan",
+    "/bin/bash"
+  }
+...
+}
+
+ +

Many common Unix configuration files can be read by tweaking these parameters. +For /etc/fstab, the options {list_delim='%s+',ignore_assign=true} will +correctly separate the columns. It's common to find 'KEY VALUE' assignments in +files such as /etc/ssh/ssh_config; the options {keysep=' '} make +config.read return a table where each KEY has a value VALUE.

+ +

Files in the Linux procfs usually use ':` as the field delimiter:

+ + +
+> t = config.read('/proc/meminfo',{keysep=':'})
+> = t.MemFree
+220140 kB
+
+ +

That result is a string, since tonumber doesn't like it, but defining the +convert_numbers option as `function(s) return tonumber((s:gsub(' kB$',''))) +end` will get the memory figures as actual numbers in the result. (The extra +parentheses are necessary so that tonumber only gets the first result from +gsub). From `tests/test-config.lua':

+ + +
+testconfig([[
+MemTotal:        1024748 kB
+MemFree:          220292 kB
+]],
+{ MemTotal = 1024748, MemFree = 220292 },
+{
+ keysep = ':',
+ convert_numbers = function(s)
+    s = s:gsub(' kB$','')
+    return tonumber(s)
+  end
+ }
+)
+
+ +

The smart option lets config.read make a reasonable guess for you; there +are examples in tests/test-config.lua, but basically these common file +formats (and those following the same pattern) can be processed directly in +smart mode: 'etc/fstab', '/proc/XXXX/status', 'ssh_config' and 'pdatedb.conf'.

+ +

Please note that config.read can be passed a file-like object; if it's not a +string and supports the read method, then that will be used. For instance, to +read a configuration from a string, use stringio.open.

+ + +

+ +

+

Lexical Scanning

+ +

Although Lua's string pattern matching is very powerful, there are times when +something more powerful is needed. pl.lexer.scan provides lexical scanners +which tokenize a string, classifying tokens into numbers, strings, etc.

+ + +
+> lua -lpl
+Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
+> tok = lexer.scan 'alpha = sin(1.5)'
+> = tok()
+iden    alpha
+> = tok()
+=       =
+> = tok()
+iden    sin
+> = tok()
+(       (
+> = tok()
+number  1.5
+> = tok()
+)       )
+> = tok()
+(nil)
+
+ +

The scanner is a function, which is repeatedly called and returns the type and +value of the token. Recognized basic types are 'iden','string','number', and +'space'. and everything else is represented by itself. Note that by default the +scanner will skip any 'space' tokens.

+ +

'comment' and 'keyword' aren't applicable to the plain scanner, which is not +language-specific, but a scanner which understands Lua is available. It +recognizes the Lua keywords, and understands both short and long comments and +strings.

+ + +
+> for t,v in lexer.lua 'for i=1,n do' do print(t,v) end
+keyword for
+iden    i
+=       =
+number  1
+,       ,
+iden    n
+keyword do
+
+ +

A lexical scanner is useful where you have highly-structured data which is not +nicely delimited by newlines. For example, here is a snippet of a in-house file +format which it was my task to maintain:

+ + +
+points
+    (818344.1,-20389.7,-0.1),(818337.9,-20389.3,-0.1),(818332.5,-20387.8,-0.1)
+    ,(818327.4,-20388,-0.1),(818322,-20387.7,-0.1),(818316.3,-20388.6,-0.1)
+    ,(818309.7,-20389.4,-0.1),(818303.5,-20390.6,-0.1),(818295.8,-20388.3,-0.1)
+    ,(818290.5,-20386.9,-0.1),(818285.2,-20386.1,-0.1),(818279.3,-20383.6,-0.1)
+    ,(818274,-20381.2,-0.1),(818274,-20380.7,-0.1);
+
+ +

Here is code to extract the points using pl.lexer:

+ + +
+-- assume 's' contains the text above...
+local lexer = require 'pl.lexer'
+local expecting = lexer.expecting
+local append = table.insert
+
+local tok = lexer.scan(s)
+
+local points = {}
+local t,v = tok() -- should be 'iden','points'
+
+while t ~= ';' do
+    c = {}
+    expecting(tok,'(')
+    c.x = expecting(tok,'number')
+    expecting(tok,',')
+    c.y = expecting(tok,'number')
+    expecting(tok,',')
+    c.z = expecting(tok,'number')
+    expecting(tok,')')
+    t,v = tok()  -- either ',' or ';'
+    append(points,c)
+end
+
+ +

The expecting function grabs the next token and if the type doesn't match, it +throws an error. (pl.lexer, unlike other PL libraries, raises errors if +something goes wrong, so you should wrap your code in pcall to catch the error +gracefully.)

+ +

The scanners all have a second optional argument, which is a table which controls +whether you want to exclude spaces and/or comments. The default for lexer.lua +is {space=true,comments=true}. There is a third optional argument which +determines how string and number tokens are to be processsed.

+ +

The ultimate highly-structured data is of course, program source. Here is a +snippet from 'text-lexer.lua':

+ + +
+require 'pl'
+
+lines = [[
+for k,v in pairs(t) do
+    if type(k) == 'number' then
+        print(v) -- array-like case
+    else
+        print(k,v)
+    end
+end
+]]
+
+ls = List()
+for tp,val in lexer.lua(lines,{space=true,comments=true}) do
+    assert(tp ~= 'space' and tp ~= 'comment')
+    if tp == 'keyword' then ls:append(val) end
+end
+test.asserteq(ls,List{'for','in','do','if','then','else','end','end'})
+
+ +

Here is a useful little utility that identifies all common global variables found +in a lua module (ignoring those declared locally for the moment):

+ + +
+-- testglobal.lua
+require 'pl'
+
+local txt,err = utils.readfile(arg[1])
+if not txt then return print(err) end
+
+local globals = List()
+for t,v in lexer.lua(txt) do
+    if t == 'iden' and _G[v] then
+        globals:append(v)
+    end
+end
+pretty.dump(seq.count_map(globals))
+
+ +

Rather then dumping the whole list, with its duplicates, we pass it through +seq.count_map which turns the list into a table where the keys are the values, +and the associated values are the number of times those values occur in the +sequence. Typical output looks like this:

+ + +
+{
+  type = 2,
+  pairs = 2,
+  table = 2,
+  print = 3,
+  tostring = 2,
+  require = 1,
+  ipairs = 4
+}
+
+ +

You could further pass this through tablex.keys to get a unique list of +symbols. This can be useful when writing 'strict' Lua modules, where all global +symbols must be defined as locals at the top of the file.

+ +

For a more detailed use of lexer.scan, please look at testxml.lua in the +examples directory.

+ +

+

XML

+ +

New in the 0.9.7 release is some support for XML. This is a large topic, and +Penlight does not provide a full XML stack, which is properly the task of a more +specialized library.

+ +

Parsing and Pretty-Printing

+ +

The semi-standard XML parser in the Lua universe is lua-expat. +In particular, +it has a function called lxp.lom.parse which will parse XML into the Lua Object +Model (LOM) format. However, it does not provide a way to convert this data back +into XML text. xml.parse will use this function, if lua-expat is +available, and otherwise switches back to a pure Lua parser originally written by +Roberto Ierusalimschy.

+ +

The resulting document object knows how to render itself as a string, which is +useful for debugging:

+ + +
+> d = xml.parse "<nodes><node id='1'>alice</node></nodes>"
+> = d
+<nodes><node id='1'>alice</node></nodes>
+> pretty.dump (d)
+{
+  {
+    "alice",
+    attr = {
+      "id",
+      id = "1"
+    },
+    tag = "node"
+  },
+  attr = {
+  },
+  tag = "nodes"
+}
+
+ +

Looking at the actual shape of the data reveals the structure of LOM:

+ +
    +
  • every element has a tag field with its name
  • +
  • plus a attr field which is a table containing the attributes as fields, and + also as an array. It is always present.
  • +
  • the children of the element are the array part of the element, so d[1] is + the first child of d, etc.
  • +
+ +

It could be argued that having attributes also as the array part of attr is not +essential (you cannot depend on attribute order in XML) but that's how +it goes with this standard.

+ +

lua-expat is another soft dependency of Penlight; generally, the fallback +parser is good enough for straightforward XML as is commonly found in +configuration files, etc. doc.basic_parse is not intended to be a proper +conforming parser (it's only sixty lines) but it handles simple kinds of +documents that do not have comments or DTD directives. It is intelligent enough +to ignore the <?xml directive and that is about it.

+ +

You can get pretty-printing by explicitly calling xml.tostring and passing it +the initial indent and the per-element indent:

+ + +
+> = xml.tostring(d,'','  ')
+
+<nodes>
+  <node id='1'>alice</node>
+</nodes>
+
+ +

There is a fourth argument which is the attribute indent:

+ + +
+> a = xml.parse "<frodo name='baggins' age='50' type='hobbit'/>"
+> = xml.tostring(a,'','  ','  ')
+
+<frodo
+  type='hobbit'
+  name='baggins'
+  age='50'
+/>
+
+ +

Parsing and Working with Configuration Files

+ +

It's common to find configurations expressed with XML these days. It's +straightforward to 'walk' the LOM +data and extract the data in the form you want:

+ + +
+require 'pl'
+
+local config = [[
+<config>
+    <alpha>1.3</alpha>
+    <beta>10</beta>
+    <name>bozo</name>
+</config>
+]]
+local d,err = xml.parse(config)
+
+local t = {}
+for item in d:childtags() do
+    t[item.tag] = item[1]
+end
+
+pretty.dump(t)
+--->
+{
+  beta = "10",
+  alpha = "1.3",
+  name = "bozo"
+}
+
+ +

The only gotcha is that here we must use the Doc:childtags method, which will +skip over any text elements.

+ +

A more involved example is this excerpt from serviceproviders.xml, which is +usually found at /usr/share/mobile-broadband-provider-info/serviceproviders.xml +on Debian/Ubuntu Linux systems.

+ + +
+d = xml.parse [[
+<serviceproviders format="2.0">
+...
+<country code="za">
+    <provider>
+        <name>Cell-c</name>
+        <gsm>
+            <network-id mcc="655" mnc="07"/>
+            <apn value="internet">
+                <username>Cellcis</username>
+                <dns>196.7.0.138</dns>
+                <dns>196.7.142.132</dns>
+            </apn>
+        </gsm>
+    </provider>
+    <provider>
+        <name>MTN</name>
+        <gsm>
+            <network-id mcc="655" mnc="10"/>
+            <apn value="internet">
+                <dns>196.11.240.241</dns>
+                <dns>209.212.97.1</dns>
+            </apn>
+        </gsm>
+    </provider>
+    <provider>
+        <name>Vodacom</name>
+        <gsm>
+            <network-id mcc="655" mnc="01"/>
+            <apn value="internet">
+                <dns>196.207.40.165</dns>
+                <dns>196.43.46.190</dns>
+            </apn>
+            <apn value="unrestricted">
+                <name>Unrestricted</name>
+                <dns>196.207.32.69</dns>
+                <dns>196.43.45.190</dns>
+            </apn>
+        </gsm>
+    </provider>
+    <provider>
+        <name>Virgin Mobile</name>
+        <gsm>
+            <apn value="vdata">
+                <dns>196.7.0.138</dns>
+                <dns>196.7.142.132</dns>
+            </apn>
+        </gsm>
+    </provider>
+</country>
+....
+</serviceproviders>
+]]
+
+ +

Getting the names of the providers per-country is straightforward:

+ + +
+local t = {}
+for country in d:childtags() do
+    local providers = {}
+    t[country.attr.code] = providers
+    for provider in country:childtags() do
+        table.insert(providers,provider:child_with_name('name'):get_text())
+    end
+end
+
+pretty.dump(t)
+-->
+{
+  za = {
+    "Cell-c",
+    "MTN",
+    "Vodacom",
+    "Virgin Mobile"
+  }
+  ....
+}
+
+ +

Generating XML with 'xmlification'

+ +

This feature is inspired by the htmlify function used by +Orbit to simplify HTML generation, +except that no function environment magic is used; the tags function returns a +set of constructors for elements of the given tag names.

+ + +
+> nodes, node = xml.tags 'nodes, node'
+> = node 'alice'
+<node>alice</node>
+> = nodes { node {id='1','alice'}}
+<nodes><node id='1'>alice</node></nodes>
+
+ +

The flexibility of Lua tables is very useful here, since both the attributes and +the children of an element can be encoded naturally. The argument to these tag +constructors is either a single value (like a string) or a table where the +attributes are the named keys and the children are the array values.

+ +

Generating XML using Templates

+ +

A template is a little XML document which contains dollar-variables. The subst +method on a document is fed an array of tables containing values for these +variables. Note how the parent tag name is specified:

+ + +
+> templ = xml.parse "<node id='$id'>$name</node>"
+> = templ:subst {tag='nodes', {id=1,name='alice'},{id=2,name='john'}}
+<nodes><node id='1'>alice</node><node id='2'>john</node></nodes>
+
+ +

Substitution is very related to filtering documents. One of the annoying things +about XML is that it is a document markup language first, and a data language +second. Standard parsers will assume you really care about all those extra +text elements. Consider this fragment, which has been changed by a five-year old:

+ + +
+T = [[
+  <weather>
+    boops!
+    <current_conditions>
+      <condition data='$condition'/>
+      <temp_c data='$temp'/>
+      <bo>whoops!</bo>
+    </current_conditions>
+  </weather>
+]]
+
+ +

Conformant parsers will give you text elements with the line feed after <current_conditions> +although it makes handling the data more irritating.

+ + +
+local function parse (str)
+    return xml.parse(str,false,true)
+end
+
+ +

Second argument means 'string, not file' and third argument means use the built-in +Lua parser (instead of LuaExpat if available) which by default is not interested in +keeping such strings.

+ +

How to remove the string boops!? clone (also called filter when called as a +method) copies a LOM document. It can be passed a filter function, which is applied +to each string found. The powerful thing about this is that this function receives +structural information - the parent node, and whether this was a tag name, a text +element or a attribute name:

+ + +
+d = parse (T)
+c = d:filter(function(s,kind,parent)
+    print(stringx.strip(s),kind,parent and parent.tag or '?')
+    if kind == '*TEXT' and #parent > 1 then return nil end
+    return s
+end)
+--->
+weather    *TAG    ?
+boops!    *TEXT    weather
+current_conditions    *TAG    weather
+condition    *TAG    current_conditions
+$condition    data    condition
+temp_c    *TAG    current_conditions
+$temp    data    temp_c
+bo    *TAG    current_conditions
+whoops!    *TEXT    bo
+
+ +

We can pull out 'boops' and not 'whoops' by discarding text elements which are not +the single child of an element.

+ + + +

Extracting Data using Templates

+ +

Matching goes in the opposite direction. We have a document, and would like to +extract values from it using a pattern.

+ +

A common use of this is parsing the XML result of API queries. The +(undocumented and subsequently discontinued) Google Weather +API is a +good example. Grabbing the result of +`http://www.google.com/ig/api?weather=Johannesburg,ZA" we get something like +this, after pretty-printing:

+ + +
+<xml_api_reply version='1'>
+  <weather module_id='0' tab_id='0' mobile_zipped='1' section='0' row='0'
+
+ +

mobile_row='0'>

+ +
+<forecast_information>
+  <city data='Johannesburg, Gauteng'/>
+  <postal_code data='Johannesburg,ZA'/>
+  <latitude_e6 data=''/>
+  <longitude_e6 data=''/>
+  <forecast_date data='2010-10-02'/>
+  <current_date_time data='2010-10-02 18:30:00 +0000'/>
+  <unit_system data='US'/>
+</forecast_information>
+<current_conditions>
+  <condition data='Clear'/>
+  <temp_f data='75'/>
+  <temp_c data='24'/>
+  <humidity data='Humidity: 19%'/>
+  <icon data='/ig/images/weather/sunny.gif'/>
+  <wind_condition data='Wind: NW at 7 mph'/>
+</current_conditions>
+<forecast_conditions>
+  <day_of_week data='Sat'/>
+  <low data='60'/>
+  <high data='89'/>
+  <icon data='/ig/images/weather/sunny.gif'/>
+  <condition data='Clear'/>
+</forecast_conditions>
+....
+/weather>
+l_api_reply>
+
+ +

Assume that the above XML has been read into google. The idea is to write a +pattern looking like a template, and use it to extract some values of interest:

+ + +
+t = [[
+  <weather>
+    <current_conditions>
+      <condition data='$condition'/>
+      <temp_c data='$temp'/>
+    </current_conditions>
+  </weather>
+]]
+
+local res, ret = google:match(t)
+pretty.dump(res)
+
+ +

And the output is:

+ + +
+{
+  condition = "Clear",
+  temp = "24"
+}
+
+ +

The match method can be passed a LOM document or some text, which will be +parsed first.

+ +

But what if we need to extract values from repeated elements? Match templates may +contain 'array matches' which are enclosed in '{{..}}':

+ + +
+<weather>
+  {{<forecast_conditions>
+    <day_of_week data='$day'/>
+    <low data='$low'/>
+    <high data='$high'/>
+    <condition data='$condition'/>
+  </forecast_conditions>}}
+</weather>
+
+ +

And the match result is:

+ + +
+{
+  {
+    low = "60",
+    high = "89",
+    day = "Sat",
+    condition = "Clear",
+  },
+  {
+    low = "53",
+    high = "86",
+    day = "Sun",
+    condition = "Clear",
+  },
+  {
+    low = "57",
+    high = "87",
+    day = "Mon",
+    condition = "Clear",
+  },
+  {
+    low = "60",
+    high = "84",
+    day = "Tue",
+    condition = "Clear",
+  }
+}
+
+ +

With this array of tables, you can use tablex or List +to reshape into the desired form, if you choose. Just as with reading a Unix password +file with config, you can make the array into a map of days to conditions using:

+ + +
+tablex.pairmap('|k,v| v,v.day',conditions)
+
+ +

(Here using the alternative string lambda option)

+ +

However, xml matches can shape the structure of the output. By replacing the day_of_week +line of the template with <day_of_week data='$_'/> we get the same effect; $_ is +a special symbol that means that this captured value (or simply capture) becomes the key.

+ +

Note that $NUMBER means a numerical index, so +that $1 is the first element of the resulting array, and so forth. You can mix +numbered and named captures, but it's strongly advised to make the numbered captures +form a proper array sequence (everything from 1 to n inclusive). $0 has a +special meaning; if it is the only capture ({[0]='foo'}) then the table is +collapsed into 'foo'.

+ + +
+<weather>
+  {{<forecast_conditions>
+    <day_of_week data='$_'/>
+    <low data='$1'/>
+    <high data='$2'/>
+    <condition data='$3'/>
+  </forecast_conditions>}}
+</weather>
+
+ +

Now the result is:

+ + +
+{
+  Tue = {
+    "60",
+    "84",
+    "Clear"
+  },
+  Sun = {
+    "53",
+    "86",
+    "Clear"
+  },
+  Sat = {
+    "60",
+    "89",
+    "Clear"
+  },
+  Mon = {
+    "57",
+    "87",
+    "Clear"
+  }
+}
+
+ +

Applying matches to this config file poses another problem, because the actual +tags matched are themselves meaningful.

+ + +
+<config>
+    <alpha>1.3</alpha>
+    <beta>10</beta>
+    <name>bozo</name>
+</config>
+
+ +

So there are tag 'wildcards' which are element names ending with a hyphen.

+ + +
+<config>
+    {{<key->$value</key->}}
+</config>
+
+ +

You will then get {{alpha='1.3'},...}. The most convenient format would be +returned by this (note that _- behaves just like $_):

+ + +
+<config>
+    {{<_->$0</_->}}
+</config>
+
+ +

which would return {alpha='1.3',beta='10',name='bozo'}.

+ +

We could play this game endlessly, and encode ways of converting captures, but +the scheme is complex enough, and it's easy to do the conversion later

+ + +
+local numbers = {alpha=true,beta=true}
+for k,v in pairs(res) do
+    if numbers[v] then res[k] = tonumber(v) end
+end
+
+ +

HTML Parsing

+ +

HTML is an unusually degenerate form of XML, and Dennis Schridde has contributed +a feature which makes parsing it easier. For instance, from the tests:

+ + +
+doc = xml.parsehtml [[
+<BODY>
+Hello dolly<br>
+HTML is <b>slack</b><br>
+</BODY>
+]]
+
+asserteq(xml.tostring(doc),[[
+<body>
+Hello dolly<br/>
+HTML is <b>slack</b><br/></body>]])
+
+ +

That is, all tags are converted to lowercase, and empty HTML elements like br +are properly closed; attributes do not need to be quoted.

+ +

Also, DOCTYPE directives and comments are skipped. For truly badly formed HTML, +this is not the tool for you!

+ + + + + +
+
+
+generated by LDoc 1.4.6 +
+
+ + diff --git a/Data/Libraries/Penlight/docs/manual/07-functional.md.html b/Data/Libraries/Penlight/docs/manual/07-functional.md.html new file mode 100644 index 0000000..d4ca655 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/07-functional.md.html @@ -0,0 +1,834 @@ + + + + + Penlight Documentation + + + + +
+ +
+ +
+
+
+ + +
+ + + + + + +
+ + +

Functional Programming

+ +

+

Sequences

+ + +

A Lua iterator (in its simplest form) is a function which can be repeatedly +called to return a set of one or more values. The for in statement understands +these iterators, and loops until the function returns nil. There are standard +sequence adapters for tables in Lua (ipairs and pairs), and io.lines +returns an iterator over all the lines in a file. In the Penlight libraries, such +iterators are also called sequences. A sequence of single values (say from +io.lines) is called single-valued, whereas the sequence defined by pairs is +double-valued.

+ +

pl.seq provides a number of useful iterators, and some functions which operate +on sequences. At first sight this example looks like an attempt to write Python +in Lua, (with the sequence being inclusive):

+ + +
+> for i in seq.range(1,4) do print(i) end
+1
+2
+3
+4
+
+ +

But range is actually equivalent to Python's xrange, since it generates a +sequence, not a list. To get a list, use seq.copy(seq.range(1,10)), which +takes any single-value sequence and makes a table from the result. seq.list is +like ipairs except that it does not give you the index, just the value.

+ + +
+> for x in seq.list {1,2,3} do print(x) end
+1
+2
+3
+
+ +

enum takes a sequence and turns it into a double-valued sequence consisting of +a sequence number and the value, so enum(list(ls)) is actually equivalent to +ipairs. A more interesting example prints out a file with line numbers:

+ + +
+for i,v in seq.enum(io.lines(fname)) do print(i..' '..v) end
+
+ +

Sequences can be combined, either by 'zipping' them or by concatenating them.

+ + +
+> for x,y in seq.zip(l1,l2) do print(x,y) end
+10      1
+20      2
+30      3
+> for x in seq.splice(l1,l2) do print(x) end
+10
+20
+30
+1
+2
+3
+
+ +

seq.printall is useful for printing out single-valued sequences, and provides +some finer control over formating, such as a delimiter, the number of fields per +line, and a format string to use (@see string.format)

+ + +
+> seq.printall(seq.random(10))
+0.0012512588885159 0.56358531449324 0.19330423902097 ....
+> seq.printall(seq.random(10), ',', 4, '%4.2f')
+0.17,0.86,0.71,0.51
+0.30,0.01,0.09,0.36
+0.15,0.17,
+
+ +

map will apply a function to a sequence.

+ + +
+> seq.printall(seq.map(string.upper, {'one','two'}))
+ONE TWO
+> seq.printall(seq.map('+', {10,20,30}, 1))
+11 21 31
+
+ +

filter will filter a sequence using a boolean function (often called a +predicate). For instance, this code only prints lines in a file which are +composed of digits:

+ + +
+for l in seq.filter(io.lines(file), stringx.isdigit) do print(l) end
+
+ +

The following returns a table consisting of all the positive values in the +original table (equivalent to tablex.filter(ls, '>', 0))

+ + +
+ls = seq.copy(seq.filter(ls, '>', 0))
+
+ +

We're already encounted seq.sum when discussing input.numbers. This can also +be expressed with seq.reduce:

+ + +
+> seq.reduce(function(x,y) return x + y end, seq.list{1,2,3,4})
+10
+
+ +

seq.reduce applies a binary function in a recursive fashion, so that:

+ + +
+reduce(op,{1,2,3}) => op(1,reduce(op,{2,3}) => op(1,op(2,3))
+
+ +

it's now possible to easily generate other cumulative operations; the standard +operations declared in pl.operator are useful here:

+ + +
+> ops = require 'pl.operator'
+> -- can also say '*' instead of ops.mul
+> = seq.reduce(ops.mul,input.numbers '1 2 3 4')
+24
+
+ +

There are functions to extract statistics from a sequence of numbers:

+ + +
+> l1 = List {10,20,30}
+> l2 = List {1,2,3}
+> = seq.minmax(l1)
+10      30
+> = seq.sum(l1)
+60      3
+
+ +

It is common to get sequences where values are repeated, say the words in a file. +count_map will take such a sequence and count the values, returning a table +where the keys are the unique values, and the value associated with each key is +the number of times they occurred:

+ + +
+> t = seq.count_map {'one','fred','two','one','two','two'}
+> = t
+{one=2,fred=1,two=3}
+
+ +

This will also work on numerical sequences, but you cannot expect the result to +be a proper list, i.e. having no 'holes'. Instead, you always need to use pairs +to iterate over the result - note that there is a hole at index 5:

+ + +
+> t = seq.count_map {1,2,4,2,2,3,4,2,6}
+> for k,v in pairs(t) do print(k,v) end
+1       1
+2       4
+3       1
+4       2
+6       1
+
+ +

unique uses count_map to return a list of the unique values, that is, just +the keys of the resulting table.

+ +

last turns a single-valued sequence into a double-valued sequence with the +current value and the last value:

+ + +
+> for current,last in seq.last {10,20,30,40} do print (current,last) end
+20      10
+30      20
+40      30
+
+ +

This makes it easy to do things like identify repeated lines in a file, or +construct differences between values. filter can handle double-valued sequences +as well, so one could filter such a sequence to only return cases where the +current value is less than the last value by using operator.lt or just '<'. +This code then copies the resulting code into a table.

+ + +
+> ls = {10,9,10,3}
+> = seq.copy(seq.filter(seq.last(s),'<'))
+{9,3}
+
+ +

+

Sequence Wrappers

+ +

The functions in pl.seq cover the common patterns when dealing with sequences, +but chaining these functions together can lead to ugly code. Consider the last +example of the previous section; seq is repeated three times and the resulting +expression has to be read right-to-left. The first issue can be helped by local +aliases, so that the expression becomes copy(filter(last(s),'<')) but the +second issue refers to the somewhat unnatural order of functional application. +We tend to prefer reading operations from left to right, which is one reason why +object-oriented notation has become popular. Sequence adapters allow this +expression to be written like so:

+ + +
+seq(s):last():filter('<'):copy()
+
+ +

With this notation, the operation becomes a chain of method calls running from +left to right.

+ +

'Sequence' is not a basic Lua type, they are generally functions or callable +objects. The expression seq(s) wraps a sequence in a sequence wrapper, which +is an object which understands all the functions in pl.seq as methods. This +object then explicitly represents sequences.

+ +

As a special case, the constructor (which is when you call the table seq) will +make a wrapper for a plain list-like table. Here we apply the length operator to +a sequence of strings, and print them out.

+ + +
+> seq{'one','tw','t'} :map '#' :printall()
+3 2 1
+
+ +

As a convenience, there is a function seq.lines which behaves just like +io.lines except it wraps the result as an explicit sequence type. This takes +the first 10 lines from standard input, makes it uppercase, turns it into a +sequence with a count and the value, glues these together with the concatenation +operator, and finally prints out the sequence delimited by a newline.

+ + +
+seq.lines():take(10):upper():enum():map('..'):printall '\n'
+
+ +

Note the method upper, which is not a seq function. if an unknown method is +called, sequence wrappers apply that method to all the values in the sequence +(this is implicit use of mapmethod)

+ +

It is straightforward to create custom sequences that can be used in this way. On +Unix, /dev/random gives you an endless sequence of random bytes, so we use +take to limit the sequence, and then map to scale the result into the desired +range. The key step is to use seq to wrap the iterator function:

+ + +
+-- random.lua
+local seq = require 'pl.seq'
+
+function dev_random()
+    local f = io.open('/dev/random')
+    local byte = string.byte
+    return seq(function()
+        -- read two bytes into a string and convert into a 16-bit number
+        local s = f:read(2)
+        return byte(s,1) + 256*byte(s,2)
+    end)
+end
+
+-- print 10 random numbers from 0 to 1 !
+dev_random():take(10):map('%',100):map('/',100):printall ','
+
+ +

Another Linux one-liner depends on the /proc filesystem and makes a list of all +the currently running processes:

+ + +
+pids = seq(lfs.dir '/proc'):filter(stringx.isdigit):map(tonumber):copy()
+
+ +

This version of Penlight has an experimental feature which relies on the fact +that all Lua types can have metatables, including functions. This makes +implicit sequence wrapping possible:

+ + +
+> seq.import()
+> seq.random(5):printall(',',5,'%4.1f')
+ 0.0, 0.1, 0.4, 0.1, 0.2
+
+ +

This avoids the awkward seq(seq.random(5)) construction. Or the iterator can +come from somewhere else completely:

+ + +
+> ('one two three'):gfind('%a+'):printall(',')
+one,two,three,
+
+ +

After seq.import, it is no longer necessary to explicitly wrap sequence +functions.

+ +

But there is a price to pay for this convenience. Every function is affected, +so that any function can be used, appropriate or not:

+ + +
+> math.sin:printall()
+..seq.lua:287: bad argument #1 to '(for generator)' (number expected, got nil)
+> a = tostring
+> = a:find(' ')
+function: 0042C920
+
+ +

What function is returned? It's almost certain to be something that makes no +sense in the current context. So implicit sequences may make certain kinds of +programming mistakes harder to catch - they are best used for interactive +exploration and small scripts.

+ +

+ +

+

List Comprehensions

+ +

List comprehensions are a compact way to create tables by specifying their +elements. In Python, you can say this:

+ + +
+ls = [x for x in range(5)]  # == [0,1,2,3,4]
+
+ +

In Lua, using pl.comprehension:

+ + +
+> C = require('pl.comprehension').new()
+> = C ('x for x=1,10') ()
+{1,2,3,4,5,6,7,8,9,10}
+
+ +

C is a function which compiles a list comprehension string into a function. +In this case, the function has no arguments. The parentheses are redundant for a +function taking a string argument, so this works as well:

+ + +
+> = C 'x^2 for x=1,4' ()
+{1,4,9,16}
+> = C '{x,x^2} for x=1,4' ()
+{{1,1},{2,4},{3,9},{4,16}}
+
+ +

Note that the expression can be any function of the variable x!

+ +

The basic syntax so far is <expr> for <set>, where <set> can be anything that +the Lua for statement understands. <set> can also just be the variable, in +which case the values will come from the argument of the comprehension. Here +I'm emphasizing that a comprehension is a function which can take a list argument:

+ + +
+> = C '2*x for x' {1,2,3}
+{2,4,6}
+> dbl = C '2*x for x'
+> = dbl {10,20,30}
+{20,40,60}
+
+ +

Here is a somewhat more explicit way of saying the same thing; _1 is a +placeholder refering to the first argument passed to the comprehension.

+ + +
+> = C '2*x for _,x in pairs(_1)' {10,20,30}
+{20,40,60}
+> = C '_1(x) for x'(tostring,{1,2,3,4})
+{'1','2','3','4'}
+
+ +

This extended syntax is useful when you wish to collect the result of some +iterator, such as io.lines. This comprehension creates a function which creates +a table of all the lines in a file:

+ + +
+> f = io.open('array.lua')
+> lines = C 'line for line in _1:lines()' (f)
+> = #lines
+118
+
+ +

There are a number of functions that may be applied to the result of a +comprehension:

+ + +
+> = C 'min(x for x)' {1,44,0}
+0
+> = C 'max(x for x)' {1,44,0}
+44
+> = C 'sum(x for x)' {1,44,0}
+45
+
+ +

(These are equivalent to a reduce operation on a list.)

+ +

After the for part, there may be a condition, which filters the output. This +comprehension collects the even numbers from a list:

+ + +
+> = C 'x for x if x % 2 == 0' {1,2,3,4,5}
+{2,4}
+
+ +

There may be a number of for parts:

+ + +
+> = C '{x,y} for x = 1,2 for y = 1,2' ()
+{{1,1},{1,2},{2,1},{2,2}}
+> = C '{x,y} for x for y' ({1,2},{10,20})
+{{1,10},{1,20},{2,10},{2,20}}
+
+ +

These comprehensions are useful when dealing with functions of more than one +variable, and are not so easily achieved with the other Penlight functional forms.

+ +

+ +

+

Creating Functions from Functions

+ + +

Lua functions may be treated like any other value, although of course you cannot +multiply or add them. One operation that makes sense is function composition, +which chains function calls (so (f * g)(x) is f(g(x)).)

+ + +
+> func = require 'pl.func'
+> printf = func.compose(io.write,string.format)
+> printf("hello %s\n",'world')
+hello world
+true
+
+ +

Many functions require you to pass a function as an argument, say to apply to all +values of a sequence or as a callback. Often useful functions have the wrong +number of arguments. So there is a need to construct a function of one argument +from one of two arguments, binding the extra argument to a given value.

+ +

partial application takes a function of n arguments and returns a function of n-1 +arguments where the first argument is bound to some value:

+ + +
+> p2 = func.bind1(print,'start>')
+> p2('hello',2)
+start>  hello   2
+> ops = require 'pl.operator'
+> = tablex.filter({1,-2,10,-1,2},bind1(ops.gt,0))
+{-2,-1}
+> tablex.filter({1,-2,10,-1,2},bind1(ops.le,0))
+{1,10,2}
+
+ +

The last example unfortunately reads backwards, because bind1 alway binds the +first argument! Also unfortunately, in my youth I confused 'currying' with +'partial application', so the old name for bind1 is curry - this alias still exists.

+ +

This is a specialized form of function argument binding. Here is another way +to say the print example:

+ + +
+> p2 = func.bind(print,'start>',func._1,func._2)
+> p2('hello',2)
+start>  hello   2
+
+ +

where _1 and _2 are placeholder variables, corresponding to the first and +second argument respectively.

+ +

Having func all over the place is distracting, so it's useful to pull all of +pl.func into the local context. Here is the filter example, this time the right +way around:

+ + +
+> utils.import 'pl.func'
+> tablex.filter({1,-2,10,-1,2},bind(ops.gt, _1, 0))
+{1,10,2}
+
+ +

tablex.merge does a general merge of two tables. This example shows the +usefulness of binding the last argument of a function.

+ + +
+> S1 = {john=27, jane=31, mary=24}
+> S2 = {jane=31, jones=50}
+> intersection = bind(tablex.merge, _1, _2, false)
+> union = bind(tablex.merge, _1, _2, true)
+> = intersection(S1,S2)
+{jane=31}
+> = union(S1,S2)
+{mary=24,jane=31,john=27,jones=50}
+
+ +

When using bind with print, we got a function of precisely two arguments, +whereas we really want our function to use varargs like print. This is the role +of _0:

+ + +
+> _DEBUG = true
+> p = bind(print,'start>', _0)
+return function (fn,_v1)
+    return function(...) return fn(_v1,...) end
+end
+
+> p(1,2,3,4,5)
+start>  1       2       3       4       5
+
+ +

I've turned on the global _DEBUG flag, so that the function generated is +printed out. It is actually a function which generates the required function; +the first call binds the value of _v1 to 'start>'.

+ +

+

Placeholder Expressions

+ +

A common pattern in Penlight is a function which applies another function to all +elements in a table or a sequence, such as tablex.map or seq.filter. Lua does +anonymous functions well, although they can be a bit tedious to type:

+ + +
+> = tablex.map(function(x) return x*x end, {1,2,3,4})
+{1,4,9,16}
+
+ +

pl.func allows you to define placeholder expressions, which can cut down on +the typing required, and also make your intent clearer. First, we bring contents +of pl.func into our context, and then supply an expression using placeholder +variables, such as _1,_2,etc. (C++ programmers will recognize this from the +Boost libraries.)

+ + +
+> utils.import 'pl.func'
+> = tablex.map(_1*_1, {1,2,3,4})
+{1,4,9,16}
+
+ +

Functions of up to 5 arguments can be generated.

+ + +
+> = tablex.map2(_1+_2,{1,2,3}, {10,20,30})
+{11,22,33}
+
+ +

These expressions can use arbitrary functions, altho they must first be +registered with the functional library. func.register brings in a single +function, and func.import brings in a whole table of functions, such as math.

+ + +
+> sin = register(math.sin)
+> = tablex.map(sin(_1), {1,2,3,4})
+{0.8414709848079,0.90929742682568,0.14112000805987,-0.75680249530793}
+> import 'math'
+> = tablex.map(cos(2*_1),{1,2,3,4})
+{-0.41614683654714,-0.65364362086361,0.96017028665037,-0.14550003380861}
+
+ +

A common operation is calling a method of a set of objects:

+ + +
+> = tablex.map(_1:sub(1,1), {'one','four','x'})
+{'o','f','x'}
+
+ +

There are some restrictions on what operators can be used in PEs. For instance, +because the __len metamethod cannot be overriden by plain Lua tables, we need +to define a special function to express `#_1':

+ + +
+> = tablex.map(Len(_1), {'one','four','x'})
+{3,4,1}
+
+ +

Likewise for comparison operators, which cannot be overloaded for different +types, and thus also have to be expressed as a special function:

+ + +
+> = tablex.filter(Gt(_1,0), {1,-1,2,4,-3})
+{1,2,4}
+
+ +

It is useful to express the fact that a function returns multiple values. For +instance, tablex.pairmap expects a function that will be called with the key +and the value, and returns the new value and the key, in that order.

+ + +
+> = pairmap(Args(_2,_1:upper()),{fred=1,alice=2})
+{ALICE=2,FRED=1}
+
+ +

PEs cannot contain nil values, since PE function arguments are represented as +an array. Instead, a special value called Nil is provided. So say +_1:f(Nil,1) instead of _1:f(nil,1).

+ +

A placeholder expression cannot be automatically used as a Lua function. The +technical reason is that the call operator must be overloaded to construct +function calls like _1(1). If you want to force a PE to return a function, use +func.I.

+ + +
+> = tablex.map(_1(10),{I(2*_1),I(_1*_1),I(_1+2)})
+{20,100,12}
+
+ +

Here we make a table of functions taking a single argument, and then call them +all with a value of 10.

+ +

The essential idea with PEs is to 'quote' an expression so that it is not +immediately evaluated, but instead turned into a function that can be applied +later to some arguments. The basic mechanism is to wrap values and placeholders +so that the usual Lua operators have the effect of building up an expression +tree. (It turns out that you can do symbolic algebra using PEs, see +symbols.lua in the examples directory, and its test runner testsym.lua, which +demonstrates symbolic differentiation.)

+ +

The rule is that if any operator has a PE operand, the result will be quoted. +Sometimes we need to quote things explicitly. For instance, say we want to pass a +function to a filter that must return true if the element value is in a set. +set[_1] is the obvious expression, but it does not give the desired result, +since it evaluates directly, giving nil. Indexing works differently than a +binary operation like addition (set+_1 is properly quoted) so there is a need +for an explicit quoting or wrapping operation. This is the job of the _ +function; the PE in this case should be _(set)[_1]. This works for functions +as well, as a convenient alternative to registering functions: _(math.sin)(_1). +This is equivalent to using the `lines' method:

+ + +
+for line in I(_(f):read()) do print(line) end
+
+ +

Now this will work for any 'file-like' object which which has a read method +returning the next line. If you had a LuaSocket client which was being 'pushed' +by lines sent from a server, then _(s):receive '*l' would create an iterator +for accepting input. These forms can be convenient for adapting your data flow so +that it can be passed to the sequence functions in `pl.seq'.

+ +

Placeholder expressions can be mixed with sequence wrapper expressions. +lexer.lua will give us a double-valued sequence of tokens, where the first +value is a type, and the second is a value. We filter out only the values where +the type is 'iden', extract the actual value using map, get the unique values +and finally copy to a list.

+ + +
+> str = 'for i=1,10 do for j = 1,10 do print(i,j) end end'
+> = seq(lexer.lua(str)):filter('==','iden'):map(_2):unique():copy()
+{i,print,j}
+
+ +

This is a particularly intense line (and I don't always suggest making everything +a one-liner!); the key is the behaviour of map, which will take both values of +the sequence, so _2 returns the value part. (Since filter here takes extra +arguments, it only operates on the type values.)

+ +

There are some performance considerations to using placeholder expressions. +Instantiating a PE requires constructing and compiling a function, which is not +such a fast operation. So to get best performance, factor out PEs from loops like +this;

+ + +
+local fn = I(_1:f() + _2:g())
+for i = 1,n do
+    res[i] = tablex.map2(fn,first[i],second[i])
+end
+
+ + + +
+
+
+generated by LDoc 1.4.6 +
+
+ + diff --git a/Data/Libraries/Penlight/docs/manual/08-additional.md.html b/Data/Libraries/Penlight/docs/manual/08-additional.md.html new file mode 100644 index 0000000..d13ac6e --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/08-additional.md.html @@ -0,0 +1,815 @@ + + + + + Penlight Documentation + + + + +
+ +
+ +
+
+
+ + +
+ + + + + + +
+ + +

Additional Libraries

+ +

Libraries in this section are no longer considered to be part of the Penlight +core, but still provide specialized functionality when needed.

+ +

+ +

+

Simple Input Patterns

+ +

Lua string pattern matching is very powerful, and usually you will not need a +traditional regular expression library. Even so, sometimes Lua code ends up +looking like Perl, which happens because string patterns are not always the +easiest things to read, especially for the casual reader. Here is a program +which needs to understand three distinct date formats:

+ + +
+-- parsing dates using Lua string patterns
+months={Jan=1,Feb=2,Mar=3,Apr=4,May=5,Jun=6,
+Jul=7,Aug=8,Sep=9,Oct=10,Nov=11,Dec=12}
+
+function check_and_process(d,m,y)
+    d = tonumber(d)
+    m = tonumber(m)
+    y = tonumber(y)
+    ....
+end
+
+for line in f:lines() do
+    -- ordinary (English) date format
+    local d,m,y = line:match('(%d+)/(%d+)/(%d+)')
+    if d then
+        check_and_process(d,m,y)
+    else -- ISO date??
+        y,m,d = line:match('(%d+)%-(%d+)%-(%d+)')
+        if y then
+            check_and_process(d,m,y)
+        else -- <day> <month-name> <year>?
+            d,mm,y = line:match('%(d+)%s+(%a+)%s+(%d+)')
+            m = months[mm]
+            check_and_process(d,m,y)
+        end
+    end
+end
+
+ +

These aren't particularly difficult patterns, but already typical issues are +appearing, such as having to escape '-'. Also, string.match returns its +captures, so that we're forced to use a slightly awkward nested if-statement.

+ +

Verification issues will further cloud the picture, since regular expression +people try to enforce constraints (like year cannot be more than four digits) +using regular expressions, on the usual grounds that you shouldn't stop using a +hammer when you are enjoying yourself.

+ +

pl.sip provides a simple, intuitive way to detect patterns in strings and +extract relevant parts.

+ + +
+> sip = require 'pl.sip'
+> dump = require('pl.pretty').dump
+> res = {}
+> c = sip.compile 'ref=$S{file}:$d{line}'
+> = c('ref=hello.c:10',res)
+true
+> dump(res)
+{
+  line = 10,
+  file = "hello.c"
+}
+> = c('ref=long name, no line',res)
+false
+
+ +

sip.compile creates a pattern matcher function, which takes a string and a +table as arguments. If the string matches the pattern, then true is returned +and the table is populated according to the captures within the pattern.

+ +

Here is another version of the date parser:

+ + +
+-- using SIP patterns
+function check(t)
+    check_and_process(t.day,t.month,t.year)
+end
+
+shortdate = sip.compile('$d{day}/$d{month}/$d{year}')
+longdate = sip.compile('$d{day} $v{mon} $d{year}')
+isodate = sip.compile('$d{year}-$d{month}-$d{day}')
+
+for line in f:lines() do
+    local res = {}
+    if shortdate(str,res) then
+        check(res)
+    elseif isodate(str,res) then
+        check(res)
+    elseif longdate(str,res) then
+        res.month = months[res.mon]
+        check(res)
+    end
+end
+
+ +

SIP captures start with '$', then a one-character type, and then an +optional variable name in curly braces.

+ + +
+Type      Meaning
+v         identifier
+i         possibly signed integer
+f         floating-point number
+r         rest of line
+q         quoted string (quoted using either ' or ")
+p         a path name
+(         anything inside balanced parentheses
+[         anything inside balanced brackets
+{         anything inside balanced curly brackets
+<         anything inside balanced angle brackets
+
+ +

If a type is not one of the above, then it's assumed to be one of the standard +Lua character classes, and will match one or more repetitions of that class. +Any spaces you leave in your pattern will match any number of spaces, including +zero, unless the spaces are between two identifier characters or patterns +matching them; in that case, at least one space will be matched.

+ +

SIP captures (like $v{mon}) do not have to be named. You can use just $v, but +you have to be consistent; if a pattern contains unnamed captures, then all +captures must be unnamed. In this case, the result table is a simple list of +values.

+ +

sip.match is a useful shortcut if you want to compile and match in one call, +without saving the compiled pattern. It caches the result, so it is not much +slower than explicitly using sip.compile.

+ + +
+> sip.match('($q{first},$q{second})','("john","smith")',res)
+true
+> res
+{second='smith',first='john'}
+> res = {}
+> sip.match('($q,$q)','("jan","smit")',res)  -- unnamed captures
+true
+> res
+{'jan','smit'}
+> sip.match('($q,$q)','("jan", "smit")',res)
+false   ---> oops! Can't handle extra space!
+> sip.match('( $q , $q )','("jan", "smit")',res)
+true
+
+ +

As a general rule, allow for whitespace in your patterns.

+ +

Finally, putting a '$' at the end of a pattern means 'capture the rest of the +line, starting at the first non-space'. It is a shortcut for '$r{rest}', +or just '$r' if no named captures are used.

+ + +
+> sip.match('( $q , $q ) $','("jan", "smit") and a string',res)
+true
+> res
+{'jan','smit','and a string'}
+> res = {}
+> sip.match('( $q{first} , $q{last} ) $','("jan", "smit") and a string',res)
+true
+> res
+{first='jan',rest='and a string',last='smit'}
+
+ +

+ +

+

Command-line Programs with Lapp

+ +

pl.lapp is a small and focused Lua module which aims to make standard +command-line parsing easier and intuitive. It implements the standard GNU style, +i.e. short flags with one letter start with '-', and there may be an additional +long flag which starts with '--'. Generally options which take an argument expect +to find it as the next parameter (e.g. 'gcc test.c -o test') but single short +options taking a value can dispense with the space (e.g. 'head -n4 +test.c' or gcc -I/usr/include/lua/5.1 ...)

+ +

As far as possible, Lapp will convert parameters into their equivalent Lua types, +i.e. convert numbers and convert filenames into file objects. If any conversion +fails, or a required parameter is missing, an error will be issued and the usage +text will be written out. So there are two necessary tasks, supplying the flag +and option names and associating them with a type.

+ +

For any non-trivial script, even for personal consumption, it's necessary to +supply usage text. The novelty of Lapp is that it starts from that point and +defines a loose format for usage strings which can specify the names and types of +the parameters.

+ +

An example will make this clearer:

+ + +
+-- scale.lua
+  lapp = require 'pl.lapp'
+  local args = lapp [[
+  Does some calculations
+    -o,--offset (default 0.0)  Offset to add to scaled number
+    -s,--scale  (number)  Scaling factor
+    <number> (number)  Number to be scaled
+  ]]
+
+  print(args.offset + args.scale * args.number)
+
+ +

Here is a command-line session using this script:

+ + +
+$ lua scale.lua
+scale.lua:missing required parameter: scale
+
+Does some calculations
+ -o,--offset (default 0.0)  Offset to add to scaled number
+ -s,--scale  (number)  Scaling factor
+  <number> (number )  Number to be scaled
+
+$ lua scale.lua -s 2.2 10
+22
+
+$ lua scale.lua -s 2.2 x10
+scale.lua:unable to convert to number: x10
+
+....(usage as before)
+
+ +

There are two kinds of lines in Lapp usage strings which are meaningful; option +and parameter lines. An option line gives the short option, optionally followed +by the corresponding long option. A type specifier in parentheses may follow. +Similarly, a parameter line starts with '', followed by a type +specifier.

+ +

Type specifiers usually start with a type name: one of 'boolean', 'string','number','file-in' or +'file-out'. You may leave this out, but then must say 'default' followed by a value. +If a flag or parameter has a default, it is not required and is set to the default. The actual +type is deduced from this value (number, string, file or boolean) if not provided directly. +'Deduce' is a fancy word for 'guess' and it can be wrong, e.g '(default 1)' +will always be a number. You can say '(string default 1)' to override the guess. +There are file values for the predefined console streams: stdin, stdout, stderr.

+ +

The boolean type is the default for flags. Not providing the type specifier is equivalent to +'(boolean default false)`. If the flag is meant to be 'turned off' then either the full +'(boolean default true)` or the shortcut '(default true)' will work.

+ +

An alternative to default is optional:

+ + +
+local lapp = require 'pl.lapp'
+local args = lapp [[
+   --cmd (optional string) Command to run.
+]]
+
+if args.cmd then
+  os.execute(args.cmd)
+end
+
+ +

Here we're implying that cmd need not be specified (just as with default) but if not +present, then args.cmd is nil, which will always test false.

+ +

The rest of the line is ignored and can be used for explanatory text.

+ +

This script shows the relation between the specified parameter names and the +fields in the output table.

+ + +
+-- simple.lua
+local args = require ('pl.lapp') [[
+Various flags and option types
+  -p          A simple optional flag, defaults to false
+  -q,--quiet  A simple flag with long name
+  -o  (string)  A required option with argument
+  -s  (default 'save') Optional string with default 'save' (single quotes ignored)
+  -n  (default 1) Optional numerical flag with default 1
+  -b  (string default 1)  Optional string flag with default '1' (type explicit)
+  <input> (default stdin)  Optional input file parameter, reads from stdin
+]]
+
+for k,v in pairs(args) do
+    print(k,v)
+end
+
+ +

I've just dumped out all values of the args table; note that args.quiet has +become true, because it's specified; args.p defaults to false. If there is a long +name for an option, that will be used in preference as a field name. A type or +default specifier is not necessary for simple flags, since the default type is +boolean.

+ + +
+$ simple -o test -q simple.lua
+p       false
+input   file (781C1BD8)
+quiet   true
+o       test
+input_name      simple.lua
+D:\dev\lua\lapp>simple -o test simple.lua one two three
+1       one
+2       two
+3       three
+p       false
+quiet   false
+input   file (781C1BD8)
+o       test
+input_name      simple.lua
+
+ +

The parameter input has been set to an open read-only file object - we know it +must be a read-only file since that is the type of the default value. The field +input_name is automatically generated, since it's often useful to have access to +the original filename.

+ +

Notice that any extra parameters supplied will be put in the result table with +integer indices, i.e. args[i] where i goes from 1 to #args.

+ +

Files don't really have to be closed explicitly for short scripts with a quick +well-defined mission, since the result of garbage-collecting file objects is to +close them.

+ +

Enforcing a Range and Enumerations

+ +

The type specifier can also be of the form '(' MIN '..' MAX ')' or a set of strings +separated by '|'.

+ + +
+local lapp = require 'pl.lapp'
+local args = lapp [[
+    Setting ranges
+    <x> (1..10)  A number from 1 to 10
+    <y> (-5..1e6) Bigger range
+    <z> (slow|medium|fast)
+]]
+
+print(args.x,args.y)
+
+ +

Here the meaning of ranges is that the value is greater or equal to MIN and less or equal +to MAX. +An 'enum' is a string that can only have values from a specified set.

+ +

Custom Types

+ +

There is no builti-in way to force a parameter to be a whole number, but +you may define a custom type that does this:

+ + +
+lapp = require ('pl.lapp')
+
+lapp.add_type('integer','number',
+    function(x)
+        lapp.assert(math.ceil(x) == x, 'not an integer!')
+    end
+)
+
+local args =  lapp [[
+    <ival> (integer) Process PID
+]]
+
+print(args.ival)
+
+ +

lapp.add_type takes three parameters, a type name, a converter and a constraint +function. The constraint function is expected to throw an assertion if some +condition is not true; we use lapp.assert because it fails in the standard way +for a command-line script. The converter argument can either be a type name known +to Lapp, or a function which takes a string and generates a value.

+ +

Here's a useful custom type that allows dates to be input as pl.Date values:

+ + +
+local df = Date.Format()
+
+lapp.add_type('date',
+    function(s)
+        local d,e = df:parse(s)
+        lapp.assert(d,e)
+        return d
+    end
+)
+
+ +

'varargs' Parameter Arrays

+ + +
+lapp = require 'pl.lapp'
+local args = lapp [[
+Summing numbers
+    <numbers...> (number) A list of numbers to be summed
+]]
+
+local sum = 0
+for i,x in ipairs(args.numbers) do
+    sum = sum + x
+end
+print ('sum is '..sum)
+
+ +

The parameter number has a trailing '...', which indicates that this parameter is +a 'varargs' parameter. It must be the last parameter, and args.number will be an +array.

+ +

Consider this implementation of the head utility from Mac OS X:

+ + +
+-- implements a BSD-style head
+-- (see http://www.manpagez.com/man/1/head/osx-10.3.php)
+
+lapp = require ('pl.lapp')
+
+local args = lapp [[
+Print the first few lines of specified files
+   -n         (default 10)    Number of lines to print
+   <files...> (default stdin) Files to print
+]]
+
+-- by default, lapp converts file arguments to an actual Lua file object.
+-- But the actual filename is always available as <file>_name.
+-- In this case, 'files' is a varargs array, so that 'files_name' is
+-- also an array.
+local nline = args.n
+local nfile = #args.files
+for i = 1,nfile do
+    local file = args.files[i]
+    if nfile > 1 then
+        print('==> '..args.files_name[i]..' <==')
+    end
+    local n = 0
+    for line in file:lines() do
+        print(line)
+        n = n + 1
+        if n == nline then break end
+    end
+end
+
+ +

Note how we have access to all the filenames, because the auto-generated field +files_name is also an array!

+ +

(This is probably not a very considerate script, since Lapp will open all the +files provided, and only close them at the end of the script. See the xhead.lua +example for another implementation.)

+ +

Flags and options may also be declared as vararg arrays, and can occur anywhere. +If there is both a short and long form, then the trailing "..." must happen after the long form, +for example "-x,--network... (string)...",

+ +

Bear in mind that short options can be combined (like 'tar -xzf'), so it's +perfectly legal to have '-vvv'. But normally the value of args.v is just a simple +true value.

+ + +
+local args = require ('pl.lapp') [[
+   -v...  Verbosity level; can be -v, -vv or -vvv
+]]
+vlevel = not args.v[1] and 0 or #args.v
+print(vlevel)
+
+ +

The vlevel assigment is a bit of Lua voodoo, so consider the cases:

+ + +
+* No -v flag, v is just { false }
+* One -v flags, v is { true }
+* Two -v flags, v is { true, true }
+* Three -v flags, v is { true, true, true }
+
+ +

Defining a Parameter Callback

+ +

If a script implements lapp.callback, then Lapp will call it after each +argument is parsed. The callback is passed the parameter name, the raw unparsed +value, and the result table. It is called immediately after assignment of the +value, so the corresponding field is available.

+ + +
+lapp = require ('pl.lapp')
+
+function lapp.callback(parm,arg,args)
+    print('+',parm,arg)
+end
+
+local args = lapp [[
+Testing parameter handling
+    -p               Plain flag (defaults to false)
+    -q,--quiet       Plain flag with GNU-style optional long name
+    -o  (string)     Required string option
+    -n  (number)     Required number option
+    -s (default 1.0) Option that takes a number, but will default
+    <start> (number) Required number argument
+    <input> (default stdin)  A parameter which is an input file
+    <output> (default stdout) One that is an output file
+]]
+print 'args'
+for k,v in pairs(args) do
+    print(k,v)
+end
+
+ +

This produces the following output:

+ + +
+$ args -o name -n 2 10 args.lua
++       o       name
++       n       2
++       start   10
++       input   args.lua
+args
+p       false
+s       1
+input_name      args.lua
+quiet   false
+output  file (781C1B98)
+start   10
+input   file (781C1BD8)
+o       name
+n       2
+
+ +

Callbacks are needed when you want to take action immediately on parsing an +argument.

+ +

Slack Mode

+ +

If you'd like to use a multi-letter 'short' parameter you need to set +the lapp.slack variable to true.

+ +

In the following example we also see how default false and default true flags can be used +and how to overwrite the default -h help flag (--help still works fine) - this applies +to non-slack mode as well.

+ + +
+-- Parsing the command line ----------------------------------------------------
+-- test.lua
+local lapp = require 'pl.lapp'
+local pretty = require 'pl.pretty'
+lapp.slack = true
+local args = lapp [[
+Does some calculations
+   -v, --video              (string)             Specify input video
+   -w, --width              (default 256)        Width of the video
+   -h, --height             (default 144)        Height of the video
+   -t, --time               (default 10)         Seconds of video to process
+   -sk,--seek               (default 0)          Seek number of seconds
+   -f1,--flag1                                   A false flag
+   -f2,--flag2                                   A false flag
+   -f3,--flag3              (default true)       A true flag
+   -f4,--flag4              (default true)       A true flag
+]]
+
+pretty.dump(args)
+
+ +

And here we can see the output of test.lua:

+ + +
+$> lua test.lua -v abc --time 40 -h 20 -sk 15 --flag1 -f3
+---->
+{
+  width = 256,
+  flag1 = true,
+  flag3 = false,
+  seek = 15,
+  flag2 = false,
+  video = abc,
+  time = 40,
+  height = 20,
+  flag4 = true
+}
+
+ +

+

Simple Test Framework

+ +

pl.test was originally developed for the sole purpose of testing Penlight itself, +but you may find it useful for your own applications. (There are many other options.)

+ +

Most of the goodness is in test.asserteq. It uses tablex.deepcompare on its two arguments, +and by default quits the test application with a non-zero exit code, and an informative +message printed to stderr:

+ + +
+local test = require 'pl.test'
+
+test.asserteq({10,20,30},{10,20,30.1})
+
+--~ test-test.lua:3: assertion failed
+--~ got:    {
+--~  [1] = 10,
+--~  [2] = 20,
+--~  [3] = 30
+--~ }
+--~ needed:    {
+--~  [1] = 10,
+--~  [2] = 20,
+--~  [3] = 30.1
+--~ }
+--~ these values were not equal
+
+ +

This covers most cases but it's also useful to compare strings using string.match

+ + +
+-- must start with bonzo the dog
+test.assertmatch ('bonzo the dog is here','^bonzo the dog')
+-- must end with an integer
+test.assertmatch ('hello 42','%d+$')
+
+ +

Since Lua errors are usually strings, this matching strategy is used to test 'exceptions':

+ + +
+test.assertraise(function()
+    local t = nil
+    print(t.bonzo)
+end,'nil value')
+
+ +

(Some care is needed to match the essential part of the thrown error if you care +for portability, since in Lua 5.2 +the exact error is "attempt to index local 't' (a nil value)" and in Lua 5.3 the error +is "attempt to index a nil value (local 't')")

+ +

There is an extra optional argument to these test functions, which is helpful when writing +test helper functions. There you want to highlight the failed line, not the actual call +to asserteq or assertmatch - line 33 here is the call to is_iden

+ + +
+function is_iden(str)
+    test.assertmatch(str,'^[%a_][%w_]*$',1)
+end
+
+is_iden 'alpha_dog'
+is_iden '$dollars'
+
+--~ test-test.lua:33: assertion failed
+--~ got:    "$dollars"
+--~ needed:    "^[%a_][%w_]*$"
+--~ these strings did not match
+
+ +

Useful Lua functions often return multiple values, and test.tuple is a convenient way to +capture these values, whether they contain nils or not.

+ + +
+T = test.tuple
+
+--- common error pattern
+function failing()
+    return nil,'failed'
+end
+
+test.asserteq(T(failing()),T(nil,'failed'))
+
+ + + +
+
+
+generated by LDoc 1.4.6 +
+
+ + diff --git a/Data/Libraries/Penlight/docs/manual/09-discussion.md.html b/Data/Libraries/Penlight/docs/manual/09-discussion.md.html new file mode 100644 index 0000000..4e7dd69 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/09-discussion.md.html @@ -0,0 +1,233 @@ + + + + + Penlight Documentation + + + + +
+ +
+ +
+
+
+ + +
+ + + + + + +
+ + +

Technical Choices

+ +

+

Modularity and Granularity

+ +

In an ideal world, a program should only load the libraries it needs. Penlight is +intended to work in situations where an extra 100Kb of bytecode could be a +problem. It is straightforward but tedious to load exactly what you need:

+ + +
+local data = require 'pl.data'
+local List = require 'pl.List'
+local array2d = require 'pl.array2d'
+local seq = require 'pl.seq'
+local utils = require 'pl.utils'
+
+ +

This is the style that I follow in Penlight itself, so that modules don't mess +with the global environment; also, stringx.import() is not used because it will +update the global string table.

+ +

But require 'pl' is more convenient in scripts; the question is how to ensure +that one doesn't load the whole kitchen sink as the price of convenience. The +strategy is to only load modules when they are referenced. In 'init.lua' (which +is loaded by require 'pl') a metatable is attached to the global table with an +__index metamethod. Any unknown name is looked up in the list of modules, and +if found, we require it and make that module globally available. So when +tablex.deepcompare is encountered, looking up tablex causes 'pl.tablex' to be +required. .

+ +

Modifying the behaviour of the global table has consequences. For instance, there +is the famous module strict which comes with Lua itself (perhaps the only +standard Lua module written in Lua itself) which also does this modification so +that global variiables must be defined before use. So the implementation in +'init.lua' allows for a 'not found' hook, which 'pl.strict.lua' uses. Other +libraries may install their own metatables for _G, but Penlight will now +forward any unknown name to the __index defined by the original metatable.

+ +

But the strategy is worth the effort: the old 'kitchen sink' 'init.lua' would +pull in about 260K of bytecode, whereas now typical programs use about 100K less, +and short scripts even better - for instance, if they were only needing +functionality in utils.

+ +

There are some functions which mark their output table with a special metatable, +when it seems particularly appropriate. For instance, tablex.makeset creates a +Set, and seq.copy creates a List. But this does not automatically result in +the loading of pl.Set and pl.List; only if you try to access any of these +methods. In 'utils.lua', there is an exported table called stdmt:

+ + +
+stdmt = { List = {}, Map = {}, Set = {}, MultiMap = {} }
+
+ +

If you go through 'init.lua', then these plain little 'identity' tables get an +__index metamethod which forces the loading of the full functionality. Here is +the code from 'list.lua' which starts the ball rolling for lists:

+ + +
+List = utils.stdmt.List
+List.__index = List
+List._name = "List"
+List._class = List
+
+ +

The 'load-on-demand' strategy helps to modularize the library. Especially for +more casual use, require 'pl' is a good compromise between convenience and +modularity.

+ +

In this current version, I have generally reduced the amount of trickery +involved. Previously, Map was defined in pl.class; now it is sensibly defined +in pl.Map; pl.class only contains the basic class mechanism (and returns that +function.) For consistency, List is returned directly by require 'pl.List' +(note the uppercase 'L'), Also, the amount of module dependencies in the +non-core libraries like pl.config have been reduced.

+ +

+

Defining what is Callable

+ +

'utils.lua' exports function_arg which is used extensively throughout Penlight. +It defines what is meant by 'callable'. Obviously true functions are immediately +passed back. But what about strings? The first option is that it represents an +operator in 'operator.lua', so that '<' is just an alias for operator.lt.

+ +

We then check whether there is a function factory defined for the metatable of +the value.

+ +

(It is true that strings can be made callable, but in practice this turns out to +be a cute but dubious idea, since all strings share the same metatable. A +common programming error is to pass the wrong kind of object to a function, and +it's better to get a nice clean 'attempting to call a string' message rather than +some obscure trace from the bowels of your library.)

+ +

The other module that registers a function factory is pl.func. Placeholder +expressions cannot be directly calleable, and so need to be instantiated and +cached in as efficient way as possible.

+ +

(An inconsistency is that utils.is_callable does not do this thorough check.)

+ + + + +
+
+
+generated by LDoc 1.4.6 +
+
+ + -- cgit v1.1-26-g67d0