Penlight Documentation

Date: Sat, 30 Oct 2021 11:32:16 +0800 Subject: + Penlight --- .../Penlight/docs/manual/01-introduction.md.html | 843 ++++++++++ .../Penlight/docs/manual/02-arrays.md.html | 914 +++++++++++ .../Penlight/docs/manual/03-strings.md.html | 397 +++++ .../Penlight/docs/manual/04-paths.md.html | 329 ++++ .../Penlight/docs/manual/05-dates.md.html | 269 ++++ .../Libraries/Penlight/docs/manual/06-data.md.html | 1633 ++++++++++++++++++++ .../Penlight/docs/manual/07-functional.md.html | 834 ++++++++++ .../Penlight/docs/manual/08-additional.md.html | 815 ++++++++++ .../Penlight/docs/manual/09-discussion.md.html | 233 +++ 9 files changed, 6267 insertions(+) create mode 100644 Data/Libraries/Penlight/docs/manual/01-introduction.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/02-arrays.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/03-strings.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/04-paths.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/05-dates.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/06-data.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/07-functional.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/08-additional.md.html create mode 100644 Data/Libraries/Penlight/docs/manual/09-discussion.md.html (limited to 'Data/Libraries/Penlight/docs/manual') diff --git a/Data/Libraries/Penlight/docs/manual/01-introduction.md.html b/Data/Libraries/Penlight/docs/manual/01-introduction.md.html new file mode 100644 index 0000000..fe42256 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/01-introduction.md.html @@ -0,0 +1,843 @@ + + + + + Penlight Documentation + + + + +

+ +

+ + +

+ + + + + + +

+ + +

Introduction

+ +

Purpose

+ +

It is often said of Lua that it does not include batteries. That is because the +goal of Lua is to produce a lean expressive language that will be used on all +sorts of machines, (some of which don't even have hierarchical filesystems). The +Lua language is the equivalent of an operating system kernel; the creators of Lua +do not see it as their responsibility to create a full software ecosystem around +the language. That is the role of the community.

+ +

A principle of software design is to recognize common patterns and reuse them. If +you find yourself writing things like `io.write(string.format('the answer is %d +',42))` more than a number of times then it becomes useful just to define a +function printf. This is good, not just because repeated code is harder to +maintain, but because such code is easier to read, once people understand your +libraries.

+ +

Penlight captures many such code patterns, so that the intent of your code +becomes clearer. For instance, a Lua idiom to copy a table is {unpack(t)}, but +this will only work for 'small' tables (for a given value of 'small') so it is +not very robust. Also, the intent is not clear. So tablex.deepcopy is provided, +which will also copy nested tables and and associated metatables, so it can be +used to clone complex objects.

+ +

The default error handling policy follows that of the Lua standard libraries: if +a argument is the wrong type, then an error will be thrown, but otherwise we +return nil,message if there is a problem. There are some exceptions; functions +like input.fields default to shutting down the program immediately with a +useful message. This is more appropriate behaviour for a script than providing +a stack trace. (However, this default can be changed.) The lexer functions always +throw errors, to simplify coding, and so should be wrapped in pcall.

+ +

If you are used to Python conventions, please note that all indices consistently +start at 1.

+ +

The Lua function table.foreach has been deprecated in favour of the for in +statement, but such an operation becomes particularly useful with the +higher-order function support in Penlight. Note that tablex.foreach reverses +the order, so that the function is passed the value and then the key. Although +perverse, this matches the intended use better.

+ +

The only important external dependence of Penlight is +LuaFileSystem +(lfs), and if you want dir.copyfile to work cleanly on Windows, you will need +either alien or be using +LuaJIT as well. (The fallback is to call the equivalent +shell commands.)

+ +

To Inject or not to Inject?

+ +

It was realized a long time ago that large programs needed a way to keep names +distinct by putting them into tables (Lua), namespaces (C++) or modules +(Python). It is obviously impossible to run a company where everyone is called +'Bruce', except in Monty Python skits. These 'namespace clashes' are more of a +problem in a simple language like Lua than in C++, because C++ does more +complicated lookup over 'injected namespaces'. However, in a small group of +friends, 'Bruce' is usually unique, so in particular situations it's useful to +drop the formality and not use last names. It depends entirely on what kind of +program you are writing, whether it is a ten line script or a ten thousand line +program.

+ +

So the Penlight library provides the formal way and the informal way, without +imposing any preference. You can do it formally like:

+ + +

+local utils = require 'pl.utils'
+utils.printf("%s\n","hello, world!")
+

+ +

or informally like:

+ + +

+require 'pl'
+utils.printf("%s\n","That feels better")
+

+ +

require 'pl' makes all the separate Penlight modules available, without needing +to require them each individually.

+ +

Generally, the formal way is better when writing modules, since then there are no +global side-effects and the dependencies of your module are made explicit.

+ +

Andrew Starks has contributed another way, which balances nicely between the +formal need to keep the global table uncluttered and the informal need for +convenience. require'pl.import_into' returns a function, which accepts a table +for injecting Penlight into, or if no table is given, it passes back a new one.

+ + +

+local pl = require'pl.import_into'()
+

+ +

The table pl is a 'lazy table' which loads modules as needed, so we can then +use pl.utils.printf and so forth, without an explicit `require' or harming any +globals.

+ +

If you are using _ENV with Lua 5.2 to define modules, then here is a way to +make Penlight available within a module:

+ + +

+local _ENV,M = require 'pl.import_into' ()
+
+function answer ()
+    -- all the Penlight modules are available!
+    return pretty.write(utils.split '10 20  30', '')
+end
+
+return M
+

+ +

The default is to put Penlight into \_ENV, which has the unintended effect of +making it available from the module (much as module(...,package.seeall) does). +To satisfy both convenience and safety, you may pass true to this function, and +then the module M is not the same as \_ENV, but only contains the exported +functions.

+ +

Otherwise, Penlight will not bring in functions into the global table, or +clobber standard tables like 'io'. require('pl') will bring tables like +'utils','tablex',etc into the global table if they are used. This +'load-on-demand' strategy ensures that the whole kitchen sink is not loaded up +front, so this method is as efficient as explicitly loading required modules.

+ +

You have an option to bring the pl.stringx methods into the standard string +table. All strings have a metatable that allows for automatic lookup in string, +so we can say s:upper(). Importing stringx allows for its functions to also +be called as methods: s:strip(),etc:

+ + +

+require 'pl'
+stringx.import()
+

+ +

or, more explicitly:

+ + +

+require('pl.stringx').import()
+

+ +

A more delicate operation is importing tables into the local environment. This is +convenient when the context makes the meaning of a name very clear:

+ + +

+> require 'pl'
+> utils.import(math)
+> = sin(1.2)
+0.93203908596723
+

+ +

utils.import can also be passed a module name as a string, which is first +required and then imported. If used in a module, import will bring the symbols +into the module context.

+ +

Keeping the global scope simple is very necessary with dynamic languages. Using +global variables in a big program is always asking for trouble, especially since +you do not have the spell-checking provided by a compiler. The pl.strict +module enforces a simple rule: globals must be 'declared'. This means that they +must be assigned before use; assigning to nil is sufficient.

+ + +

+> require 'pl.strict'
+> print(x)
+stdin:1: variable 'x' is not declared
+> x = nil
+> print(x)
+nil
+

+ +

The strict module provided by Penlight is compatible with the 'load-on-demand' +scheme used by require 'pl.

+ +

strict also disallows assignment to global variables, except in the main +program. Generally, modules have no business messing with global scope; if you +must do it, then use a call to rawset. Similarly, if you have to check for the +existence of a global, use rawget.

+ +

If you wish to enforce strictness globally, then just add require 'pl.strict' +at the end of pl/init.lua, otherwise call it from your main program.

+ +

As from 1.1.0, this module provides a strict.module function which creates (or +modifies) modules so that accessing an unknown function or field causes an error.

+ +

For example,

+ + +

+-- mymod.lua
+local strict = require 'pl.strict'
+local M = strict.module (...)
+
+function M.answer ()
+    return 42
+end
+
+return M
+

+ +

If you were to accidently type mymod.Answer(), then you would get a runtime +error: "variable 'Answer' is not declared in 'mymod'".

+ +

This can be applied to existing modules. You may desire to have the same level +of checking for the Lua standard libraries:

+ + +

+strict.make_all_strict(_G)
+

+ +

Thereafter a typo such as math.cosine will give you an explicit error, rather +than merely returning a nil that will cause problems later.

+ +

What are function arguments in Penlight?

+ +

Many functions in Penlight themselves take function arguments, like map which +applies a function to a list, element by element. You can use existing +functions, like math.max, anonymous functions (like `function(x,y) return x > y +end ), or operations by name (e.g '*' or '..'). The module pl.operator` exports +all the standard Lua operations, like the Python module of the same name. +Penlight allows these to be referred to by name, so operator.gt can be more +concisely expressed as '>'.

+ +

Note that the map functions pass any extra arguments to the function, so we can +have ls:filter('>',0), which is a shortcut for +ls:filter(function(x) return x > 0 end).

+ +

Finally, pl.func supports placeholder expressions in the Boost lambda style, +so that an anonymous function to multiply the two arguments can be expressed as +\1*\2.

+ +

To use them directly, note that all function arguments in Penlight go through +utils.function_arg. pl.func registers itself with this function, so that you +can directly use placeholder expressions with standard methods:

+ + +

+> _1 = func._1
+> = List{10,20,30}:map(_1+1)
+{11,21,31}
+

+ +

Another option for short anonymous functions is provided by +utils.string_lambda; this is invoked automatically:

+ + +

+> = List{10,20,30}:map '|x| x + 1'
+{11,21,31}
+

+ +

Pros and Cons of Loopless Programming

+ +

The standard loops-and-ifs 'imperative' style of programming is dominant, and +often seems to be the 'natural' way of telling a machine what to do. It is in +fact very much how the machine does things, but we need to take a step back and +find ways of expressing solutions in a higher-level way. For instance, applying +a function to all elements of a list is a common operation:

+ + +

+local res = {}
+for i = 1,#ls do
+    res[i] = fun(ls[i])
+end
+

+ +

This can be efficiently and succintly expressed as ls:map(fun). Not only is +there less typing but the intention of the code is clearer. If readers of your +code spend too much time trying to guess your intention by analyzing your loops, +then you have failed to express yourself clearly. Similarly, ls:filter('>',0) +will give you all the values in a list greater than zero. (Of course, if you +don't feel like using List, or have non-list-like tables, then pl.tablex +offers the same facilities. In fact, the List methods are implemented using +tablex functions.)

+ +

A common observation is that loopless programming is less efficient, particularly +in the way it uses memory. ls1:map2('*',ls2):reduce '+' will give you the dot +product of two lists, but an unnecessary temporary list is created. But +efficiency is relative to the actual situation, it may turn out to be fast +enough, or may not appear in any crucial inner loops, etc.

+ +

Writing loops is 'error-prone and tedious', as Stroustrup says. But any +half-decent editor can be taught to do much of that typing for you. The question +should actually be: is it tedious to read loops? As with natural language, +programmers tend to read chunks at a time. A for-loop causes no surprise, and +probably little brain activity. One argument for loopless programming is the +loops that you do write stand out more, and signal 'something different +happening here'. It should not be an all-or-nothing thing, since most programs +require a mixture of idioms that suit the problem. Some languages (like APL) do +nearly everything with map and reduce operations on arrays, and so solutions can +sometimes seem forced. Wisdom is knowing when a particular idiom makes a +particular problem easy to solve and the solution easy to explain afterwards.

+ +

Generally useful functions.

+ +

The function printf discussed earlier is included in pl.utils because it +makes properly formatted output easier. (There is an equivalent fprintf which +also takes a file object parameter, just like the C function.)

+ +

Splitting a string using a delimiter is a fairly common operation, hence split.

+ +

Utility functions like is_type help with identifying what +kind of animal you are dealing with. +The Lua type function handles the basic types, but can't distinguish between +different kinds of objects, which are all tables. So is_type handles both +cases, like is_type(s,"string") and is_type(ls,List).

+ +

A common pattern when working with Lua varargs is capturing all the arguments in +a table:

+ + +

+function t(...)
+    local args = {...}
+    ...
+end
+

+ +

But this will bite you someday when nil is one of the arguments, since this +will put a 'hole' in your table. In particular, #ls will only give you the size +upto the nil value. Hence the need for table.pack - this is a new Lua 5.2 +function which Penlight defines also for Lua 5.1.

+ + +

+function t(...)
+    local args,n = table.pack(...)
+    for i = 1,n do
+      ...
+    end
+end
+

+ +

The 'memoize' pattern occurs when you have a function which is expensive to call, +but will always return the same value subsequently. utils.memoize is given a +function, and returns another function. This calls the function the first time, +saves the value for that argument, and thereafter for that argument returns the +saved value. This is a more flexible alternative to building a table of values +upfront, since in general you won't know what values are needed.

+ + +

+sum = utils.memoize(function(n)
+    local sum = 0
+    for i = 1,n do sum = sum + i end
+    return sum
+end)
+...
+s = sum(1e8) --takes time!
+...
+s = sum(1e8) --returned saved value!
+

+ +

Penlight is fully compatible with Lua 5.1, 5.2 and LuaJIT 2. To ensure this, +utils also defines the global Lua 5.2 +load function as utils.load

+ +

the input (either a string or a function)
the source name used in debug information
the mode is a string that can have either or both of 'b' or 't', depending on + whether the source is a binary chunk or text code (default is 'bt')
the environment for the compiled chunk

+ +

Using utils.load should reduce the need to call the deprecated function setfenv, +and make your Lua 5.1 code 5.2-friendly.

+ +

The utils module exports getfenv and setfenv for +Lua 5.2 as well, based on code by Sergey Rozhenko. Note that these functions can fail +for functions which don't access any globals.

+ +

Application Support

+ +

app.parse_args is a simple command-line argument parser. If called without any +arguments, it tries to use the global arg array. It returns the flags +(options begining with '-') as a table of name/value pairs, and the arguments +as an array. It knows about long GNU-style flag names, e.g. --value, and +groups of short flags are understood, so that -ab is short for -a -b. The +flags result would then look like {value=true,a=true,b=true}.

+ +

Flags may take values. The command-line --value=open -n10 would result in +{value='open',n='10'}; generally you can use '=' or ':' to separate the flag +from its value, except in the special case where a short flag is followed by an +integer. Or you may specify upfront that some flags have associated values, and +then the values will follow the flag.

+ + +

+> require 'pl'
+> flags,args = app.parse_args({'-o','fred','-n10','fred.txt'},{o=true})
+> pretty.dump(flags)
+{o='fred',n='10'}
+

+ +

parse_args is not intelligent or psychic; it will not convert any flag values +or arguments for you, or raise errors. For that, have a look at +Lapp.

+ +

An application which consists of several files usually cannot use require to +load files in the same directory as the main script. app.require_here() +ensures that the Lua module path is modified so that files found locally are +found first. In the examples directory, test-symbols.lua uses this function +to ensure that it can find symbols.lua even if it is not run from this directory.

+ +

app.appfile will create a filename that your application can use to store its +private data, based on the script name. For example, app.appfile "test.txt" +from a script called testapp.lua produces the following file on my Windows +machine:

+ +

C:\Documents and Settings\SJDonova\.testapp\test.txt
+

+ + +

and the equivalent on my Linux machine:

+ +

/home/sdonovan/.testapp/test.txt
+

+ + +

If .testapp does not exist, it will be created.

+ +

Penlight makes it convenient to save application data in Lua format. You can use +pretty.dump(t,file) to write a Lua table in a human-readable form to a file, +and pretty.read(file.read(file)) to generate the table again, using the +pretty module.

+ + +

Simplifying Object-Oriented Programming in Lua

+ +

Lua is similar to JavaScript in that the concept of class is not directly +supported by the language. In fact, Lua has a very general mechanism for +extending the behaviour of tables which makes it straightforward to implement +classes. A table's behaviour is controlled by its metatable. If that metatable +has a \\index function or table, this will handle looking up anything which is +not found in the original table. A class is just a table with an __index key +pointing to itself. Creating an object involves making a table and setting its +metatable to the class; then when handling obj.fun, Lua first looks up fun in +the table obj, and if not found it looks it up in the class. obj:fun(a) is +just short for obj.fun(obj,a). So with the metatable mechanism and this bit of +syntactic sugar, it is straightforward to implement classic object orientation.

+ + +

+-- animal.lua
+
+class = require 'pl.class'
+
+class.Animal()
+
+function Animal:_init(name)
+    self.name = name
+end
+
+function Animal:__tostring()
+  return self.name..': '..self:speak()
+end
+
+class.Dog(Animal)
+
+function Dog:speak()
+  return 'bark'
+end
+
+class.Cat(Animal)
+
+function Cat:_init(name,breed)
+    self:super(name)  -- must init base!
+    self.breed = breed
+end
+
+function Cat:speak()
+  return 'meow'
+end
+
+class.Lion(Cat)
+
+function Lion:speak()
+  return 'roar'
+end
+
+fido = Dog('Fido')
+felix = Cat('Felix','Tabby')
+leo = Lion('Leo','African')
+
+$ lua -i animal.lua
+> = fido,felix,leo
+Fido: bark      Felix: meow     Leo: roar
+> = leo:is_a(Animal)
+true
+> = leo:is_a(Dog)
+false
+> = leo:is_a(Cat)
+true
+

+ +

All Animal does is define \\tostring, which Lua will use whenever a string +representation is needed of the object. In turn, this relies on speak, which is +not defined. So it's what C++ people would call an abstract base class; the +specific derived classes like Dog define speak. Please note that if derived +classes have their own constructors, they must explicitly call the base +constructor for their base class; this is conveniently available as the super +method.

+ +

Note that (as always) there are multiple ways to implement OOP in Lua; this method +uses the classic 'a class is the __index of its objects' but does 'fat inheritance'; +methods from the base class are copied into the new class. The advantage of this is +that you are not penalized for long inheritance chains, for the price of larger classes, +but generally objects outnumber classes! (If not, something odd is going on with your design.)

+ +

All such objects will have a is_a method, which looks up the inheritance chain +to find a match. Another form is class_of, which can be safely called on all +objects, so instead of leo:is_a(Animal) one can say Animal:class_of(leo).

+ +

There are two ways to define a class, either class.Name() or Name = class(); +both work identically, except that the first form will always put the class in +the current environment (whether global or module); the second form provides more +flexibility about where to store the class. The first form does name the class +by setting the _name field, which can be useful in identifying the objects of +this type later. This session illustrates the usefulness of having named classes, +if no __tostring method is explicitly defined.

+ + +

+> class.Fred()
+> a = Fred()
+> = a
+Fred: 00459330
+> Alice = class()
+> b = Alice()
+> = b
+table: 00459AE8
+> Alice._name = 'Alice'
+> = b
+Alice: 00459AE8
+

+ +

So Alice = class(); Alice._name = 'Alice' is exactly the same as class.Alice().

+ +

This useful notation is borrowed from Hugo Etchegoyen's +classlib which further +extends this concept to allow for multiple inheritance. Notice that the +more convenient form puts the class name in the current environment! That is, +you may use it safely within modules using the old-fashioned module() +or the new _ENV mechanism.

+ +

There is always more than one way of doing things in Lua; some may prefer this +style for creating classes:

+ + +

+local class = require 'pl.class'
+
+class.Named {
+    _init = function(self,name)
+        self.name = name
+    end;
+
+    __tostring = function(self)
+        return 'boo '..self.name
+    end;
+}
+
+b = Named 'dog'
+print(b)
+--> boo dog
+

+ +

Note that you have to explicitly declare self and end each function definition +with a semi-colon or comma, since this is a Lua table. To inherit from a base class, +set the special field _base to the class in this table.

+ +

Penlight provides a number of useful classes; there is List, which is a Lua +clone of the standard Python list object, and Set which represents sets. There +are three kinds of map defined: Map, MultiMap (where a key may have +multiple values) and OrderedMap (where the order of insertion is remembered.). +There is nothing special about these classes and you may inherit from them.

+ +

A powerful thing about dynamic languages is that you can redefine existing classes +and functions, which is often called 'monkey patching' It's entertaining and convenient, +but ultimately anti-social; you may modify List but then any other modules using +this shared resource can no longer be sure about its behaviour. (This is why you +must say stringx.import() explicitly if you want the extended string methods - it +would be a bad default.) Lua is particularly open to modification but the +community is not as tolerant of monkey-patching as the Ruby community, say. You may +wish to add some new methods to List? Cool, but that's what subclassing is for.

+ + +

+class.Strings(List)
+
+function Strings:my_method()
+...
+end
+

+ +

It's definitely more useful to define exactly how your objects behave +in unknown conditions. All classes have a catch method you can use to set +a handler for unknown lookups; the function you pass looks exactly like the +__index metamethod.

+ + +

+Strings:catch(function(self,name)
+    return function() error("no such method "..name,2) end
+end)
+

+ +

In this case we're just customizing the error message, but +creative things can be done. Consider this code from test-vector.lua:

+ + +

+Strings:catch(List.default_map_with(string))
+
+ls = Strings{'one','two','three'}
+asserteq(ls:upper(),{'ONE','TWO','THREE'})
+asserteq(ls:sub(1,2),{'on','tw','th'})
+

+ +

So we've converted a unknown method invocation into a map using the function of +that name found in string. So for a Vector (which is a specialization of List +for numbers) it makes sense to make math the default map so that v:sin() makes +sense.

+ +

Note that map operations return a object of the same type - this is often called +covariance. So ls:upper() itself returns a Strings object.

+ +

This is not always what you want, but objects can always be cast to the desired type. +(cast doesn't create a new object, but returns the object passed.)

+ + +

+local sizes = ls:map '#'
+asserteq(sizes, {3,3,5})
+asserteq(utils.type(sizes),'Strings')
+asserteq(sizes:is_a(Strings),true)
+sizes = Vector:cast(sizes)
+asserteq(utils.type(sizes),'Vector')
+asserteq(sizes+1,{4,4,6})
+

+ +

About utils.type: it can only return a string for a class type if that class does +in fact have a _name field.

+ + +

Properties are a useful object-oriented pattern. We wish to control access to a +field, but don't wish to force the user of the class to say obj:get_field() +etc. This excerpt from tests/test-class.lua shows how it is done:

+ + + +

+local MyProps = class(class.properties)
+local setted_a, got_b
+
+function MyProps:_init ()
+    self._a = 1
+    self._b = 2
+end
+
+function MyProps:set_a (v)
+    setted_a = true
+    self._a = v
+end
+
+function MyProps:get_b ()
+    got_b = true
+    return self._b
+end
+
+local mp = MyProps()
+
+mp.a = 10
+
+asserteq(mp.a,10)
+asserteq(mp.b,2)
+asserteq(setted_a and got_b, true)
+

+ +

The convention is that the internal field name is prefixed with an underscore; +when reading mp.a, first a check for an explicit getter get_a and then only +look for _a. Simularly, writing mp.a causes the setter set_a to be used.

+ +

This is cool behaviour, but like much Lua metaprogramming, it is not free. Method +lookup on such objects goes through \\index as before, but now \\index is a +function which has to explicitly look up methods in the class, before doing any +property indexing, which is not going to be as fast as field lookup. If however, +your accessors actually do non-trivial things, then the extra overhead could be +worth it.

+ +

This is not really intended for access control because external code can write +to mp._a directly. It is possible to have this kind of control in Lua, but it +again comes with run-time costs.

+ + +

+generated by LDoc 1.4.6 +

+ + diff --git a/Data/Libraries/Penlight/docs/manual/02-arrays.md.html b/Data/Libraries/Penlight/docs/manual/02-arrays.md.html new file mode 100644 index 0000000..28dc6a2 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/02-arrays.md.html @@ -0,0 +1,914 @@ + + + + + Penlight Documentation + + + + +

+ +

+ + +

+ + + + + + +

+ + +

Tables and Arrays

+ +

Python-style Lists

+ +

One of the elegant things about Lua is that tables do the job of both lists and +dicts (as called in Python) or vectors and maps, (as called in C++), and they do +it efficiently. However, if we are dealing with 'tables with numerical indices' +we may as well call them lists and look for operations which particularly make +sense for lists. The Penlight List class was originally written by Nick Trout +for Lua 5.0, and translated to 5.1 and extended by myself. It seemed that +borrowing from Python was a good idea, and this eventually grew into Penlight.

+ +

Here is an example showing List in action; it redefines __tostring, so that +it can print itself out more sensibly:

+ + +

+> List = require 'pl.List'  --> automatic with require 'pl' <---
+> l = List()
+> l:append(10)
+> l:append(20)
+> = l
+{10,20}
+> l:extend {30,40}
+{10,20,30,40}
+> l:insert(1,5)
+{5,10,20,30,40}
+> = l:pop()
+40
+> = l
+{5,10,20,30}
+> = l:index(30)
+4
+> = l:contains(30)
+true
+> = l:reverse()  ---> note: doesn't make a copy!
+{30,20,10,5}
+

+ +

Although methods like sort and reverse operate in-place and change the list, +they do return the original list. This makes it possible to do method chaining, +like ls = ls:append(10):append(20):reverse():append(1). But (and this is an +important but) no extra copy is made, so ls does not change identity. List +objects (like tables) are mutable, unlike strings. If you want a copy of a +list, then List(ls) will do the job, i.e. it acts like a copy constructor. +However, if passed any other table, List will just set the metatable of the +table and not make a copy.

+ +

A particular feature of Python lists is slicing. This is fully supported in +this version of List, except we use 1-based indexing. So List.slice works +rather like string.sub:

+ + +

+> l = List {10,20,30,40}
+> = l:slice(1,1)  ---> note: creates a new list!
+{10}
+> = l:slice(2,2)
+{20}
+> = l:slice(2,3)
+{20,30}
+> = l:slice(2,-2)
+{20,30}
+> = l:slice_assign(2,2,{21,22,23})
+{10,21,22,23,30,40}
+> = l:chop(1,1)
+{21,22,23,30,40}
+

+ +

Functions like slice_assign and chop modify the list; the first is equivalent +to Pythonl[i1:i2] = seq and the second to del l[i1:i2].

+ +

List objects are ultimately just Lua 'list-like' tables, but they have extra +operations defined on them, such as equality and concatention. For regular +tables, equality is only true if the two tables are identical objects, whereas +two lists are equal if they have the same contents, i.e. that l1[i]==l2[i] for +all elements.

+ + +

+> l1 = List {1,2,3}
+> l2 = List {1,2,3}
+> = l1 == l2
+true
+> = l1..l2
+{1,2,3,1,2,3}
+

+ +

The List constructor can be passed a function. If so, it's assumed that this is +an iterator function that can be repeatedly called to generate a sequence. One +such function is io.lines; the following short, intense little script counts +the number of lines in standard input:

+ + +

+-- linecount.lua
+require 'pl'
+ls = List(io.lines())
+print(#ls)
+

+ +

List.iterate captures what List considers a sequence. In particular, it can +also iterate over all 'characters' in a string:

+ + +

+> for ch in List.iterate 'help' do io.write(ch,' ') end
+h e l p >
+

+ +

Since the function iterate is used internally by the List constructor, +strings can be made into lists of character strings very easily.

+ +

There are a number of operations that go beyond the standard Python methods. For +instance, you can partition a list into a table of sublists using a function. +In the simplest form, you use a predicate (a function returning a boolean value) +to partition the list into two lists, one of elements matching and another of +elements not matching. But you can use any function; if we use type then the +keys will be the standard Lua type names.

+ + +

+> ls = List{1,2,3,4}
+> ops = require 'pl.operator'
+> ls:partition(function(x) return x > 2 end)
+{false={1,2},true={3,4}}
+> ls = List{'one',math.sin,List{1},10,20,List{1,2}}
+> ls:partition(type)
+{function={function: 00369110},string={one},number={10,20},table={{1},{1,2}}}
+

+ +

This is one List method which returns a table which is not a List. Bear in +mind that you can always call a List method on a plain table argument, so +List.partition(t,type) works as expected. But these functions will only operate +on the array part of the table.

+ +

The 'nominal' type of the returned table is pl.Multimap, which describes a mapping +between keys and multiple values. This does not mean that pl.Multimap is automatically +loaded whenever you use partition (or List for that matter); this is one of the +standard metatables which are only filled out when the appropriate module is loaded. +This allows tables to be tagged appropriately without causing excessive coupling.

+ +

Stacks occur everywhere in computing. List supports stack-like operations; +there is already pop (remove and return last value) and append acts like +push (add a value to the end). push is provided as an alias for append, and +the other stack operation (size) is simply the size operator #. Queues can +also be implemented; you use pop to take values out of the queue, and put to +insert a value at the begining.

+ +

You may derive classes from List, and since the list-returning methods +are covariant, the result of slice etc will return lists of the derived type, +not List. For instance, consider the specialization of a List type that contains +numbers in tests/test-list.lua:

+ + +

+n1 = NA{10,20,30}
+n2 = NA{1,2,3}
+ns = n1 + 2*n2
+asserteq(ns,{12,24,36})
+min,max = ns:slice(1,2):minmax()
+asserteq(T(min,max),T(12,24))
+asserteq(n1:normalize():sum(),1,1e-8)
+

+ +

Map and Set classes

+ +

The Map class exposes what Python would call a 'dict' interface, and accesses +the hash part of the table. The name 'Map' is used to emphasize the interface, +not the implementation; it is an object which maps keys onto values; m['alice'] +or the equivalent m.alice is the access operation. This class also provides +explicit set and get methods, which are trivial for regular maps but get +interesting when Map is subclassed. The other operation is update, which +extends a map by copying the keys and values from another table, perhaps +overwriting existing keys:

+ + +

+> Map = require 'pl.Map'
+> m = Map{one=1,two=2}
+> m:update {three=3,four=4,two=20}
+> = m == M{one=1,two=20,three=3,four=4}
+true
+

+ +

The method values returns a list of the values, and keys returns a list of +the keys; there is no guarantee of order. getvalues is given a list of keys and +returns a list of values associated with these keys:

+ + +

+> m = Map{one=1,two=2,three=3}
+> = m:getvalues {'one','three'}
+{1,3}
+> = m:getvalues(m:keys()) == m:values()
+true
+

+ +

When querying the value of a Map, it is best to use the get method:

+ + +

+> print(m:get 'one', m:get 'two')
+1     2
+

+ +

The reason is that m[key] can be ambiguous; due to the current implementation, +m["get"] will always succeed, because if a value is not present in the map, it +will be looked up in the Map metatable, which contains a method get. There is +currently no simple solution to this annoying restriction.

+ +

There are some useful classes which inherit from Map. An OrderedMap behaves +like a Map but keeps its keys in order if you use its set method to add keys +and values. Like all the 'container' classes in Penlight, it defines an iter +method for iterating over its values; this will return the keys and values in the +order of insertion; the keys and values methods likewise.

+ +

A MultiMap allows multiple values to be associated with a given key. So set +(as before) takes a key and a value, but calling it with the same key and a +different value does not overwrite but adds a new value. get (or using []) +will return a list of values.

+ +

A Set can be seen as a special kind of Map, where all the values are true, +the keys are the values, and the order is not important. So in this case +Set.values is defined to return a list of the keys. Sets can display +themselves, and the basic operations like union (+) and intersection (*) +are defined.

+ + +

+> Set = require 'pl.Set'
+> = Set{'one','two'} == Set{'two','one'}
+true
+> fruit = Set{'apple','banana','orange'}
+> = fruit['banana']
+true
+> = fruit['hazelnut']
+nil
+> = fruit:values()
+{apple,orange,banana}
+> colours = Set{'red','orange','green','blue'}
+> = fruit,colours
+[apple,orange,banana]   [blue,green,orange,red]
+> = fruit+colours
+[blue,green,apple,red,orange,banana]
+> = fruit*colours
+[orange]
+

+ +

There are also the functions Set.difference and Set.symmetric_difference. The +first answers the question 'what fruits are not colours?' and the second 'what +are fruits and colours but not both?'

+ + +

+> = fruit - colours
+[apple,banana]
+> = fruit ^ colours
+[blue,green,apple,red,banana]
+

+ +

Adding elements to a set is simply fruit['peach'] = true and removing is +fruit['apple'] = nil . To make this simplicity work properly, the Set class has no +methods - either you use the operator forms or explicitly use Set.intersect +etc. In this way we avoid the ambiguity that plagues Map.

+ + +

(See pl.Map and pl.Set)

+ +

Useful Operations on Tables

+ + +

Some notes on terminology: Lua tables are usually list-like (like an array) or +map-like (like an associative array or dict); they can of course have a +list-like and a map-like part. Some of the table operations only make sense for +list-like tables, and some only for map-like tables. (The usual Lua terminology +is the array part and the hash part of the table, which reflects the actual +implementation used; it is more accurate to say that a Lua table is an +associative map which happens to be particularly efficient at acting like an +array.)

+ +

The functions provided in table provide all the basic manipulations on Lua +tables, but as we saw with the List class, it is useful to build higher-level +operations on top of those functions. For instance, to copy a table involves this +kind of loop:

+ + +

+local res = {}
+for k,v in pairs(T) do
+    res[k] = v
+end
+return res
+

+ +

The tablex module provides this as copy, which does a shallow copy of a +table. There is also deepcopy which goes further than a simple loop in two +ways; first, it also gives the copy the same metatable as the original (so it can +copy objects like List above) and any nested tables will also be copied, to +arbitrary depth. There is also icopy which operates on list-like tables, where +you can set optionally set the start index of the source and destination as well. +It ensures that any left-over elements will be deleted:

+ + +

+asserteq(icopy({1,2,3,4,5,6},{20,30}),{20,30})   -- start at 1
+asserteq(icopy({1,2,3,4,5,6},{20,30},2),{1,20,30}) -- start at 2
+asserteq(icopy({1,2,3,4,5,6},{20,30},2,2),{1,30}) -- start at 2, copy from 2
+

+ +

(This code from the tablex test module shows the use of pl.test.asserteq)

+ +

Whereas, move overwrites but does not delete the rest of the destination:

+ + +

+asserteq(move({1,2,3,4,5,6},{20,30}),{20,30,3,4,5,6})
+asserteq(move({1,2,3,4,5,6},{20,30},2),{1,20,30,4,5,6})
+asserteq(move({1,2,3,4,5,6},{20,30},2,2),{1,30,3,4,5,6})
+

+ +

(The difference is somewhat like that between C's strcpy and memmove.)

+ +

To summarize, use copy or deepcopy to make a copy of an arbitrary table. To +copy into a map-like table, use update; to copy into a list-like table use +icopy, and move if you are updating a range in the destination.

+ +

To complete this set of operations, there is insertvalues which works like +table.insert except that one provides a table of values to be inserted, and +removevalues which removes a range of values.

+ + +

+asserteq(insertvalues({1,2,3,4},2,{20,30}),{1,20,30,2,3,4})
+asserteq(insertvalues({1,2},{3,4}),{1,2,3,4})
+

+ +

Another example:

+ + +

+> T = require 'pl.tablex'
+> t = {10,20,30,40}
+> = T.removevalues(t,2,3)
+{10,40}
+> = T.insertvalues(t,2,{20,30})
+{10,20,30,40}
+

+ +

In a similar spirit to deepcopy, deepcompare will take two tables and return +true only if they have exactly the same values and structure.

+ + +

+> t1 = {1,{2,3},4}
+> t2 = deepcopy(t1)
+> = t1 == t2
+false
+> = deepcompare(t1,t2)
+true
+

+ +

find will return the index of a given value in a list-like table. Note that +like string.find you can specify an index to start searching, so that all +instances can be found. There is an optional fourth argument, which makes the +search start at the end and go backwards, so we could define rfind like so:

+ + +

+function rfind(t,val,istart)
+    return tablex.find(t,val,istart,true)
+end
+

+ +

find does a linear search, so it can slow down code that depends on it. If +efficiency is required for large tables, consider using an index map. +index_map will return a table where the keys are the original values of the +list, and the associated values are the indices. (It is almost exactly the +representation needed for a set.)

+ + +

+> t = {'one','two','three'}
+> = tablex.find(t,'two')
+2
+> = tablex.find(t,'four')
+nil
+> il = tablex.index_map(t)
+> = il['two']
+2
+> = il.two
+2
+

+ +

A version of index_map called makeset is also provided, where the values are +just true. This is useful because two such sets can be compared for equality +using deepcompare:

+ + +

+> = deepcompare(makeset {1,2,3},makeset {2,1,3})
+true
+

+ +

Consider the problem of determining the new employees that have joined in a +period. Assume we have two files of employee names:

+ + +

+(last-month.txt)
+smith,john
+brady,maureen
+mongale,thabo
+
+(this-month.txt)
+smith,john
+smit,johan
+brady,maureen
+mogale,thabo
+van der Merwe,Piet
+

+ +

To find out differences, just make the employee lists into sets, like so:

+ + +

+require 'pl'
+
+function read_employees(file)
+  local ls = List(io.lines(file)) -- a list of employees
+  return tablex.makeset(ls)
+end
+
+last = read_employees 'last-month.txt'
+this = read_employees 'this-month.txt'
+
+-- who is in this but not in last?
+diff = tablex.difference(this,last)
+
+-- in a set, the keys are the values...
+for e in pairs(diff) do print(e) end
+
+--  *output*
+-- van der Merwe,Piet
+-- smit,johan
+

+ +

The difference operation is easy to write and read:

+ + +

+for e in pairs(this) do
+  if not last[e] then
+    print(e)
+  end
+end
+

+ +

Using difference here is not that it is a tricky thing to code, it is that you +are stating your intentions clearly to other readers of your code. (And naturally +to your future self, in six months time.)

+ +

find_if will search a table using a function. The optional third argument is a +value which will be passed as a second argument to the function. pl.operator +provides the Lua operators conveniently wrapped as functions, so the basic +comparison functions are available:

+ + +

+> ops = require 'pl.operator'
+> = tablex.find_if({10,20,30,40},ops.gt,20)
+3       true
+

+ +

Note that find_if will also return the actual value returned by the function, +which of course is usually just true for a boolean function, but any value +which is not nil and not false can be usefully passed back.

+ +

deepcompare does a thorough recursive comparison, but otherwise using the +default equality operator. compare allows you to specify exactly what function +to use when comparing two list-like tables, and compare_no_order is true if +they contain exactly the same elements. Do note that the latter does not need an +explicit comparison function - in this case the implementation is actually to +compare the two sets, as above:

+ + +

+> = compare_no_order({1,2,3},{2,1,3})
+true
+> = compare_no_order({1,2,3},{2,1,3},'==')
+true
+

+ +

(Note the special string '==' above; instead of saying ops.gt or ops.eq we +can use the strings '>' or '==' respectively.)

+ +

sort and sortv return iterators that will iterate through the +sorted elements of a table. sort iterates by sorted key order, and +sortv iterates by sorted value order. For example, given a table +with names and ages, it is trivial to iterate over the elements:

+ + +

+> t = {john=27,jane=31,mary=24}
+> for name,age in tablex.sort(t) do print(name,age) end
+jane    31
+john    27
+mary    24
+> for name,age in tablex.sortv(t) do print(name,age) end
+mary    24
+john    27
+jane    31
+

+ +

There are several ways to merge tables in PL. If they are list-like, then see the +operations defined by pl.List, like concatenation. If they are map-like, then +merge provides two basic operations. If the third arg is false, then the result +only contains the keys that are in common between the two tables, and if true, +then the result contains all the keys of both tables. These are in fact +generalized set union and intersection operations:

+ + +

+> S1 = {john=27,jane=31,mary=24}
+> S2 = {jane=31,jones=50}
+> = tablex.merge(S1, S2, false)
+{jane=31}
+> = tablex.merge(S1, S2, true)
+{mary=24,jane=31,john=27,jones=50}
+

+ +

When working with tables, you will often find yourself writing loops like in the +first example. Loops are second nature to programmers, but they are often not the +most elegant and self-describing way of expressing an operation. Consider the +map function, which creates a new table by applying a function to each element +of the original:

+ + +

+> = map(math.sin, {1,2,3,4})
+{  0.84,  0.91,  0.14, -0.76}
+> = map(function(x) return x*x end, {1,2,3,4})
+{1,4,9,16}
+

+ +

map saves you from writing a loop, and the resulting code is often clearer, as +well as being shorter. This is not to say that 'loops are bad' (although you will +hear that from some extremists), just that it's good to capture standard +patterns. Then the loops you do write will stand out and acquire more significance.

+ +

pairmap is interesting, because the function works with both the key and the +value.

+ + +

+> t = {fred=10,bonzo=20,alice=4}
+> = pairmap(function(k,v) return v end, t)
+{4,10,20}
+> = pairmap(function(k,v) return k end, t)
+{'alice','fred','bonzo'}
+

+ +

(These are common enough operations that the first is defined as values and the +second as keys.) If the function returns two values, then the second value is +considered to be the new key:

+ + +

+> = pairmap(t,function(k,v) return v+10, k:upper() end)
+{BONZO=30,FRED=20,ALICE=14}
+

+ +

map2 applies a function to two tables:

+ + +

+> map2(ops.add,{1,2},{10,20})
+{11,22}
+> map2('*',{1,2},{10,20})
+{10,40}
+

+ +

The various map operations generate tables; reduce applies a function of two +arguments over a table and returns the result as a scalar:

+ + +

+> reduce ('+', {1,2,3})
+6
+> reduce ('..', {'one','two','three'})
+'onetwothree'
+

+ +

Finally, zip sews different tables together:

+ + +

+> = zip({1,2,3},{10,20,30})
+{{1,10},{2,20},{3,30}}
+

+ +

Browsing through the documentation, you will find that tablex and List share +methods. For instance, tablex.imap and List.map are basically the same +function; they both operate over the array-part of the table and generate another +table. This can also be expressed as a list comprehension C 'f(x) for x' (t) +which makes the operation more explicit. So why are there different ways to do +the same thing? The main reason is that not all tables are Lists: the expression +ls:map('#') will return a list of the lengths of any elements of ls. A list +is a thin wrapper around a table, provided by the metatable List. Sometimes you +may wish to work with ordinary Lua tables; the List interface is not a +compulsory way to use Penlight table operations.

+ +

Operations on two-dimensional tables

+ + +

Two-dimensional tables are of course easy to represent in Lua, for instance +{{1,2},{3,4}} where we store rows as subtables and index like so A[col][row]. +This is the common representation used by matrix libraries like +LuaMatrix. pl.array2d does not provide +matrix operations, since that is the job for a specialized library, but rather +provides generalizations of the higher-level operations provided by pl.tablex +for one-dimensional arrays.

+ +

iter is a useful generalization of ipairs. (The extra parameter determines +whether you want the indices as well.)

+ + +

+> a = {{1,2},{3,4}}
+> for i,j,v in array2d.iter(a,true) do print(i,j,v) end
+1       1       1
+1       2       2
+2       1       3
+2       2       4
+

+ +

Note that you can always convert an arbitrary 2D array into a 'list of lists' +with List(tablex.map(List,a))

+ +

map will apply a function over all elements (notice that extra arguments can be +provided, so this operation is in effect function(x) return x-1 end)

+ + +

+> array2d.map('-',a,1)
+{{0,1},{2,3}}
+

+ +

2D arrays are stored as an array of rows, but columns can be extracted:

+ + +

+> array2d.column(a,1)
+{1,3}
+

+ +

There are three equivalents to tablex.reduce. You can either reduce along the +rows (which is the most efficient) or reduce along the columns. Either one will +give you a 1D array. And reduce2 will apply two operations: the first one +reduces the rows, and the second reduces the result.

+ + +

+> array2d.reduce_rows('+',a)
+{3,7}
+> array2d.reduce_cols('+',a)
+{4,6}
+> -- same as tablex.reduce('*',array.reduce_rows('+',a))
+> array2d.reduce2('*','+',a)
+21    `
+

+ +

tablex.map2 applies an operation to two tables, giving another table. +array2d.map2 does this for 2D arrays. Note that you have to provide the rank +of the arrays involved, since it's hard to always correctly deduce this from the +data:

+ + +

+> b = {{10,20},{30,40}}
+> a = {{1,2},{3,4}}
+> = array2d.map2('+',2,2,a,b)  -- two 2D arrays
+{{11,22},{33,44}}
+> = array2d.map2('+',1,2,{10,100},a)  -- 1D, 2D
+{{11,102},{13,104}}
+> = array2d.map2('*',2,1,a,{1,-1})  -- 2D, 1D
+{{1,-2},{3,-4}}
+

+ +

Of course, you are not limited to simple arithmetic. Say we have a 2D array of +strings, and wish to print it out with proper right justification. The first step +is to create all the string lengths by mapping string.len over the array, the +second is to reduce this along the columns using math.max to get maximum column +widths, and last, apply stringx.rjust with these widths.

+ + +

+maxlens = reduce_cols(math.max,map('#',lines))
+lines = map2(stringx.rjust,2,1,lines,maxlens)
+

+ +

There is product which returns the Cartesian product of two 1D arrays. The +result is a 2D array formed from applying the function to all possible pairs from +the two arrays.

+ + +

+> array2d.product('{}',{1,2},{'a','b'})
+{{{1,'b'},{2,'a'}},{{1,'a'},{2,'b'}}}
+

+ +

There is a set of operations which work in-place on 2D arrays. You can +swap_rows and swap_cols; the first really is a simple one-liner, but the idea +here is to give the operation a name. remove_row and remove_col are +generalizations of table.remove. Likewise, extract_rows and extract_cols +are given arrays of indices and discard anything else. So, for instance, +extract_cols(A,{2,4}) will leave just columns 2 and 4 in the array.

+ +

List.slice is often useful on 1D arrays; slice does the same thing, but is +generally given a start (row,column) and a end (row,column).

+ + +

+> A = {{1,2,3},{4,5,6},{7,8,9}}
+> B = slice(A,1,1,2,2)
+> write(B)
+ 1 2
+ 4 5
+> B = slice(A,2,2)
+> write(B,nil,'%4.1f')
+ 5.0 6.0
+ 8.0 9.0
+

+ +

Here write is used to print out an array nicely; the second parameter is nil, +which is the default (stdout) but can be any file object and the third parameter +is an optional format (as used in string.format).

+ +

parse_range will take a spreadsheet range like 'A1:B2' or 'R1C1:R2C2' and +return the range as four numbers, which can be passed to slice. The rule is +that slice will return an array of the appropriate shape depending on the +range; if a range represents a row or a column, the result is 1D, otherwise 2D.

+ +

This applies to iter as well, which can also optionally be given a range:

+ + + +

+> for i,j,v in iter(A,true,2,2) do print(i,j,v) end
+2       2       5
+2       3       6
+3       2       8
+3       3       9
+

+ +

new will construct a new 2D array with the given dimensions. You provide an +initial value for the elements, which is interpreted as a function if it's +callable. With L being utils.string_lambda we then have the following way to +make an identity matrix:

+ + +

+asserteq(
+    array.new(3,3,L'|i,j| i==j and 1 or 0'),
+    {{1,0,0},{0,1,0},{0,0,1}}
+)
+

+ +

Please note that most functions in array2d are covariant, that is, they +return an array of the same type as they receive. In particular, any objects +created with data.new or matrix.new will remain data or matrix objects when +reshaped or sliced, etc. Data objects have the array2d functions available as +methods.

+ + + + +

+generated by LDoc 1.4.6 +

+ + diff --git a/Data/Libraries/Penlight/docs/manual/03-strings.md.html b/Data/Libraries/Penlight/docs/manual/03-strings.md.html new file mode 100644 index 0000000..a629192 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/03-strings.md.html @@ -0,0 +1,397 @@ + + + + + Penlight Documentation + + + + +

+ +

+ + +

+ + + + + + +

+ + +

Strings. Higher-level operations on strings.

+ +

Extra String Methods

+ + +

These are convenient borrowings from Python, as described in 3.6.1 of the Python +reference, but note that indices in Lua always begin at one. stringx defines +functions like isalpha and isdigit, which return true if s is only composed +of letters or digits respectively. startswith and endswith are convenient +ways to find substrings. (endswith works as in Python 2.5, so that `f:endswith +{'.bat','.exe','.cmd'}` will be true for any filename which ends with these +extensions.) There are justify methods and whitespace trimming functions like +strip.

+ + +

+> stringx.import()
+> ('bonzo.dog'):endswith {'.dog','.cat'}
+true
+> ('bonzo.txt'):endswith {'.dog','.cat'}
+false
+> ('bonzo.cat'):endswith {'.dog','.cat'}
+true
+> (' stuff'):ljust(20,'+')
+'++++++++++++++ stuff'
+> ('  stuff '):lstrip()
+'stuff '
+> ('  stuff '):rstrip()
+'  stuff'
+> ('  stuff '):strip()
+'stuff'
+> for s in ('one\ntwo\nthree\n'):lines() do print(s) end
+one
+two
+three
+

+ +

Most of these can be fairly easily implemented using the Lua string library, +which is more general and powerful. But they are convenient operations to have +easily at hand. Note that can be injected into the string table if you use +stringx.import, but a simple alias like local stringx = require 'pl.stringx' +is preferrable. This is the recommended practice when writing modules for +consumption by other people, since it is bad manners to change the global state +of the rest of the system. Magic may be used for convenience, but there is always +a price.

+ + +

String Templates

+ + +

Another borrowing from Python, string templates allow you to substitute values +looked up in a table:

+ + +

+local Template = require ('pl.text').Template
+t = Template('${here} is the $answer')
+print(t:substitute {here = 'Lua', answer = 'best'})
+==>
+Lua is the best
+

+ +

'$ variables' can optionally have curly braces; this form is useful if you are +glueing text together to make variables, e.g ${prefix}_name_${postfix}. The +substitute method will throw an error if a $ variable is not found in the +table, and the safe_substitute method will not.

+ +

The Lua implementation has an extra method, indent_substitute which is very +useful for inserting blocks of text, because it adjusts indentation. Consider +this example:

+ + +

+-- testtemplate.lua
+local Template = require ('pl.text').Template
+
+t = Template [[
+    for i = 1,#$t do
+        $body
+    end
+]]
+
+body = Template [[
+local row = $t[i]
+for j = 1,#row do
+    fun(row[j])
+end
+]]
+
+print(t:indent_substitute {body=body,t='tbl'})
+

+ +

And the output is:

+ + +

+for i = 1,#tbl do
+    local row = tbl[i]
+    for j = 1,#row do
+        fun(row[j])
+    end
+end
+

+ +

indent_substitute can substitute templates, and in which case they themselves +will be substituted using the given table. So in this case, $t was substituted +twice.

+ +

pl.text also has a number of useful functions like dedent, which strips all +the initial indentation from a multiline string. As in Python, this is useful for +preprocessing multiline strings if you like indenting them with your code. The +function wrap is passed a long string (a paragraph) and returns a list of +lines that fit into a desired line width. As an extension, there is also indent +for indenting multiline strings.

+ +

New in Penlight with the 0.9 series is text.format_operator. Calling this +enables Python-style string formating using the modulo operator %:

+ + +

+> text.format_operator()
+> = '%s[%d]' % {'dog',1}
+dog[1]
+

+ +

So in its simplest form it saves the typing involved with string.format; it +will also expand $ variables using named fields:

+ + +

+> = '$animal[$num]' % {animal='dog',num=1}
+dog[1]
+

+ +

As with stringx.import you have to do this explicitly, since all strings share the same +metatable. But in your own scripts you can feel free to do this.

+ +

Another Style of Template

+ +

A new module is template, which is a version of Rici Lake's Lua +Preprocessor. This +allows you to mix Lua code with your templates in a straightforward way. There +are only two rules:

+ +

Lines begining with # are Lua
Otherwise, anything inside $() is a Lua expression.

+ +

So a template generating an HTML list would look like this:

+ + +

+<ul>
+# for i,val in ipairs(T) do
+<li>$(i) = $(val:upper())</li>
+# end
+</ul>
+

+ +

Assume the text is inside tmpl, then the template can be expanded using:

+ + +

+local template = require 'pl.template'
+local my_env = {
+  ipairs = ipairs,
+  T = {'one','two','three'}
+}
+res = template.substitute(tmpl, my_env)
+

+ +

and we get

+ + +

+<ul>
+<li>1 = ONE</li>
+<li>2 = TWO</li>
+<li>3 = THREE</li>
+</ul>
+

+ +

There is a single function, template.substitute which is passed a template +string and an environment table. This table may contain some special fields, +like \_parent which can be set to a table representing a 'fallback' environment +in case a symbol was not found. \_brackets is usually '()' and \_escape is +usually '#' but it's sometimes necessary to redefine these if the defaults +interfere with the target language - for instance, $(V) has another meaning in +Make, and # means a preprocessor line in C/C++.

+ +

Finally, if something goes wrong, passing _debug will cause the intermediate +Lua code to be dumped if there's a problem.

+ +

Here is a C code generation example; something that could easily be extended to +be a minimal Lua extension skeleton generator.

+ + +

+local subst = require 'pl.template'.substitute
+
+local templ = [[
+#include <lua.h>
+#include <lauxlib.h>
+#include <lualib.h>
+
+> for _,f in ipairs(mod) do
+static int l_$(f.name) (lua_State *L) {
+
+}
+> end
+
+static const luaL_reg $(mod.name)[] = {
+> for _,f in ipairs(mod) do
+    {"$(f.name)",l_$(f.name)},
+> end
+    {NULL,NULL}
+};
+
+int luaopen_$(mod.name) {
+   luaL_register (L, "$(mod.name)", $(mod.name));
+    return 1;
+}
+]]
+
+print(subst(templ,{
+    _escape = '>',
+    ipairs = ipairs,
+    mod = {
+        name = 'baggins';
+        {name='frodo'},
+        {name='bilbo'}
+    }
+}))
+

+ +

File-style I/O on Strings

+ +

pl.stringio provides just three functions; stringio.open is passed a string, +and returns a file-like object for reading. It supports a read method, which +takes the same arguments as standard file objects:

+ + +

+> f = stringio.open 'first line\n10 20 30\n'
+> = f:read()
+first line
+> = f:read('*n','*n','*n')
+10    20    30
+

+ +

lines and seek are also supported.

+ +

stringio.lines is a useful short-cut for iterating over all the lines in a +string.

+ +

stringio.create creates a writeable file-like object. You then use write to +this stream, and finally extract the builded string using value. This 'string +builder' pattern is useful for efficiently creating large strings.

+ + + +

+generated by LDoc 1.4.6 +

+ + diff --git a/Data/Libraries/Penlight/docs/manual/04-paths.md.html b/Data/Libraries/Penlight/docs/manual/04-paths.md.html new file mode 100644 index 0000000..070a3ea --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/04-paths.md.html @@ -0,0 +1,329 @@ + + + + + Penlight Documentation + + + + +

+ +

+ + +

+ + + + + + +

+ + +

Paths and Directories

+ +

Working with Paths

+ +

Programs should not depend on quirks of your operating system. They will be +harder to read, and need to be ported for other systems. The worst of course is +hardcoding paths like 'c:\' in programs, and wondering why Vista complains so +much. But even something like dir..'\'..file is a problem, since Unix can't +understand backslashes in this way. dir..'/'..file is usually portable, but +it's best to put this all into a simple function, path.join. If you +consistently use path.join, then it's much easier to write cross-platform code, +since it handles the directory separator for you.

+ +

pl.path provides the same functionality as Python's os.path module (11.1).

+ + +

+> p = 'c:\\bonzo\\DOG.txt'
+> = path.normcase (p)  ---> only makes sense on Windows
+c:\bonzo\dog.txt
+> = path.splitext (p)
+c:\bonzo\DOG    .txt
+> = path.extension (p)
+.txt
+> = path.basename (p)
+DOG.txt
+> = path.exists(p)
+false
+> = path.join ('fred','alice.txt')
+fred\alice.txt
+> = path.exists 'pretty.lua'
+true
+> = path.getsize 'pretty.lua'
+2125
+> = path.isfile 'pretty.lua'
+true
+> = path.isdir 'pretty.lua'
+false
+

+ +

It is very important for all programmers, not just on Unix, to only write to +where they are allowed to write. path.expanduser will expand '~' (tilde) into +the home directory. Depending on your OS, this will be a guaranteed place where +you can create files:

+ + +

+> = path.expanduser '~/mydata.txt'
+'C:\Documents and Settings\SJDonova/mydata.txt'
+
+> = path.expanduser '~/mydata.txt'
+/home/sdonovan/mydata.txt
+

+ +

Under Windows, os.tmpname returns a path which leads to your drive root full of +temporary files. (And increasingly, you do not have access to this root folder.) +This is corrected by path.tmpname, which uses the environment variable TMP:

+ + +

+> os.tmpname()  -- not a good place to put temporary files!
+'\s25g.'
+> path.tmpname()
+'C:\DOCUME~1\SJDonova\LOCALS~1\Temp\s25g.1'
+

+ +

A useful extra function is pl.path.package_path, which will tell you the path +of a particular Lua module. So on my system, package_path('pl.path') returns +'C:\Program Files\Lua\5.1\lualibs\pl\path.lua', and package_path('ifs') returns +'C:\Program Files\Lua\5.1\clibs\lfs.dll'. It is implemented in terms of +package.searchpath, which is a new function in Lua 5.2 which has been +implemented for Lua 5.1 in Penlight.

+ +

File Operations

+ +

pl.file is a new module that provides more sensible names for common file +operations. For instance, file.read and file.write are aliases for +utils.readfile and utils.writefile.

+ +

Smaller files can be efficiently read and written in one operation. file.read +is passed a filename and returns the contents as a string, if successful; if not, +then it returns nil and the actual error message. There is an optional boolean +parameter if you want the file to be read in binary mode (this makes no +difference on Unix but remains important with Windows.)

+ +

In previous versions of Penlight, utils.readfile would read standard input if +the file was not specified, but this can lead to nasty bugs; use io.read '*a' +to grab all of standard input.

+ +

Similarly, file.write takes a filename and a string which will be written to +that file.

+ +

For example, this little script converts a file into upper case:

+ + +

+require 'pl'
+assert(#arg == 2, 'supply two filenames')
+text = assert(file.read(arg[1]))
+assert(file.write(arg[2],text:upper()))
+

+ +

Copying files is suprisingly tricky. file.copy and file.move attempt to use +the best implementation possible. On Windows, they link to the API functions +CopyFile and MoveFile, but only if the alien package is installed (this is +true for Lua for Windows.) Otherwise, the system copy command is used. This can +be ugly when writing Windows GUI applications, because of the dreaded flashing +black-box problem with launching processes.

+ +

Directory Operations

+ +

pl.dir provides some useful functions for working with directories. fnmatch +will match a filename against a shell pattern, and filter will return any files +in the supplied list which match the given pattern, which correspond to the +functions in the Python fnmatch module. getdirectories will return all +directories contained in a directory, and getfiles will return all files in a +directory which match a shell pattern. These functions return the files as a +table, unlike lfs.dir which returns an iterator.)

+ +

dir.makepath can create a full path, creating subdirectories as necessary; +rmtree is the Nuclear Option of file deleting functions, since it will +recursively clear out and delete all directories found begining at a path (there +is a similar function with this name in the Python shutils module.)

+ + +

+> = dir.makepath 't\\temp\\bonzo'
+> = path.isdir 't\\temp\\bonzo'
+true
+> = dir.rmtree 't'
+

+ +

dir.rmtree depends on dir.walk, which is a powerful tool for scanning a whole +directory tree. Here is the implementation of dir.rmtree:

+ + +

+--- remove a whole directory tree.
+-- @param path A directory path
+function dir.rmtree(fullpath)
+    for root,dirs,files in dir.walk(fullpath) do
+        for i,f in ipairs(files) do
+            os.remove(path.join(root,f))
+        end
+        lfs.rmdir(root)
+    end
+end
+

+ +

dir.clonetree clones directory trees. The first argument is a path that must +exist, and the second path is the path to be cloned. (Note that this path cannot +be inside the first path, since this leads to madness.) By default, it will +then just recreate the directory structure. You can in addition provide a +function, which will be applied for all files found.

+ + +

+-- make a copy of my libs folder
+require 'pl'
+p1 = [[d:\dev\lua\libs]]
+p2 = [[D:\dev\lua\libs\..\tests]]
+dir.clonetree(p1,p2,dir.copyfile)
+

+ +

A more sophisticated version, which only copies files which have been modified:

+ + +

+-- p1 and p2 as before, or from arg[1] and arg[2]
+dir.clonetree(p1,p2,function(f1,f2)
+  local res
+  local t1,t2 = path.getmtime(f1),path.getmtime(f2)
+  -- f2 might not exist, so be careful about t2
+  if not t2 or t1 > t2 then
+    res = dir.copyfile(f1,f2)
+  end
+  return res -- indicates successful operation
+end)
+

+ +

dir.clonetree uses path.common_prefix. With p1 and p2 defined above, the +common path is 'd:\dev\lua'. So 'd:\dev\lua\libs\testfunc.lua' is copied to +'d:\dev\lua\test\testfunc.lua', etc.

+ +

If you need to find the common path of list of files, then tablex.reduce will +do the job:

+ + +

+> p3 = [[d:\dev]]
+> = tablex.reduce(path.common_prefix,{p1,p2,p3})
+'d:\dev'
+

+ + + +

+generated by LDoc 1.4.6 +

+ + diff --git a/Data/Libraries/Penlight/docs/manual/05-dates.md.html b/Data/Libraries/Penlight/docs/manual/05-dates.md.html new file mode 100644 index 0000000..c04b036 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/05-dates.md.html @@ -0,0 +1,269 @@ + + + + + Penlight Documentation + + + + +

+ +

+ + +

+ + + + + + +

+ + +

Date and Time

+ +

NOTE: the Date module is deprecated

+ +

Creating and Displaying Dates

+ +

The Date class provides a simplified way to work with date and +time in Lua; it leans heavily on the functions +os.date and os.time.

+ +

A Date object can be constructed from a table, just like with os.time. +Methods are provided to get and set the various parts of the date.

+ + +

+> d = Date {year = 2011, month = 3, day = 2 }
+> = d
+2011-03-02 12:00:00
+> = d:month(),d:year(),d:day()
+3    2011    2
+> d:month(4)
+> = d
+2011-04-02 12:00:00
+> d:add {day=1}
+> = d
+2011-04-03 12:00:00
+

+ +

add takes a table containing one of the date table fields.

+ + +

+> = d:weekday_name()
+Sun
+> = d:last_day()
+2011-04-30 12:00:00
+> = d:month_name(true)
+April
+

+ +

There is a default conversion to text for date objects, but Date.Format gives +you full control of the format for both parsing and displaying dates:

+ + +

+> iso = Date.Format 'yyyy-mm-dd'
+> d = iso:parse '2010-04-10'
+> amer = Date.Format 'mm/dd/yyyy'
+> = amer:tostring(d)
+04/10/2010
+

+ +

With the 0.9.7 relase, the Date constructor has become more flexible. You may +omit any of the 'year', 'month' or 'day' fields:

+ + +

+> = Date { year = 2008 }
+2008-01-01 12:00:00
+> = Date { month = 3 }
+2011-03-01 12:00:00
+> = Date { day = 20 }
+2011-10-20 12:00:00
+> = Date { hour = 14, min = 30 }
+2011-10-13 14:30:00
+

+ +

If 'year' is omitted, then the current year is assumed, and likewise for 'month'.

+ +

To set the time on such a partial date, you can use the fact that the 'setter' +methods return the date object and so you can 'chain' these methods.

+ + +

+> d = Date { day = 03 }
+> = d:hour(18):min(30)
+2011-10-03 18:30:00
+

+ +

Finally, Date also now accepts positional arguments:

+ + +

+> = Date(2011,10,3)
+2011-10-03 12:00:00
+> = Date(2011,10,3,18,30,23)
+2011-10-03 18:30:23
+

+ +

Date.format has been extended. If you construct an instance without a pattern, +then it will try to match against a set of known formats. This is useful for +human-input dates since keeping to a strict format is not one of the strong +points of users. It assumes that there will be a date, and then a date.

+ + +

+> df = Date.Format()
+> = df:parse '5.30pm'
+2011-10-13 17:30:00
+> = df:parse '1730'
+nil     day out of range: 1730 is not between 1 and 31
+> = df:parse '17.30'
+2011-10-13 17:30:00
+> = df:parse 'mar'
+2011-03-01 12:00:00
+> = df:parse '3 March'
+2011-03-03 12:00:00
+> = df:parse '15 March'
+2011-03-15 12:00:00
+> = df:parse '15 March 2008'
+2008-03-15 12:00:00
+> = df:parse '15 March 2008 1.30pm'
+2008-03-15 13:30:00
+> = df:parse '2008-10-03 15:30:23'
+2008-10-03 15:30:23
+

+ +

ISO date format is of course a good idea if you need to deal with users from +different countries. Here is the default behaviour for 'short' dates:

+ + +

+> = df:parse '24/02/12'
+2012-02-24 12:00:00
+

+ +

That's not what Americans expect! It's tricky to work out in a cross-platform way +exactly what the expected format is, so there is an explicit flag:

+ + +

+> df:US_order(true)
+> = df:parse '9/11/01'
+2001-11-09 12:00:00
+

+ + + +

+generated by LDoc 1.4.6 +

+ + diff --git a/Data/Libraries/Penlight/docs/manual/06-data.md.html b/Data/Libraries/Penlight/docs/manual/06-data.md.html new file mode 100644 index 0000000..585e23e --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/06-data.md.html @@ -0,0 +1,1633 @@ + + + + + Penlight Documentation + + + + +

+ +

+ + +

+ + + + + + +

+ + +

Data

+ +

Reading Data Files

+ +

The first thing to consider is this: do you actually need to write a custom file +reader? And if the answer is yes, the next question is: can you write the reader +in as clear a way as possible? Correctness, Robustness, and Speed; pick the first +two and the third can be sorted out later, if necessary.

+ +

A common sort of data file is the configuration file format commonly used on Unix +systems. This format is often called a property file in the Java world.

+ + +

+# Read timeout in seconds
+read.timeout=10
+
+# Write timeout in seconds
+write.timeout=10
+

+ +

Here is a simple Lua implementation:

+ + +

+-- property file parsing with Lua string patterns
+props = []
+for line in io.lines() do
+    if line:find('#',1,true) ~= 1 and not line:find('^%s*$') then
+        local var,value = line:match('([^=]+)=(.*)')
+        props[var] = value
+    end
+end
+

+ +

Very compact, but it suffers from a similar disease in equivalent Perl programs; +it uses odd string patterns which are 'lexically noisy'. Noisy code like this +slows the casual reader down. (For an even more direct way of doing this, see the +next section, 'Reading Configuration Files')

+ +

Another implementation, using the Penlight libraries:

+ + +

+-- property file parsing with extended string functions
+require 'pl'
+stringx.import()
+props = []
+for line in io.lines() do
+    if not line:startswith('#') and not line:isspace() then
+        local var,value = line:splitv('=')
+        props[var] = value
+    end
+end
+

+ +

This is more self-documenting; it is generally better to make the code express +the intention, rather than having to scatter comments everywhere - comments are +necessary, of course, but mostly to give the higher view of your intention that +cannot be expressed in code. It is slightly slower, true, but in practice the +speed of this script is determined by I/O, so further optimization is unnecessary.

+ +

Reading Unstructured Text Data

+ +

Text data is sometimes unstructured, for example a file containing words. The +pl.input module has a number of functions which makes processing such files +easier. For example, a script to count the number of words in standard input +using import.words:

+ + +

+-- countwords.lua
+require 'pl'
+local k = 1
+for w in input.words(io.stdin) do
+    k = k + 1
+end
+print('count',k)
+

+ +

Or this script to calculate the average of a set of numbers using input.numbers:

+ + +

+-- average.lua
+require 'pl'
+local k = 1
+local sum = 0
+for n in input.numbers(io.stdin) do
+    sum = sum + n
+    k = k + 1
+end
+print('average',sum/k)
+

+ +

These scripts can be improved further by eliminating loops In the last case, +there is a perfectly good function seq.sum which can already take a sequence of +numbers and calculate these numbers for us:

+ + +

+-- average2.lua
+require 'pl'
+local total,n = seq.sum(input.numbers())
+print('average',total/n)
+

+ +

A further simplification here is that if numbers or words are not passed an +argument, they will grab their input from standard input. The first script can +be rewritten:

+ + +

+-- countwords2.lua
+require 'pl'
+print('count',seq.count(input.words()))
+

+ +

A useful feature of a sequence generator like numbers is that it can read from +a string source. Here is a script to calculate the sums of the numbers on each +line in a file:

+ + +

+-- sums.lua
+for line in io.lines() do
+    print(seq.sum(input.numbers(line))
+end
+

+ +

Reading Columnar Data

+ +

It is very common to find data in columnar form, either space or comma-separated, +perhaps with an initial set of column headers. Here is a typical example:

+ + +

+EventID    Magnitude    LocationX    LocationY    LocationZ
+981124001    2.0    18988.4    10047.1    4149.7
+981125001    0.8    19104.0    9970.4    5088.7
+981127003    0.5    19012.5    9946.9    3831.2
+...
+

+ +

input.fields is designed to extract several columns, given some delimiter +(default to whitespace). Here is a script to calculate the average X location of +all the events:

+ + +

+-- avg-x.lua
+require 'pl'
+io.read() -- skip the header line
+local sum,count = seq.sum(input.fields {3})
+print(sum/count)
+

+ +

input.fields is passed either a field count, or a list of column indices, +starting at one as usual. So in this case we're only interested in column 3. If +you pass it a field count, then you get every field up to that count:

+ + +

+for id,mag,locX,locY,locZ in input.fields (5) do
+....
+end
+

+ +

input.fields by default tries to convert each field to a number. It will skip +lines which clearly don't match the pattern, but will abort the script if there +are any fields which cannot be converted to numbers.

+ +

The second parameter is a delimiter, by default spaces. ' ' is understood to mean +'any number of spaces', i.e. '%s+'. Any Lua string pattern can be used.

+ +

The third parameter is a data source, by default standard input (defined by +input.create_getter.) It assumes that the data source has a read method which +brings in the next line, i.e. it is a 'file-like' object. As a special case, a +string will be split into its lines:

+ + +

+> for x,y in input.fields(2,' ','10 20\n30 40\n') do print(x,y) end
+10      20
+30      40
+

+ +

Note the default behaviour for bad fields, which is to show the offending line +number:

+ + +

+> for x,y in input.fields(2,' ','10 20\n30 40x\n') do print(x,y) end
+10      20
+line 2: cannot convert '40x' to number
+

+ +

This behaviour of input.fields is appropriate for a script which you want to +fail immediately with an appropriate user error message if conversion fails. +The fourth optional parameter is an options table: {no_fail=true} means that +conversion is attempted but if it fails it just returns the string, rather as AWK +would operate. You are then responsible for checking the type of the returned +field. {no_convert=true} switches off conversion altogether and all fields are +returned as strings.

+ + +

Sometimes it is useful to bring a whole dataset into memory, for operations such +as extracting columns. Penlight provides a flexible reader specifically for +reading this kind of data, using the data module. Given a file looking like this:

+ + +

+x,y
+10,20
+2,5
+40,50
+

+ +

Then data.read will create a table like this, with each row represented by a +sublist:

+ + +

+> t = data.read 'test.txt'
+> pretty.dump(t)
+{{10,20},{2,5},{40,50},fieldnames={'x','y'},delim=','}
+

+ +

You can now analyze this returned table using the supplied methods. For instance, +the method column_by_name returns a table of all the values of that column.

+ + +

+-- testdata.lua
+require 'pl'
+d = data.read('fev.txt')
+for _,name in ipairs(d.fieldnames) do
+    local col = d:column_by_name(name)
+    if type(col[1]) == 'number' then
+        local total,n = seq.sum(col)
+        utils.printf("Average for %s is %f\n",name,total/n)
+    end
+end
+

+ +

data.read tries to be clever when given data; by default it expects a first +line of column names, unless any of them are numbers. It tries to deduce the +column delimiter by looking at the first line. Sometimes it guesses wrong; these +things can be specified explicitly. The second optional parameter is an options +table: can override delim (a string pattern), fieldnames (a list or +comma-separated string), specify no_convert (default is to convert), numfields +(indices of columns known to be numbers, as a list) and thousands_dot (when the +thousands separator in Excel CSV is '.')

+ +

A very powerful feature is a way to execute SQL-like queries on such data:

+ + +

+-- queries on tabular data
+require 'pl'
+local d = data.read('xyz.txt')
+local q = d:select('x,y,z where x > 3 and z < 2 sort by y')
+for x,y,z in q do
+    print(x,y,z)
+end
+

+ +

Please note that the format of queries is restricted to the following syntax:

+ + +

+FIELDLIST [ 'where' CONDITION ] [ 'sort by' FIELD [asc|desc]]
+

+ +

Any valid Lua code can appear in CONDITION; remember it is not SQL and you +have to use == (this warning comes from experience.)

+ +

For this to work, field names must be Lua identifiers. So read will massage +fieldnames so that all non-alphanumeric chars are replaced with underscores. +However, the original_fieldnames field always contains the original un-massaged +fieldnames.

+ +

read can handle standard CSV files fine, although doesn't try to be a +full-blown CSV parser. With the csv=true option, it's possible to have +double-quoted fields, which may contain commas; then trailing commas become +significant as well.

+ +

Spreadsheet programs are not always the best tool to +process such data, strange as this might seem to some people. This is a toy CSV +file; to appreciate the problem, imagine thousands of rows and dozens of columns +like this:

+ + +

+Department Name,Employee ID,Project,Hours Booked
+sales,1231,overhead,4
+sales,1255,overhead,3
+engineering,1501,development,5
+engineering,1501,maintenance,3
+engineering,1433,maintenance,10
+

+ +

The task is to reduce the dataset to a relevant set of rows and columns, perhaps +do some processing on row data, and write the result out to a new CSV file. The +write_row method uses the delimiter to write the row to a file; +Data.select_row is like Data.select, except it iterates over rows, not +fields; this is necessary if we are dealing with a lot of columns!

+ + +

+names = {[1501]='don',[1433]='dilbert'}
+keepcols = {'Employee_ID','Hours_Booked'}
+t:write_row (outf,{'Employee','Hours_Booked'})
+q = t:select_row {
+    fields=keepcols,
+    where=function(row) return row[1]=='engineering' end
+}
+for row in q do
+    row[1] = names[row[1]]
+    t:write_row(outf,row)
+end
+

+ +

Data.select_row and Data.select can be passed a table specifying the query; a +list of field names, a function defining the condition and an optional parameter +sort_by. It isn't really necessary here, but if we had a more complicated row +condition (such as belonging to a specified set) then it is not generally +possible to express such a condition as a query string, without resorting to +hackery such as global variables.

+ +

With 1.0.3, you can specify explicit conversion functions for selected columns. +For instance, this is a log file with a Unix date stamp:

+ + +

+Time Message
+1266840760 +# EE7C0600006F0D00C00F06010302054000000308010A00002B00407B00
+1266840760 closure data 0.000000 1972 1972 0
+1266840760 ++ 1266840760 EE 1
+1266840760 +# EE7C0600006F0D00C00F06010302054000000408020A00002B00407B00
+1266840764 closure data 0.000000 1972 1972 0
+

+ +

We would like the first column as an actual date object, so the convert +field sets an explicit conversion for column 1. (Note that we have to explicitly +convert the string to a number first.)

+ + +

+Date = require 'pl.Date'
+
+function date_convert (ds)
+    return Date(tonumber(ds))
+end
+
+d = data.read(f,{convert={[1]=date_convert},last_field_collect=true})
+

+ +

This gives us a two-column dataset, where the first column contains Date objects +and the second column contains the rest of the line. Queries can then easily +pick out events on a day of the week:

+ + +

+q = d:select "Time,Message where Time:weekday_name()=='Sun'"
+

+ +

Data does not have to come from files, nor does it necessarily come from the lab +or the accounts department. On Linux, ps aux gives you a full listing of all +processes running on your machine. It is straightforward to feed the output of +this command into data.read and perform useful queries on it. Notice that +non-identifier characters like '%' get converted into underscores:

+ + +

+require 'pl'
+f = io.popen 'ps aux'
+s = data.read (f,{last_field_collect=true})
+f:close()
+print(s.fieldnames)
+print(s:column_by_name 'USER')
+qs = 'COMMAND,_MEM where _MEM > 5 and USER=="steve"'
+for name,mem in s:select(qs) do
+    print(mem,name)
+end
+

+ +

I've always been an admirer of the AWK programming language; with filter you +can get Lua programs which are just as compact:

+ + +

+-- printxy.lua
+require 'pl'
+data.filter 'x,y where x > 3'
+

+ +

It is common enough to have data files without headers of field names. +data.read makes a special exception for such files if all fields are numeric. +Since there are no column names to use in query expressions, you can use AWK-like +column indexes, e.g. '$1,$2 where $1 > 3'. I have a little executable script on +my system called lf which looks like this:

+ + +

+#!/usr/bin/env lua
+require 'pl.data'.filter(arg[1])
+

+ +

And it can be used generally as a filter command to extract columns from data. +(The column specifications may be expressions or even constants.)

+ + +

+$ lf '$1,$5/10' < test.dat
+

+ +

(As with AWK, please note the single-quotes used in this command; this prevents +the shell trying to expand the column indexes. If you are on Windows, then you +must quote the expression in double-quotes so +it is passed as one argument to your batch file.)

+ +

As a tutorial resource, have a look at test-data.lua in the PL tests directory +for other examples of use, plus comments.

+ +

The data returned by read or constructed by Data.copy_select from a query is +basically just an array of rows: {{1,2},{3,4}}. So you may use read to pull +in any array-like dataset, and process with any function that expects such a +implementation. In particular, the functions in array2d will work fine with +this data. In fact, these functions are available as methods; e.g. +array2d.flatten can be called directly like so to give us a one-dimensional list:

+ + +

+v = data.read('dat.txt'):flatten()
+

+ +

The data is also in exactly the right shape to be treated as matrices by +LuaMatrix:

+ + +

+> matrix = require 'matrix'
+> m = matrix(data.read 'mat.txt')
+> = m
+1       0.2     0.3
+0.2     1       0.1
+0.1     0.2     1
+> = m^2  -- same as m*m
+1.07    0.46    0.62
+0.41    1.06    0.26
+0.24    0.42    1.05
+

+ +

write will write matrices back to files for you.

+ +

Finally, for the curious, the global variable _DEBUG can be used to print out +the actual iterator function which a query generates and dynamically compiles. By +using code generation, we can get pretty much optimal performance out of +arbitrary queries.

+ + +

+> lua -lpl -e "_DEBUG=true" -e "data.filter 'x,y where x > 4 sort by x'" < test.txt
+return function (t)
+        local i = 0
+        local v
+        local ls = {}
+        for i,v in ipairs(t) do
+            if v[1] > 4  then
+                    ls[#ls+1] = v
+            end
+        end
+        table.sort(ls,function(v1,v2)
+            return v1[1] < v2[1]
+        end)
+        local n = #ls
+        return function()
+            i = i + 1
+            v = ls[i]
+            if i > n then return end
+            return v[1],v[2]
+        end
+end
+
+10,20
+40,50
+

+ +

Reading Configuration Files

+ +

The config module provides a simple way to convert several kinds of +configuration files into a Lua table. Consider the simple example:

+ + +

+# test.config
+# Read timeout in seconds
+read.timeout=10
+
+# Write timeout in seconds
+write.timeout=5
+
+#acceptable ports
+ports = 1002,1003,1004
+

+ +

This can be easily brought in using config.read and the result shown using +pretty.write:

+ + +

+-- readconfig.lua
+local config = require 'pl.config'
+local pretty= require 'pl.pretty'
+
+local t = config.read(arg[1])
+print(pretty.write(t))
+

+ +

and the output of lua readconfig.lua test.config is:

+ + +

+{
+  ports = {
+    1002,
+    1003,
+    1004
+  },
+  write_timeout = 5,
+  read_timeout = 10
+}
+

+ +

That is, config.read will bring in all key/value pairs, ignore # comments, and +ensure that the key names are proper Lua identifiers by replacing non-identifier +characters with '_'. If the values are numbers, then they will be converted. (So +the value of t.write_timeout is the number 5). In addition, any values which +are separated by commas will be converted likewise into an array.

+ +

Any line can be continued with a backslash. So this will all be considered one +line:

+ + +

+names=one,two,three, \
+four,five,six,seven, \
+eight,nine,ten
+

+ +

Windows-style INI files are also supported. The section structure of INI files +translates naturally to nested tables in Lua:

+ + +

+; test.ini
+[timeouts]
+read=10 ; Read timeout in seconds
+write=5 ; Write timeout in seconds
+[portinfo]
+ports = 1002,1003,1004
+

+ +

The output is:

+ + +

+{
+  portinfo = {
+    ports = {
+      1002,
+      1003,
+      1004
+    }
+  },
+  timeouts = {
+    write = 5,
+    read = 10
+  }
+}
+

+ +

You can now refer to the write timeout as t.timeouts.write.

+ +

As a final example of the flexibility of config.read, if passed this simple +comma-delimited file

+ + +

+one,two,three
+10,20,30
+40,50,60
+1,2,3
+

+ +

it will produce the following table:

+ + +

+{
+  { "one", "two", "three" },
+  { 10, 20, 30 },
+  { 40, 50, 60  },
+  { 1, 2, 3 }
+}
+

+ +

config.read isn't designed to read all CSV files in general, but intended to +support some Unix configuration files not structured as key-value pairs, such as +'/etc/passwd'.

+ +

This function is intended to be a Swiss Army Knife of configuration readers, but +it does have to make assumptions, and you may not like them. So there is an +optional extra parameter which allows some control, which is table that may have +the following fields:

+ + +

+{
+   variablilize = true,
+   convert_numbers = tonumber,
+   trim_space = true,
+   list_delim = ',',
+   trim_quotes = true,
+   ignore_assign = false,
+   keysep = '=',
+   smart = false,
+}
+

+ +

variablilize is the option that converted write.timeout in the first example +to the valid Lua identifier write_timeout. If convert_numbers is true, then +an attempt is made to convert any string that starts like a number. You can +specify your own function (say one that will convert a string like '5224 kb' into +a number.)

+ +

trim_space ensures that there is no starting or trailing whitespace with +values, and list_delim is the character that will be used to decide whether to +split a value up into a list (it may be a Lua string pattern such as '%s+'.)

+ +

For instance, the password file in Unix is colon-delimited:

+ + +

+t = config.read('/etc/passwd',{list_delim=':'})
+

+ +

This produces the following output on my system (only last two lines shown):

+ + +

+{
+  ...
+  {
+    "user",
+    "x",
+    "1000",
+    "1000",
+    "user,,,",
+    "/home/user",
+    "/bin/bash"
+  },
+  {
+    "sdonovan",
+    "x",
+    "1001",
+    "1001",
+    "steve donovan,28,,",
+    "/home/sdonovan",
+    "/bin/bash"
+  }
+}
+

+ +

You can get this into a more sensible format, where the usernames are the keys, +with this (the tablex.pairmap function must return value, key!)

+ + +

+t = tablex.pairmap(function(k,v) return v,v[1] end,t)
+

+ +

and you get:

+ + +

+{ ...
+  sdonovan = {
+    "sdonovan",
+    "x",
+    "1001",
+    "1001",
+    "steve donovan,28,,",
+    "/home/sdonovan",
+    "/bin/bash"
+  }
+...
+}
+

+ +

Many common Unix configuration files can be read by tweaking these parameters. +For /etc/fstab, the options {list_delim='%s+',ignore_assign=true} will +correctly separate the columns. It's common to find 'KEY VALUE' assignments in +files such as /etc/ssh/ssh_config; the options {keysep=' '} make +config.read return a table where each KEY has a value VALUE.

+ +

Files in the Linux procfs usually use ':` as the field delimiter:

+ + +

+> t = config.read('/proc/meminfo',{keysep=':'})
+> = t.MemFree
+220140 kB
+

+ +

That result is a string, since tonumber doesn't like it, but defining the +convert_numbers option as `function(s) return tonumber((s:gsub(' kB$',''))) +end` will get the memory figures as actual numbers in the result. (The extra +parentheses are necessary so that tonumber only gets the first result from +gsub). From `tests/test-config.lua':

+ + +

+testconfig([[
+MemTotal:        1024748 kB
+MemFree:          220292 kB
+]],
+{ MemTotal = 1024748, MemFree = 220292 },
+{
+ keysep = ':',
+ convert_numbers = function(s)
+    s = s:gsub(' kB$','')
+    return tonumber(s)
+  end
+ }
+)
+

+ +

The smart option lets config.read make a reasonable guess for you; there +are examples in tests/test-config.lua, but basically these common file +formats (and those following the same pattern) can be processed directly in +smart mode: 'etc/fstab', '/proc/XXXX/status', 'ssh_config' and 'pdatedb.conf'.

+ +

Please note that config.read can be passed a file-like object; if it's not a +string and supports the read method, then that will be used. For instance, to +read a configuration from a string, use stringio.open.

+ + +

+ +

Lexical Scanning

+ +

Although Lua's string pattern matching is very powerful, there are times when +something more powerful is needed. pl.lexer.scan provides lexical scanners +which tokenize a string, classifying tokens into numbers, strings, etc.

+ + +

+> lua -lpl
+Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
+> tok = lexer.scan 'alpha = sin(1.5)'
+> = tok()
+iden    alpha
+> = tok()
+=       =
+> = tok()
+iden    sin
+> = tok()
+(       (
+> = tok()
+number  1.5
+> = tok()
+)       )
+> = tok()
+(nil)
+

+ +

The scanner is a function, which is repeatedly called and returns the type and +value of the token. Recognized basic types are 'iden','string','number', and +'space'. and everything else is represented by itself. Note that by default the +scanner will skip any 'space' tokens.

+ +

'comment' and 'keyword' aren't applicable to the plain scanner, which is not +language-specific, but a scanner which understands Lua is available. It +recognizes the Lua keywords, and understands both short and long comments and +strings.

+ + +

+> for t,v in lexer.lua 'for i=1,n do' do print(t,v) end
+keyword for
+iden    i
+=       =
+number  1
+,       ,
+iden    n
+keyword do
+

+ +

A lexical scanner is useful where you have highly-structured data which is not +nicely delimited by newlines. For example, here is a snippet of a in-house file +format which it was my task to maintain:

+ + +

+points
+    (818344.1,-20389.7,-0.1),(818337.9,-20389.3,-0.1),(818332.5,-20387.8,-0.1)
+    ,(818327.4,-20388,-0.1),(818322,-20387.7,-0.1),(818316.3,-20388.6,-0.1)
+    ,(818309.7,-20389.4,-0.1),(818303.5,-20390.6,-0.1),(818295.8,-20388.3,-0.1)
+    ,(818290.5,-20386.9,-0.1),(818285.2,-20386.1,-0.1),(818279.3,-20383.6,-0.1)
+    ,(818274,-20381.2,-0.1),(818274,-20380.7,-0.1);
+

+ +

Here is code to extract the points using pl.lexer:

+ + +

+-- assume 's' contains the text above...
+local lexer = require 'pl.lexer'
+local expecting = lexer.expecting
+local append = table.insert
+
+local tok = lexer.scan(s)
+
+local points = {}
+local t,v = tok() -- should be 'iden','points'
+
+while t ~= ';' do
+    c = {}
+    expecting(tok,'(')
+    c.x = expecting(tok,'number')
+    expecting(tok,',')
+    c.y = expecting(tok,'number')
+    expecting(tok,',')
+    c.z = expecting(tok,'number')
+    expecting(tok,')')
+    t,v = tok()  -- either ',' or ';'
+    append(points,c)
+end
+

+ +

The expecting function grabs the next token and if the type doesn't match, it +throws an error. (pl.lexer, unlike other PL libraries, raises errors if +something goes wrong, so you should wrap your code in pcall to catch the error +gracefully.)

+ +

The scanners all have a second optional argument, which is a table which controls +whether you want to exclude spaces and/or comments. The default for lexer.lua +is {space=true,comments=true}. There is a third optional argument which +determines how string and number tokens are to be processsed.

+ +

The ultimate highly-structured data is of course, program source. Here is a +snippet from 'text-lexer.lua':

+ + +

+require 'pl'
+
+lines = [[
+for k,v in pairs(t) do
+    if type(k) == 'number' then
+        print(v) -- array-like case
+    else
+        print(k,v)
+    end
+end
+]]
+
+ls = List()
+for tp,val in lexer.lua(lines,{space=true,comments=true}) do
+    assert(tp ~= 'space' and tp ~= 'comment')
+    if tp == 'keyword' then ls:append(val) end
+end
+test.asserteq(ls,List{'for','in','do','if','then','else','end','end'})
+

+ +

Here is a useful little utility that identifies all common global variables found +in a lua module (ignoring those declared locally for the moment):

+ + +

+-- testglobal.lua
+require 'pl'
+
+local txt,err = utils.readfile(arg[1])
+if not txt then return print(err) end
+
+local globals = List()
+for t,v in lexer.lua(txt) do
+    if t == 'iden' and _G[v] then
+        globals:append(v)
+    end
+end
+pretty.dump(seq.count_map(globals))
+

+ +

Rather then dumping the whole list, with its duplicates, we pass it through +seq.count_map which turns the list into a table where the keys are the values, +and the associated values are the number of times those values occur in the +sequence. Typical output looks like this:

+ + +

+{
+  type = 2,
+  pairs = 2,
+  table = 2,
+  print = 3,
+  tostring = 2,
+  require = 1,
+  ipairs = 4
+}
+

+ +

You could further pass this through tablex.keys to get a unique list of +symbols. This can be useful when writing 'strict' Lua modules, where all global +symbols must be defined as locals at the top of the file.

+ +

For a more detailed use of lexer.scan, please look at testxml.lua in the +examples directory.

+ +

XML

+ +

New in the 0.9.7 release is some support for XML. This is a large topic, and +Penlight does not provide a full XML stack, which is properly the task of a more +specialized library.

+ +

Parsing and Pretty-Printing

+ +

The semi-standard XML parser in the Lua universe is lua-expat. +In particular, +it has a function called lxp.lom.parse which will parse XML into the Lua Object +Model (LOM) format. However, it does not provide a way to convert this data back +into XML text. xml.parse will use this function, if lua-expat is +available, and otherwise switches back to a pure Lua parser originally written by +Roberto Ierusalimschy.

+ +

The resulting document object knows how to render itself as a string, which is +useful for debugging:

+ + +

+> d = xml.parse "<nodes><node id='1'>alice</node></nodes>"
+> = d
+<nodes><node id='1'>alice</node></nodes>
+> pretty.dump (d)
+{
+  {
+    "alice",
+    attr = {
+      "id",
+      id = "1"
+    },
+    tag = "node"
+  },
+  attr = {
+  },
+  tag = "nodes"
+}
+

+ +

Looking at the actual shape of the data reveals the structure of LOM:

+ +

every element has a tag field with its name
plus a attr field which is a table containing the attributes as fields, and + also as an array. It is always present.
the children of the element are the array part of the element, so d[1] is + the first child of d, etc.

+ +

It could be argued that having attributes also as the array part of attr is not +essential (you cannot depend on attribute order in XML) but that's how +it goes with this standard.

+ +

lua-expat is another soft dependency of Penlight; generally, the fallback +parser is good enough for straightforward XML as is commonly found in +configuration files, etc. doc.basic_parse is not intended to be a proper +conforming parser (it's only sixty lines) but it handles simple kinds of +documents that do not have comments or DTD directives. It is intelligent enough +to ignore the <?xml directive and that is about it.

+ +

You can get pretty-printing by explicitly calling xml.tostring and passing it +the initial indent and the per-element indent:

+ + +

+> = xml.tostring(d,'','  ')
+
+<nodes>
+  <node id='1'>alice</node>
+</nodes>
+

+ +

There is a fourth argument which is the attribute indent:

+ + +

+> a = xml.parse "<frodo name='baggins' age='50' type='hobbit'/>"
+> = xml.tostring(a,'','  ','  ')
+
+<frodo
+  type='hobbit'
+  name='baggins'
+  age='50'
+/>
+

+ +

Parsing and Working with Configuration Files

+ +

It's common to find configurations expressed with XML these days. It's +straightforward to 'walk' the LOM +data and extract the data in the form you want:

+ + +

+require 'pl'
+
+local config = [[
+<config>
+    <alpha>1.3</alpha>
+    <beta>10</beta>
+    <name>bozo</name>
+</config>
+]]
+local d,err = xml.parse(config)
+
+local t = {}
+for item in d:childtags() do
+    t[item.tag] = item[1]
+end
+
+pretty.dump(t)
+--->
+{
+  beta = "10",
+  alpha = "1.3",
+  name = "bozo"
+}
+

+ +

The only gotcha is that here we must use the Doc:childtags method, which will +skip over any text elements.

+ +

A more involved example is this excerpt from serviceproviders.xml, which is +usually found at /usr/share/mobile-broadband-provider-info/serviceproviders.xml +on Debian/Ubuntu Linux systems.

+ + +

+d = xml.parse [[
+<serviceproviders format="2.0">
+...
+<country code="za">
+    <provider>
+        <name>Cell-c</name>
+        <gsm>
+            <network-id mcc="655" mnc="07"/>
+            <apn value="internet">
+                <username>Cellcis</username>
+                <dns>196.7.0.138</dns>
+                <dns>196.7.142.132</dns>
+            </apn>
+        </gsm>
+    </provider>
+    <provider>
+        <name>MTN</name>
+        <gsm>
+            <network-id mcc="655" mnc="10"/>
+            <apn value="internet">
+                <dns>196.11.240.241</dns>
+                <dns>209.212.97.1</dns>
+            </apn>
+        </gsm>
+    </provider>
+    <provider>
+        <name>Vodacom</name>
+        <gsm>
+            <network-id mcc="655" mnc="01"/>
+            <apn value="internet">
+                <dns>196.207.40.165</dns>
+                <dns>196.43.46.190</dns>
+            </apn>
+            <apn value="unrestricted">
+                <name>Unrestricted</name>
+                <dns>196.207.32.69</dns>
+                <dns>196.43.45.190</dns>
+            </apn>
+        </gsm>
+    </provider>
+    <provider>
+        <name>Virgin Mobile</name>
+        <gsm>
+            <apn value="vdata">
+                <dns>196.7.0.138</dns>
+                <dns>196.7.142.132</dns>
+            </apn>
+        </gsm>
+    </provider>
+</country>
+....
+</serviceproviders>
+]]
+

+ +

Getting the names of the providers per-country is straightforward:

+ + +

+local t = {}
+for country in d:childtags() do
+    local providers = {}
+    t[country.attr.code] = providers
+    for provider in country:childtags() do
+        table.insert(providers,provider:child_with_name('name'):get_text())
+    end
+end
+
+pretty.dump(t)
+-->
+{
+  za = {
+    "Cell-c",
+    "MTN",
+    "Vodacom",
+    "Virgin Mobile"
+  }
+  ....
+}
+

+ +

Generating XML with 'xmlification'

+ +

This feature is inspired by the htmlify function used by +Orbit to simplify HTML generation, +except that no function environment magic is used; the tags function returns a +set of constructors for elements of the given tag names.

+ + +

+> nodes, node = xml.tags 'nodes, node'
+> = node 'alice'
+<node>alice</node>
+> = nodes { node {id='1','alice'}}
+<nodes><node id='1'>alice</node></nodes>
+

+ +

The flexibility of Lua tables is very useful here, since both the attributes and +the children of an element can be encoded naturally. The argument to these tag +constructors is either a single value (like a string) or a table where the +attributes are the named keys and the children are the array values.

+ +

Generating XML using Templates

+ +

A template is a little XML document which contains dollar-variables. The subst +method on a document is fed an array of tables containing values for these +variables. Note how the parent tag name is specified:

+ + +

+> templ = xml.parse "<node id='$id'>$name</node>"
+> = templ:subst {tag='nodes', {id=1,name='alice'},{id=2,name='john'}}
+<nodes><node id='1'>alice</node><node id='2'>john</node></nodes>
+

+ +

Substitution is very related to filtering documents. One of the annoying things +about XML is that it is a document markup language first, and a data language +second. Standard parsers will assume you really care about all those extra +text elements. Consider this fragment, which has been changed by a five-year old:

+ + +

+T = [[
+  <weather>
+    boops!
+    <current_conditions>
+      <condition data='$condition'/>
+      <temp_c data='$temp'/>
+      <bo>whoops!</bo>
+    </current_conditions>
+  </weather>
+]]
+

+ +

Conformant parsers will give you text elements with the line feed after <current_conditions> +although it makes handling the data more irritating.

+ + +

+local function parse (str)
+    return xml.parse(str,false,true)
+end
+

+ +

Second argument means 'string, not file' and third argument means use the built-in +Lua parser (instead of LuaExpat if available) which by default is not interested in +keeping such strings.

+ +

How to remove the string boops!? clone (also called filter when called as a +method) copies a LOM document. It can be passed a filter function, which is applied +to each string found. The powerful thing about this is that this function receives +structural information - the parent node, and whether this was a tag name, a text +element or a attribute name:

+ + +

+d = parse (T)
+c = d:filter(function(s,kind,parent)
+    print(stringx.strip(s),kind,parent and parent.tag or '?')
+    if kind == '*TEXT' and #parent > 1 then return nil end
+    return s
+end)
+--->
+weather    *TAG    ?
+boops!    *TEXT    weather
+current_conditions    *TAG    weather
+condition    *TAG    current_conditions
+$condition    data    condition
+temp_c    *TAG    current_conditions
+$temp    data    temp_c
+bo    *TAG    current_conditions
+whoops!    *TEXT    bo
+

+ +

We can pull out 'boops' and not 'whoops' by discarding text elements which are not +the single child of an element.

+ + + +

Extracting Data using Templates

+ +

Matching goes in the opposite direction. We have a document, and would like to +extract values from it using a pattern.

+ +

A common use of this is parsing the XML result of API queries. The +(undocumented and subsequently discontinued) Google Weather +API is a +good example. Grabbing the result of +`http://www.google.com/ig/api?weather=Johannesburg,ZA" we get something like +this, after pretty-printing:

+ + +

+<xml_api_reply version='1'>
+  <weather module_id='0' tab_id='0' mobile_zipped='1' section='0' row='0'
+

+ +

mobile_row='0'>

+ +

+<forecast_information>
+  <city data='Johannesburg, Gauteng'/>
+  <postal_code data='Johannesburg,ZA'/>
+  <latitude_e6 data=''/>
+  <longitude_e6 data=''/>
+  <forecast_date data='2010-10-02'/>
+  <current_date_time data='2010-10-02 18:30:00 +0000'/>
+  <unit_system data='US'/>
+</forecast_information>
+<current_conditions>
+  <condition data='Clear'/>
+  <temp_f data='75'/>
+  <temp_c data='24'/>
+  <humidity data='Humidity: 19%'/>
+  <icon data='/ig/images/weather/sunny.gif'/>
+  <wind_condition data='Wind: NW at 7 mph'/>
+</current_conditions>
+<forecast_conditions>
+  <day_of_week data='Sat'/>
+  <low data='60'/>
+  <high data='89'/>
+  <icon data='/ig/images/weather/sunny.gif'/>
+  <condition data='Clear'/>
+</forecast_conditions>
+....
+/weather>
+l_api_reply>
+

+ +

Assume that the above XML has been read into google. The idea is to write a +pattern looking like a template, and use it to extract some values of interest:

+ + +

+t = [[
+  <weather>
+    <current_conditions>
+      <condition data='$condition'/>
+      <temp_c data='$temp'/>
+    </current_conditions>
+  </weather>
+]]
+
+local res, ret = google:match(t)
+pretty.dump(res)
+

+ +

And the output is:

+ + +

+{
+  condition = "Clear",
+  temp = "24"
+}
+

+ +

The match method can be passed a LOM document or some text, which will be +parsed first.

+ +

But what if we need to extract values from repeated elements? Match templates may +contain 'array matches' which are enclosed in '{{..}}':

+ + +

+<weather>
+  {{<forecast_conditions>
+    <day_of_week data='$day'/>
+    <low data='$low'/>
+    <high data='$high'/>
+    <condition data='$condition'/>
+  </forecast_conditions>}}
+</weather>
+

+ +

And the match result is:

+ + +

+{
+  {
+    low = "60",
+    high = "89",
+    day = "Sat",
+    condition = "Clear",
+  },
+  {
+    low = "53",
+    high = "86",
+    day = "Sun",
+    condition = "Clear",
+  },
+  {
+    low = "57",
+    high = "87",
+    day = "Mon",
+    condition = "Clear",
+  },
+  {
+    low = "60",
+    high = "84",
+    day = "Tue",
+    condition = "Clear",
+  }
+}
+

+ +

With this array of tables, you can use tablex or List +to reshape into the desired form, if you choose. Just as with reading a Unix password +file with config, you can make the array into a map of days to conditions using:

+ + +

+tablex.pairmap('|k,v| v,v.day',conditions)
+

+ +

(Here using the alternative string lambda option)

+ +

However, xml matches can shape the structure of the output. By replacing the day_of_week +line of the template with <day_of_week data='$_'/> we get the same effect; $_ is +a special symbol that means that this captured value (or simply capture) becomes the key.

+ +

Note that $NUMBER means a numerical index, so +that $1 is the first element of the resulting array, and so forth. You can mix +numbered and named captures, but it's strongly advised to make the numbered captures +form a proper array sequence (everything from 1 to n inclusive). $0 has a +special meaning; if it is the only capture ({[0]='foo'}) then the table is +collapsed into 'foo'.

+ + +

+<weather>
+  {{<forecast_conditions>
+    <day_of_week data='$_'/>
+    <low data='$1'/>
+    <high data='$2'/>
+    <condition data='$3'/>
+  </forecast_conditions>}}
+</weather>
+

+ +

Now the result is:

+ + +

+{
+  Tue = {
+    "60",
+    "84",
+    "Clear"
+  },
+  Sun = {
+    "53",
+    "86",
+    "Clear"
+  },
+  Sat = {
+    "60",
+    "89",
+    "Clear"
+  },
+  Mon = {
+    "57",
+    "87",
+    "Clear"
+  }
+}
+

+ +

Applying matches to this config file poses another problem, because the actual +tags matched are themselves meaningful.

+ + +

+<config>
+    <alpha>1.3</alpha>
+    <beta>10</beta>
+    <name>bozo</name>
+</config>
+

+ +

So there are tag 'wildcards' which are element names ending with a hyphen.

+ + +

+<config>
+    {{<key->$value</key->}}
+</config>
+

+ +

You will then get {{alpha='1.3'},...}. The most convenient format would be +returned by this (note that _- behaves just like $_):

+ + +

+<config>
+    {{<_->$0</_->}}
+</config>
+

+ +

which would return {alpha='1.3',beta='10',name='bozo'}.

+ +

We could play this game endlessly, and encode ways of converting captures, but +the scheme is complex enough, and it's easy to do the conversion later

+ + +

+local numbers = {alpha=true,beta=true}
+for k,v in pairs(res) do
+    if numbers[v] then res[k] = tonumber(v) end
+end
+

+ +

HTML Parsing

+ +

HTML is an unusually degenerate form of XML, and Dennis Schridde has contributed +a feature which makes parsing it easier. For instance, from the tests:

+ + +

+doc = xml.parsehtml [[
+<BODY>
+Hello dolly<br>
+HTML is <b>slack</b><br>
+</BODY>
+]]
+
+asserteq(xml.tostring(doc),[[
+<body>
+Hello dolly<br/>
+HTML is <b>slack</b><br/></body>]])
+

+ +

That is, all tags are converted to lowercase, and empty HTML elements like br +are properly closed; attributes do not need to be quoted.

+ +

Also, DOCTYPE directives and comments are skipped. For truly badly formed HTML, +this is not the tool for you!

+ + + + + +

+generated by LDoc 1.4.6 +

+ + diff --git a/Data/Libraries/Penlight/docs/manual/07-functional.md.html b/Data/Libraries/Penlight/docs/manual/07-functional.md.html new file mode 100644 index 0000000..d4ca655 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/07-functional.md.html @@ -0,0 +1,834 @@ + + + + + Penlight Documentation + + + + +

+ +

+ + +

+ + + + + + +

+ + +

Functional Programming

+ +

Sequences

+ + +

A Lua iterator (in its simplest form) is a function which can be repeatedly +called to return a set of one or more values. The for in statement understands +these iterators, and loops until the function returns nil. There are standard +sequence adapters for tables in Lua (ipairs and pairs), and io.lines +returns an iterator over all the lines in a file. In the Penlight libraries, such +iterators are also called sequences. A sequence of single values (say from +io.lines) is called single-valued, whereas the sequence defined by pairs is +double-valued.

+ +

pl.seq provides a number of useful iterators, and some functions which operate +on sequences. At first sight this example looks like an attempt to write Python +in Lua, (with the sequence being inclusive):

+ + +

+> for i in seq.range(1,4) do print(i) end
+1
+2
+3
+4
+

+ +

But range is actually equivalent to Python's xrange, since it generates a +sequence, not a list. To get a list, use seq.copy(seq.range(1,10)), which +takes any single-value sequence and makes a table from the result. seq.list is +like ipairs except that it does not give you the index, just the value.

+ + +

+> for x in seq.list {1,2,3} do print(x) end
+1
+2
+3
+

+ +

enum takes a sequence and turns it into a double-valued sequence consisting of +a sequence number and the value, so enum(list(ls)) is actually equivalent to +ipairs. A more interesting example prints out a file with line numbers:

+ + +

+for i,v in seq.enum(io.lines(fname)) do print(i..' '..v) end
+

+ +

Sequences can be combined, either by 'zipping' them or by concatenating them.

+ + +

+> for x,y in seq.zip(l1,l2) do print(x,y) end
+10      1
+20      2
+30      3
+> for x in seq.splice(l1,l2) do print(x) end
+10
+20
+30
+1
+2
+3
+

+ +

seq.printall is useful for printing out single-valued sequences, and provides +some finer control over formating, such as a delimiter, the number of fields per +line, and a format string to use (@see string.format)

+ + +

+> seq.printall(seq.random(10))
+0.0012512588885159 0.56358531449324 0.19330423902097 ....
+> seq.printall(seq.random(10), ',', 4, '%4.2f')
+0.17,0.86,0.71,0.51
+0.30,0.01,0.09,0.36
+0.15,0.17,
+

+ +

map will apply a function to a sequence.

+ + +

+> seq.printall(seq.map(string.upper, {'one','two'}))
+ONE TWO
+> seq.printall(seq.map('+', {10,20,30}, 1))
+11 21 31
+

+ +

filter will filter a sequence using a boolean function (often called a +predicate). For instance, this code only prints lines in a file which are +composed of digits:

+ + +

+for l in seq.filter(io.lines(file), stringx.isdigit) do print(l) end
+

+ +

The following returns a table consisting of all the positive values in the +original table (equivalent to tablex.filter(ls, '>', 0))

+ + +

+ls = seq.copy(seq.filter(ls, '>', 0))
+

+ +

We're already encounted seq.sum when discussing input.numbers. This can also +be expressed with seq.reduce:

+ + +

+> seq.reduce(function(x,y) return x + y end, seq.list{1,2,3,4})
+10
+

+ +

seq.reduce applies a binary function in a recursive fashion, so that:

+ + +

+reduce(op,{1,2,3}) => op(1,reduce(op,{2,3}) => op(1,op(2,3))
+

+ +

it's now possible to easily generate other cumulative operations; the standard +operations declared in pl.operator are useful here:

+ + +

+> ops = require 'pl.operator'
+> -- can also say '*' instead of ops.mul
+> = seq.reduce(ops.mul,input.numbers '1 2 3 4')
+24
+

+ +

There are functions to extract statistics from a sequence of numbers:

+ + +

+> l1 = List {10,20,30}
+> l2 = List {1,2,3}
+> = seq.minmax(l1)
+10      30
+> = seq.sum(l1)
+60      3
+

+ +

It is common to get sequences where values are repeated, say the words in a file. +count_map will take such a sequence and count the values, returning a table +where the keys are the unique values, and the value associated with each key is +the number of times they occurred:

+ + +

+> t = seq.count_map {'one','fred','two','one','two','two'}
+> = t
+{one=2,fred=1,two=3}
+

+ +

This will also work on numerical sequences, but you cannot expect the result to +be a proper list, i.e. having no 'holes'. Instead, you always need to use pairs +to iterate over the result - note that there is a hole at index 5:

+ + +

+> t = seq.count_map {1,2,4,2,2,3,4,2,6}
+> for k,v in pairs(t) do print(k,v) end
+1       1
+2       4
+3       1
+4       2
+6       1
+

+ +

unique uses count_map to return a list of the unique values, that is, just +the keys of the resulting table.

+ +

last turns a single-valued sequence into a double-valued sequence with the +current value and the last value:

+ + +

+> for current,last in seq.last {10,20,30,40} do print (current,last) end
+20      10
+30      20
+40      30
+

+ +

This makes it easy to do things like identify repeated lines in a file, or +construct differences between values. filter can handle double-valued sequences +as well, so one could filter such a sequence to only return cases where the +current value is less than the last value by using operator.lt or just '<'. +This code then copies the resulting code into a table.

+ + +

+> ls = {10,9,10,3}
+> = seq.copy(seq.filter(seq.last(s),'<'))
+{9,3}
+

+ +

Sequence Wrappers

+ +

The functions in pl.seq cover the common patterns when dealing with sequences, +but chaining these functions together can lead to ugly code. Consider the last +example of the previous section; seq is repeated three times and the resulting +expression has to be read right-to-left. The first issue can be helped by local +aliases, so that the expression becomes copy(filter(last(s),'<')) but the +second issue refers to the somewhat unnatural order of functional application. +We tend to prefer reading operations from left to right, which is one reason why +object-oriented notation has become popular. Sequence adapters allow this +expression to be written like so:

+ + +

+seq(s):last():filter('<'):copy()
+

+ +

With this notation, the operation becomes a chain of method calls running from +left to right.

+ +

'Sequence' is not a basic Lua type, they are generally functions or callable +objects. The expression seq(s) wraps a sequence in a sequence wrapper, which +is an object which understands all the functions in pl.seq as methods. This +object then explicitly represents sequences.

+ +

As a special case, the constructor (which is when you call the table seq) will +make a wrapper for a plain list-like table. Here we apply the length operator to +a sequence of strings, and print them out.

+ + +

+> seq{'one','tw','t'} :map '#' :printall()
+3 2 1
+

+ +

As a convenience, there is a function seq.lines which behaves just like +io.lines except it wraps the result as an explicit sequence type. This takes +the first 10 lines from standard input, makes it uppercase, turns it into a +sequence with a count and the value, glues these together with the concatenation +operator, and finally prints out the sequence delimited by a newline.

+ + +

+seq.lines():take(10):upper():enum():map('..'):printall '\n'
+

+ +

Note the method upper, which is not a seq function. if an unknown method is +called, sequence wrappers apply that method to all the values in the sequence +(this is implicit use of mapmethod)

+ +

It is straightforward to create custom sequences that can be used in this way. On +Unix, /dev/random gives you an endless sequence of random bytes, so we use +take to limit the sequence, and then map to scale the result into the desired +range. The key step is to use seq to wrap the iterator function:

+ + +

+-- random.lua
+local seq = require 'pl.seq'
+
+function dev_random()
+    local f = io.open('/dev/random')
+    local byte = string.byte
+    return seq(function()
+        -- read two bytes into a string and convert into a 16-bit number
+        local s = f:read(2)
+        return byte(s,1) + 256*byte(s,2)
+    end)
+end
+
+-- print 10 random numbers from 0 to 1 !
+dev_random():take(10):map('%',100):map('/',100):printall ','
+

+ +

Another Linux one-liner depends on the /proc filesystem and makes a list of all +the currently running processes:

+ + +

+pids = seq(lfs.dir '/proc'):filter(stringx.isdigit):map(tonumber):copy()
+

+ +

This version of Penlight has an experimental feature which relies on the fact +that all Lua types can have metatables, including functions. This makes +implicit sequence wrapping possible:

+ + +

+> seq.import()
+> seq.random(5):printall(',',5,'%4.1f')
+ 0.0, 0.1, 0.4, 0.1, 0.2
+

+ +

This avoids the awkward seq(seq.random(5)) construction. Or the iterator can +come from somewhere else completely:

+ + +

+> ('one two three'):gfind('%a+'):printall(',')
+one,two,three,
+

+ +

After seq.import, it is no longer necessary to explicitly wrap sequence +functions.

+ +

But there is a price to pay for this convenience. Every function is affected, +so that any function can be used, appropriate or not:

+ + +

+> math.sin:printall()
+..seq.lua:287: bad argument #1 to '(for generator)' (number expected, got nil)
+> a = tostring
+> = a:find(' ')
+function: 0042C920
+

+ +

What function is returned? It's almost certain to be something that makes no +sense in the current context. So implicit sequences may make certain kinds of +programming mistakes harder to catch - they are best used for interactive +exploration and small scripts.

+ +

List Comprehensions

+ +

List comprehensions are a compact way to create tables by specifying their +elements. In Python, you can say this:

+ + +

+ls = [x for x in range(5)]  # == [0,1,2,3,4]
+

+ +

In Lua, using pl.comprehension:

+ + +

+> C = require('pl.comprehension').new()
+> = C ('x for x=1,10') ()
+{1,2,3,4,5,6,7,8,9,10}
+

+ +

C is a function which compiles a list comprehension string into a function. +In this case, the function has no arguments. The parentheses are redundant for a +function taking a string argument, so this works as well:

+ + +

+> = C 'x^2 for x=1,4' ()
+{1,4,9,16}
+> = C '{x,x^2} for x=1,4' ()
+{{1,1},{2,4},{3,9},{4,16}}
+

+ +

Note that the expression can be any function of the variable x!

+ +

The basic syntax so far is <expr> for <set>, where <set> can be anything that +the Lua for statement understands. <set> can also just be the variable, in +which case the values will come from the argument of the comprehension. Here +I'm emphasizing that a comprehension is a function which can take a list argument:

+ + +

+> = C '2*x for x' {1,2,3}
+{2,4,6}
+> dbl = C '2*x for x'
+> = dbl {10,20,30}
+{20,40,60}
+

+ +

Here is a somewhat more explicit way of saying the same thing; _1 is a +placeholder refering to the first argument passed to the comprehension.

+ + +

+> = C '2*x for _,x in pairs(_1)' {10,20,30}
+{20,40,60}
+> = C '_1(x) for x'(tostring,{1,2,3,4})
+{'1','2','3','4'}
+

+ +

This extended syntax is useful when you wish to collect the result of some +iterator, such as io.lines. This comprehension creates a function which creates +a table of all the lines in a file:

+ + +

+> f = io.open('array.lua')
+> lines = C 'line for line in _1:lines()' (f)
+> = #lines
+118
+

+ +

There are a number of functions that may be applied to the result of a +comprehension:

+ + +

+> = C 'min(x for x)' {1,44,0}
+0
+> = C 'max(x for x)' {1,44,0}
+44
+> = C 'sum(x for x)' {1,44,0}
+45
+

+ +

(These are equivalent to a reduce operation on a list.)

+ +

After the for part, there may be a condition, which filters the output. This +comprehension collects the even numbers from a list:

+ + +

+> = C 'x for x if x % 2 == 0' {1,2,3,4,5}
+{2,4}
+

+ +

There may be a number of for parts:

+ + +

+> = C '{x,y} for x = 1,2 for y = 1,2' ()
+{{1,1},{1,2},{2,1},{2,2}}
+> = C '{x,y} for x for y' ({1,2},{10,20})
+{{1,10},{1,20},{2,10},{2,20}}
+

+ +

These comprehensions are useful when dealing with functions of more than one +variable, and are not so easily achieved with the other Penlight functional forms.

+ +

Creating Functions from Functions

+ + +

Lua functions may be treated like any other value, although of course you cannot +multiply or add them. One operation that makes sense is function composition, +which chains function calls (so (f * g)(x) is f(g(x)).)

+ + +

+> func = require 'pl.func'
+> printf = func.compose(io.write,string.format)
+> printf("hello %s\n",'world')
+hello world
+true
+

+ +

Many functions require you to pass a function as an argument, say to apply to all +values of a sequence or as a callback. Often useful functions have the wrong +number of arguments. So there is a need to construct a function of one argument +from one of two arguments, binding the extra argument to a given value.

+ +

partial application takes a function of n arguments and returns a function of n-1 +arguments where the first argument is bound to some value:

+ + +

+> p2 = func.bind1(print,'start>')
+> p2('hello',2)
+start>  hello   2
+> ops = require 'pl.operator'
+> = tablex.filter({1,-2,10,-1,2},bind1(ops.gt,0))
+{-2,-1}
+> tablex.filter({1,-2,10,-1,2},bind1(ops.le,0))
+{1,10,2}
+

+ +

The last example unfortunately reads backwards, because bind1 alway binds the +first argument! Also unfortunately, in my youth I confused 'currying' with +'partial application', so the old name for bind1 is curry - this alias still exists.

+ +

This is a specialized form of function argument binding. Here is another way +to say the print example:

+ + +

+> p2 = func.bind(print,'start>',func._1,func._2)
+> p2('hello',2)
+start>  hello   2
+

+ +

where _1 and _2 are placeholder variables, corresponding to the first and +second argument respectively.

+ +

Having func all over the place is distracting, so it's useful to pull all of +pl.func into the local context. Here is the filter example, this time the right +way around:

+ + +

+> utils.import 'pl.func'
+> tablex.filter({1,-2,10,-1,2},bind(ops.gt, _1, 0))
+{1,10,2}
+

+ +

tablex.merge does a general merge of two tables. This example shows the +usefulness of binding the last argument of a function.

+ + +

+> S1 = {john=27, jane=31, mary=24}
+> S2 = {jane=31, jones=50}
+> intersection = bind(tablex.merge, _1, _2, false)
+> union = bind(tablex.merge, _1, _2, true)
+> = intersection(S1,S2)
+{jane=31}
+> = union(S1,S2)
+{mary=24,jane=31,john=27,jones=50}
+

+ +

When using bind with print, we got a function of precisely two arguments, +whereas we really want our function to use varargs like print. This is the role +of _0:

+ + +

+> _DEBUG = true
+> p = bind(print,'start>', _0)
+return function (fn,_v1)
+    return function(...) return fn(_v1,...) end
+end
+
+> p(1,2,3,4,5)
+start>  1       2       3       4       5
+

+ +

I've turned on the global _DEBUG flag, so that the function generated is +printed out. It is actually a function which generates the required function; +the first call binds the value of _v1 to 'start>'.

+ +

Placeholder Expressions

+ +

A common pattern in Penlight is a function which applies another function to all +elements in a table or a sequence, such as tablex.map or seq.filter. Lua does +anonymous functions well, although they can be a bit tedious to type:

+ + +

+> = tablex.map(function(x) return x*x end, {1,2,3,4})
+{1,4,9,16}
+

+ +

pl.func allows you to define placeholder expressions, which can cut down on +the typing required, and also make your intent clearer. First, we bring contents +of pl.func into our context, and then supply an expression using placeholder +variables, such as _1,_2,etc. (C++ programmers will recognize this from the +Boost libraries.)

+ + +

+> utils.import 'pl.func'
+> = tablex.map(_1*_1, {1,2,3,4})
+{1,4,9,16}
+

+ +

Functions of up to 5 arguments can be generated.

+ + +

+> = tablex.map2(_1+_2,{1,2,3}, {10,20,30})
+{11,22,33}
+

+ +

These expressions can use arbitrary functions, altho they must first be +registered with the functional library. func.register brings in a single +function, and func.import brings in a whole table of functions, such as math.

+ + +

+> sin = register(math.sin)
+> = tablex.map(sin(_1), {1,2,3,4})
+{0.8414709848079,0.90929742682568,0.14112000805987,-0.75680249530793}
+> import 'math'
+> = tablex.map(cos(2*_1),{1,2,3,4})
+{-0.41614683654714,-0.65364362086361,0.96017028665037,-0.14550003380861}
+

+ +

A common operation is calling a method of a set of objects:

+ + +

+> = tablex.map(_1:sub(1,1), {'one','four','x'})
+{'o','f','x'}
+

+ +

There are some restrictions on what operators can be used in PEs. For instance, +because the __len metamethod cannot be overriden by plain Lua tables, we need +to define a special function to express `#_1':

+ + +

+> = tablex.map(Len(_1), {'one','four','x'})
+{3,4,1}
+

+ +

Likewise for comparison operators, which cannot be overloaded for different +types, and thus also have to be expressed as a special function:

+ + +

+> = tablex.filter(Gt(_1,0), {1,-1,2,4,-3})
+{1,2,4}
+

+ +

It is useful to express the fact that a function returns multiple values. For +instance, tablex.pairmap expects a function that will be called with the key +and the value, and returns the new value and the key, in that order.

+ + +

+> = pairmap(Args(_2,_1:upper()),{fred=1,alice=2})
+{ALICE=2,FRED=1}
+

+ +

PEs cannot contain nil values, since PE function arguments are represented as +an array. Instead, a special value called Nil is provided. So say +_1:f(Nil,1) instead of _1:f(nil,1).

+ +

A placeholder expression cannot be automatically used as a Lua function. The +technical reason is that the call operator must be overloaded to construct +function calls like _1(1). If you want to force a PE to return a function, use +func.I.

+ + +

+> = tablex.map(_1(10),{I(2*_1),I(_1*_1),I(_1+2)})
+{20,100,12}
+

+ +

Here we make a table of functions taking a single argument, and then call them +all with a value of 10.

+ +

The essential idea with PEs is to 'quote' an expression so that it is not +immediately evaluated, but instead turned into a function that can be applied +later to some arguments. The basic mechanism is to wrap values and placeholders +so that the usual Lua operators have the effect of building up an expression +tree. (It turns out that you can do symbolic algebra using PEs, see +symbols.lua in the examples directory, and its test runner testsym.lua, which +demonstrates symbolic differentiation.)

+ +

The rule is that if any operator has a PE operand, the result will be quoted. +Sometimes we need to quote things explicitly. For instance, say we want to pass a +function to a filter that must return true if the element value is in a set. +set[_1] is the obvious expression, but it does not give the desired result, +since it evaluates directly, giving nil. Indexing works differently than a +binary operation like addition (set+_1 is properly quoted) so there is a need +for an explicit quoting or wrapping operation. This is the job of the _ +function; the PE in this case should be _(set)[_1]. This works for functions +as well, as a convenient alternative to registering functions: _(math.sin)(_1). +This is equivalent to using the `lines' method:

+ + +

+for line in I(_(f):read()) do print(line) end
+

+ +

Now this will work for any 'file-like' object which which has a read method +returning the next line. If you had a LuaSocket client which was being 'pushed' +by lines sent from a server, then _(s):receive '*l' would create an iterator +for accepting input. These forms can be convenient for adapting your data flow so +that it can be passed to the sequence functions in `pl.seq'.

+ +

Placeholder expressions can be mixed with sequence wrapper expressions. +lexer.lua will give us a double-valued sequence of tokens, where the first +value is a type, and the second is a value. We filter out only the values where +the type is 'iden', extract the actual value using map, get the unique values +and finally copy to a list.

+ + +

+> str = 'for i=1,10 do for j = 1,10 do print(i,j) end end'
+> = seq(lexer.lua(str)):filter('==','iden'):map(_2):unique():copy()
+{i,print,j}
+

+ +

This is a particularly intense line (and I don't always suggest making everything +a one-liner!); the key is the behaviour of map, which will take both values of +the sequence, so _2 returns the value part. (Since filter here takes extra +arguments, it only operates on the type values.)

+ +

There are some performance considerations to using placeholder expressions. +Instantiating a PE requires constructing and compiling a function, which is not +such a fast operation. So to get best performance, factor out PEs from loops like +this;

+ + +

+local fn = I(_1:f() + _2:g())
+for i = 1,n do
+    res[i] = tablex.map2(fn,first[i],second[i])
+end
+

+ + + +

+generated by LDoc 1.4.6 +

+ + diff --git a/Data/Libraries/Penlight/docs/manual/08-additional.md.html b/Data/Libraries/Penlight/docs/manual/08-additional.md.html new file mode 100644 index 0000000..d13ac6e --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/08-additional.md.html @@ -0,0 +1,815 @@ + + + + + Penlight Documentation + + + + +

+ +

+ + +

+ + + + + + +

+ + +

Additional Libraries

+ +

Libraries in this section are no longer considered to be part of the Penlight +core, but still provide specialized functionality when needed.

+ +

Simple Input Patterns

+ +

Lua string pattern matching is very powerful, and usually you will not need a +traditional regular expression library. Even so, sometimes Lua code ends up +looking like Perl, which happens because string patterns are not always the +easiest things to read, especially for the casual reader. Here is a program +which needs to understand three distinct date formats:

+ + +

+-- parsing dates using Lua string patterns
+months={Jan=1,Feb=2,Mar=3,Apr=4,May=5,Jun=6,
+Jul=7,Aug=8,Sep=9,Oct=10,Nov=11,Dec=12}
+
+function check_and_process(d,m,y)
+    d = tonumber(d)
+    m = tonumber(m)
+    y = tonumber(y)
+    ....
+end
+
+for line in f:lines() do
+    -- ordinary (English) date format
+    local d,m,y = line:match('(%d+)/(%d+)/(%d+)')
+    if d then
+        check_and_process(d,m,y)
+    else -- ISO date??
+        y,m,d = line:match('(%d+)%-(%d+)%-(%d+)')
+        if y then
+            check_and_process(d,m,y)
+        else -- <day> <month-name> <year>?
+            d,mm,y = line:match('%(d+)%s+(%a+)%s+(%d+)')
+            m = months[mm]
+            check_and_process(d,m,y)
+        end
+    end
+end
+

+ +

These aren't particularly difficult patterns, but already typical issues are +appearing, such as having to escape '-'. Also, string.match returns its +captures, so that we're forced to use a slightly awkward nested if-statement.

+ +

Verification issues will further cloud the picture, since regular expression +people try to enforce constraints (like year cannot be more than four digits) +using regular expressions, on the usual grounds that you shouldn't stop using a +hammer when you are enjoying yourself.

+ +

pl.sip provides a simple, intuitive way to detect patterns in strings and +extract relevant parts.

+ + +

+> sip = require 'pl.sip'
+> dump = require('pl.pretty').dump
+> res = {}
+> c = sip.compile 'ref=$S{file}:$d{line}'
+> = c('ref=hello.c:10',res)
+true
+> dump(res)
+{
+  line = 10,
+  file = "hello.c"
+}
+> = c('ref=long name, no line',res)
+false
+

+ +

sip.compile creates a pattern matcher function, which takes a string and a +table as arguments. If the string matches the pattern, then true is returned +and the table is populated according to the captures within the pattern.

+ +

Here is another version of the date parser:

+ + +

+-- using SIP patterns
+function check(t)
+    check_and_process(t.day,t.month,t.year)
+end
+
+shortdate = sip.compile('$d{day}/$d{month}/$d{year}')
+longdate = sip.compile('$d{day} $v{mon} $d{year}')
+isodate = sip.compile('$d{year}-$d{month}-$d{day}')
+
+for line in f:lines() do
+    local res = {}
+    if shortdate(str,res) then
+        check(res)
+    elseif isodate(str,res) then
+        check(res)
+    elseif longdate(str,res) then
+        res.month = months[res.mon]
+        check(res)
+    end
+end
+

+ +

SIP captures start with '$', then a one-character type, and then an +optional variable name in curly braces.

+ + +

+Type      Meaning
+v         identifier
+i         possibly signed integer
+f         floating-point number
+r         rest of line
+q         quoted string (quoted using either ' or ")
+p         a path name
+(         anything inside balanced parentheses
+[         anything inside balanced brackets
+{         anything inside balanced curly brackets
+<         anything inside balanced angle brackets
+

+ +

If a type is not one of the above, then it's assumed to be one of the standard +Lua character classes, and will match one or more repetitions of that class. +Any spaces you leave in your pattern will match any number of spaces, including +zero, unless the spaces are between two identifier characters or patterns +matching them; in that case, at least one space will be matched.

+ +

SIP captures (like $v{mon}) do not have to be named. You can use just $v, but +you have to be consistent; if a pattern contains unnamed captures, then all +captures must be unnamed. In this case, the result table is a simple list of +values.

+ +

sip.match is a useful shortcut if you want to compile and match in one call, +without saving the compiled pattern. It caches the result, so it is not much +slower than explicitly using sip.compile.

+ + +

+> sip.match('($q{first},$q{second})','("john","smith")',res)
+true
+> res
+{second='smith',first='john'}
+> res = {}
+> sip.match('($q,$q)','("jan","smit")',res)  -- unnamed captures
+true
+> res
+{'jan','smit'}
+> sip.match('($q,$q)','("jan", "smit")',res)
+false   ---> oops! Can't handle extra space!
+> sip.match('( $q , $q )','("jan", "smit")',res)
+true
+

+ +

As a general rule, allow for whitespace in your patterns.

+ +

Finally, putting a '$' at the end of a pattern means 'capture the rest of the +line, starting at the first non-space'. It is a shortcut for '$r{rest}', +or just '$r' if no named captures are used.

+ + +

+> sip.match('( $q , $q ) $','("jan", "smit") and a string',res)
+true
+> res
+{'jan','smit','and a string'}
+> res = {}
+> sip.match('( $q{first} , $q{last} ) $','("jan", "smit") and a string',res)
+true
+> res
+{first='jan',rest='and a string',last='smit'}
+

+ +

Command-line Programs with Lapp

+ +

pl.lapp is a small and focused Lua module which aims to make standard +command-line parsing easier and intuitive. It implements the standard GNU style, +i.e. short flags with one letter start with '-', and there may be an additional +long flag which starts with '--'. Generally options which take an argument expect +to find it as the next parameter (e.g. 'gcc test.c -o test') but single short +options taking a value can dispense with the space (e.g. 'head -n4 +test.c' or gcc -I/usr/include/lua/5.1 ...)

+ +

As far as possible, Lapp will convert parameters into their equivalent Lua types, +i.e. convert numbers and convert filenames into file objects. If any conversion +fails, or a required parameter is missing, an error will be issued and the usage +text will be written out. So there are two necessary tasks, supplying the flag +and option names and associating them with a type.

+ +

For any non-trivial script, even for personal consumption, it's necessary to +supply usage text. The novelty of Lapp is that it starts from that point and +defines a loose format for usage strings which can specify the names and types of +the parameters.

+ +

An example will make this clearer:

+ + +

+-- scale.lua
+  lapp = require 'pl.lapp'
+  local args = lapp [[
+  Does some calculations
+    -o,--offset (default 0.0)  Offset to add to scaled number
+    -s,--scale  (number)  Scaling factor
+    <number> (number)  Number to be scaled
+  ]]
+
+  print(args.offset + args.scale * args.number)
+

+ +

Here is a command-line session using this script:

+ + +

+$ lua scale.lua
+scale.lua:missing required parameter: scale
+
+Does some calculations
+ -o,--offset (default 0.0)  Offset to add to scaled number
+ -s,--scale  (number)  Scaling factor
+  <number> (number )  Number to be scaled
+
+$ lua scale.lua -s 2.2 10
+22
+
+$ lua scale.lua -s 2.2 x10
+scale.lua:unable to convert to number: x10
+
+....(usage as before)
+

+ +

There are two kinds of lines in Lapp usage strings which are meaningful; option +and parameter lines. An option line gives the short option, optionally followed +by the corresponding long option. A type specifier in parentheses may follow. +Similarly, a parameter line starts with '', followed by a type +specifier.

+ +

Type specifiers usually start with a type name: one of 'boolean', 'string','number','file-in' or +'file-out'. You may leave this out, but then must say 'default' followed by a value. +If a flag or parameter has a default, it is not required and is set to the default. The actual +type is deduced from this value (number, string, file or boolean) if not provided directly. +'Deduce' is a fancy word for 'guess' and it can be wrong, e.g '(default 1)' +will always be a number. You can say '(string default 1)' to override the guess. +There are file values for the predefined console streams: stdin, stdout, stderr.

+ +

The boolean type is the default for flags. Not providing the type specifier is equivalent to +'(boolean default false)`. If the flag is meant to be 'turned off' then either the full +'(boolean default true)` or the shortcut '(default true)' will work.

+ +

An alternative to default is optional:

+ + +

+local lapp = require 'pl.lapp'
+local args = lapp [[
+   --cmd (optional string) Command to run.
+]]
+
+if args.cmd then
+  os.execute(args.cmd)
+end
+

+ +

Here we're implying that cmd need not be specified (just as with default) but if not +present, then args.cmd is nil, which will always test false.

+ +

The rest of the line is ignored and can be used for explanatory text.

+ +

This script shows the relation between the specified parameter names and the +fields in the output table.

+ + +

+-- simple.lua
+local args = require ('pl.lapp') [[
+Various flags and option types
+  -p          A simple optional flag, defaults to false
+  -q,--quiet  A simple flag with long name
+  -o  (string)  A required option with argument
+  -s  (default 'save') Optional string with default 'save' (single quotes ignored)
+  -n  (default 1) Optional numerical flag with default 1
+  -b  (string default 1)  Optional string flag with default '1' (type explicit)
+  <input> (default stdin)  Optional input file parameter, reads from stdin
+]]
+
+for k,v in pairs(args) do
+    print(k,v)
+end
+

+ +

I've just dumped out all values of the args table; note that args.quiet has +become true, because it's specified; args.p defaults to false. If there is a long +name for an option, that will be used in preference as a field name. A type or +default specifier is not necessary for simple flags, since the default type is +boolean.

+ + +

+$ simple -o test -q simple.lua
+p       false
+input   file (781C1BD8)
+quiet   true
+o       test
+input_name      simple.lua
+D:\dev\lua\lapp>simple -o test simple.lua one two three
+1       one
+2       two
+3       three
+p       false
+quiet   false
+input   file (781C1BD8)
+o       test
+input_name      simple.lua
+

+ +

The parameter input has been set to an open read-only file object - we know it +must be a read-only file since that is the type of the default value. The field +input_name is automatically generated, since it's often useful to have access to +the original filename.

+ +

Notice that any extra parameters supplied will be put in the result table with +integer indices, i.e. args[i] where i goes from 1 to #args.

+ +

Files don't really have to be closed explicitly for short scripts with a quick +well-defined mission, since the result of garbage-collecting file objects is to +close them.

+ +

Enforcing a Range and Enumerations

+ +

The type specifier can also be of the form '(' MIN '..' MAX ')' or a set of strings +separated by '|'.

+ + +

+local lapp = require 'pl.lapp'
+local args = lapp [[
+    Setting ranges
+    <x> (1..10)  A number from 1 to 10
+    <y> (-5..1e6) Bigger range
+    <z> (slow|medium|fast)
+]]
+
+print(args.x,args.y)
+

+ +

Here the meaning of ranges is that the value is greater or equal to MIN and less or equal +to MAX. +An 'enum' is a string that can only have values from a specified set.

+ +

Custom Types

+ +

There is no builti-in way to force a parameter to be a whole number, but +you may define a custom type that does this:

+ + +

+lapp = require ('pl.lapp')
+
+lapp.add_type('integer','number',
+    function(x)
+        lapp.assert(math.ceil(x) == x, 'not an integer!')
+    end
+)
+
+local args =  lapp [[
+    <ival> (integer) Process PID
+]]
+
+print(args.ival)
+

+ +

lapp.add_type takes three parameters, a type name, a converter and a constraint +function. The constraint function is expected to throw an assertion if some +condition is not true; we use lapp.assert because it fails in the standard way +for a command-line script. The converter argument can either be a type name known +to Lapp, or a function which takes a string and generates a value.

+ +

Here's a useful custom type that allows dates to be input as pl.Date values:

+ + +

+local df = Date.Format()
+
+lapp.add_type('date',
+    function(s)
+        local d,e = df:parse(s)
+        lapp.assert(d,e)
+        return d
+    end
+)
+

+ +

'varargs' Parameter Arrays

+ + +

+lapp = require 'pl.lapp'
+local args = lapp [[
+Summing numbers
+    <numbers...> (number) A list of numbers to be summed
+]]
+
+local sum = 0
+for i,x in ipairs(args.numbers) do
+    sum = sum + x
+end
+print ('sum is '..sum)
+

+ +

The parameter number has a trailing '...', which indicates that this parameter is +a 'varargs' parameter. It must be the last parameter, and args.number will be an +array.

+ +

Consider this implementation of the head utility from Mac OS X:

+ + +

+-- implements a BSD-style head
+-- (see http://www.manpagez.com/man/1/head/osx-10.3.php)
+
+lapp = require ('pl.lapp')
+
+local args = lapp [[
+Print the first few lines of specified files
+   -n         (default 10)    Number of lines to print
+   <files...> (default stdin) Files to print
+]]
+
+-- by default, lapp converts file arguments to an actual Lua file object.
+-- But the actual filename is always available as <file>_name.
+-- In this case, 'files' is a varargs array, so that 'files_name' is
+-- also an array.
+local nline = args.n
+local nfile = #args.files
+for i = 1,nfile do
+    local file = args.files[i]
+    if nfile > 1 then
+        print('==> '..args.files_name[i]..' <==')
+    end
+    local n = 0
+    for line in file:lines() do
+        print(line)
+        n = n + 1
+        if n == nline then break end
+    end
+end
+

+ +

Note how we have access to all the filenames, because the auto-generated field +files_name is also an array!

+ +

(This is probably not a very considerate script, since Lapp will open all the +files provided, and only close them at the end of the script. See the xhead.lua +example for another implementation.)

+ +

Flags and options may also be declared as vararg arrays, and can occur anywhere. +If there is both a short and long form, then the trailing "..." must happen after the long form, +for example "-x,--network... (string)...",

+ +

Bear in mind that short options can be combined (like 'tar -xzf'), so it's +perfectly legal to have '-vvv'. But normally the value of args.v is just a simple +true value.

+ + +

+local args = require ('pl.lapp') [[
+   -v...  Verbosity level; can be -v, -vv or -vvv
+]]
+vlevel = not args.v[1] and 0 or #args.v
+print(vlevel)
+

+ +

The vlevel assigment is a bit of Lua voodoo, so consider the cases:

+ + +

+* No -v flag, v is just { false }
+* One -v flags, v is { true }
+* Two -v flags, v is { true, true }
+* Three -v flags, v is { true, true, true }
+

+ +

Defining a Parameter Callback

+ +

If a script implements lapp.callback, then Lapp will call it after each +argument is parsed. The callback is passed the parameter name, the raw unparsed +value, and the result table. It is called immediately after assignment of the +value, so the corresponding field is available.

+ + +

+lapp = require ('pl.lapp')
+
+function lapp.callback(parm,arg,args)
+    print('+',parm,arg)
+end
+
+local args = lapp [[
+Testing parameter handling
+    -p               Plain flag (defaults to false)
+    -q,--quiet       Plain flag with GNU-style optional long name
+    -o  (string)     Required string option
+    -n  (number)     Required number option
+    -s (default 1.0) Option that takes a number, but will default
+    <start> (number) Required number argument
+    <input> (default stdin)  A parameter which is an input file
+    <output> (default stdout) One that is an output file
+]]
+print 'args'
+for k,v in pairs(args) do
+    print(k,v)
+end
+

+ +

This produces the following output:

+ + +

+$ args -o name -n 2 10 args.lua
++       o       name
++       n       2
++       start   10
++       input   args.lua
+args
+p       false
+s       1
+input_name      args.lua
+quiet   false
+output  file (781C1B98)
+start   10
+input   file (781C1BD8)
+o       name
+n       2
+

+ +

Callbacks are needed when you want to take action immediately on parsing an +argument.

+ +

Slack Mode

+ +

If you'd like to use a multi-letter 'short' parameter you need to set +the lapp.slack variable to true.

+ +

In the following example we also see how default false and default true flags can be used +and how to overwrite the default -h help flag (--help still works fine) - this applies +to non-slack mode as well.

+ + +

+-- Parsing the command line ----------------------------------------------------
+-- test.lua
+local lapp = require 'pl.lapp'
+local pretty = require 'pl.pretty'
+lapp.slack = true
+local args = lapp [[
+Does some calculations
+   -v, --video              (string)             Specify input video
+   -w, --width              (default 256)        Width of the video
+   -h, --height             (default 144)        Height of the video
+   -t, --time               (default 10)         Seconds of video to process
+   -sk,--seek               (default 0)          Seek number of seconds
+   -f1,--flag1                                   A false flag
+   -f2,--flag2                                   A false flag
+   -f3,--flag3              (default true)       A true flag
+   -f4,--flag4              (default true)       A true flag
+]]
+
+pretty.dump(args)
+

+ +

And here we can see the output of test.lua:

+ + +

+$> lua test.lua -v abc --time 40 -h 20 -sk 15 --flag1 -f3
+---->
+{
+  width = 256,
+  flag1 = true,
+  flag3 = false,
+  seek = 15,
+  flag2 = false,
+  video = abc,
+  time = 40,
+  height = 20,
+  flag4 = true
+}
+

+ +

Simple Test Framework

+ +

pl.test was originally developed for the sole purpose of testing Penlight itself, +but you may find it useful for your own applications. (There are many other options.)

+ +

Most of the goodness is in test.asserteq. It uses tablex.deepcompare on its two arguments, +and by default quits the test application with a non-zero exit code, and an informative +message printed to stderr:

+ + +

+local test = require 'pl.test'
+
+test.asserteq({10,20,30},{10,20,30.1})
+
+--~ test-test.lua:3: assertion failed
+--~ got:    {
+--~  [1] = 10,
+--~  [2] = 20,
+--~  [3] = 30
+--~ }
+--~ needed:    {
+--~  [1] = 10,
+--~  [2] = 20,
+--~  [3] = 30.1
+--~ }
+--~ these values were not equal
+

+ +

This covers most cases but it's also useful to compare strings using string.match

+ + +

+-- must start with bonzo the dog
+test.assertmatch ('bonzo the dog is here','^bonzo the dog')
+-- must end with an integer
+test.assertmatch ('hello 42','%d+$')
+

+ +

Since Lua errors are usually strings, this matching strategy is used to test 'exceptions':

+ + +

+test.assertraise(function()
+    local t = nil
+    print(t.bonzo)
+end,'nil value')
+

+ +

(Some care is needed to match the essential part of the thrown error if you care +for portability, since in Lua 5.2 +the exact error is "attempt to index local 't' (a nil value)" and in Lua 5.3 the error +is "attempt to index a nil value (local 't')")

+ +

There is an extra optional argument to these test functions, which is helpful when writing +test helper functions. There you want to highlight the failed line, not the actual call +to asserteq or assertmatch - line 33 here is the call to is_iden

+ + +

+function is_iden(str)
+    test.assertmatch(str,'^[%a_][%w_]*$',1)
+end
+
+is_iden 'alpha_dog'
+is_iden '$dollars'
+
+--~ test-test.lua:33: assertion failed
+--~ got:    "$dollars"
+--~ needed:    "^[%a_][%w_]*$"
+--~ these strings did not match
+

+ +

Useful Lua functions often return multiple values, and test.tuple is a convenient way to +capture these values, whether they contain nils or not.

+ + +

+T = test.tuple
+
+--- common error pattern
+function failing()
+    return nil,'failed'
+end
+
+test.asserteq(T(failing()),T(nil,'failed'))
+

+ + + +

+generated by LDoc 1.4.6 +

+ + diff --git a/Data/Libraries/Penlight/docs/manual/09-discussion.md.html b/Data/Libraries/Penlight/docs/manual/09-discussion.md.html new file mode 100644 index 0000000..4e7dd69 --- /dev/null +++ b/Data/Libraries/Penlight/docs/manual/09-discussion.md.html @@ -0,0 +1,233 @@ + + + + + Penlight Documentation + + + + +

+ +

+ + +

+ + + + + + +

+ + +

Technical Choices

+ +

Modularity and Granularity

+ +

In an ideal world, a program should only load the libraries it needs. Penlight is +intended to work in situations where an extra 100Kb of bytecode could be a +problem. It is straightforward but tedious to load exactly what you need:

+ + +

+local data = require 'pl.data'
+local List = require 'pl.List'
+local array2d = require 'pl.array2d'
+local seq = require 'pl.seq'
+local utils = require 'pl.utils'
+

+ +

This is the style that I follow in Penlight itself, so that modules don't mess +with the global environment; also, stringx.import() is not used because it will +update the global string table.

+ +

But require 'pl' is more convenient in scripts; the question is how to ensure +that one doesn't load the whole kitchen sink as the price of convenience. The +strategy is to only load modules when they are referenced. In 'init.lua' (which +is loaded by require 'pl') a metatable is attached to the global table with an +__index metamethod. Any unknown name is looked up in the list of modules, and +if found, we require it and make that module globally available. So when +tablex.deepcompare is encountered, looking up tablex causes 'pl.tablex' to be +required. .

+ +

Modifying the behaviour of the global table has consequences. For instance, there +is the famous module strict which comes with Lua itself (perhaps the only +standard Lua module written in Lua itself) which also does this modification so +that global variiables must be defined before use. So the implementation in +'init.lua' allows for a 'not found' hook, which 'pl.strict.lua' uses. Other +libraries may install their own metatables for _G, but Penlight will now +forward any unknown name to the __index defined by the original metatable.

+ +

But the strategy is worth the effort: the old 'kitchen sink' 'init.lua' would +pull in about 260K of bytecode, whereas now typical programs use about 100K less, +and short scripts even better - for instance, if they were only needing +functionality in utils.

+ +

There are some functions which mark their output table with a special metatable, +when it seems particularly appropriate. For instance, tablex.makeset creates a +Set, and seq.copy creates a List. But this does not automatically result in +the loading of pl.Set and pl.List; only if you try to access any of these +methods. In 'utils.lua', there is an exported table called stdmt:

+ + +

+stdmt = { List = {}, Map = {}, Set = {}, MultiMap = {} }
+

+ +

If you go through 'init.lua', then these plain little 'identity' tables get an +__index metamethod which forces the loading of the full functionality. Here is +the code from 'list.lua' which starts the ball rolling for lists:

+ + +

+List = utils.stdmt.List
+List.__index = List
+List._name = "List"
+List._class = List
+

+ +

The 'load-on-demand' strategy helps to modularize the library. Especially for +more casual use, require 'pl' is a good compromise between convenience and +modularity.

+ +

In this current version, I have generally reduced the amount of trickery +involved. Previously, Map was defined in pl.class; now it is sensibly defined +in pl.Map; pl.class only contains the basic class mechanism (and returns that +function.) For consistency, List is returned directly by require 'pl.List' +(note the uppercase 'L'), Also, the amount of module dependencies in the +non-core libraries like pl.config have been reduced.

+ +

Defining what is Callable

+ +

'utils.lua' exports function_arg which is used extensively throughout Penlight. +It defines what is meant by 'callable'. Obviously true functions are immediately +passed back. But what about strings? The first option is that it represents an +operator in 'operator.lua', so that '<' is just an alias for operator.lt.

+ +

We then check whether there is a function factory defined for the metatable of +the value.

+ +

(It is true that strings can be made callable, but in practice this turns out to +be a cute but dubious idea, since all strings share the same metatable. A +common programming error is to pass the wrong kind of object to a function, and +it's better to get a nice clean 'attempting to call a string' message rather than +some obscure trace from the bowels of your library.)

+ +

The other module that registers a function factory is pl.func. Placeholder +expressions cannot be directly calleable, and so need to be instantiated and +cached in as efficient way as possible.

+ +

(An inconsistency is that utils.is_callable does not do this thorough check.)

+ + + + +

+generated by LDoc 1.4.6 +

+ + -- cgit v1.1-26-g67d0

Penlight

Contents

Manual

Libraries

Classes

Examples

Introduction

Purpose

To Inject or not to Inject?

What are function arguments in Penlight?

Pros and Cons of Loopless Programming

Generally useful functions.

Application Support

Simplifying Object-Oriented Programming in Lua

Penlight

Contents

Manual

Libraries

Classes

Examples

Tables and Arrays

Python-style Lists

Map and Set classes

Useful Operations on Tables

Operations on two-dimensional tables

Penlight

Contents

Manual

Libraries

Classes

Examples

Strings. Higher-level operations on strings.

Extra String Methods

String Templates

Another Style of Template

File-style I/O on Strings

Penlight

Contents

Manual

Libraries

Classes

Examples

Paths and Directories

Working with Paths

File Operations

Directory Operations

Penlight

Contents

Manual

Libraries

Classes

Examples

Date and Time

Creating and Displaying Dates

Penlight

Contents

Manual

Libraries

Classes

Examples

Data

Reading Data Files

Reading Unstructured Text Data

Reading Columnar Data

Reading Configuration Files

Lexical Scanning

XML

Parsing and Pretty-Printing

Parsing and Working with Configuration Files

Generating XML with 'xmlification'

Generating XML using Templates

Extracting Data using Templates

HTML Parsing

Penlight

Contents

Manual

Libraries

Classes

Examples

Functional Programming