summaryrefslogtreecommitdiff
path: root/Data/Libraries/Penlight/docs_topics/01-introduction.md
diff options
context:
space:
mode:
authorchai <chaifix@163.com>2021-10-30 11:32:16 +0800
committerchai <chaifix@163.com>2021-10-30 11:32:16 +0800
commit42ec7286b2d36a9ba22925f816a17cb1cc2aa5ce (patch)
tree24bc7009457a8d7500f264e89946dc20d069294f /Data/Libraries/Penlight/docs_topics/01-introduction.md
parent164885fd98d48703bd771f802d79557b7db97431 (diff)
+ Penlight
Diffstat (limited to 'Data/Libraries/Penlight/docs_topics/01-introduction.md')
-rw-r--r--Data/Libraries/Penlight/docs_topics/01-introduction.md621
1 files changed, 621 insertions, 0 deletions
diff --git a/Data/Libraries/Penlight/docs_topics/01-introduction.md b/Data/Libraries/Penlight/docs_topics/01-introduction.md
new file mode 100644
index 0000000..a8bf26a
--- /dev/null
+++ b/Data/Libraries/Penlight/docs_topics/01-introduction.md
@@ -0,0 +1,621 @@
+## Introduction
+
+### Purpose
+
+It is often said of Lua that it does not include batteries. That is because the
+goal of Lua is to produce a lean expressive language that will be used on all
+sorts of machines, (some of which don't even have hierarchical filesystems). The
+Lua language is the equivalent of an operating system kernel; the creators of Lua
+do not see it as their responsibility to create a full software ecosystem around
+the language. That is the role of the community.
+
+A principle of software design is to recognize common patterns and reuse them. If
+you find yourself writing things like `io.write(string.format('the answer is %d
+',42))` more than a number of times then it becomes useful just to define a
+function `printf`. This is good, not just because repeated code is harder to
+maintain, but because such code is easier to read, once people understand your
+libraries.
+
+Penlight captures many such code patterns, so that the intent of your code
+becomes clearer. For instance, a Lua idiom to copy a table is `{unpack(t)}`, but
+this will only work for 'small' tables (for a given value of 'small') so it is
+not very robust. Also, the intent is not clear. So `tablex.deepcopy` is provided,
+which will also copy nested tables and and associated metatables, so it can be
+used to clone complex objects.
+
+The default error handling policy follows that of the Lua standard libraries: if
+a argument is the wrong type, then an error will be thrown, but otherwise we
+return `nil,message` if there is a problem. There are some exceptions; functions
+like `input.fields` default to shutting down the program immediately with a
+useful message. This is more appropriate behaviour for a _script_ than providing
+a stack trace. (However, this default can be changed.) The lexer functions always
+throw errors, to simplify coding, and so should be wrapped in `pcall`.
+
+If you are used to Python conventions, please note that all indices consistently
+start at 1.
+
+The Lua function `table.foreach` has been deprecated in favour of the `for in`
+statement, but such an operation becomes particularly useful with the
+higher-order function support in Penlight. Note that `tablex.foreach` reverses
+the order, so that the function is passed the value and then the key. Although
+perverse, this matches the intended use better.
+
+The only important external dependence of Penlight is
+[LuaFileSystem](http://keplerproject.github.com/luafilesystem/manual.html)
+(`lfs`), and if you want `dir.copyfile` to work cleanly on Windows, you will need
+either [alien](http://alien.luaforge.net/) or be using
+[LuaJIT](http://luajit.org) as well. (The fallback is to call the equivalent
+shell commands.)
+
+### To Inject or not to Inject?
+
+It was realized a long time ago that large programs needed a way to keep names
+distinct by putting them into tables (Lua), namespaces (C++) or modules
+(Python). It is obviously impossible to run a company where everyone is called
+'Bruce', except in Monty Python skits. These 'namespace clashes' are more of a
+problem in a simple language like Lua than in C++, because C++ does more
+complicated lookup over 'injected namespaces'. However, in a small group of
+friends, 'Bruce' is usually unique, so in particular situations it's useful to
+drop the formality and not use last names. It depends entirely on what kind of
+program you are writing, whether it is a ten line script or a ten thousand line
+program.
+
+So the Penlight library provides the formal way and the informal way, without
+imposing any preference. You can do it formally like:
+
+ local utils = require 'pl.utils'
+ utils.printf("%s\n","hello, world!")
+
+or informally like:
+
+ require 'pl'
+ utils.printf("%s\n","That feels better")
+
+`require 'pl'` makes all the separate Penlight modules available, without needing
+to require them each individually.
+
+Generally, the formal way is better when writing modules, since then there are no
+global side-effects and the dependencies of your module are made explicit.
+
+Andrew Starks has contributed another way, which balances nicely between the
+formal need to keep the global table uncluttered and the informal need for
+convenience. `require'pl.import_into'` returns a function, which accepts a table
+for injecting Penlight into, or if no table is given, it passes back a new one.
+
+ local pl = require'pl.import_into'()
+
+The table `pl` is a 'lazy table' which loads modules as needed, so we can then
+use `pl.utils.printf` and so forth, without an explicit `require' or harming any
+globals.
+
+If you are using `_ENV` with Lua 5.2 to define modules, then here is a way to
+make Penlight available within a module:
+
+ local _ENV,M = require 'pl.import_into' ()
+
+ function answer ()
+ -- all the Penlight modules are available!
+ return pretty.write(utils.split '10 20 30', '')
+ end
+
+ return M
+
+The default is to put Penlight into `\_ENV`, which has the unintended effect of
+making it available from the module (much as `module(...,package.seeall)` does).
+To satisfy both convenience and safety, you may pass `true` to this function, and
+then the _module_ `M` is not the same as `\_ENV`, but only contains the exported
+functions.
+
+Otherwise, Penlight will _not_ bring in functions into the global table, or
+clobber standard tables like 'io'. require('pl') will bring tables like
+'utils','tablex',etc into the global table _if they are used_. This
+'load-on-demand' strategy ensures that the whole kitchen sink is not loaded up
+front, so this method is as efficient as explicitly loading required modules.
+
+You have an option to bring the `pl.stringx` methods into the standard string
+table. All strings have a metatable that allows for automatic lookup in `string`,
+so we can say `s:upper()`. Importing `stringx` allows for its functions to also
+be called as methods: `s:strip()`,etc:
+
+ require 'pl'
+ stringx.import()
+
+or, more explicitly:
+
+ require('pl.stringx').import()
+
+A more delicate operation is importing tables into the local environment. This is
+convenient when the context makes the meaning of a name very clear:
+
+ > require 'pl'
+ > utils.import(math)
+ > = sin(1.2)
+ 0.93203908596723
+
+`utils.import` can also be passed a module name as a string, which is first
+required and then imported. If used in a module, `import` will bring the symbols
+into the module context.
+
+Keeping the global scope simple is very necessary with dynamic languages. Using
+global variables in a big program is always asking for trouble, especially since
+you do not have the spell-checking provided by a compiler. The `pl.strict`
+module enforces a simple rule: globals must be 'declared'. This means that they
+must be assigned before use; assigning to `nil` is sufficient.
+
+ > require 'pl.strict'
+ > print(x)
+ stdin:1: variable 'x' is not declared
+ > x = nil
+ > print(x)
+ nil
+
+The `strict` module provided by Penlight is compatible with the 'load-on-demand'
+scheme used by `require 'pl`.
+
+`strict` also disallows assignment to global variables, except in the main
+program. Generally, modules have no business messing with global scope; if you
+must do it, then use a call to `rawset`. Similarly, if you have to check for the
+existence of a global, use `rawget`.
+
+If you wish to enforce strictness globally, then just add `require 'pl.strict'`
+at the end of `pl/init.lua`, otherwise call it from your main program.
+
+As from 1.1.0, this module provides a `strict.module` function which creates (or
+modifies) modules so that accessing an unknown function or field causes an error.
+
+For example,
+
+ -- mymod.lua
+ local strict = require 'pl.strict'
+ local M = strict.module (...)
+
+ function M.answer ()
+ return 42
+ end
+
+ return M
+
+If you were to accidently type `mymod.Answer()`, then you would get a runtime
+error: "variable 'Answer' is not declared in 'mymod'".
+
+This can be applied to existing modules. You may desire to have the same level
+of checking for the Lua standard libraries:
+
+ strict.make_all_strict(_G)
+
+Thereafter a typo such as `math.cosine` will give you an explicit error, rather
+than merely returning a `nil` that will cause problems later.
+
+### What are function arguments in Penlight?
+
+Many functions in Penlight themselves take function arguments, like `map` which
+applies a function to a list, element by element. You can use existing
+functions, like `math.max`, anonymous functions (like `function(x,y) return x > y
+end` ), or operations by name (e.g '*' or '..'). The module `pl.operator` exports
+all the standard Lua operations, like the Python module of the same name.
+Penlight allows these to be referred to by name, so `operator.gt` can be more
+concisely expressed as '>'.
+
+Note that the `map` functions pass any extra arguments to the function, so we can
+have `ls:filter('>',0)`, which is a shortcut for
+`ls:filter(function(x) return x > 0 end)`.
+
+Finally, `pl.func` supports _placeholder expressions_ in the Boost lambda style,
+so that an anonymous function to multiply the two arguments can be expressed as
+`\_1*\_2`.
+
+To use them directly, note that _all_ function arguments in Penlight go through
+`utils.function_arg`. `pl.func` registers itself with this function, so that you
+can directly use placeholder expressions with standard methods:
+
+ > _1 = func._1
+ > = List{10,20,30}:map(_1+1)
+ {11,21,31}
+
+Another option for short anonymous functions is provided by
+`utils.string_lambda`; this is invoked automatically:
+
+ > = List{10,20,30}:map '|x| x + 1'
+ {11,21,31}
+
+### Pros and Cons of Loopless Programming
+
+The standard loops-and-ifs 'imperative' style of programming is dominant, and
+often seems to be the 'natural' way of telling a machine what to do. It is in
+fact very much how the machine does things, but we need to take a step back and
+find ways of expressing solutions in a higher-level way. For instance, applying
+a function to all elements of a list is a common operation:
+
+ local res = {}
+ for i = 1,#ls do
+ res[i] = fun(ls[i])
+ end
+
+This can be efficiently and succintly expressed as `ls:map(fun)`. Not only is
+there less typing but the intention of the code is clearer. If readers of your
+code spend too much time trying to guess your intention by analyzing your loops,
+then you have failed to express yourself clearly. Similarly, `ls:filter('>',0)`
+will give you all the values in a list greater than zero. (Of course, if you
+don't feel like using `List`, or have non-list-like tables, then `pl.tablex`
+offers the same facilities. In fact, the `List` methods are implemented using
+`tablex` functions.)
+
+A common observation is that loopless programming is less efficient, particularly
+in the way it uses memory. `ls1:map2('*',ls2):reduce '+'` will give you the dot
+product of two lists, but an unnecessary temporary list is created. But
+efficiency is relative to the actual situation, it may turn out to be _fast
+enough_, or may not appear in any crucial inner loops, etc.
+
+Writing loops is 'error-prone and tedious', as Stroustrup says. But any
+half-decent editor can be taught to do much of that typing for you. The question
+should actually be: is it tedious to _read_ loops? As with natural language,
+programmers tend to read chunks at a time. A for-loop causes no surprise, and
+probably little brain activity. One argument for loopless programming is the
+loops that you _do_ write stand out more, and signal 'something different
+happening here'. It should not be an all-or-nothing thing, since most programs
+require a mixture of idioms that suit the problem. Some languages (like APL) do
+nearly everything with map and reduce operations on arrays, and so solutions can
+sometimes seem forced. Wisdom is knowing when a particular idiom makes a
+particular problem easy to _solve_ and the solution easy to _explain_ afterwards.
+
+### Generally useful functions.
+
+The function `printf` discussed earlier is included in `pl.utils` because it
+makes properly formatted output easier. (There is an equivalent `fprintf` which
+also takes a file object parameter, just like the C function.)
+
+Splitting a string using a delimiter is a fairly common operation, hence `split`.
+
+Utility functions like `is_type` help with identifying what
+kind of animal you are dealing with.
+The Lua `type` function handles the basic types, but can't distinguish between
+different kinds of objects, which are all tables. So `is_type` handles both
+cases, like `is_type(s,"string")` and `is_type(ls,List)`.
+
+A common pattern when working with Lua varargs is capturing all the arguments in
+a table:
+
+ function t(...)
+ local args = {...}
+ ...
+ end
+
+But this will bite you someday when `nil` is one of the arguments, since this
+will put a 'hole' in your table. In particular, `#ls` will only give you the size
+upto the `nil` value. Hence the need for `table.pack` - this is a new Lua 5.2
+function which Penlight defines also for Lua 5.1.
+
+ function t(...)
+ local args,n = table.pack(...)
+ for i = 1,n do
+ ...
+ end
+ end
+
+The 'memoize' pattern occurs when you have a function which is expensive to call,
+but will always return the same value subsequently. `utils.memoize` is given a
+function, and returns another function. This calls the function the first time,
+saves the value for that argument, and thereafter for that argument returns the
+saved value. This is a more flexible alternative to building a table of values
+upfront, since in general you won't know what values are needed.
+
+ sum = utils.memoize(function(n)
+ local sum = 0
+ for i = 1,n do sum = sum + i end
+ return sum
+ end)
+ ...
+ s = sum(1e8) --takes time!
+ ...
+ s = sum(1e8) --returned saved value!
+
+Penlight is fully compatible with Lua 5.1, 5.2 and LuaJIT 2. To ensure this,
+`utils` also defines the global Lua 5.2
+[load](http://www.lua.org/work/doc/manual.html#pdf-load) function as `utils.load`
+
+ * the input (either a string or a function)
+ * the source name used in debug information
+ * the mode is a string that can have either or both of 'b' or 't', depending on
+whether the source is a binary chunk or text code (default is 'bt')
+ * the environment for the compiled chunk
+
+Using `utils.load` should reduce the need to call the deprecated function `setfenv`,
+and make your Lua 5.1 code 5.2-friendly.
+
+The `utils` module exports `getfenv` and `setfenv` for
+Lua 5.2 as well, based on code by Sergey Rozhenko. Note that these functions can fail
+for functions which don't access any globals.
+
+### Application Support
+
+`app.parse_args` is a simple command-line argument parser. If called without any
+arguments, it tries to use the global `arg` array. It returns the _flags_
+(options begining with '-') as a table of name/value pairs, and the _arguments_
+as an array. It knows about long GNU-style flag names, e.g. `--value`, and
+groups of short flags are understood, so that `-ab` is short for `-a -b`. The
+flags result would then look like `{value=true,a=true,b=true}`.
+
+Flags may take values. The command-line `--value=open -n10` would result in
+`{value='open',n='10'}`; generally you can use '=' or ':' to separate the flag
+from its value, except in the special case where a short flag is followed by an
+integer. Or you may specify upfront that some flags have associated values, and
+then the values will follow the flag.
+
+ > require 'pl'
+ > flags,args = app.parse_args({'-o','fred','-n10','fred.txt'},{o=true})
+ > pretty.dump(flags)
+ {o='fred',n='10'}
+
+`parse_args` is not intelligent or psychic; it will not convert any flag values
+or arguments for you, or raise errors. For that, have a look at
+@{08-additional.md.Command_line_Programs_with_Lapp|Lapp}.
+
+An application which consists of several files usually cannot use `require` to
+load files in the same directory as the main script. `app.require_here()`
+ensures that the Lua module path is modified so that files found locally are
+found first. In the `examples` directory, `test-symbols.lua` uses this function
+to ensure that it can find `symbols.lua` even if it is not run from this directory.
+
+`app.appfile` will create a filename that your application can use to store its
+private data, based on the script name. For example, `app.appfile "test.txt"`
+from a script called `testapp.lua` produces the following file on my Windows
+machine:
+
+ @plain
+ C:\Documents and Settings\SJDonova\.testapp\test.txt
+
+and the equivalent on my Linux machine:
+
+ @plain
+ /home/sdonovan/.testapp/test.txt
+
+If `.testapp` does not exist, it will be created.
+
+Penlight makes it convenient to save application data in Lua format. You can use
+`pretty.dump(t,file)` to write a Lua table in a human-readable form to a file,
+and `pretty.read(file.read(file))` to generate the table again, using the
+`pretty` module.
+
+
+### Simplifying Object-Oriented Programming in Lua
+
+Lua is similar to JavaScript in that the concept of class is not directly
+supported by the language. In fact, Lua has a very general mechanism for
+extending the behaviour of tables which makes it straightforward to implement
+classes. A table's behaviour is controlled by its metatable. If that metatable
+has a `\_\_index` function or table, this will handle looking up anything which is
+not found in the original table. A class is just a table with an `__index` key
+pointing to itself. Creating an object involves making a table and setting its
+metatable to the class; then when handling `obj.fun`, Lua first looks up `fun` in
+the table `obj`, and if not found it looks it up in the class. `obj:fun(a)` is
+just short for `obj.fun(obj,a)`. So with the metatable mechanism and this bit of
+syntactic sugar, it is straightforward to implement classic object orientation.
+
+ -- animal.lua
+
+ class = require 'pl.class'
+
+ class.Animal()
+
+ function Animal:_init(name)
+ self.name = name
+ end
+
+ function Animal:__tostring()
+ return self.name..': '..self:speak()
+ end
+
+ class.Dog(Animal)
+
+ function Dog:speak()
+ return 'bark'
+ end
+
+ class.Cat(Animal)
+
+ function Cat:_init(name,breed)
+ self:super(name) -- must init base!
+ self.breed = breed
+ end
+
+ function Cat:speak()
+ return 'meow'
+ end
+
+ class.Lion(Cat)
+
+ function Lion:speak()
+ return 'roar'
+ end
+
+ fido = Dog('Fido')
+ felix = Cat('Felix','Tabby')
+ leo = Lion('Leo','African')
+
+ $ lua -i animal.lua
+ > = fido,felix,leo
+ Fido: bark Felix: meow Leo: roar
+ > = leo:is_a(Animal)
+ true
+ > = leo:is_a(Dog)
+ false
+ > = leo:is_a(Cat)
+ true
+
+All Animal does is define `\_\_tostring`, which Lua will use whenever a string
+representation is needed of the object. In turn, this relies on `speak`, which is
+not defined. So it's what C++ people would call an abstract base class; the
+specific derived classes like Dog define `speak`. Please note that _if_ derived
+classes have their own constructors, they must explicitly call the base
+constructor for their base class; this is conveniently available as the `super`
+method.
+
+Note that (as always) there are multiple ways to implement OOP in Lua; this method
+uses the classic 'a class is the __index of its objects' but does 'fat inheritance';
+methods from the base class are copied into the new class. The advantage of this is
+that you are not penalized for long inheritance chains, for the price of larger classes,
+but generally objects outnumber classes! (If not, something odd is going on with your design.)
+
+All such objects will have a `is_a` method, which looks up the inheritance chain
+to find a match. Another form is `class_of`, which can be safely called on all
+objects, so instead of `leo:is_a(Animal)` one can say `Animal:class_of(leo)`.
+
+There are two ways to define a class, either `class.Name()` or `Name = class()`;
+both work identically, except that the first form will always put the class in
+the current environment (whether global or module); the second form provides more
+flexibility about where to store the class. The first form does _name_ the class
+by setting the `_name` field, which can be useful in identifying the objects of
+this type later. This session illustrates the usefulness of having named classes,
+if no `__tostring` method is explicitly defined.
+
+ > class.Fred()
+ > a = Fred()
+ > = a
+ Fred: 00459330
+ > Alice = class()
+ > b = Alice()
+ > = b
+ table: 00459AE8
+ > Alice._name = 'Alice'
+ > = b
+ Alice: 00459AE8
+
+So `Alice = class(); Alice._name = 'Alice'` is exactly the same as `class.Alice()`.
+
+This useful notation is borrowed from Hugo Etchegoyen's
+[classlib](http://lua-users.org/wiki/MultipleInheritanceClasses) which further
+extends this concept to allow for multiple inheritance. Notice that the
+more convenient form puts the class name in the _current environment_! That is,
+you may use it safely within modules using the old-fashioned `module()`
+or the new `_ENV` mechanism.
+
+There is always more than one way of doing things in Lua; some may prefer this
+style for creating classes:
+
+ local class = require 'pl.class'
+
+ class.Named {
+ _init = function(self,name)
+ self.name = name
+ end;
+
+ __tostring = function(self)
+ return 'boo '..self.name
+ end;
+ }
+
+ b = Named 'dog'
+ print(b)
+ --> boo dog
+
+Note that you have to explicitly declare `self` and end each function definition
+with a semi-colon or comma, since this is a Lua table. To inherit from a base class,
+set the special field `_base` to the class in this table.
+
+Penlight provides a number of useful classes; there is `List`, which is a Lua
+clone of the standard Python list object, and `Set` which represents sets. There
+are three kinds of _map_ defined: `Map`, `MultiMap` (where a key may have
+multiple values) and `OrderedMap` (where the order of insertion is remembered.).
+There is nothing special about these classes and you may inherit from them.
+
+A powerful thing about dynamic languages is that you can redefine existing classes
+and functions, which is often called 'monkey patching' It's entertaining and convenient,
+but ultimately anti-social; you may modify `List` but then any other modules using
+this _shared_ resource can no longer be sure about its behaviour. (This is why you
+must say `stringx.import()` explicitly if you want the extended string methods - it
+would be a bad default.) Lua is particularly open to modification but the
+community is not as tolerant of monkey-patching as the Ruby community, say. You may
+wish to add some new methods to `List`? Cool, but that's what subclassing is for.
+
+ class.Strings(List)
+
+ function Strings:my_method()
+ ...
+ end
+
+It's definitely more useful to define exactly how your objects behave
+in _unknown_ conditions. All classes have a `catch` method you can use to set
+a handler for unknown lookups; the function you pass looks exactly like the
+`__index` metamethod.
+
+ Strings:catch(function(self,name)
+ return function() error("no such method "..name,2) end
+ end)
+
+In this case we're just customizing the error message, but
+creative things can be done. Consider this code from `test-vector.lua`:
+
+ Strings:catch(List.default_map_with(string))
+
+ ls = Strings{'one','two','three'}
+ asserteq(ls:upper(),{'ONE','TWO','THREE'})
+ asserteq(ls:sub(1,2),{'on','tw','th'})
+
+So we've converted a unknown method invocation into a map using the function of
+that name found in `string`. So for a `Vector` (which is a specialization of `List`
+for numbers) it makes sense to make `math` the default map so that `v:sin()` makes
+sense.
+
+Note that `map` operations return a object of the same type - this is often called
+_covariance_. So `ls:upper()` itself returns a `Strings` object.
+
+This is not _always_ what you want, but objects can always be cast to the desired type.
+(`cast` doesn't create a new object, but returns the object passed.)
+
+ local sizes = ls:map '#'
+ asserteq(sizes, {3,3,5})
+ asserteq(utils.type(sizes),'Strings')
+ asserteq(sizes:is_a(Strings),true)
+ sizes = Vector:cast(sizes)
+ asserteq(utils.type(sizes),'Vector')
+ asserteq(sizes+1,{4,4,6})
+
+About `utils.type`: it can only return a string for a class type if that class does
+in fact have a `_name` field.
+
+
+_Properties_ are a useful object-oriented pattern. We wish to control access to a
+field, but don't wish to force the user of the class to say `obj:get_field()`
+etc. This excerpt from `tests/test-class.lua` shows how it is done:
+
+
+ local MyProps = class(class.properties)
+ local setted_a, got_b
+
+ function MyProps:_init ()
+ self._a = 1
+ self._b = 2
+ end
+
+ function MyProps:set_a (v)
+ setted_a = true
+ self._a = v
+ end
+
+ function MyProps:get_b ()
+ got_b = true
+ return self._b
+ end
+
+ local mp = MyProps()
+
+ mp.a = 10
+
+ asserteq(mp.a,10)
+ asserteq(mp.b,2)
+ asserteq(setted_a and got_b, true)
+
+The convention is that the internal field name is prefixed with an underscore;
+when reading `mp.a`, first a check for an explicit _getter_ `get_a` and then only
+look for `_a`. Simularly, writing `mp.a` causes the _setter_ `set_a` to be used.
+
+This is cool behaviour, but like much Lua metaprogramming, it is not free. Method
+lookup on such objects goes through `\_\_index` as before, but now `\_\_index` is a
+function which has to explicitly look up methods in the class, before doing any
+property indexing, which is not going to be as fast as field lookup. If however,
+your accessors actually do non-trivial things, then the extra overhead could be
+worth it.
+
+This is not really intended for _access control_ because external code can write
+to `mp._a` directly. It is possible to have this kind of control in Lua, but it
+again comes with run-time costs.