diff options
Diffstat (limited to 'ThirdParty/lpeg-1.0.2/lpeg.html')
-rw-r--r-- | ThirdParty/lpeg-1.0.2/lpeg.html | 1439 |
1 files changed, 1439 insertions, 0 deletions
diff --git a/ThirdParty/lpeg-1.0.2/lpeg.html b/ThirdParty/lpeg-1.0.2/lpeg.html new file mode 100644 index 0000000..8b9f59c --- /dev/null +++ b/ThirdParty/lpeg-1.0.2/lpeg.html @@ -0,0 +1,1439 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> +<head> + <title>LPeg - Parsing Expression Grammars For Lua</title> + <link rel="stylesheet" + href="http://www.inf.puc-rio.br/~roberto/lpeg/doc.css" + type="text/css"/> + <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> +</head> +<body> + +<!-- $Id: lpeg.html $ --> + +<div id="container"> + +<div id="product"> + <div id="product_logo"> + <a href="http://www.inf.puc-rio.br/~roberto/lpeg/"> + <img alt="LPeg logo" src="lpeg-128.gif"/></a> + + </div> + <div id="product_name"><big><strong>LPeg</strong></big></div> + <div id="product_description"> + Parsing Expression Grammars For Lua, version 1.0 + </div> +</div> <!-- id="product" --> + +<div id="main"> + +<div id="navigation"> +<h1>LPeg</h1> + +<ul> + <li><strong>Home</strong> + <ul> + <li><a href="#intro">Introduction</a></li> + <li><a href="#func">Functions</a></li> + <li><a href="#basic">Basic Constructions</a></li> + <li><a href="#grammar">Grammars</a></li> + <li><a href="#captures">Captures</a></li> + <li><a href="#ex">Some Examples</a></li> + <li><a href="re.html">The <code>re</code> Module</a></li> + <li><a href="#download">Download</a></li> + <li><a href="#license">License</a></li> + </ul> + </li> +</ul> +</div> <!-- id="navigation" --> + +<div id="content"> + + +<h2><a name="intro">Introduction</a></h2> + +<p> +<em>LPeg</em> is a new pattern-matching library for Lua, +based on +<a href="http://pdos.csail.mit.edu/%7Ebaford/packrat/"> +Parsing Expression Grammars</a> (PEGs). +This text is a reference manual for the library. +For a more formal treatment of LPeg, +as well as some discussion about its implementation, +see +<a href="http://www.inf.puc-rio.br/~roberto/docs/peg.pdf"> +A Text Pattern-Matching Tool based on Parsing Expression Grammars</a>. +(You may also be interested in my +<a href="http://vimeo.com/1485123">talk about LPeg</a> +given at the III Lua Workshop.) +</p> + +<p> +Following the Snobol tradition, +LPeg defines patterns as first-class objects. +That is, patterns are regular Lua values +(represented by userdata). +The library offers several functions to create +and compose patterns. +With the use of metamethods, +several of these functions are provided as infix or prefix +operators. +On the one hand, +the result is usually much more verbose than the typical +encoding of patterns using the so called +<em>regular expressions</em> +(which typically are not regular expressions in the formal sense). +On the other hand, +first-class patterns allow much better documentation +(as it is easy to comment the code, +to break complex definitions in smaller parts, etc.) +and are extensible, +as we can define new functions to create and compose patterns. +</p> + +<p> +For a quick glance of the library, +the following table summarizes its basic operations +for creating patterns: +</p> +<table border="1"> +<tbody><tr><td><b>Operator</b></td><td><b>Description</b></td></tr> +<tr><td><a href="#op-p"><code>lpeg.P(string)</code></a></td> + <td>Matches <code>string</code> literally</td></tr> +<tr><td><a href="#op-p"><code>lpeg.P(n)</code></a></td> + <td>Matches exactly <code>n</code> characters</td></tr> +<tr><td><a href="#op-s"><code>lpeg.S(string)</code></a></td> + <td>Matches any character in <code>string</code> (Set)</td></tr> +<tr><td><a href="#op-r"><code>lpeg.R("<em>xy</em>")</code></a></td> + <td>Matches any character between <em>x</em> and <em>y</em> (Range)</td></tr> +<tr><td><a href="#op-pow"><code>patt^n</code></a></td> + <td>Matches at least <code>n</code> repetitions of <code>patt</code></td></tr> +<tr><td><a href="#op-pow"><code>patt^-n</code></a></td> + <td>Matches at most <code>n</code> repetitions of <code>patt</code></td></tr> +<tr><td><a href="#op-mul"><code>patt1 * patt2</code></a></td> + <td>Matches <code>patt1</code> followed by <code>patt2</code></td></tr> +<tr><td><a href="#op-add"><code>patt1 + patt2</code></a></td> + <td>Matches <code>patt1</code> or <code>patt2</code> + (ordered choice)</td></tr> +<tr><td><a href="#op-sub"><code>patt1 - patt2</code></a></td> + <td>Matches <code>patt1</code> if <code>patt2</code> does not match</td></tr> +<tr><td><a href="#op-unm"><code>-patt</code></a></td> + <td>Equivalent to <code>("" - patt)</code></td></tr> +<tr><td><a href="#op-len"><code>#patt</code></a></td> + <td>Matches <code>patt</code> but consumes no input</td></tr> +<tr><td><a href="#op-behind"><code>lpeg.B(patt)</code></a></td> + <td>Matches <code>patt</code> behind the current position, + consuming no input</td></tr> +</tbody></table> + +<p>As a very simple example, +<code>lpeg.R("09")^1</code> creates a pattern that +matches a non-empty sequence of digits. +As a not so simple example, +<code>-lpeg.P(1)</code> +(which can be written as <code>lpeg.P(-1)</code>, +or simply <code>-1</code> for operations expecting a pattern) +matches an empty string only if it cannot match a single character; +so, it succeeds only at the end of the subject. +</p> + +<p> +LPeg also offers the <a href="re.html"><code>re</code> module</a>, +which implements patterns following a regular-expression style +(e.g., <code>[09]+</code>). +(This module is 260 lines of Lua code, +and of course it uses LPeg to parse regular expressions and +translate them to regular LPeg patterns.) +</p> + + +<h2><a name="func">Functions</a></h2> + + +<h3><a name="f-match"></a><code>lpeg.match (pattern, subject [, init])</code></h3> +<p> +The matching function. +It attempts to match the given pattern against the subject string. +If the match succeeds, +returns the index in the subject of the first character after the match, +or the <a href="#captures">captured values</a> +(if the pattern captured any value). +</p> + +<p> +An optional numeric argument <code>init</code> makes the match +start at that position in the subject string. +As usual in Lua libraries, +a negative value counts from the end. +</p> + +<p> +Unlike typical pattern-matching functions, +<code>match</code> works only in <em>anchored</em> mode; +that is, it tries to match the pattern with a prefix of +the given subject string (at position <code>init</code>), +not with an arbitrary substring of the subject. +So, if we want to find a pattern anywhere in a string, +we must either write a loop in Lua or write a pattern that +matches anywhere. +This second approach is easy and quite efficient; +see <a href="#ex">examples</a>. +</p> + +<h3><a name="f-type"></a><code>lpeg.type (value)</code></h3> +<p> +If the given value is a pattern, +returns the string <code>"pattern"</code>. +Otherwise returns nil. +</p> + +<h3><a name="f-version"></a><code>lpeg.version ()</code></h3> +<p> +Returns a string with the running version of LPeg. +</p> + +<h3><a name="f-setstack"></a><code>lpeg.setmaxstack (max)</code></h3> +<p> +Sets a limit for the size of the backtrack stack used by LPeg to +track calls and choices. +(The default limit is 400.) +Most well-written patterns need little backtrack levels and +therefore you seldom need to change this limit; +before changing it you should try to rewrite your +pattern to avoid the need for extra space. +Nevertheless, a few useful patterns may overflow. +Also, with recursive grammars, +subjects with deep recursion may also need larger limits. +</p> + + +<h2><a name="basic">Basic Constructions</a></h2> + +<p> +The following operations build patterns. +All operations that expect a pattern as an argument +may receive also strings, tables, numbers, booleans, or functions, +which are translated to patterns according to +the rules of function <a href="#op-p"><code>lpeg.P</code></a>. +</p> + + + +<h3><a name="op-p"></a><code>lpeg.P (value)</code></h3> +<p> +Converts the given value into a proper pattern, +according to the following rules: +</p> +<ul> + +<li><p> +If the argument is a pattern, +it is returned unmodified. +</p></li> + +<li><p> +If the argument is a string, +it is translated to a pattern that matches the string literally. +</p></li> + +<li><p> +If the argument is a non-negative number <em>n</em>, +the result is a pattern that matches exactly <em>n</em> characters. +</p></li> + +<li><p> +If the argument is a negative number <em>-n</em>, +the result is a pattern that +succeeds only if the input string has less than <em>n</em> characters left: +<code>lpeg.P(-n)</code> +is equivalent to <code>-lpeg.P(n)</code> +(see the <a href="#op-unm">unary minus operation</a>). +</p></li> + +<li><p> +If the argument is a boolean, +the result is a pattern that always succeeds or always fails +(according to the boolean value), +without consuming any input. +</p></li> + +<li><p> +If the argument is a table, +it is interpreted as a grammar +(see <a href="#grammar">Grammars</a>). +</p></li> + +<li><p> +If the argument is a function, +returns a pattern equivalent to a +<a href="#matchtime">match-time capture</a> over the empty string. +</p></li> + +</ul> + + +<h3><a name="op-behind"></a><code>lpeg.B(patt)</code></h3> +<p> +Returns a pattern that +matches only if the input string at the current position +is preceded by <code>patt</code>. +Pattern <code>patt</code> must match only strings +with some fixed length, +and it cannot contain captures. +</p> + +<p> +Like the <a href="#op-len">and predicate</a>, +this pattern never consumes any input, +independently of success or failure. +</p> + + +<h3><a name="op-r"></a><code>lpeg.R ({range})</code></h3> +<p> +Returns a pattern that matches any single character +belonging to one of the given <em>ranges</em>. +Each <code>range</code> is a string <em>xy</em> of length 2, +representing all characters with code +between the codes of <em>x</em> and <em>y</em> +(both inclusive). +</p> + +<p> +As an example, the pattern +<code>lpeg.R("09")</code> matches any digit, +and <code>lpeg.R("az", "AZ")</code> matches any ASCII letter. +</p> + + +<h3><a name="op-s"></a><code>lpeg.S (string)</code></h3> +<p> +Returns a pattern that matches any single character that +appears in the given string. +(The <code>S</code> stands for <em>Set</em>.) +</p> + +<p> +As an example, the pattern +<code>lpeg.S("+-*/")</code> matches any arithmetic operator. +</p> + +<p> +Note that, if <code>s</code> is a character +(that is, a string of length 1), +then <code>lpeg.P(s)</code> is equivalent to <code>lpeg.S(s)</code> +which is equivalent to <code>lpeg.R(s..s)</code>. +Note also that both <code>lpeg.S("")</code> and <code>lpeg.R()</code> +are patterns that always fail. +</p> + + +<h3><a name="op-v"></a><code>lpeg.V (v)</code></h3> +<p> +This operation creates a non-terminal (a <em>variable</em>) +for a grammar. +The created non-terminal refers to the rule indexed by <code>v</code> +in the enclosing grammar. +(See <a href="#grammar">Grammars</a> for details.) +</p> + + +<h3><a name="op-locale"></a><code>lpeg.locale ([table])</code></h3> +<p> +Returns a table with patterns for matching some character classes +according to the current locale. +The table has fields named +<code>alnum</code>, +<code>alpha</code>, +<code>cntrl</code>, +<code>digit</code>, +<code>graph</code>, +<code>lower</code>, +<code>print</code>, +<code>punct</code>, +<code>space</code>, +<code>upper</code>, and +<code>xdigit</code>, +each one containing a correspondent pattern. +Each pattern matches any single character that belongs to its class. +</p> + +<p> +If called with an argument <code>table</code>, +then it creates those fields inside the given table and +returns that table. +</p> + + +<h3><a name="op-len"></a><code>#patt</code></h3> +<p> +Returns a pattern that +matches only if the input string matches <code>patt</code>, +but without consuming any input, +independently of success or failure. +(This pattern is called an <em>and predicate</em> +and it is equivalent to +<em>&patt</em> in the original PEG notation.) +</p> + + +<p> +This pattern never produces any capture. +</p> + + +<h3><a name="op-unm"></a><code>-patt</code></h3> +<p> +Returns a pattern that +matches only if the input string does not match <code>patt</code>. +It does not consume any input, +independently of success or failure. +(This pattern is equivalent to +<em>!patt</em> in the original PEG notation.) +</p> + +<p> +As an example, the pattern +<code>-lpeg.P(1)</code> matches only the end of string. +</p> + +<p> +This pattern never produces any captures, +because either <code>patt</code> fails +or <code>-patt</code> fails. +(A failing pattern never produces captures.) +</p> + + +<h3><a name="op-add"></a><code>patt1 + patt2</code></h3> +<p> +Returns a pattern equivalent to an <em>ordered choice</em> +of <code>patt1</code> and <code>patt2</code>. +(This is denoted by <em>patt1 / patt2</em> in the original PEG notation, +not to be confused with the <code>/</code> operation in LPeg.) +It matches either <code>patt1</code> or <code>patt2</code>, +with no backtracking once one of them succeeds. +The identity element for this operation is the pattern +<code>lpeg.P(false)</code>, +which always fails. +</p> + +<p> +If both <code>patt1</code> and <code>patt2</code> are +character sets, +this operation is equivalent to set union. +</p> +<pre class="example"> +lower = lpeg.R("az") +upper = lpeg.R("AZ") +letter = lower + upper +</pre> + + +<h3><a name="op-sub"></a><code>patt1 - patt2</code></h3> +<p> +Returns a pattern equivalent to <em>!patt2 patt1</em>. +This pattern asserts that the input does not match +<code>patt2</code> and then matches <code>patt1</code>. +</p> + +<p> +When successful, +this pattern produces all captures from <code>patt1</code>. +It never produces any capture from <code>patt2</code> +(as either <code>patt2</code> fails or +<code>patt1 - patt2</code> fails). +</p> + +<p> +If both <code>patt1</code> and <code>patt2</code> are +character sets, +this operation is equivalent to set difference. +Note that <code>-patt</code> is equivalent to <code>"" - patt</code> +(or <code>0 - patt</code>). +If <code>patt</code> is a character set, +<code>1 - patt</code> is its complement. +</p> + + +<h3><a name="op-mul"></a><code>patt1 * patt2</code></h3> +<p> +Returns a pattern that matches <code>patt1</code> +and then matches <code>patt2</code>, +starting where <code>patt1</code> finished. +The identity element for this operation is the +pattern <code>lpeg.P(true)</code>, +which always succeeds. +</p> + +<p> +(LPeg uses the <code>*</code> operator +[instead of the more obvious <code>..</code>] +both because it has +the right priority and because in formal languages it is +common to use a dot for denoting concatenation.) +</p> + + +<h3><a name="op-pow"></a><code>patt^n</code></h3> +<p> +If <code>n</code> is nonnegative, +this pattern is +equivalent to <em>patt<sup>n</sup> patt*</em>: +It matches <code>n</code> or more occurrences of <code>patt</code>. +</p> + +<p> +Otherwise, when <code>n</code> is negative, +this pattern is equivalent to <em>(patt?)<sup>-n</sup></em>: +It matches at most <code>|n|</code> +occurrences of <code>patt</code>. +</p> + +<p> +In particular, <code>patt^0</code> is equivalent to <em>patt*</em>, +<code>patt^1</code> is equivalent to <em>patt+</em>, +and <code>patt^-1</code> is equivalent to <em>patt?</em> +in the original PEG notation. +</p> + +<p> +In all cases, +the resulting pattern is greedy with no backtracking +(also called a <em>possessive</em> repetition). +That is, it matches only the longest possible sequence +of matches for <code>patt</code>. +</p> + + + +<h2><a name="grammar">Grammars</a></h2> + +<p> +With the use of Lua variables, +it is possible to define patterns incrementally, +with each new pattern using previously defined ones. +However, this technique does not allow the definition of +recursive patterns. +For recursive patterns, +we need real grammars. +</p> + +<p> +LPeg represents grammars with tables, +where each entry is a rule. +</p> + +<p> +The call <code>lpeg.V(v)</code> +creates a pattern that represents the nonterminal +(or <em>variable</em>) with index <code>v</code> in a grammar. +Because the grammar still does not exist when +this function is evaluated, +the result is an <em>open reference</em> to the respective rule. +</p> + +<p> +A table is <em>fixed</em> when it is converted to a pattern +(either by calling <code>lpeg.P</code> or by using it wherein a +pattern is expected). +Then every open reference created by <code>lpeg.V(v)</code> +is corrected to refer to the rule indexed by <code>v</code> in the table. +</p> + +<p> +When a table is fixed, +the result is a pattern that matches its <em>initial rule</em>. +The entry with index 1 in the table defines its initial rule. +If that entry is a string, +it is assumed to be the name of the initial rule. +Otherwise, LPeg assumes that the entry 1 itself is the initial rule. +</p> + +<p> +As an example, +the following grammar matches strings of a's and b's that +have the same number of a's and b's: +</p> +<pre class="example"> +equalcount = lpeg.P{ + "S"; -- initial rule name + S = "a" * lpeg.V"B" + "b" * lpeg.V"A" + "", + A = "a" * lpeg.V"S" + "b" * lpeg.V"A" * lpeg.V"A", + B = "b" * lpeg.V"S" + "a" * lpeg.V"B" * lpeg.V"B", +} * -1 +</pre> +<p> +It is equivalent to the following grammar in standard PEG notation: +</p> +<pre class="example"> + S <- 'a' B / 'b' A / '' + A <- 'a' S / 'b' A A + B <- 'b' S / 'a' B B +</pre> + + +<h2><a name="captures">Captures</a></h2> + +<p> +A <em>capture</em> is a pattern that produces values +(the so called <em>semantic information</em>) +according to what it matches. +LPeg offers several kinds of captures, +which produces values based on matches and combine these values to +produce new values. +Each capture may produce zero or more values. +</p> + +<p> +The following table summarizes the basic captures: +</p> +<table border="1"> +<tbody><tr><td><b>Operation</b></td><td><b>What it Produces</b></td></tr> +<tr><td><a href="#cap-c"><code>lpeg.C(patt)</code></a></td> + <td>the match for <code>patt</code> plus all captures + made by <code>patt</code></td></tr> +<tr><td><a href="#cap-arg"><code>lpeg.Carg(n)</code></a></td> + <td>the value of the n<sup>th</sup> extra argument to + <code>lpeg.match</code> (matches the empty string)</td></tr> +<tr><td><a href="#cap-b"><code>lpeg.Cb(name)</code></a></td> + <td>the values produced by the previous + group capture named <code>name</code> + (matches the empty string)</td></tr> +<tr><td><a href="#cap-cc"><code>lpeg.Cc(values)</code></a></td> + <td>the given values (matches the empty string)</td></tr> +<tr><td><a href="#cap-f"><code>lpeg.Cf(patt, func)</code></a></td> + <td>a <em>folding</em> of the captures from <code>patt</code></td></tr> +<tr><td><a href="#cap-g"><code>lpeg.Cg(patt [, name])</code></a></td> + <td>the values produced by <code>patt</code>, + optionally tagged with <code>name</code></td></tr> +<tr><td><a href="#cap-p"><code>lpeg.Cp()</code></a></td> + <td>the current position (matches the empty string)</td></tr> +<tr><td><a href="#cap-s"><code>lpeg.Cs(patt)</code></a></td> + <td>the match for <code>patt</code> + with the values from nested captures replacing their matches</td></tr> +<tr><td><a href="#cap-t"><code>lpeg.Ct(patt)</code></a></td> + <td>a table with all captures from <code>patt</code></td></tr> +<tr><td><a href="#cap-string"><code>patt / string</code></a></td> + <td><code>string</code>, with some marks replaced by captures + of <code>patt</code></td></tr> +<tr><td><a href="#cap-num"><code>patt / number</code></a></td> + <td>the n-th value captured by <code>patt</code>, +or no value when <code>number</code> is zero.</td></tr> +<tr><td><a href="#cap-query"><code>patt / table</code></a></td> + <td><code>table[c]</code>, where <code>c</code> is the (first) + capture of <code>patt</code></td></tr> +<tr><td><a href="#cap-func"><code>patt / function</code></a></td> + <td>the returns of <code>function</code> applied to the captures + of <code>patt</code></td></tr> +<tr><td><a href="#matchtime"><code>lpeg.Cmt(patt, function)</code></a></td> + <td>the returns of <code>function</code> applied to the captures + of <code>patt</code>; the application is done at match time</td></tr> +</tbody></table> + +<p> +A capture pattern produces its values only when it succeeds. +For instance, +the pattern <code>lpeg.C(lpeg.P"a"^-1)</code> +produces the empty string when there is no <code>"a"</code> +(because the pattern <code>"a"?</code> succeeds), +while the pattern <code>lpeg.C("a")^-1</code> +does not produce any value when there is no <code>"a"</code> +(because the pattern <code>"a"</code> fails). +A pattern inside a loop or inside a recursive structure +produces values for each match. +</p> + +<p> +Usually, +LPeg does not specify when (and if) it evaluates its captures. +(As an example, +consider the pattern <code>lpeg.P"a" / func / 0</code>. +Because the "division" by 0 instructs LPeg to throw away the +results from the pattern, +LPeg may or may not call <code>func</code>.) +Therefore, captures should avoid side effects. +Moreover, +most captures cannot affect the way a pattern matches a subject. +The only exception to this rule is the +so-called <a href="#matchtime"><em>match-time capture</em></a>. +When a match-time capture matches, +it forces the immediate evaluation of all its nested captures +and then calls its corresponding function, +which defines whether the match succeeds and also +what values are produced. +</p> + +<h3><a name="cap-c"></a><code>lpeg.C (patt)</code></h3> +<p> +Creates a <em>simple capture</em>, +which captures the substring of the subject that matches <code>patt</code>. +The captured value is a string. +If <code>patt</code> has other captures, +their values are returned after this one. +</p> + + +<h3><a name="cap-arg"></a><code>lpeg.Carg (n)</code></h3> +<p> +Creates an <em>argument capture</em>. +This pattern matches the empty string and +produces the value given as the n<sup>th</sup> extra +argument given in the call to <code>lpeg.match</code>. +</p> + + +<h3><a name="cap-b"></a><code>lpeg.Cb (name)</code></h3> +<p> +Creates a <em>back capture</em>. +This pattern matches the empty string and +produces the values produced by the <em>most recent</em> +<a href="#cap-g">group capture</a> named <code>name</code> +(where <code>name</code> can be any Lua value). +</p> + +<p> +<em>Most recent</em> means the last +<em>complete</em> +<em>outermost</em> +group capture with the given name. +A <em>Complete</em> capture means that the entire pattern +corresponding to the capture has matched. +An <em>Outermost</em> capture means that the capture is not inside +another complete capture. +</p> + +<p> +In the same way that LPeg does not specify when it evaluates captures, +it does not specify whether it reuses +values previously produced by the group +or re-evaluates them. +</p> + +<h3><a name="cap-cc"></a><code>lpeg.Cc ([value, ...])</code></h3> +<p> +Creates a <em>constant capture</em>. +This pattern matches the empty string and +produces all given values as its captured values. +</p> + + +<h3><a name="cap-f"></a><code>lpeg.Cf (patt, func)</code></h3> +<p> +Creates a <em>fold capture</em>. +If <code>patt</code> produces a list of captures +<em>C<sub>1</sub> C<sub>2</sub> ... C<sub>n</sub></em>, +this capture will produce the value +<em>func(...func(func(C<sub>1</sub>, C<sub>2</sub>), C<sub>3</sub>)..., + C<sub>n</sub>)</em>, +that is, it will <em>fold</em> +(or <em>accumulate</em>, or <em>reduce</em>) +the captures from <code>patt</code> using function <code>func</code>. +</p> + +<p> +This capture assumes that <code>patt</code> should produce +at least one capture with at least one value (of any type), +which becomes the initial value of an <em>accumulator</em>. +(If you need a specific initial value, +you may prefix a <a href="#cap-cc">constant capture</a> to <code>patt</code>.) +For each subsequent capture, +LPeg calls <code>func</code> +with this accumulator as the first argument and all values produced +by the capture as extra arguments; +the first result from this call +becomes the new value for the accumulator. +The final value of the accumulator becomes the captured value. +</p> + +<p> +As an example, +the following pattern matches a list of numbers separated +by commas and returns their addition: +</p> +<pre class="example"> +-- matches a numeral and captures its numerical value +number = lpeg.R"09"^1 / tonumber + +-- matches a list of numbers, capturing their values +list = number * ("," * number)^0 + +-- auxiliary function to add two numbers +function add (acc, newvalue) return acc + newvalue end + +-- folds the list of numbers adding them +sum = lpeg.Cf(list, add) + +-- example of use +print(sum:match("10,30,43")) --> 83 +</pre> + + +<h3><a name="cap-g"></a><code>lpeg.Cg (patt [, name])</code></h3> +<p> +Creates a <em>group capture</em>. +It groups all values returned by <code>patt</code> +into a single capture. +The group may be anonymous (if no name is given) +or named with the given name +(which can be any non-nil Lua value). +</p> + +<p> +An anonymous group serves to join values from several captures into +a single capture. +A named group has a different behavior. +In most situations, a named group returns no values at all. +Its values are only relevant for a following +<a href="#cap-b">back capture</a> or when used +inside a <a href="#cap-t">table capture</a>. +</p> + + +<h3><a name="cap-p"></a><code>lpeg.Cp ()</code></h3> +<p> +Creates a <em>position capture</em>. +It matches the empty string and +captures the position in the subject where the match occurs. +The captured value is a number. +</p> + + +<h3><a name="cap-s"></a><code>lpeg.Cs (patt)</code></h3> +<p> +Creates a <em>substitution capture</em>, +which captures the substring of the subject that matches <code>patt</code>, +with <em>substitutions</em>. +For any capture inside <code>patt</code> with a value, +the substring that matched the capture is replaced by the capture value +(which should be a string). +The final captured value is the string resulting from +all replacements. +</p> + + +<h3><a name="cap-t"></a><code>lpeg.Ct (patt)</code></h3> +<p> +Creates a <em>table capture</em>. +This capture returns a table with all values from all anonymous captures +made by <code>patt</code> inside this table in successive integer keys, +starting at 1. +Moreover, +for each named capture group created by <code>patt</code>, +the first value of the group is put into the table +with the group name as its key. +The captured value is only the table. +</p> + + +<h3><a name="cap-string"></a><code>patt / string</code></h3> +<p> +Creates a <em>string capture</em>. +It creates a capture string based on <code>string</code>. +The captured value is a copy of <code>string</code>, +except that the character <code>%</code> works as an escape character: +any sequence in <code>string</code> of the form <code>%<em>n</em></code>, +with <em>n</em> between 1 and 9, +stands for the match of the <em>n</em>-th capture in <code>patt</code>. +The sequence <code>%0</code> stands for the whole match. +The sequence <code>%%</code> stands for a single <code>%</code>. +</p> + + +<h3><a name="cap-num"></a><code>patt / number</code></h3> +<p> +Creates a <em>numbered capture</em>. +For a non-zero number, +the captured value is the n-th value +captured by <code>patt</code>. +When <code>number</code> is zero, +there are no captured values. +</p> + + +<h3><a name="cap-query"></a><code>patt / table</code></h3> +<p> +Creates a <em>query capture</em>. +It indexes the given table using as key the first value captured by +<code>patt</code>, +or the whole match if <code>patt</code> produced no value. +The value at that index is the final value of the capture. +If the table does not have that key, +there is no captured value. +</p> + + +<h3><a name="cap-func"></a><code>patt / function</code></h3> +<p> +Creates a <em>function capture</em>. +It calls the given function passing all captures made by +<code>patt</code> as arguments, +or the whole match if <code>patt</code> made no capture. +The values returned by the function +are the final values of the capture. +In particular, +if <code>function</code> returns no value, +there is no captured value. +</p> + + +<h3><a name="matchtime"></a><code>lpeg.Cmt(patt, function)</code></h3> +<p> +Creates a <em>match-time capture</em>. +Unlike all other captures, +this one is evaluated immediately when a match occurs +(even if it is part of a larger pattern that fails later). +It forces the immediate evaluation of all its nested captures +and then calls <code>function</code>. +</p> + +<p> +The given function gets as arguments the entire subject, +the current position (after the match of <code>patt</code>), +plus any capture values produced by <code>patt</code>. +</p> + +<p> +The first value returned by <code>function</code> +defines how the match happens. +If the call returns a number, +the match succeeds +and the returned number becomes the new current position. +(Assuming a subject <em>s</em> and current position <em>i</em>, +the returned number must be in the range <em>[i, len(s) + 1]</em>.) +If the call returns <b>true</b>, +the match succeeds without consuming any input. +(So, to return <b>true</b> is equivalent to return <em>i</em>.) +If the call returns <b>false</b>, <b>nil</b>, or no value, +the match fails. +</p> + +<p> +Any extra values returned by the function become the +values produced by the capture. +</p> + + + + +<h2><a name="ex">Some Examples</a></h2> + +<h3>Using a Pattern</h3> +<p> +This example shows a very simple but complete program +that builds and uses a pattern: +</p> +<pre class="example"> +local lpeg = require "lpeg" + +-- matches a word followed by end-of-string +p = lpeg.R"az"^1 * -1 + +print(p:match("hello")) --> 6 +print(lpeg.match(p, "hello")) --> 6 +print(p:match("1 hello")) --> nil +</pre> +<p> +The pattern is simply a sequence of one or more lower-case letters +followed by the end of string (-1). +The program calls <code>match</code> both as a method +and as a function. +In both sucessful cases, +the match returns +the index of the first character after the match, +which is the string length plus one. +</p> + + +<h3>Name-value lists</h3> +<p> +This example parses a list of name-value pairs and returns a table +with those pairs: +</p> +<pre class="example"> +lpeg.locale(lpeg) -- adds locale entries into 'lpeg' table + +local space = lpeg.space^0 +local name = lpeg.C(lpeg.alpha^1) * space +local sep = lpeg.S(",;") * space +local pair = lpeg.Cg(name * "=" * space * name) * sep^-1 +local list = lpeg.Cf(lpeg.Ct("") * pair^0, rawset) +t = list:match("a=b, c = hi; next = pi") --> { a = "b", c = "hi", next = "pi" } +</pre> +<p> +Each pair has the format <code>name = name</code> followed by +an optional separator (a comma or a semicolon). +The <code>pair</code> pattern encloses the pair in a group pattern, +so that the names become the values of a single capture. +The <code>list</code> pattern then folds these captures. +It starts with an empty table, +created by a table capture matching an empty string; +then for each capture (a pair of names) it applies <code>rawset</code> +over the accumulator (the table) and the capture values (the pair of names). +<code>rawset</code> returns the table itself, +so the accumulator is always the table. +</p> + +<h3>Splitting a string</h3> +<p> +The following code builds a pattern that +splits a string using a given pattern +<code>sep</code> as a separator: +</p> +<pre class="example"> +function split (s, sep) + sep = lpeg.P(sep) + local elem = lpeg.C((1 - sep)^0) + local p = elem * (sep * elem)^0 + return lpeg.match(p, s) +end +</pre> +<p> +First the function ensures that <code>sep</code> is a proper pattern. +The pattern <code>elem</code> is a repetition of zero of more +arbitrary characters as long as there is not a match against +the separator. +It also captures its match. +The pattern <code>p</code> matches a list of elements separated +by <code>sep</code>. +</p> + +<p> +If the split results in too many values, +it may overflow the maximum number of values +that can be returned by a Lua function. +In this case, +we can collect these values in a table: +</p> +<pre class="example"> +function split (s, sep) + sep = lpeg.P(sep) + local elem = lpeg.C((1 - sep)^0) + local p = lpeg.Ct(elem * (sep * elem)^0) -- make a table capture + return lpeg.match(p, s) +end +</pre> + + +<h3>Searching for a pattern</h3> +<p> +The primitive <code>match</code> works only in anchored mode. +If we want to find a pattern anywhere in a string, +we must write a pattern that matches anywhere. +</p> + +<p> +Because patterns are composable, +we can write a function that, +given any arbitrary pattern <code>p</code>, +returns a new pattern that searches for <code>p</code> +anywhere in a string. +There are several ways to do the search. +One way is like this: +</p> +<pre class="example"> +function anywhere (p) + return lpeg.P{ p + 1 * lpeg.V(1) } +end +</pre> +<p> +This grammar has a straight reading: +it matches <code>p</code> or skips one character and tries again. +</p> + +<p> +If we want to know where the pattern is in the string +(instead of knowing only that it is there somewhere), +we can add position captures to the pattern: +</p> +<pre class="example"> +local I = lpeg.Cp() +function anywhere (p) + return lpeg.P{ I * p * I + 1 * lpeg.V(1) } +end + +print(anywhere("world"):match("hello world!")) -> 7 12 +</pre> + +<p> +Another option for the search is like this: +</p> +<pre class="example"> +local I = lpeg.Cp() +function anywhere (p) + return (1 - lpeg.P(p))^0 * I * p * I +end +</pre> +<p> +Again the pattern has a straight reading: +it skips as many characters as possible while not matching <code>p</code>, +and then matches <code>p</code> (plus appropriate captures). +</p> + +<p> +If we want to look for a pattern only at word boundaries, +we can use the following transformer: +</p> + +<pre class="example"> +local t = lpeg.locale() + +function atwordboundary (p) + return lpeg.P{ + [1] = p + t.alpha^0 * (1 - t.alpha)^1 * lpeg.V(1) + } +end +</pre> + + +<h3><a name="balanced"></a>Balanced parentheses</h3> +<p> +The following pattern matches only strings with balanced parentheses: +</p> +<pre class="example"> +b = lpeg.P{ "(" * ((1 - lpeg.S"()") + lpeg.V(1))^0 * ")" } +</pre> +<p> +Reading the first (and only) rule of the given grammar, +we have that a balanced string is +an open parenthesis, +followed by zero or more repetitions of either +a non-parenthesis character or +a balanced string (<code>lpeg.V(1)</code>), +followed by a closing parenthesis. +</p> + + +<h3>Global substitution</h3> +<p> +The next example does a job somewhat similar to <code>string.gsub</code>. +It receives a pattern and a replacement value, +and substitutes the replacement value for all occurrences of the pattern +in a given string: +</p> +<pre class="example"> +function gsub (s, patt, repl) + patt = lpeg.P(patt) + patt = lpeg.Cs((patt / repl + 1)^0) + return lpeg.match(patt, s) +end +</pre> +<p> +As in <code>string.gsub</code>, +the replacement value can be a string, +a function, or a table. +</p> + + +<h3><a name="CSV"></a>Comma-Separated Values (CSV)</h3> +<p> +This example breaks a string into comma-separated values, +returning all fields: +</p> +<pre class="example"> +local field = '"' * lpeg.Cs(((lpeg.P(1) - '"') + lpeg.P'""' / '"')^0) * '"' + + lpeg.C((1 - lpeg.S',\n"')^0) + +local record = field * (',' * field)^0 * (lpeg.P'\n' + -1) + +function csv (s) + return lpeg.match(record, s) +end +</pre> +<p> +A field is either a quoted field +(which may contain any character except an individual quote, +which may be written as two quotes that are replaced by one) +or an unquoted field +(which cannot contain commas, newlines, or quotes). +A record is a list of fields separated by commas, +ending with a newline or the string end (-1). +</p> + +<p> +As it is, +the previous pattern returns each field as a separated result. +If we add a table capture in the definition of <code>record</code>, +the pattern will return instead a single table +containing all fields: +</p> +<pre> +local record = lpeg.Ct(field * (',' * field)^0) * (lpeg.P'\n' + -1) +</pre> + + +<h3>UTF-8 and Latin 1</h3> +<p> +It is not difficult to use LPeg to convert a string from +UTF-8 encoding to Latin 1 (ISO 8859-1): +</p> + +<pre class="example"> +-- convert a two-byte UTF-8 sequence to a Latin 1 character +local function f2 (s) + local c1, c2 = string.byte(s, 1, 2) + return string.char(c1 * 64 + c2 - 12416) +end + +local utf8 = lpeg.R("\0\127") + + lpeg.R("\194\195") * lpeg.R("\128\191") / f2 + +local decode_pattern = lpeg.Cs(utf8^0) * -1 +</pre> +<p> +In this code, +the definition of UTF-8 is already restricted to the +Latin 1 range (from 0 to 255). +Any encoding outside this range (as well as any invalid encoding) +will not match that pattern. +</p> + +<p> +As the definition of <code>decode_pattern</code> demands that +the pattern matches the whole input (because of the -1 at its end), +any invalid string will simply fail to match, +without any useful information about the problem. +We can improve this situation redefining <code>decode_pattern</code> +as follows: +</p> +<pre class="example"> +local function er (_, i) error("invalid encoding at position " .. i) end + +local decode_pattern = lpeg.Cs(utf8^0) * (-1 + lpeg.P(er)) +</pre> +<p> +Now, if the pattern <code>utf8^0</code> stops +before the end of the string, +an appropriate error function is called. +</p> + + +<h3>UTF-8 and Unicode</h3> +<p> +We can extend the previous patterns to handle all Unicode code points. +Of course, +we cannot translate them to Latin 1 or any other one-byte encoding. +Instead, our translation results in a array with the code points +represented as numbers. +The full code is here: +</p> +<pre class="example"> +-- decode a two-byte UTF-8 sequence +local function f2 (s) + local c1, c2 = string.byte(s, 1, 2) + return c1 * 64 + c2 - 12416 +end + +-- decode a three-byte UTF-8 sequence +local function f3 (s) + local c1, c2, c3 = string.byte(s, 1, 3) + return (c1 * 64 + c2) * 64 + c3 - 925824 +end + +-- decode a four-byte UTF-8 sequence +local function f4 (s) + local c1, c2, c3, c4 = string.byte(s, 1, 4) + return ((c1 * 64 + c2) * 64 + c3) * 64 + c4 - 63447168 +end + +local cont = lpeg.R("\128\191") -- continuation byte + +local utf8 = lpeg.R("\0\127") / string.byte + + lpeg.R("\194\223") * cont / f2 + + lpeg.R("\224\239") * cont * cont / f3 + + lpeg.R("\240\244") * cont * cont * cont / f4 + +local decode_pattern = lpeg.Ct(utf8^0) * -1 +</pre> + + +<h3>Lua's long strings</h3> +<p> +A long string in Lua starts with the pattern <code>[=*[</code> +and ends at the first occurrence of <code>]=*]</code> with +exactly the same number of equal signs. +If the opening brackets are followed by a newline, +this newline is discarded +(that is, it is not part of the string). +</p> + +<p> +To match a long string in Lua, +the pattern must capture the first repetition of equal signs and then, +whenever it finds a candidate for closing the string, +check whether it has the same number of equal signs. +</p> + +<pre class="example"> +equals = lpeg.P"="^0 +open = "[" * lpeg.Cg(equals, "init") * "[" * lpeg.P"\n"^-1 +close = "]" * lpeg.C(equals) * "]" +closeeq = lpeg.Cmt(close * lpeg.Cb("init"), function (s, i, a, b) return a == b end) +string = open * lpeg.C((lpeg.P(1) - closeeq)^0) * close / 1 +</pre> + +<p> +The <code>open</code> pattern matches <code>[=*[</code>, +capturing the repetitions of equal signs in a group named <code>init</code>; +it also discharges an optional newline, if present. +The <code>close</code> pattern matches <code>]=*]</code>, +also capturing the repetitions of equal signs. +The <code>closeeq</code> pattern first matches <code>close</code>; +then it uses a back capture to recover the capture made +by the previous <code>open</code>, +which is named <code>init</code>; +finally it uses a match-time capture to check +whether both captures are equal. +The <code>string</code> pattern starts with an <code>open</code>, +then it goes as far as possible until matching <code>closeeq</code>, +and then matches the final <code>close</code>. +The final numbered capture simply discards +the capture made by <code>close</code>. +</p> + + +<h3>Arithmetic expressions</h3> +<p> +This example is a complete parser and evaluator for simple +arithmetic expressions. +We write it in two styles. +The first approach first builds a syntax tree and then +traverses this tree to compute the expression value: +</p> +<pre class="example"> +-- Lexical Elements +local Space = lpeg.S(" \n\t")^0 +local Number = lpeg.C(lpeg.P"-"^-1 * lpeg.R("09")^1) * Space +local TermOp = lpeg.C(lpeg.S("+-")) * Space +local FactorOp = lpeg.C(lpeg.S("*/")) * Space +local Open = "(" * Space +local Close = ")" * Space + +-- Grammar +local Exp, Term, Factor = lpeg.V"Exp", lpeg.V"Term", lpeg.V"Factor" +G = lpeg.P{ Exp, + Exp = lpeg.Ct(Term * (TermOp * Term)^0); + Term = lpeg.Ct(Factor * (FactorOp * Factor)^0); + Factor = Number + Open * Exp * Close; +} + +G = Space * G * -1 + +-- Evaluator +function eval (x) + if type(x) == "string" then + return tonumber(x) + else + local op1 = eval(x[1]) + for i = 2, #x, 2 do + local op = x[i] + local op2 = eval(x[i + 1]) + if (op == "+") then op1 = op1 + op2 + elseif (op == "-") then op1 = op1 - op2 + elseif (op == "*") then op1 = op1 * op2 + elseif (op == "/") then op1 = op1 / op2 + end + end + return op1 + end +end + +-- Parser/Evaluator +function evalExp (s) + local t = lpeg.match(G, s) + if not t then error("syntax error", 2) end + return eval(t) +end + +-- small example +print(evalExp"3 + 5*9 / (1+1) - 12") --> 13.5 +</pre> + +<p> +The second style computes the expression value on the fly, +without building the syntax tree. +The following grammar takes this approach. +(It assumes the same lexical elements as before.) +</p> +<pre class="example"> +-- Auxiliary function +function eval (v1, op, v2) + if (op == "+") then return v1 + v2 + elseif (op == "-") then return v1 - v2 + elseif (op == "*") then return v1 * v2 + elseif (op == "/") then return v1 / v2 + end +end + +-- Grammar +local V = lpeg.V +G = lpeg.P{ "Exp", + Exp = lpeg.Cf(V"Term" * lpeg.Cg(TermOp * V"Term")^0, eval); + Term = lpeg.Cf(V"Factor" * lpeg.Cg(FactorOp * V"Factor")^0, eval); + Factor = Number / tonumber + Open * V"Exp" * Close; +} + +-- small example +print(lpeg.match(G, "3 + 5*9 / (1+1) - 12")) --> 13.5 +</pre> +<p> +Note the use of the fold (accumulator) capture. +To compute the value of an expression, +the accumulator starts with the value of the first term, +and then applies <code>eval</code> over +the accumulator, the operator, +and the new term for each repetition. +</p> + + + +<h2><a name="download"></a>Download</h2> + +<p>LPeg +<a href="http://www.inf.puc-rio.br/~roberto/lpeg/lpeg-1.0.2.tar.gz">source code</a>.</p> + + +<h2><a name="license">License</a></h2> + +<p> +Copyright © 2007-2019 Lua.org, PUC-Rio. +</p> +<p> +Permission is hereby granted, free of charge, +to any person obtaining a copy of this software and +associated documentation files (the "Software"), +to deal in the Software without restriction, +including without limitation the rights to use, +copy, modify, merge, publish, distribute, sublicense, +and/or sell copies of the Software, +and to permit persons to whom the Software is +furnished to do so, +subject to the following conditions: +</p> + +<p> +The above copyright notice and this permission notice +shall be included in all copies or substantial portions of the Software. +</p> + +<p> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, +INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, +DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +</p> + +</div> <!-- id="content" --> + +</div> <!-- id="main" --> + +</div> <!-- id="container" --> + +</body> +</html> |