diff options
Diffstat (limited to 'ThirdParty/lpeg-1.0.2/re.html')
-rw-r--r-- | ThirdParty/lpeg-1.0.2/re.html | 494 |
1 files changed, 494 insertions, 0 deletions
diff --git a/ThirdParty/lpeg-1.0.2/re.html b/ThirdParty/lpeg-1.0.2/re.html new file mode 100644 index 0000000..ad60d50 --- /dev/null +++ b/ThirdParty/lpeg-1.0.2/re.html @@ -0,0 +1,494 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> +<html> +<head> + <title>LPeg.re - Regex syntax for LPEG</title> + <link rel="stylesheet" + href="http://www.inf.puc-rio.br/~roberto/lpeg/doc.css" + type="text/css"/> + <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> +</head> +<body> + +<!-- $Id: re.html $ --> + +<div id="container"> + +<div id="product"> + <div id="product_logo"> + <a href="http://www.inf.puc-rio.br/~roberto/lpeg/"> + <img alt="LPeg logo" src="lpeg-128.gif"/> + </a> + </div> + <div id="product_name"><big><strong>LPeg.re</strong></big></div> + <div id="product_description"> + Regex syntax for LPEG + </div> +</div> <!-- id="product" --> + +<div id="main"> + +<div id="navigation"> +<h1>re</h1> + +<ul> + <li><a href="#basic">Basic Constructions</a></li> + <li><a href="#func">Functions</a></li> + <li><a href="#ex">Some Examples</a></li> + <li><a href="#license">License</a></li> + </ul> + </li> +</ul> +</div> <!-- id="navigation" --> + +<div id="content"> + +<h2><a name="basic"></a>The <code>re</code> Module</h2> + +<p> +The <code>re</code> module +(provided by file <code>re.lua</code> in the distribution) +supports a somewhat conventional regex syntax +for pattern usage within <a href="lpeg.html">LPeg</a>. +</p> + +<p> +The next table summarizes <code>re</code>'s syntax. +A <code>p</code> represents an arbitrary pattern; +<code>num</code> represents a number (<code>[0-9]+</code>); +<code>name</code> represents an identifier +(<code>[a-zA-Z][a-zA-Z0-9_]*</code>). +Constructions are listed in order of decreasing precedence. +<table border="1"> +<tbody><tr><td><b>Syntax</b></td><td><b>Description</b></td></tr> +<tr><td><code>( p )</code></td> <td>grouping</td></tr> +<tr><td><code>'string'</code></td> <td>literal string</td></tr> +<tr><td><code>"string"</code></td> <td>literal string</td></tr> +<tr><td><code>[class]</code></td> <td>character class</td></tr> +<tr><td><code>.</code></td> <td>any character</td></tr> +<tr><td><code>%name</code></td> + <td>pattern <code>defs[name]</code> or a pre-defined pattern</td></tr> +<tr><td><code>name</code></td><td>non terminal</td></tr> +<tr><td><code><name></code></td><td>non terminal</td></tr> +<tr><td><code>{}</code></td> <td>position capture</td></tr> +<tr><td><code>{ p }</code></td> <td>simple capture</td></tr> +<tr><td><code>{: p :}</code></td> <td>anonymous group capture</td></tr> +<tr><td><code>{:name: p :}</code></td> <td>named group capture</td></tr> +<tr><td><code>{~ p ~}</code></td> <td>substitution capture</td></tr> +<tr><td><code>{| p |}</code></td> <td>table capture</td></tr> +<tr><td><code>=name</code></td> <td>back reference +</td></tr> +<tr><td><code>p ?</code></td> <td>optional match</td></tr> +<tr><td><code>p *</code></td> <td>zero or more repetitions</td></tr> +<tr><td><code>p +</code></td> <td>one or more repetitions</td></tr> +<tr><td><code>p^num</code></td> <td>exactly <code>n</code> repetitions</td></tr> +<tr><td><code>p^+num</code></td> + <td>at least <code>n</code> repetitions</td></tr> +<tr><td><code>p^-num</code></td> + <td>at most <code>n</code> repetitions</td></tr> +<tr><td><code>p -> 'string'</code></td> <td>string capture</td></tr> +<tr><td><code>p -> "string"</code></td> <td>string capture</td></tr> +<tr><td><code>p -> num</code></td> <td>numbered capture</td></tr> +<tr><td><code>p -> name</code></td> <td>function/query/string capture +equivalent to <code>p / defs[name]</code></td></tr> +<tr><td><code>p => name</code></td> <td>match-time capture +equivalent to <code>lpeg.Cmt(p, defs[name])</code></td></tr> +<tr><td><code>p ~> name</code></td> <td>fold capture +equivalent to <code>lpeg.Cf(p, defs[name])</code></td></tr> +<tr><td><code>& p</code></td> <td>and predicate</td></tr> +<tr><td><code>! p</code></td> <td>not predicate</td></tr> +<tr><td><code>p1 p2</code></td> <td>concatenation</td></tr> +<tr><td><code>p1 / p2</code></td> <td>ordered choice</td></tr> +<tr><td>(<code>name <- p</code>)<sup>+</sup></td> <td>grammar</td></tr> +</tbody></table> +<p> +Any space appearing in a syntax description can be +replaced by zero or more space characters and Lua-style comments +(<code>--</code> until end of line). +</p> + +<p> +Character classes define sets of characters. +An initial <code>^</code> complements the resulting set. +A range <em>x</em><code>-</code><em>y</em> includes in the set +all characters with codes between the codes of <em>x</em> and <em>y</em>. +A pre-defined class <code>%</code><em>name</em> includes all +characters of that class. +A simple character includes itself in the set. +The only special characters inside a class are <code>^</code> +(special only if it is the first character); +<code>]</code> +(can be included in the set as the first character, +after the optional <code>^</code>); +<code>%</code> (special only if followed by a letter); +and <code>-</code> +(can be included in the set as the first or the last character). +</p> + +<p> +Currently the pre-defined classes are similar to those from the +Lua's string library +(<code>%a</code> for letters, +<code>%A</code> for non letters, etc.). +There is also a class <code>%nl</code> +containing only the newline character, +which is particularly handy for grammars written inside long strings, +as long strings do not interpret escape sequences like <code>\n</code>. +</p> + + +<h2><a name="func">Functions</a></h2> + +<h3><code>re.compile (string, [, defs])</code></h3> +<p> +Compiles the given string and +returns an equivalent LPeg pattern. +The given string may define either an expression or a grammar. +The optional <code>defs</code> table provides extra Lua values +to be used by the pattern. +</p> + +<h3><code>re.find (subject, pattern [, init])</code></h3> +<p> +Searches the given pattern in the given subject. +If it finds a match, +returns the index where this occurrence starts and +the index where it ends. +Otherwise, returns nil. +</p> + +<p> +An optional numeric argument <code>init</code> makes the search +starts at that position in the subject string. +As usual in Lua libraries, +a negative value counts from the end. +</p> + +<h3><code>re.gsub (subject, pattern, replacement)</code></h3> +<p> +Does a <em>global substitution</em>, +replacing all occurrences of <code>pattern</code> +in the given <code>subject</code> by <code>replacement</code>. + +<h3><code>re.match (subject, pattern)</code></h3> +<p> +Matches the given pattern against the given subject, +returning all captures. +</p> + +<h3><code>re.updatelocale ()</code></h3> +<p> +Updates the pre-defined character classes to the current locale. +</p> + + +<h2><a name="ex">Some Examples</a></h2> + +<h3>A complete simple program</h3> +<p> +The next code shows a simple complete Lua program using +the <code>re</code> module: +</p> +<pre class="example"> +local re = require"re" + +-- find the position of the first numeral in a string +print(re.find("the number 423 is odd", "[0-9]+")) --> 12 14 + +-- returns all words in a string +print(re.match("the number 423 is odd", "({%a+} / .)*")) +--> the number is odd + +-- returns the first numeral in a string +print(re.match("the number 423 is odd", "s <- {%d+} / . s")) +--> 423 + +print(re.gsub("hello World", "[aeiou]", ".")) +--> h.ll. W.rld +</pre> + + +<h3>Balanced parentheses</h3> +<p> +The following call will produce the same pattern produced by the +Lua expression in the +<a href="lpeg.html#balanced">balanced parentheses</a> example: +</p> +<pre class="example"> +b = re.compile[[ balanced <- "(" ([^()] / balanced)* ")" ]] +</pre> + +<h3>String reversal</h3> +<p> +The next example reverses a string: +</p> +<pre class="example"> +rev = re.compile[[ R <- (!.) -> '' / ({.} R) -> '%2%1']] +print(rev:match"0123456789") --> 9876543210 +</pre> + +<h3>CSV decoder</h3> +<p> +The next example replicates the <a href="lpeg.html#CSV">CSV decoder</a>: +</p> +<pre class="example"> +record = re.compile[[ + record <- {| field (',' field)* |} (%nl / !.) + field <- escaped / nonescaped + nonescaped <- { [^,"%nl]* } + escaped <- '"' {~ ([^"] / '""' -> '"')* ~} '"' +]] +</pre> + +<h3>Lua's long strings</h3> +<p> +The next example matches Lua long strings: +</p> +<pre class="example"> +c = re.compile([[ + longstring <- ('[' {:eq: '='* :} '[' close) + close <- ']' =eq ']' / . close +]]) + +print(c:match'[==[]]===]]]]==]===[]') --> 17 +</pre> + +<h3>Abstract Syntax Trees</h3> +<p> +This example shows a simple way to build an +abstract syntax tree (AST) for a given grammar. +To keep our example simple, +let us consider the following grammar +for lists of names: +</p> +<pre class="example"> +p = re.compile[[ + listname <- (name s)* + name <- [a-z][a-z]* + s <- %s* +]] +</pre> +<p> +Now, we will add captures to build a corresponding AST. +As a first step, the pattern will build a table to +represent each non terminal; +terminals will be represented by their corresponding strings: +</p> +<pre class="example"> +c = re.compile[[ + listname <- {| (name s)* |} + name <- {| {[a-z][a-z]*} |} + s <- %s* +]] +</pre> +<p> +Now, a match against <code>"hi hello bye"</code> +results in the table +<code>{{"hi"}, {"hello"}, {"bye"}}</code>. +</p> +<p> +For such a simple grammar, +this AST is more than enough; +actually, the tables around each single name +are already overkilling. +More complex grammars, +however, may need some more structure. +Specifically, +it would be useful if each table had +a <code>tag</code> field telling what non terminal +that table represents. +We can add such a tag using +<a href="lpeg.html#cap-g">named group captures</a>: +</p> +<pre class="example"> +x = re.compile[[ + listname <- {| {:tag: '' -> 'list':} (name s)* |} + name <- {| {:tag: '' -> 'id':} {[a-z][a-z]*} |} + s <- ' '* +]] +</pre> +<p> +With these group captures, +a match against <code>"hi hello bye"</code> +results in the following table: +</p> +<pre class="example"> +{tag="list", + {tag="id", "hi"}, + {tag="id", "hello"}, + {tag="id", "bye"} +} +</pre> + + +<h3>Indented blocks</h3> +<p> +This example breaks indented blocks into tables, +respecting the indentation: +</p> +<pre class="example"> +p = re.compile[[ + block <- {| {:ident:' '*:} line + ((=ident !' ' line) / &(=ident ' ') block)* |} + line <- {[^%nl]*} %nl +]] +</pre> +<p> +As an example, +consider the following text: +</p> +<pre class="example"> +t = p:match[[ +first line + subline 1 + subline 2 +second line +third line + subline 3.1 + subline 3.1.1 + subline 3.2 +]] +</pre> +<p> +The resulting table <code>t</code> will be like this: +</p> +<pre class="example"> + {'first line'; {'subline 1'; 'subline 2'; ident = ' '}; + 'second line'; + 'third line'; { 'subline 3.1'; {'subline 3.1.1'; ident = ' '}; + 'subline 3.2'; ident = ' '}; + ident = ''} +</pre> + +<h3>Macro expander</h3> +<p> +This example implements a simple macro expander. +Macros must be defined as part of the pattern, +following some simple rules: +</p> +<pre class="example"> +p = re.compile[[ + text <- {~ item* ~} + item <- macro / [^()] / '(' item* ')' + arg <- ' '* {~ (!',' item)* ~} + args <- '(' arg (',' arg)* ')' + -- now we define some macros + macro <- ('apply' args) -> '%1(%2)' + / ('add' args) -> '%1 + %2' + / ('mul' args) -> '%1 * %2' +]] + +print(p:match"add(mul(a,b), apply(f,x))") --> a * b + f(x) +</pre> +<p> +A <code>text</code> is a sequence of items, +wherein we apply a substitution capture to expand any macros. +An <code>item</code> is either a macro, +any character different from parentheses, +or a parenthesized expression. +A macro argument (<code>arg</code>) is a sequence +of items different from a comma. +(Note that a comma may appear inside an item, +e.g., inside a parenthesized expression.) +Again we do a substitution capture to expand any macro +in the argument before expanding the outer macro. +<code>args</code> is a list of arguments separated by commas. +Finally we define the macros. +Each macro is a string substitution; +it replaces the macro name and its arguments by its corresponding string, +with each <code>%</code><em>n</em> replaced by the <em>n</em>-th argument. +</p> + +<h3>Patterns</h3> +<p> +This example shows the complete syntax +of patterns accepted by <code>re</code>. +</p> +<pre class="example"> +p = [=[ + +pattern <- exp !. +exp <- S (grammar / alternative) + +alternative <- seq ('/' S seq)* +seq <- prefix* +prefix <- '&' S prefix / '!' S prefix / suffix +suffix <- primary S (([+*?] + / '^' [+-]? num + / '->' S (string / '{}' / name) + / '=>' S name) S)* + +primary <- '(' exp ')' / string / class / defined + / '{:' (name ':')? exp ':}' + / '=' name + / '{}' + / '{~' exp '~}' + / '{' exp '}' + / '.' + / name S !arrow + / '<' name '>' -- old-style non terminals + +grammar <- definition+ +definition <- name S arrow exp + +class <- '[' '^'? item (!']' item)* ']' +item <- defined / range / . +range <- . '-' [^]] + +S <- (%s / '--' [^%nl]*)* -- spaces and comments +name <- [A-Za-z][A-Za-z0-9_]* +arrow <- '<-' +num <- [0-9]+ +string <- '"' [^"]* '"' / "'" [^']* "'" +defined <- '%' name + +]=] + +print(re.match(p, p)) -- a self description must match itself +</pre> + + + +<h2><a name="license">License</a></h2> + +<p> +Copyright © 2008-2015 Lua.org, PUC-Rio. +</p> +<p> +Permission is hereby granted, free of charge, +to any person obtaining a copy of this software and +associated documentation files (the "Software"), +to deal in the Software without restriction, +including without limitation the rights to use, +copy, modify, merge, publish, distribute, sublicense, +and/or sell copies of the Software, +and to permit persons to whom the Software is +furnished to do so, +subject to the following conditions: +</p> + +<p> +The above copyright notice and this permission notice +shall be included in all copies or substantial portions of the Software. +</p> + +<p> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, +INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, +DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +</p> + +</div> <!-- id="content" --> + +</div> <!-- id="main" --> + +</div> <!-- id="container" --> + +</body> +</html> |