=================================================
PRegEx Version 2.0 Specification and Documentation 
=================================================

What's New in 2.0?

    See section "What's New in 2.0", later in this file.

What is PRegEx?

    PRegEx is a free, cross-platform scripting Xtra for Director 11+.

    (For Director 7-10, you must download and use PRegEx version 1.0)

    It does searching, replacing, data extraction, and more.

    It provides the search features of PCRE (the Perl-Compatible
    Regular Expression library from http://pcre.org/), while adding
    its own replace capabilities.

    It also supplies Lingo versions of some powerful features of Perl
    pertaining to manipulating string data, lists, and property lists,
    and converting between all those different formats.

    You don't have to know anything about Perl to use it.

    You don't have to know about regular expressions to use it.  But
    you can do some pretty cool stuff if you do know about them.

    It uses the iconv library for full support of Unicode and all
    other character (text file) encoding formats.

Who should use it?

    If you have ever wished you could use Lingo to:

     - do any kind of text searching
     - modify strings
     - parse anything
     - extract data from a file
     - standardize data formats
     - clean/canonicalize/validate user-provided data fields
     - manipulate lists and property lists
     - copy or deep-copy lists / property lists
     - reverse lists
     - convert a list of one kind of thing into another kind of thing
     - use custom sort functions to sort lists
     - sort lists without modifying the original
     - filter lists
     - deal with binary data buffers in Lingo
     - do any of the above with very large string buffers
     - call a handler, passing arguments, and get return value
     - have a way for a callback function to signal its caller
     - quickly read/write entire files into/from memory
     - globally map characters in buffers
     - convert files between different character encodings
     - etc.

   ... then PRegEx is for you.

Help! What is a Regular Expression?  What's going on here?

    Please see the Introduction and Examples sections near the end of
    this doc.  There are also lots of helpful tutorials on the web.

    If you have not used regular expressions before, then as you learn
    them, you will hardly believe how powerful they are.  They are
    like a whole new programing language unto themselves.  Enjoy.

What does it cost?

    Nothing.  PRegEx is a free, open-source project.

    See "PRegEx Licensing", below, for full details.


Where do I get the latest version?

    PRegEx released on the Web site http://openxtras.org/.

    Latest updates, notes, or issues will be posted there, too.


Who made it?

    PRegEx authors are:

      Chris Thorman <chris@thorman.com>  
         Ravi Singh <ravi@ravware.com>

    Philip Hazel (see below) wrote PCRE, upon which PRegEx heavily
    relies, but he was not directly involved in PRegEx itself.


What other libraries is it based on?

    PCRE 7.8

    PCRE, the regular expression library that PRegEx uses, is included
    with this distribution.  It was written by:

    Philip Hazel
    University of Cambridge Computing Service,
    Copyright (c) 1997-2008 University of Cambridge

    Please see http://www.pcre.org/license.txt for more info.

    ICONV 1.12

    The iconv library enables the file-reading and -writing features.
    It is available from the Free Software Foundation, here:
    http://www.gnu.org/software/libiconv/.  It is licensed under LGPL.

    DIRECTOR XDK 11

    Of course, PRegEx also uses MOA, the Macromedia Open Architecture,
    and is built using the Director 11 XDK from Adobe at
    http://www.adobe.com/.

Who supports it?

    Nobody supports PRegEx for free.  It's free to begin with.
    However...


Can I pay for support or additional features?

    If you need support for PRegEx for a project-critical need, we
    recommend that you hire someone to support that need.

    Because the source is OPEN, you are completely free to approach
    and make an offer to any anyone you like, and they are free to add
    your custom features or create any other derivative work you may
    require, subject only to the liberal licensing restrictions
    outlined in this document.

    You may especially wish to approach RavWare, one of the companies
    that helped write PRegEx.  Ravware is in the business of creating
    Xtras for others.  (See complete description up above.)
    http://ravware.com/

    Please do not be offended if the PRegEx authors or others that you
    approach are unable to assist you.  We apologize in advance if a
    lack of free or inexpensive or even available support means you
    are unable to use PRegEx for your project.

    On the other hand, we believe PRegEx is quite robust in its current
    feature set and anticipate you will have few problems making use
    of it.


Can I see some examples?  

    1) Some function descriptions include examples.

    2) See "Examples" section at end.

    3) See PRegExTestMovie.dir, which you should have received with
       this package.  It has a full test suite which can be used to
       torture-test every feature of the Xtra, including heavy leak
       testing.  There are literally hundreds of usage examples there.
       It also has a few fun little features that let you import the
       spec file you are reading now and manipulate it.

       
How well tested is it?

    We feel that PRegExTestMovie.dir extensively tests all PRegEx
    features by calling it literally millions of times in 30 seconds
    or so, and thereby demonstrates that PRegEx is free of any leaks
    and that it performs with jaw-dropping speed.  Please try to prove
    us wrong.  We'd be grateful for bug reports.


Where do I send bug reports?

    Please send reports of confirmed or suspected bugs to:

        PRegEx Bugs <pregex-bugs@openxtras.org>

    Do not send the source code for your project.  Send the simplest
    possible 2-5-line example or set of steps, or a simple test movie
    that demonstrates the problem (without anything else in it).  Or,
    best yet, send a modified copy of PRegExTestMovie.dir with a new
    test added that demonstrates the problem.

    Be sure to state clearly in your report what you expected to
    happen, what did happen instead, and why you believe it's an error
    in the software.

    Bug reports that include a Lingo example that conclusively
    demonstrates the problem will get attention more quickly.

    Please be aware that we will be grateful for the reports, but may
    or may not have the time to reply.  


===============================================
PRegEx Licensing
===============================================

How did PRegEx get here?

    PRegEx is an "open-source" project.  


What do I get for free?

    You are free to use the accompanying version of the PRegEx Xtra in
    any way you see fit: in any project, for any purpose, at any time,
    now, or in the future, or in the past, free of charge.


Can I change the PRegEx source code?

    You may create derivative versions of the Xtra, or re-use any
    source code you find in it, but if you do so for pay or profit,
    you must provide the recipient with both the original, full, PRegEx
    package, including source code, along with any modifications you
    have made, including source code.  It would also be polite but not
    required to contribute the derived version back to the copyright
    holder via the contact information that you will find at
    http://openxtras.org/.


Is PRegEx supported or guaranteed to work?

    No! PRegEx is provided without support or warranty of any kind.  In
    particular, nobody guarantees that this code is fit for any
    purpose, or that it will not cause you and your customers great
    physical harm when you use it.  In fact, assume it will cause harm
    until you have tested it to your own satisfaction.  You accept all
    risks associated with using this software, should you choose to do
    so.
      

Can I contribute?
    
    The best way you can contribute is to give YOUR TIME to test,
    review, use, verify, and debug this code, to make it better,
    stronger, faster, and more powerful for others.


Can I contribute financially?

    If you find that this Xtra was insanely useful, which you will,
    and then you also feel motivated to contribute $$ to help offset
    its considerable development costs and express gratitude for the
    hours and weeks of time it has saved you, or the impossible
    projects it made possible, please log on to http://openxtras.org/
    and select one of the contribution options shown there.
    Contributions will be used to help maintain the OpenXtras web site
    and anything left over will be used to feed and clothe the
    authors' families.


What about Shockwave?

    PRegEx is not currently Shockwave-safe, and the authors do not
    intend to do any work or spend any $$ to make it so.  However, you
    have the full source here.  You're free to accept the challenge --
    and the legal responsibility -- for making a Shockwave-safe
    version for whatever use you desire.  Just be sure you follow the
    guidelines laid out in this document if you distribute modified
    versions of PRegEx to anyone.


What about future versions?
    
    This liberal licensing policy may or may not apply to future
    versions of PRegEx created by Chris Thorman, the copyright holder.
    
    However, this liberal licensing policy will always apply to this
    and earlier versions and to any derivative works based on it/them.


-------------------------------------------------------------------------
Regular Expression Xtra Licensing Statement
Version 2.0
-------------------------------------------------------------------------

This is a Scripting Xtra for Macromedia Director which lets you use regular
expressions as implemented by PCRE http://pcre.org/, plus a whole lot more.

Written by:

      Chris Thorman <chris@thorman.com>
         Ravi Singh <ravi@ravware.com>

Copyright (c) 2001-2008 Chris Thorman

-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
computer system, and to redistribute it freely, subject to the following
restrictions:

1. This software is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

2. The origin of this software must not be misrepresented, either by
   explicit claim or by omission.

3. Altered versions must be plainly marked as such, and must not be
   misrepresented as being the original software.

4. If PRegEx is embedded in any software that is released under the GNU
   General Purpose License (GPL), then the terms of that license shall
   supersede any condition above with which it is incompatible.

5. The PCRE and iconv and Director XDK components have their own
   licensing requirements, with which you obviously should comply.

(Thanks to Philip Hazel, creator of PCRE, for the above licensing statement.)
-----------------------------------------------------------------------------

=========================================
What's New in 2.0
=========================================

Mac OS X Universal Binary
-------------------------

    PRegEx is now a Universal binary.  That means it runs natively on
    Intel-based Macs, and also on older PowerPC (PPC) Macs, without
    emulation.  However, it is now Mac OS X only.  (In fact, it only
    supports v.10.4 ("Tiger") and later, the same as Director 11.)
    

Director 11+ Only
-----------------

    PRegEx 1.0 supported Director versions 7-10.  The older version
    does NOT work on Dir. 11+ (even if it seems to work on Windows).
        
Unicode
-------

    In Director 11, Macromedia changed the internal string format to
    UTF-8 (Unicode).  This is great news, but completely changes the
    way that PRegEx needs to work.  Here is a summary of the changes:

    Reading/Writing files:

        Reading files into memory and writing them back out again now
        requires careful attention to text encodings.  (In Director
        7-10, all files were simply assumed to be MacRoman or
        Windows1252 files, whether they were or not, and this was OK).
        The great news is that PRegEx now supports essentially *all*
        known text file formats (by fully incorporating the
        open-source iconv library), plus some additional custom
        formats that will be helpful to PRegEx users.  See
        ReadFileToString, and WriteStringToFile for the details.

    Length limit "feature" when writing files:      

        Users complained that the length-limit feature of the
        now-deprecated WriteEntireFile function was dangerous enough
        to be more like a bug.  This feature has been dropped in the
        successor function, WriteStringToFile, which will now always
        write the entire string to the file.  Please be aware of this
        when porting your projects.
                
    Escape Codes:           

        PRegEx supports "interpolation" of special escape codes to
        generate special characters in strings.  Interpolation is used in
        3 places: Replace (in the replacement string), Translate (in the
        input and output mapping strings), and Interpolate.  In Director
        7-10, any 8-bit value was legal in strings.  In Director 11, all
        characters in strings must be valid UTF-8, or Director could
        crash.  So the meanings of the following escapes have changed:
    
            \200-\377 octal escapes - formerly inserted 8-bit char/byte, now Unicode code points 128-255
            \x80-\xFF hex escapes - formerly inserted 8-bit char/byte, now Unicode code points 128-255
    
        And these new escapes have been added:
    
            \400-777 new octal escapes for Unicode code points 257 through 511
            \x{0}-\x{7FFFFFFF} new hex escapes for *any* valid Unicode code points
    
        Please note that not all Unicode code points between 0 and
        7FFFFFFF are valid!  You should restrict yourself to valid Unicode
        code points as defined in the latest Unicode specifications.  
    
        Also note that the UTF-8 hexadecimal representations of Unicode
        characters are NOT the same as the Unicode code point numbers.
        For example, the correct Unicode code point specification for
        "cents" sign is U+00A2, which can be specified as \x{A2} or
        \x{00A2}. The 2 hex bytes C2A2 describe the UTF-8 encoding of that
        symbol, but the escape code \x{C2A2} can NOT be used to
        interpolate one of these values into a string.  PRegEx provides no
        way to expressly indicate the UTF-8 representation of a character.
        Director and PRegEx and PCRE and iconv always figure out the UTF-8
        encodings for you.
    
        These escape codes are the same as PCRE's octal and hexadecimal
        escape codes, so you can use the same encodings in both the Search
        and Replace strings of any PRegEx function.

    Translate Function:

    Because of how Unicode works, the Translate function can no longer
    work with non-ASCII characters.  Specifically:

        - Any non-ASCII characters in the InputTable and OutputTable
          will simply be ignored, as if they were not present at all.

        - If used in a "range specifier", non-ASCII characters will
          prevent the range from being recognized as a range.

        - Any non-ASCII characters in the SrchStrL (string being
          modified) will be untouched.  I.e. they will never be
          modified by the Translate function.

    Quotemeta function:
          
        Quotemeta formerly would put a backslash in front of non-ASCII
        characters.  Now, it will not.  (Those characters are always
        literal in PCRE.)

    String lengths:

        As in Lingo, string lengths returned by PRegEx functions and
        accepted as arguments are always in terms of character length,
        never byte length.  (Prior to Director 11 and Unicode/UTF-8, these
        concepts were the same.)  For strings that are 100% ASCII, the
        lengths are the same.  For non-ASCII strings, the length in bytes
        is dependent on the UTF-8 representation.  
    
        The exception is when writing a string to a file: the return value
        is the size of file on disk, in bytes, and is dependent upon the
        character encoding chosen and the content of the string, possibly
        being higher or lower than the number of characters written.


Windows Binary
--------------

    The Release binary of PRegEx on Windows is now compiled with
    Maximize Speed(/O2) optimization in Visual C++.
    
    
Bug Fixes   
---------

Fixed in 2.0:

    - Calling join() with an empty list crashed the Mac (and maybe Windows)

    - ListToSPListSym()/SetAProp() crashes on Windows during leak testing
  
    - Could not write file names longer than 31 characters

    - The "s" option did not always function correctly
    
    - An error message said "...with setting" rather than "...without
      setting".


2.0 API Updates
---------------

New functions:
    
    PRegEx_ReadFileToString  (FilePath, TextEncoding)  ==> StringBufferList
    PRegEx_WriteStringToFile (FilePath, TextEncoding, StringBufferList) ==> 1/-Err
    PRegEX_GetICONVVersion   () ==> Version string 
    
    
Deprecated:
    
    PRegEx_ReadEntireFile  (FilePath)  ==> StringBufferList
    PRegEx_WriteEntireFile (FilePath, StringBufferList) ==> 1/-Err


Removed:

    PRegEx_SearchBegin   (SrchStrL, RE, [Opts]) ==> 1 (success) or -Err
    PRegEx_SearchContinue() ==> 1: Found; 0: Done; Negative: -Err   
    

New build methodology (for building the Xtras from source)
----------------------------------------------------------

    - Better supports source control techniques
    - Uses modern development tools (XCode, Visual Studio .NET 2003 as patched)
    - No longer has to worry about Mac resource forks (new OSX binary format)
    - Uses .zip format instead of .sit for distribution
    - See DeveloperNotes.txt files in make_mac and make_win directories for details

==========================================
PRegEx Quick-Reference / Interface Summary
==========================================

A complete detailed description of all functions follows later in this
document.  This is just a summary for quick reference.

Housekeeping functions:
-----------------------

PRegEx_Clear            ([Complete]) ==> void; partial or complete reset
PRegEx_GetPRegExVersion () ==> Version string of PRegEx   (e.g. "2.0")
PRegEx_GetPCREVersion   () ==> Version string of PCRE     (e.g. "7.8p1")
PRegEX_GetICONVVersion  () ==> Version string of LIBICONV (e.g. "1.12p1")

Search/Replace low-level interface:
-----------------------------------

PRegEx_SetSearchString     (SrchStrL)   ==> True or -Err
PRegEx_SetMatchPattern     (RE, [Opts]) ==> True or -Err
PRegEx_GetNextMatch        ([noBlastBR])==> True or -Err
PRegEx_ReplaceString       (ReplPat)    ==> True or -Err


Search/Replace high-level interface:
------------------------------------

PRegEx_Search        (SrchStrL, RE, [Opts]) ==> FoundCount or -Err
PRegEx_SearchExec    (SrchStrL, RE,  Opts, #Callback, [ArgList])

PRegEx_Replace      (SrchStrL, RE, Opts, ReplPat) ==> FoundCount
PRegEx_ReplaceExec  (SrchStrL, RE, Opts, #ReplFunction, [ArgList])


Search/Extract utilities:
-------------------------

PRegEx_Split               (SrchStrL, RE, [Opts, InitList, Max])=>List
PRegEx_ExtractIntoList     (SrchStrL, RE, [Opts, InitList])=>PList
PRegEx_ExtractIntoSPList   (SrchStrL, RE, [Opts, InitList])=>PList
PRegEx_ExtractIntoSPListSym(SrchStrL, RE, [Opts, InitList])=>PList


Match Status functions:
-----------------------

PRegEx_FoundCount     () ==> Running or final count of match events

PRegEx_GetPos         () ==> Char pos where last left off; next begins
PRegEx_SetPos         (num) ==> Change pos (0 <= Pos <= buffer len)

PRegEx_GetMatchBRCount() ==> Number of back refs in last matched RE

PRegEx_GetMatchString ([num]) ==> Last matched str (entire -or- BR #)
PRegEx_GetMatchStart  ([num]) ==> Start pos of ""  (entire -or- BR #)
PRegEx_GetMatchLen    ([num]) ==> Length of ""     (entire -or- BR #)


Error-handling functions:
-------------------------

PRegEx_LastErrCode        () ==> Error code for last failed call
PRegEx_DescribeError       ([Err]) ==> Error msg (Err or LastErrCode)

PRegEx_CompiledOK          () ==> True if last expression compiled

PRegEx_MemError            () ==> True if last op failed due to memory
PRegEx_MemErrorSticky      () ==> True if any op has failed due to mem
PRegEx_MemErrorStickyReset () ==> Reset sticky err; return prev value

Preference flags:
-----------------

PRegEx_ErrorsToMessageWindow    ([Bool]) ==> Echo all errors to Msg wind.


String-manipulation utility functions: 
--------------------------------------

PRegEx_QuoteMeta  (String) ==> String with RE-special chars quoted
PRegEx_Translate  (SrchStrL, InputTable, OutputTable) ==> ChangeCount
PRegEx_Interpolate(String, [VarsPList]) ==> String


List-manipulation utility functions:
------------------------------------

PRegEx_CopyList(ListOrPList, [Deep, InitList]) ==> CopiedListOrPList

PRegEx_Grep   (List, RE, [Opts])         ==> NewList ("PRegEx mode")
PRegEx_Grep   (List, #Filter, [ArgList]) ==> NewList ("Filter mode")

PRegEx_Map    (List, #MapFunction, [ArgList]) ==> MappedList

PRegEx_Sort   (List, DeepCopy, #SortFunction, [ArgList]) ==> NewList
PRegEx_Reverse(List, [DeepCopy])    ==> Reversed copy
PRegEx_Join   (List, [DelimiterString]) ==> String

PRegEx_Keys    (PList, [InitList]) ==> KeyList
PRegEx_Values  (PList, [InitList]) ==> ValueList

PRegEx_GetSlice(List, Keys, [InitList]) ==> SliceList
PRegEx_SetSlice(List, Keys, Values) ==> List

PRegEx_PListToList       (PList, [InitList])  ==> List
PRegEx_PListToListStrings(PList, [InitList])  ==> List

PRegEx_ListToSPList      (List,  [InitPList]) ==> SPList
PRegEx_ListToSPListSym   (List,  [InitPList]) ==> SPList


General utility functions: 
--------------------------

PRegEx_ReadFileToString  (FilePath, TextEncoding)  ==> StringBufferList
PRegEx_WriteStringToFile (FilePath, TextEncoding, StringBufferList) ==> 1/-Err

Deprecated functions included for backward compatibility:

PRegEx_ReadEntireFile  (FilePath)  ==> StringBufferList
PRegEx_WriteEntireFile (FilePath, StringBufferList) ==> 1/-Err

Callback-related functions:
---------------------------

PRegEx_CallHandler   (#CallbackFunction, [ArgList1, ArgList2])

PRegEx_CallbackAbort([bool]) ==> Stop operation and fail with error
PRegEx_CallbackStop ([bool]) ==> Stop before this iteration, but succeed
PRegEx_CallbackLast ([bool]) ==> Stop after this iteration, but succeed
PRegEx_CallbackSkip ([bool]) ==> Skip this iteration, but continue


Error code constants:
---------------------
PRegEx_ErrCode_OutOfMemory()
PRegEx_ErrCode_SearchStrLMustBeList()
PRegEx_ErrCode_SearchStrLMustContainString()
PRegEx_ErrCode_SearchStrLLengthArgMustBeInteger()
PRegEx_ErrCode_REMustNotBeEmpty()
PRegEx_ErrCode_REDidNotCompile()
PRegEx_ErrCode_ReplPatMustBeString()
PRegEx_ErrCode_CallbackFuncMustBeSymbol()
PRegEx_ErrCode_CallbackFuncDidNotReturnString()
PRegEx_ErrCode_QuoteMetaNeedsString()
PRegEx_ErrCode_TriedToMatchWithoutSearchStrL()
PRegEx_ErrCode_TriedToMatchWithoutSearchPattern()
PRegEx_ErrCode_TriedToReplaceWithoutMatching()
PRegEx_ErrCode_CallbackRequestedAbort()
PRegEx_ErrCode_UnexpectedMOAError()
PRegEx_ErrCode_UnexpectedInternalError()
PRegEx_ErrCode_CallbackFunctionNotFound()
PRegEx_ErrCode_ExpectedListArgument()
PRegEx_ErrCode_ExpectedPListArgument()
PRegEx_ErrCode_GrepNeedsFunctionNameOrPRegEx()
PRegEx_ErrCode_ExpectedStringArgument()
PRegEx_ErrCode_SortFunctionDidNotReturnInteger()
PRegEx_ErrCode_ListIndicesMustBeIntegers()
PRegEx_ErrCode_FileNotFound()
PRegEx_ErrCode_ErrorOpeningFile()
PRegEx_ErrCode_ErrorReadingFile()
PRegEx_ErrCode_ErrorWritingFile()


Perl-ish shorter function names:
-------------------------------

    These perl-friendlier "aliases" to certain of the PRegEx functions
    have been provided.  Their syntax is more evocative for Perl
    programmers, and others will appreciate their brevity.

    re_m            ==> PRegEx_Search   (aka "match")
    re_s            ==> PRegEx_Replace  (aka "substitute")
    re_search       ==> PRegEx_Search
    re_replace      ==> PRegEx_Replace

    re_get          ==> PRegEx_GetMatchString
    re_pos          ==> PRegEx_GetPos

    re_extract      ==> PRegEx_ExtractIntoList
    re_extractp     ==> PRegEx_ExtractIntoSPList
    re_extractps    ==> PRegEx_ExtractIntoSPListSym

    re_call         ==> PRegEx_CallHandler
    re_abort        ==> PRegEx_CallbackAbort
    re_stop         ==> PRegEx_CallbackStop
    re_last         ==> PRegEx_CallbackLast
    re_skip         ==> PRegEx_CallbackSkip

    re_quotemeta    ==> PRegEx_QuoteMeta
    re_tr           ==> PRegEx_Translate
    re_i            ==> PRegEx_Interpolate

    re_split        ==> PRegEx_Split
    re_join         ==> PRegEx_Join

    re_grep         ==> PRegEx_Grep
    re_map          ==> PRegEx_Map
    re_sort         ==> PRegEx_Sort
    re_reverse      ==> PRegEx_Reverse
    re_copy         ==> PRegEx_CopyList

    re_keys         ==> PRegEx_Keys
    re_values       ==> PRegEx_Values

    re_slice        ==> PRegEx_GetSlice
    re_slice_set    ==> PRegEx_SetSlice

    re_list         ==> PRegEx_PListToList
    re_list_strs    ==> PRegEx_PListToListStrings

    re_hash         ==> PRegEx_ListToSPList
    re_hash_syms    ==> PRegEx_ListToSPListSym

    re_read2        ==> PRegEx_ReadFileToString
    re_write2       ==> PRegEx_WriteStringToFile

    re_read         ==> PRegEx_ReadEntireFile  (NOTE: deprecated)
    re_write        ==> PRegEx_WriteEntireFile (NOTE: deprecated)

    re_err          ==> PRegEx_LastErrCode
    re_debug        ==> PRegEx_ErrorsToMessageWindow


=========================================
PRegEx Return Values: General Principles
=========================================

Unless otherwise noted, search/replace related functions return an
integer saying how many matches were successfully made, even if fewer
replacements were completed due to some being skipped by the program.

For functions returning match counts, a return value of 0 means
successful operation, but means that 0 matches were found (and of
course 0 replacements were done).

Any NEGATIVE INTEGER returned by any function is an ERROR CODE, which
may be interpreted using the Error-related features of PRegEx,
described Later.

Some functions return a 1 meaning successful completion or a negative
error code if an error occurred.

Consequently, you should never treat the return of PRegEx functions as
Booleans when checking whether a match was done, because Lingo
considers all non-zero numbers, even negative numbers, to be "true".

Instead, you should check integer results for being > 0 or > -1,
depending on your interest.

Wrong:   if (PRegEx_Search(str, "foo", "g")    ) then put "Found!"

Right:   if (PRegEx_Search(str, "foo", "g") > 0) then put "Found!"

Most functions that do not ordinarily return integers will either
return void or empty strings or empty lists when there is an error
encountered, and their error code is then set in the LastErrCode flag,
which may subsequently be queried.

Remember, a failure to match is never an "Error" from PRegEx's point of
view.  An "Error" always means a parameter error, syntax error, or
runtime error, such as memory or disk problems.  A failure to match is
viewed as the successful completion of a match request whose answer
happened to be "zero matches".


=========================================
PRegEx Parameter: General Descriptions
=========================================

In all function prototypes shown above and below, sample argument
names are used consistently to represent arguments of a particular
type or meeting certain criteria.  For example, "RE" always means a
Regular Expression string, "Opts" always means a 0-7-character string
of option flags, etc.

This section is a glossary explaining each of these standard argument
types.  Unless otherwise noted, the descriptions here apply to all
functions in which these named parameters appear.


RE -- Regular Expression pattern

Example: "(dog)|(cat)"

This is a simple Lingo string containing literal characters and/or
special character sequences that specify what is to be searched for.

See the PCRE and/or Perl documention for precise details of the
regular expression syntax syntax.


Opts -- Options string

Example: "gisx"

A string of 0-7 option flag chars in any order.  

Any other type of argument is treated like an empty string ("") and
results in all options being turned off.  Any other characters in Opts
are silently ignored.

The 7 option flags are:

     Pattern matching flags:

      i == case Insensitive matching 

           Corresponds to PCRE option PCRE_CASELESS 

      s == "Single line" mode (. and \s match newline)

           Corresponds to PCRE option PCRE_DOTALL

      m == "Multi line" mode (^ and $ match internal line start/end)

           Corresponds to PCRE option PCRE_MULTILINE 

      x == eXtended mode

           Ignores whitespace in patterns; allows comments.

           Corresponds to PCRE option PCRE_EXTENDED

     Behavior control flags:

      t == sTudy; optimize the PRegEx by "Studying" it first.

      g == Global; re-do Srch or Srch/Repl till no more match

      e == Exec; call a callback function on each iteration
           (see also SearchExec and ReplaceExec.)


SrchStrL -- String to be searched ("String Buffer List")

Examples: 
          
    ["my data my data my data"]        -- string only
    ["my data my data my data", 23]    -- with optional length
    ["my data my data my data", 23, 0] -- 0 means no NUL chars

You must pass search string buffers to PRegEx in a special, arguably
unusual, way.  Instead of passing a string as you normally would when
calling a Lingo command, you pass a LIST CONTAINING A STRING upon
which searching/replacing commands can operate.

SrchStrL is a regular Lingo list.  The operation occurs on the FIRST
ELEMENT of the list.  If SrchStrL is not a list, it's a param error. A
non-string first element or an empty list is considered a parameter
error. 

The second, optional, element of the list is a length value.  If
supplied, it is taken to be the intended length (even if not the
actual length) of the first element.  Of course, this value should be
no greater than length(SrchStrL[1]). and no less than zero.

The length value is always specified as a count of *characters*.  When
non-ASCII characters are used (e.g. special, accented, or non-Roman
characters), then the length of the string in *characters* may not be
the same as the length of the string in memory, or of the length of
the string when saved in a file.

String buffer to be searched may contain any amount of binary data,
including ASCII zero (NUL), which does NOT signify end-of-string.
(However, you should be aware of bugs in Director's Message and Debug
windows which incorrectly display string buffers that have NULs in
them as if the buffers were truncated at that position.  Don't worry:
the data is still in the buffer even if it is printed out wrong.)

Supplying the length element overrides the Xtra's perceived length of
the buffer.  This allows the search or other operation to take place
on a reduced subset of the string. (Warning: doing a replace on this
string will truncate it at the specified point.  Writing a file from
this string will also truncate the resulting file.)

The third, optional, element is a boolean integer (0 or 1) which says
whether the string buffer in element 1 is known to contain NUL
characters (this is set for you by ReadFileToString, for your
convenience, because you may want to use the data with
non-NUL-friendly Xtras and it will be helpful to know if it has
"binary" data that could trump them up).  Its value is never observed
by PRegEx and so does not alter the behavior of any PRegEx functions
-- all PRegEx functions are NUL-safe.  They never assume that your
data does not contain NULs.

Other elements of the list, if any, are left untouched by any
functions that modify your SrchStrL.  Normally, you would not use this
list for storage of other data.

WHY THE LIST/STRING APPROACH?  Storing the string in a list is how we
do pass-by-value to minimize copying of the string, and also allow you
to hold the string in a single, named Lingo variable, while calling
multiple Search and/or Replace commands that will modify the string
buffer in place for you without replacing or renaming your variable.
This also allows you to pass your string buffer around from one Lingo
function to another and to PRegEx functions without copies of the
string data getting made each time you make a function call.

This sample would read a tab-delimited file of settings and values:

set File = PRegEx_ReadEntireFile("@:SettingsFile.txt") -- is a SrchStrL
PRegEx_Replace(File, "(\x0D\x0A)|[\x0D\x0A]", "g", "\n") -- line ends
PRegEx_Replace(File, "\n+", "g", "\n") -- remove blank lines
PRegEx_Replace(File, "\t+", "g", "\t") -- multiple tabs --> single tab
set SettingProps = PRegEx_ExtractIntoSPList(File, "(.*?)[\t\n]", "g")


ReplPat -- Replacement string or pattern

Example: "Date: \1 Time: \3 Place: \2\n"
    
This is the replacement string for any PRegEx functions that do
replacing. It can be a simple string, OR it may also contain special
escape sequences to specify backreferences \1, \2, etc. or other
special characters.

Here is a complete list of special "escape codes" recognized within
ReplPat string:

\\            a single backslash character
\t            a single tab character (same as numtochar(9))
\n            a single newline character (aka Lingo "return" constant; aka numtochar(13); aka Macintosh newline; aka Carriage Return; aka CR)
\x##          a single UTF-8 character with 2-digit hex value, range 00 - FF (Unicode Code point number)
\x{#.......}  a single UTF-8 character with 1-8-digit hex value, range 0 - 7FFFFFFF (Unicode Code point number)
\0            a single UTF-8 NUL character (aka ASCII zero byte)
\# or \##     insert backreference by number (only recognized in replacement strings or after a match) (backslash followed by 1 or 2 digits), range 1-99
\###          a single UTF-8 character with three-digit octal value, range 000-777 (Unicode Code point number)
\(other char) insert the character itself. (e.g. \b = literal "b")
${stringkey}  string key lookup in optional caller-supplied property list (value must be a string)
${#symbolkey} symbol key lookup in optional caller-supplied property list (value must be a string)

The process of interpreting these escape sequences and converting them
into the actual output string is called "interpolation".  It is done
automatically on replacement strings, and may also be done explicitly
by calling the PRegEx function PRegEx_Interpolate().  (It is also done
in the Table arguments to Translate.)

Don't get confused: these sequences are not generally recognized by
Lingo; they are only interpreted within PRegEx search patterns (REs)
and replacement patterns (ReplPats), and by PRegEx_Interpolate().


InitList

For most PRegEx functions whose purpose is to create a list, an
optional InitList parameter may be specified.  If specified, then the
function will begin with that list and modify it, rather than creating
a new list for you.  Otherwise, all list-generating functions
automatically begin with a new, empty, list.

This allows you to progressively build up a list through several
invocations of PRegEx_ routines, or to use any PRegEx_ routines to
append items to an existing list.


ArgList

For any functions that take Callback functions, they also take an
optional ArgList argument (which defaults to [], the empty list).  The
values inside the ArgList will be passed to the callback function,
AFTER any other task-centric values that must be passed.

So, for example, a #FilterFunction that must take a single argument
and return a boolean saying whether that argument should be "in" or
"out", gets passed item to be filtered as its first argument, PLUS
additional arguments, if any, are taken from the supplied ArgList.

Additional arguments could include data to be compared against, or
perhaps other lists or property lists or instance objects that can be
used to access a database or other external resources, or to serve as
persistent state between multiple calls to the callback function.

Using ArgList is a good practice because it lets you call callback
functions by name without relying on global variables to communicate
with those functions -- pass any parameters the function needs in
order to operate in ArgList rather than using globals.


#ReplFunction -- Callback function for replacement

The SYMBOL name of a Lingo handler to be called during one of the
_Replace* commands.

The function is called EVERY time the command makes a successful match
(0 or 1 time if "global" option is off; 0 or more times if "global" is
on).

The return value, which MUST be a string, is inserted as the
replacement text.

The replacement command itself does not pass any arguments to the
function, but you may specify an optional ArgList parameter, whose
elements, if any, will be passed, each time, as arguments to
#ReplFunction.

#ReplFunction may request backrefs or the entire match string by
calling PRegEx_GetMatchString(N), and may discover which of multiple
iterations it is on by calling PRegEx_FoundCount().

Note that there is no way for the #ReplFunction to know whether it is
being called for the last time during a global replace (there is no
final "cleanup" call).

As with all callback functions in PRegEx, #ReplFunction may signal to
the function that is calling it that the function should abort, stop,
skip, or "last" -- see PRegEx_CallbackAbort, etc.

Example of typical uses:  

    - selectively replace based on calculated criteria

    - terminate a replacement early based on calculated criteria

    - look up or translate symbols from a property list or database at
      runtime and insert them into the correct locations in a buffer.

    - extract some data before/while it is being replaced


#Callback -- General-purpose callback function

This is the symbol name of a Lingo handler in the Movie scope that
will be called, generally with arguments optionally supplied by the
calling routine, and may do anything it wishes, but should avoid
actions that would stop playback or otherwise terminate the caller's
context.


=========================================
PRegEx: Detailed Function Descriptions
=========================================

Note: common parameters are described in detail in the section above.
That information is not generally reiterated in the descriptions
below.

Housekeeping functions:
----------------------

PRegEx_Clear           ([Complete]) ==> void; partial or complete reset

    Clears internal state, search strings, back references, buffers,
    error codes, etc, except for MemErrorSticky.

    "Complete" option also clears call stack, if any, callback flags,
    and other info.  DO NOT USE "Complete" option except when first
    starting up.

    Clear is automatically called by all high-level search/replace
    functions, so you should never need to use it.

PRegEx_GetPRegExVersion () ==> Version string of PRegEx (e.g. "1.0")
PRegEx_GetPCREVersion  () ==> Version string of PCRE  (e.g. "3.4")

    As described.


Search/Replace low-level interface:
-----------------------------------

    Note: For best results, avoid using these "low-level" routines
    directly.  They are really intended only for someone who needs to
    directly control the individual steps of setting up a search
    and/or replace, or who, for efficiency reasons, would like to keep
    a single SrchStrL variable and repeatedly apply multiple REs to
    it.  The low-level routines ignore the "global" option.  They
    assume the caller wants to control multiple matches.

PRegEx_SetSearchString     (SrchStrL)   ==> True or -Err

    Sets a new string to be operated on.  Resets all counters and
    buffers and flags, except the match pattern.  Resets Pos to zero.
                          

PRegEx_SetMatchPattern     (RE, [Opts]) ==> True or -Err

    Initializes engine and then compiles new RE.  Sets Opts for
    subsequent operations.  Resets all counters and buffers and flags,
    except the search string.  Resets Pos to zero.


PRegEx_GetNextMatch        ([noBlastBR])==> True or -Err

    Performs one single search event in the current string, using the
    current pattern and options, beginning at the current Pos, either
    the Pos left from the immediate previous search (of any kind), or
    from a Pos you determine by first using SetPos().
                          
    When GetNextMatch succeeeds, any previous global back-reference
    data is replaced by the new back-reference data (see "Match Status
    Functions" below).

    When it fails, all back-reference buffers are cleared out and
    MatchStatus functions will all return zero/empty/void.

    The optional noBlastBR argument tells GetNextMatch to not blow
    away the back-reference buffers when it FAILS, but instead, to
    keep the information there from the previous successful match.

    Important special case: If Entire Match is zero-length (i.e. a
    match succeeded but matched string had no length), then Pos will
    be increased before next the iteration; this guarantees that a
    global match will terminate by stepping through the string
    character-by-character rather than spinning endlessly at the
    starting position.  This behavior applies to all matching
    functions in PRegEx.


PRegEx_ReplaceString       (ReplPat)    ==> True or -Err

    ONLY AFTER a successful match, replaces the entire matched segment
    with ReplPat, after "interpolations" have been performed
    (i.e. inserting back references or other special escape sequences
    into a copy of ReplPat before then inserting the resulting string
    into the search buffer). 

    Note that all Replace functions in PRegEx MODIFY the original
    buffer.  They never return a copy.


Search/Replace high-level interface:
------------------------------------

    You should almost always choose to use these "high-level"
    functions and avoid the "low-level" interface whenever possible.

    Only the high-level functions are aware of the "g" (global) flag.

    These "high-level" search/replace functions, and any other
    functions that use SrchStrL, RE, or Opts arguments, always
    interally call the low-level functions listed above, or their
    equivalents, as needed to perform their documented tasks.

    Their function is abstractly described here partially in terms of
    the low-level functions above; and these routines have the same
    effect as if they were implemented by actually calling the
    low-level routines.

    However, in actual fact, they may or may not be implemented
    exactly that way; for example, doing a global replace is
    implemented more efficiently by doing all the searching in one
    shot and then all of the replacing, rather than by repeatedly
    calling GetNextMatch and ReplaceString.

    Consequently, do not rely on any particular assumptions about the
    contents of a string buffer DURING the course of operation of a
    single high-level Replace (say, for example, inside a callback
    function being called in the middle of a global Replace).


PRegEx_Search        (SrchStrL, RE, [Opts]) ==> FoundCount or -Err

    Sets up and does a search, comparing SrchStrL to RE.

    If Global, the search is repeated continuously until it cannot
    match anymore.

    Afterwards, the Match Status functions only return information
    pertaining to the LAST successful search done.  If there were zero
    matches, then the Match Status information will all be empty/void.

    Returns the FoundCount or Err code.  In non-global mode, this will
    be 0 or 1, but should not be.  In global mode it will be 0 or
    higher and can be treated as a count of the number of entire
    matches.

    If "e" (exec) option is supplied, then Search behaves exactly like
    SearchExec, documented below.

    Equivalent to:

    - Call PRegEx_SetMatchPattern; or fail if error

    - Call PRegEx_SetSearchString; or fail if error

    - Call PRegEx_GetMatch 1 time or until search fails if global;
      return Err if error; Retain back refs from ultimate successful
      search when in global mode).

    - return PRegEx_FoundCount()

PRegEx_SearchExec    (SrchStrL, RE,  Opts, #Callback, [ArgList])

    Like PRegEx_Search, but takes a #Callback function, which is
    called, with arguments from optional ArgList, after each
    SUCCESSFUL match that takes place.  Callback may use any of the
    Match Status functions to inquire about the current match.


PRegEx_Replace      (SrchStrL, RE, Opts, ReplPat) ==> FoundCount

    Sets up and performs a single or global search and replace in
    SrchStrL using RE and Opts.  ReplPat is interpolated and inserted
    on each successful match.

    If "e" (exec) option is supplied, then Replace behaves exactly
    like ReplaceExec, documented below (ReplPat is replaced by an
    executable #ReplFunction, with optional argument list).

PRegEx_ReplaceExec  (SrchStrL, RE, Opts, #ReplFunction, [ArgList])

    Like Replace, but instead of using a fixed ReplPat string, calls
    #ReplFunction, optionally suplying any arguments from ArgList.
    (Note: Replace does NOT supply any information about the match
    directly to #ReplFunction.  #ReplFunction should use any of the
    MatchStatus routines for that information, if needed.

    #ReplFunction is REQUIRED to return a string each time it is
    called.  Failure to do so causes immediate termination of
    ReplaceExec, with an error code being returned.

    The string returned by #ReplFunction is used as the replacement
    for the entire matched string.  Returning the empty string, then,
    causes the matched string to be deleted from the string buffer.
    Returing PRegEx_GetMatchString(0), causes the original string to
    replace itself, essentially skipping this replacement.

    The string returned by #ReplFunction is not subject to
    interpolation, but rather inserted literally into the buffer. So
    don't try to return "Joe \1 Blow" and expect \1 to convert into
    back-reference.  But you could call "Interpolate" specifically:
    return(PRegEx_Interpolate("Joe \1 Blow")).

    The #ReplFunction may and should use the Callback-related
    Abort/Stop/Skip/Last flags, described later, in order to signal
    ReplaceExec to alter its default looping behavior.


Search/Extract utilities:
-------------------------

    Searching with parentheses and then checking back-references is
    the standard way to retrieve searched/matched data from a string
    buffer.  The Search and Replace functions, combined with the Match
    Status functions, make it easy to extract values one at a time or
    in small clusters.

    The Search/Extract utilities, on the other hand, provide
    convenient ways to extract an arbitrary number of data values from
    a string buffer in one or a few quick operations.  Please study
    the purpose of these functions since they're almost always more
    convenient than the simple search functions:


PRegEx_Split        (SrchStrL, RE, [Opts, InitList, Max])=>List

    "Splits" a string buffer, using the pattern specified in RE as a
    delimiter.  The matched portions of the string are REMOVED, and
    the intervening segements are extracted into a list.  
    
    However, if the RE contains backreferences, then ALL of the
    backreferences generated by the RE, in numeric order, will be
    inserted, each as a separate element, into the resulting list at
    the appropriate point in the list.  This allows retention of all
    the matched portions of the original string, as well.

    Here's another way to think about Split: it's the same as
    PRegEx_ExtractIntoList, but in addition to extracting the
    backreferences from each match, also adds all of the strings
    BETWEEN each matched segment, effectively "split"ting the string
    into multiple strings.

    Optional MaxItems argument, which must be 2 or greater to be
    meaningful, limits the maximum number of items that the list will
    be split into.  (i.e. limits the max number of successful matches
    to (MaxItems - 1)).  Omitting the optional Opts argument or
    omitting the "g" flag from Opts has the same effect as setting Max
    = 2 because only one match will be performed and the string will
    be split into two parts.

    If MaxItems is zero or unspecified, Split will remove any empty
    trailing items that would result if the delimiter RE is found to
    match at the very end of the search string.  In other words,
    splitting "1,2," on comma would yield ["1", "2"].  However, if
    MaxItems is ANY NEGATIVE NUMBER, then empty trailing items will
    not be removed and the result would be ["1", "2", ""].  Note: in
    order to be able to pass MaxItems, you'll be forced to also pass
    values for Opts and InitialList.  These can be defaulted to "" and
    [], respectively.

    Examples:

    put PRegEx_Split(["1 2 3"], "\s+", "g") -- splitting whitespace
    - ["1", "2", "3"]

    put PRegEx_Split(["1 2 3"], "\s+", "g", [], 2) -- max 2 items
    - ["1", "2 3"]

    put PRegEx_Split(["1 2 3"], "(\s+)", "g") -- keeping whitespace
    - ["1", " ", "2", " ", "3"]

    put PRegEx_Split(["1 2 3"], "(\w+)", "g", [],  0) -- delim @ start,end
    - ["", "1", " ", "2", " ", "3"] -- note "" at start, but not end

    put PRegEx_Split(["1 2 3"], "(\w+)", "g", [], -1) -- note Max = -1
    - ["", "1", " ", "2", " ", "3", ""] -- note "" at start, AND at end


PRegEx_ExtractIntoList     (SrchStrL, RE, [Opts, InitList])=>PList

    Does a global or non-global search, putting ALL MATCHED BACK
    REFERENCES (omitting non-matched ones, but keeping empty matches)
    from each iteration into a lingo list; if global, repeats until
    matching fails, gathering up all the back references from all
    iterations along the way.


PRegEx_ExtractIntoSPList   (SrchStrL, RE, [Opts, InitList])=>PList
PRegEx_ExtractIntoSPListSym(SrchStrL, RE, [Opts, InitList])=>PList

    These Extract routines are the same as PRegEx_ExtractIntoList, but
    using a sorted property list; strings extracted using the current
    set of matched backreferences are inserted pairwise into the list.

    Here is how it works... as each complete pair is retrieved:

    - Use first item in pair as the key, second item as the value.

    - Add/Replace an entry into the SPList

    - If odd number of items, then use <void> as final value.

    The properties generated by ExtractIntoSPList are "String"
    properties, which IS allowed in Lingo, and can be absolutely any
    string.

    ExtractIntoSPListSym is identical except that it converts all
    property strings to symbols before inserting them into the list.
    Consequently, it is imperative to ensure that all strings destined
    to become properties can actually be converted into legal Lingo
    symbols.  

    (Lingo places many restrictions on what characters may legally
    appear in property names (aka symbols).  It is your repsonsibility
    to ensure the input is going to be clean, or some funky, broken,
    or illegal symbols could result.)

    Examples:

    put PRegEx_ExtractIntoSPList   (["c d b a", (\w+), "g"])
    -- ["a":"b", "c":"d"]

    put PRegEx_ExtractIntoSPListSym(["c d b a", (\w+), "g"])
    -- [ #a:"b",  #c:"d"]


Match Status functions:
-----------------------

    These functions return information about the last successful match
    AND any backreference substrings that are available due to the use
    of parentheses inside the RE.


PRegEx_FoundCount     () ==> Running or final count of match events

    This returns the number of matches completed by a previous search
    event, or done up to this point in an ongoing search.

    Always re-set to 0 at the start of any match-related function
    except GetNextMatch itself.  Incremented by 1 each time a match
    happens, and always before any callback routines, so callback
    routines may call this to find out the iteration count of a global
    search IN PROGRESS.
    
    Note: this function does not count backreference matches.  It
    counts each entire successful match as one event, regardless of
    the number of successful backreference matches each might have had
    within it.


PRegEx_GetPos         () ==> Char pos where last left off; next begins
PRegEx_SetPos         (num) ==> Change pos (0 <= Pos <= buffer len)

    "Pos" is the character offset within the currently-active SrchStrL
    of where the current or most recent successful match STOPPed
    (which is also the beginning point for the next attempted match,
    unless the string buffer or PRegEx are replaced.

    GetPos returns this value.

    SetPos lets you set the Pos for the following GetNextMatch either
    ahead or backward.  SetPos(0) would always restart from the
    beginning.  The legal bounds of Pos are 0 <= Pos <=
    length(SrchStrL[1])).

    Generally, it is recommended that you avoid calling SetPos during
    the midle of any of the high-level Search/Replace routines,
    especially the Replace routines, or unpredictable results could
    occur.  Instead, call SetPos() only when working with the
    low-level interface routines.

    High-level routines always re-set Pos to zero before they start,
    because they internally call the low-level routines
    SetMatchPattern and SetSearchString, which have this effect as
    well.

    Recommendation: instead of ever using GetPos or SetPos, use the
    power of REs to extract the data you need based on its pattern and
    nearby context, rather than trying to search at specific character
    positions within a buffer.


PRegEx_GetMatchBRCount() ==> Number of back refs in last matched RE

    Returns the number of backreference-generating parenthesis pairs
    that were in the currently-successfully-matched RE.

    This number serves as the upper bound of the "num" argument to the
    following routines -- i.e. it gives the number of the
    highest-available numbered back reference from the current match.


PRegEx_GetMatchString ([num]) ==> Last matched str (entire -or- BR #)
PRegEx_GetMatchStart  ([num]) ==> Start pos of ""  (entire -or- BR #)
PRegEx_GetMatchLen    ([num]) ==> Length of ""     (entire -or- BR #)

    These return the entire string, its start position within the
    original buffer, and its length, for the Entire Match, or, if num
    is supplied and > 0, for any numbered backreference string.

    If GetMatchString and GetMatchLen return "" and 0, respectively,
    it means the corresponding match string was a successful match,
    but empty, and GetMatchStart will still give the correct offset of
    that matched position.

    If they return void, it means that there is no corresponding
    successful match, and GetMatchStart will also return void.

    For example:

    put PRegEx_Search(["Ravi is a nice guy"], "((Chris)|(Ravi))")
    -- 1

    put PRegEx_GetMatchString(0)
    -- "Ravi"

    put PRegEx_GetMatchString(1)
    -- "Ravi"

    put PRegEx_GetMatchString(2)
    -- <Void>           -- 2nd set of parens did not kick in

    put PRegEx_GetMatchString(3)
    -- "Ravi"

    You can use this to check which of several alternate cases in
    a match pattern was the successful one:
            
    if PRegEx_GetMatchString(2) = void then put "Ravi matched."
    if PRegEx_GetMatchString(3) = void then put "Chris matched."
    -- "Ravi matched."


Error-handling functions:
-------------------------

PRegEx_LastErrCode        () ==> Error code for last failed call

    Yields the numeric error code generated by the immediate previous
    PRegEx function call.

    0 means success.  All other codes are negative values.

    Some functions return their error codes, and LastErrCode() will
    agree with those; others do not return integers, and so checking
    LastErrCode() is the only way to check the exact error in case
    they return an unexpected result.

PRegEx_DescribeError       ([Err]) ==> Error msg (Err or LastErrCode)

    Given an Error code, returns a string message explaining it.
    
    If no Err is supplied, then describes PRegEx_LastErrorCode()

    Returns empty string if the Error code is zero (success).

    Example:

    put PRegEx_DescribeError(PRegEx_ErrCode_SearchStrLMustBeList())
    -- "PRegEx: SearchStrL argument must be a Lingo list."


PRegEx_CompiledOK          () ==> True if last expression compiled

   Returns true if and only if the last attempted compilation of a
   regular expression succeeded, even if there have been other
   intervening errors since then.


PRegEx_MemError            () ==> True if last op failed due to memory

   Returns true if the last PRegEx function generated a memory error.
   
   Each new PRegEx function call resets this value.


PRegEx_MemErrorSticky      () ==> True if any op has failed due to mem
PRegEx_MemErrorStickyReset () ==> Reset sticky err; return prev valuex

   MemErrorSticky() returns true if ANY PRegEx function has generated a
   memory error at any point since the last call to PRegEx_Clear(1)
   ("Complete" reset), or since the last call to
   PRegEx_MemErrorStickyReset(), which turns off this flag until the
   next memory error occurs.
   
   This flag could be checked after a long sequence of PRegEx calls to
   see if there was a problem encountered.  Or, it could be checked
   every time through an idle loop, perhaps.


Preference flags:

    Functions listed in this section act as both the Get() and
    Set(1/0) functions for the correspondingly-named preferences.
    (Call with no arguments to Get() the value, and call with 1
    argument to Set the value, which is also returned to you.)


PRegEx_ErrorsToMessageWindow ([Bool]) ==> Echo all errors to Msg wind.

    Tells PRegEx to echo the string description of any error codes
    generated by any PRegEx routine directly to the message window
    immediately as they occur.  This can be left on all the time, if
    desired, because it will have no effect during projector playback,
    since projectors lack a message window.


String-manipulation utility functions: 
--------------------------------------

PRegEx_QuoteMeta (String) ==> String with RE-special chars quoted

    Takes a Lingo string and returns a copy of the string with any
    potentially special "meta" characters "quoted" ("escaped") by
    having a backslash inserted in front of them.  This makes the
    string "safe" to use in an RE, even when its contents or origin
    cannot be known or trusted in advance (e.g. searching for
    user-supplied data with a potentially untrusted user, or any time
    when you know you want to search literally for a string that might
    have special characters in it and you may or may not know that in
    advance.  Maybe you want to search for "?" or backslash, for
    example).

    The characters that get escaped are EVERY CHARACTER EXCEPT a-z,
    A-Z, 0-9, and underscore, and non-ASCII characters.

    As a special case, NUL characters in the input are escaped as
    "\0", so the output of QuoteMeta is 100% compatible with the
    ReplPat argument to the Replace functions.

    In other words, the QuoteMeta function is equivalent to this Lingo
    example (except it does NOT have the side effect of modifying the
    current search string, pattern, or Match Strings etc. as calling
    PRegEx_Replace would do):

    on QuoteMeta String
      set myStr = [String]
      PRegEx_Replace(myStr, "([^A-Z_0-9\x{7F}-\x{7FFFFFFF}])", "gi", "\\\1")
      PRegEx_Replace(myStr, "\0",           "g",  "\\0" )
      return myStr[1]
    end QuoteMeta

    Note: PRegEx_Interpolate can be used to reverse the processing done
    by QuoteMeta.


PRegEx_Translate(SrchStrL, InputTable, OutputTable)

    Converts chars in SrchStrL using the mapping specified.

    InputTable and OutputTable are a pair of strings specifying
    input-chars and corresponding output-chars; any input-char
    mentioned in SrchStrL will be mapped to the corresponding
    output-char.  Others will be untouched.

    Dashes can be used in InputTable and OutputTable to signify a
    range of characters.

    Example: 

    PRegEx_Translate(SrchStrL, "a-z", "n-za-m") -- Rot13 encode/decode

    Supports interpolation of \t, \n, \0, \\, \xDD for hex, \123 for
    octal in the InputTable and the OutputTable.  But, does NOT
    support back-reference interpolation as that would almost never be
    helpful.  \# and \## are ignored, consequently, except for \0.
    Does NOT support variable and symbol interpolation syntax.
    "Translate" has its own, different, syntax.  Non-ASCII characters
    will be interpolated but then ignored (see note below).

    InputTable and OutputTable may contain ascii-zero (NUL)
    characters.

    If you want to mention a literal dash in either the InputTable or
    OutputTable, that character must either be the first or last
    character in the table, where it couldn't possibly be interpreted
    as a range specifier.

    If for any reason there are fewer characters in the Output table
    than in the Input table, then the last character is understood to
    be replicated as necessary.

    Examples: 

    PRegEx_Translate(SrchStrL, "-.", "M") -- dash or dot become M

    PRegEx_Translate(SrchStrL, "\000-\177", "\177-\000") -- invert all ASCII chars

    PRegEx_Translate(SrchStrL, "a-zA-Z", "_") -- all alpha chars become underscores
    
    Returns number of characters that changed; 0 if none did; or a
    negative error code if there is an error in the parameters.

    Translate ONLY works with ASCII characters (also known as Unicode
    code points zero through 127 also known as "7-bit" characters).
    This means: Only ASCII characters are recognized in Input and
    Output tables -- non-ASCII characters are ignored, but will
    disrupt interpretation of "range" specifiers.  Non-ASCII
    characters in SrchStrL (the string being altered) will always be
    completely ignored.  (Yes, this is a step *backward* from PRegEx
    1.0 functionality, but it is an unavoidable consequence of using
    Unicode rather than a fixed 8-bit character set.)  If you want to
    do substitutions on non-ASCII characters, please use the regular
    search/replace features.
    
PRegEx_Interpolate(String, [VarsPList]) ==> String

    Does the pre-processing step that PRegEx_ReplaceString would do
    before it does a replace, and returns the interpolated string.

    Note: Since interpolation is usually done on short-ish
    programmer-supplied strings rather than large buffers, the
    incoming argument is a simple string, not a String Buffer (list).

    Supports all of the escape codes mentioned in the "ReplPat",
    including insertion of back-references, if any (see "escape codes"
    above for details).

    IN ADDITION to the normal interpolation, and IF the optional
    argument of VarsPList is supplied, then the sequence ${Foobar}
    inside the String will be replaced with the value of the property
    (string) "Foobar" from VarsPList, and ${#Foobar} will be replaced
    with value of the property (symbol) #Foobar.

    Properties whose values are absent or not of type "string" will
    result in an empty string being inserted.

    Example:

    set Props     = [#FirstName: "Joe"]
    set Location  = "Town: Davis County: Sacramento"

    PRegEx_Search([Location], "Town: (.*?) County:") -- sets \1 
    put re_i("\1 says \x22Welcome, ${#FirstName}!\x22", Props)
     -- "Davis says "Welcome, Joe!""

    Note: Although not documented to behave this way, in the current
    MOA implementation, searching a property list for the property "a"
    is considered equivalent to searching for the property #a, and
    vice versa.  Consequently, Interpolate also has this behavior --
    i.e. it does not distinguish between the string and symbol forms
    of the property name.  However, if MOA ever "corrects" this
    behavior, then Interpolate will behave with the more strict
    interpretation documented above.  Just be sure to use or omit the
    "#" as documented here, and your code will be upwardly-compatible
    with future versions of MOA.  Then, if you never intermix symbol
    properties and string properties in the same property list, you
    probably will not have to worry about this subtlety.

    Note, however, that strings can contain any character(s) in any
    length, while symbols have a more limited range of legal
    characters.  However, symbols are much faster to look up in a
    large property list.

List-manipulation utility functions:
------------------------------------

    These are PRegEx-supplied variants of favorite built-in Perl
    functions.  In Perl, regular expressions and list manipulation are
    tightly coupled, so it's only natural that PRegEx should strive for
    the same.

    You'll notice that many of these functions are generally useful
    for list-manipulation, even if you don't need to do any searching,
    replacing, and extracting.


PRegEx_CopyList(ListOrPList, [Deep, InitList]) ==> CopiedListOrPList

    Returns a copy (shallow by default, deep if Deep is true) of the
    given List or Property List.

    If a memory error occurs, returns an error code instead of a list.

    Warning: Deep copying does not check for recursive list inclusion.
    If you try to Deep copy a recursive data structure, the routine
    will run for a VERY LONG TIME till memory is filled up and then
    fail with a memory error.  

    If InitList is passed, it must be the same type of list as
    ListOrPList.  If present, the items copied from ListOrPList will
    be copied into InitList.  This is a way to use CopyList to deeply
    or shallowly APPEND items from a list onto another list (or in the
    case of PLists, ADD those key/value pairs).

    Note: Assumes that all new PLists should be marked as "sorted" (so
    it does).

    Note: Deep copying only makes deep copies of elements that
    themselves are Lists or PLists.  Otherwise, any other type of
    object is shallowly copied.  (Possible future improvement: if a
    child object has a "clone" method, Deep mode could check for that
    method and try to call it to allow the object to clone itself.)

PRegEx_Grep   (List, RE, [Opts]) ==> NewList ("Regexp mode")

    Grep produces a new list derived by filtering an existing one.

    Grep has two modes.  This is the first one.  It is triggered by
    suppling a STRING (RE) as the second argument and optional Opts as
    3rd.

    Returns a new list whose contents are the elements of List for
    which, when matched against RE/Opts, produce at least 1 match.

    Elements of the incoming List must be plain strings, or SrchStrL
    string buffers (i.e. lists containing a string and optional length
    integer).

    Elements that do not meet these requirements will simply be
    skipped.  Errors encountered in matching (e.g. failure of RE to
    compile correctly, memory errors), will cause Grep to finish
    prematurely, returning only the items that have been matched up to
    that point.  Checking LastErrCode() after calling Grep will
    indicate the error code, if any.

    Example: 

    put PRegEx_Grep([1,"abc","","fo","",["w"],"b",#symb], "\w+", "g")   
    -- ["abc", "fo", ["w"], "b"]

    Notice how 3 strings and 1 String Buffer object within the list
    were successfully matched by Grep.  Some integers, non-matching
    empty strings, and a symbol, did not match and so did not appear
    in the returned list.

PRegEx_Grep   (List, #Filter, [ArgList]) ==> NewList ("Filter mode")
                    
    Grep produces a new list derived by filtering an existing one.

    Grep has two modes.  This is the second one.  It is triggered by
    supplying a symbol (#Filter) as the second argument.

    Filters list according to the boolean results returned by the
    "#Filter" function, which can be your own custom handler or any
    Lingo built-in function whose results can be interpreted as
    Boolean (e.g. #symbolP, #stringP, #integerP, #length).

    Returns a new list whose contents are the elements of List for
    which, when passed to #Filter with optional additional arguments
    from ArgList as described above, #Filter returns true.  

    In this "Filter" mode, Grep is similar to Map or ReplaceExec in
    its recognition of any CallbackAbort/Stop/etc. flags set by the
    #Filter callback function.

    Example:

    put PRegEx_Grep([1,"abc","","fo","",["w"],"b",#symb], #length)  
    -- ["abc", "fo", "b"]

    Notice how only items for which the Lingo built-in "length"
    function returned a non-zero number, were selected, so any empty
    strings also any non-string objects were removed.


PRegEx_Map    (List, #MapFunction, [ArgList]) ==> MappedList

    Map takes one list and makes another list where (generally) each
    item in the new list corresponds to an item in the original list.

    It uses a #MapFunction to convert an original item into its
    counterpart in the new list.

    Calls #MapFunction on each element in List.  On each call, first
    argument to #MapFunction is the element being processed.

    Subsequent arguments to #MapFunction are derived from the optional
    ArgList parameter in the manner described earlier.

    #MapFunction should be prepared to convert its first argument into
    the desired output value (of any type), using its additional
    arguments in whatever way needed.

    MapFunction may use PRegEx_CallbackAbort, Stop, etc. to affect the
    behavior of PRegEx_Map.  

    Abort: stop and discard any work done so far; delete
    partially-built result list and return empty list instead.  Set
    LastErrorCode to indicate that an Abort was requested.

    Stop: stop and return only elements successfully mapped prior to
    this point; ignore current return value of #MapFunction.

    Last: keeps this current return value but then stops and
    successfully returns the list created up to that point.

    Skip: skips adding a value for the current invocation, but
    continues to process others.  Clever use of "Skip" allows Map to
    do conversion and filtering (similar to Grep's filtering) at the
    same time -- it can "Skip" items that should not make their way
    into the new list, while mapping the items that should.


PRegEx_Sort   (List, DeepCopy, #SortFunction, [ArgList]) ==> NewList

    Returns a new list consisting of a shallow OR Deep copy of the old
    list, sorted according to the ordering implied by #SortFunction,
    which takes as arguments two values (of any type), here dubbed A
    and B, from the list to be compared, plus optional additional
    arguments if required.  

    For any pair of items, #SortFunction must return -1 if A is less
    than B, 0 if A == B, and 1 if A > B.

    Sort does NOT modify the original list in any way, as Lingo's
    "sort" function does.  Rather, it makes a sorted copy which you
    may, at your option, choose to use in place of the original.


PRegEx_Reverse (List, [DeepCopy, InitList]) ==> Reversed copy
                
    Returns a copy (shallow or deep -- default is shallow) of List
    whose elements are in the reverse order of what they were in List.

    If InitList is supplied, then reversed list is appended onto it.


PRegEx_Join   (List, [DelimiterString]) ==> String

    Returns a string which is a concatenation of all strings in List,
    with the optional DelimiterString between each pair (it's the
    opposite of PRegEx_Split -- it rejoins a list of strings into a
    single string).
    
    Delimiter string may be empty, which is the default.

    Example:

    put PRegEx_Join(PRegEx_Split(["a,b,c,d,e"], ",", "g"), ":")
    -- "a:b:c:d:e"


PRegEx_Keys    (PList, [InitList]) ==> KeyList
PRegEx_Values  (PList, [InitList]) ==> ValueList

    Create a list of the keys (properties) or values in PList and
    either returns them in a new list or appends them to the optional
    InitList (a regular list), if provided.  

    These functions do NOT attempt to change the sorting behavior of
    the incoming PList; each returns keys or values in the order that
    MOA yields them, and, if Keys and Values are called without the
    list being altered, then the items yielded by each should
    correspond.  If the PList is modified between calls to Keys and
    Values, then no correspondence is guaranteed, or even likely.

    To get all the keys and values intermixed together pairwise in a
    single list, use PRegEx_PListToList, described below.

    Examples:

    put PRegEx_Keys  ([#a:10,#b:11,#c:12], ["dog", "cow"])
    -- ["dog", "cow", #a, #b, #c]

    put PRegEx_Values([#a:10,#b:11,#c:12], ["dog", "cow"])
    -- ["dog", "cow", 10, 11, 12]


PRegEx_GetSlice(List, Keys, [InitList]) ==> SliceList

    Given a List (regular OR PList) and a list of (item numbers /
    keys), which are said to define a "slice" of the first list,
    creates a new regular list of values corresponding to those
    specified by the "slice", and either appends the resulting list of
    values to optional InitList or returns it as a new List.

    Examples:

    put PRegEx_GetSlice([#a   ,#b   ,#c   ], [3,  2])
     -- [#c,#b]
    put PRegEx_GetSlice([#a:10,#b:11,#c:12], [#b,#a])
     -- [11,10]


PRegEx_SetSlice(List, Keys, Values) ==> List

    Given a List (regular or PList) and a list of (item numbers or
    keys), which are said to define a "slice" of the list, plus a
    third list of values corresponding to the keys, sets the
    keys/values accordingly in the incoming List, MODIFYING THE LIST.

    For convenience, also returns the same List/PList that was
    modified, allowing you to start with a list specified directly in
    Lingo, including an empty one, if you need.
    
    If the incoming List was a PList, SetSlice will mark it "Sorted".

    Calling SetSlice with an empty PList [:] is a way to convert a
    list a keys and a corresponding list of values into a an SPList.

    Calling SetSlice with an existing PList is a way to add all the
    keys and values from one property list into another.

    Note that any list positions that are modified by SetSlice will
    have their existing values REPLACED (like SetAt and SetAProp would
    do).

    Examples:

    put PRegEx_SetSlice([#a:1], [#d, #c, #b], [2, 3, 4])
    -- [#a:1, #b:4, #c:3, #d:2]

    put PRegEx_SetSlice([#a, #b], [2, 4, 3], ["dog", "cat", "cow"])
    -- [#a, "dog", "cow", "cat"]


PRegEx_PListToList       (PList, [InitList])     ==> List
PRegEx_PListToListStrings(PList, [InitList])     ==> List

    "Flattens" PList into a regular list: [key, value, key, value....]

    PRegEx_PListToListStrings does the same, but converts any keys of
    type "symbol" into strings before adding them to the new List.

    Either a new list is created, or items are appended to optional
    InitList, if provided.  

    Examples:

    put PRegEx_PListToList([#a: 2, #b: 4])
    -- [#a, 2, #b, 4]

    put PRegEx_PListToList([#a: 2, #b: 4], ["dog", "cat"])
    -- ["dog", "cat", #a, 2, #b, 4]

    put PRegEx_PListToListStrings([#a: 2, #b: 4, 1: 3])
    -- ["a", 2, "b", 4, 1, 3]


PRegEx_ListToSPList      (List,  [InitPList]) ==> SPList
PRegEx_ListToSPListSym   (List,  [InitPList]) ==> SPList

    "Unflattens" List into a sorted PList, taking elements pairwise
    from List. Any odd key left over at the end gets a void value.

    PRegEx_ListToSPListSym does the same, but converts any string keys
    to symbols before adding to the PList.  Other types of keys are
    left unaltered.

    As with other PRegEx functions that create symbols, the symbol
    created is subject to Lingo's rules governing symbols.  Attempt to
    create invalid symbols at your own risk: MOA's default behavior
    will govern.

    Either a new SP list is created, or items are appended to optional
    InitPList, if provided.  In either case, the resulting list will
    be marked as "sort"ed.

    Examples:

    put PRegEx_ListToPList([#a, 2, #b, 4])
    -- [#a: 2, #b: 4])

    put PRegEx_ListToPListSym(["a", "dog", "b", "box", #c, 2])
    -- [#a: "dog", #b: "box", #c: 2])


General utility functions: 
--------------------------

PRegEx_ReadFileToString  (FilePath, TextEncoding)  ==> StringBufferList
PRegEx_WriteStringToFile (FilePath, StringBufferList, TextEncoding) ==> 1/-Err

    ReadFileToString and WriteStringToFile create and accept
    StringBufferList (SrchStrL) objects -- that is, a list containing
    a string buffer in item 1.

    Reading:

    ReadFileToString reads an entire file whose path is specified as a
    MOA-style FilePath and resolved according to Director's documented
    pathname-resolution algorithm (including obeying the canonical
    "@:" syntax), and returns a StringBufferList.

    Conveniently, the StringBufferList may be used as a
    PRegEx-compatible SrchStrL argument, allowing the file buffer to
    be immediately searched and/or manipulated by PRegEx's
    search/replace routines.

    Writing: 

    WriteStringToFile takes a StringBufferList and saves to a file.

    The FilePath may be relative or absolute, and may use any of the
    standard Director path name conventions, but it MUST contain at
    least one directory component.  If it does not, a "directory not
    found" error will occur.  WriteEntireFile does NOT attempt to
    create directories; only files.

    All the characters in the StringBufferList will always be
    written. (NOTE AND WARNING: this is a change from the
    WriteEntireFile function in PregEx 1.0, in which an optional
    second integer element in the StringBufferList would be
    interpreted as a character length limit.  This was found to be a
    real pitfall by many programmers.)

    On success, returns # of bytes actually written (i.e. the actual
    size of the file), possibly zero.  Note: because of file encoding
    issues, this number is *not* necessarily the same as the number of
    characters written... it could be more or less than that number.

    On failure, tries to delete any created or partially-(over)written
    file, if any, and returns a negative error code.

    So: any negative return value should be interpreted as an error
    code.

    Text Encodings:

    Director 11+ uses Unicode UTF-8 encoding for all strings.
    Therefore, it is necessary to convert any data read from a file
    into UTF-8 before it can be stored in a Director String, and
    optionally to convert it back again when writing to a file.

    Your data files might or might not be stored in UTF-8 format, so
    PRegEx_ReadFileToString and PRegEx_WriteStringToFile take a second
    TextEncoding argument, which is a string giving the name of an
    encoding that should be used to read or write the file.  (When
    reading, PRegEx will convert *from* the format you specify, into
    UTF-8.  When writing, it will convert from UTF-8 *to* the format
    you specify.)
    
    PRegEx permits all of the encodings defined by the iconv library
    (listed in detail below), plus 2 additional fully bi-directional
    8-bit text encodings that were created for this project:

    MACROMANFULL (also known as:) MACFULL MACINTOSHFULL 
    CP1252FULL   (also known as:) MS-ANSI-FULL  WINDOWS-1252-FULL

    For details of the encodings, see these source files, included
    with the PRegEx distribution:

        pregex/project/sources/iconv_custom/MACROMANFULL.TXT
        pregex/project/sources/iconv_custom/CP1252FULL.TXT

    These encodings are called "full" and "bi-directional" because
    they can be used to read *ANY* binary file into memory, and
    although they will have a different format in memory (UTF-8), if
    written back out again with the same encoding, the identical
    binary bytes will be retained.  This should permit you to
    manipulate binary files with PRegEx if you are careful (that is,
    if, having read the binary files into strings, you never alter
    those strings to contain any characters that are not part of the
    encoding you used when reading them in).  Again, see the files
    above for details on each character in the encodings.

    KEEP IN MIND: not all encodings are 8-bit, and not all 8-bit ones
    are bi-directional.  Therefore, if you are reading binary files,
    or Mac Roman, or Windows 1252 (Windows Latin) files, we suggest
    you use one of the 8-bit bi-directional encodings listed above.

    Here is the full list of supported text encodings.  For details on
    each encoding, please visit the iconv web page, mentioned earlier:

    ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV:1991 US US-ASCII CSASCII
    UTF-8
    ISO-10646-UCS-2 UCS-2 CSUNICODE
    UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
    UCS-2LE UNICODELITTLE
    ISO-10646-UCS-4 UCS-4 CSUCS4
    UCS-4BE
    UCS-4LE
    UTF-16
    UTF-16BE
    UTF-16LE
    UTF-32
    UTF-32BE
    UTF-32LE
    UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7
    UCS-2-INTERNAL
    UCS-2-SWAPPED
    UCS-4-INTERNAL
    UCS-4-SWAPPED
    C99
    JAVA
    CP819 IBM819 ISO-8859-1 ISO-IR-100 ISO8859-1 ISO_8859-1 ISO_8859-1:1987 L1 LATIN1 CSISOLATIN1
    ISO-8859-2 ISO-IR-101 ISO8859-2 ISO_8859-2 ISO_8859-2:1987 L2 LATIN2 CSISOLATIN2
    ISO-8859-3 ISO-IR-109 ISO8859-3 ISO_8859-3 ISO_8859-3:1988 L3 LATIN3 CSISOLATIN3
    ISO-8859-4 ISO-IR-110 ISO8859-4 ISO_8859-4 ISO_8859-4:1988 L4 LATIN4 CSISOLATIN4
    CYRILLIC ISO-8859-5 ISO-IR-144 ISO8859-5 ISO_8859-5 ISO_8859-5:1988 CSISOLATINCYRILLIC
    ARABIC ASMO-708 ECMA-114 ISO-8859-6 ISO-IR-127 ISO8859-6 ISO_8859-6 ISO_8859-6:1987 CSISOLATINARABIC
    ECMA-118 ELOT_928 GREEK GREEK8 ISO-8859-7 ISO-IR-126 ISO8859-7 ISO_8859-7 ISO_8859-7:1987 ISO_8859-7:2003 CSISOLATINGREEK
    HEBREW ISO-8859-8 ISO-IR-138 ISO8859-8 ISO_8859-8 ISO_8859-8:1988 CSISOLATINHEBREW
    ISO-8859-9 ISO-IR-148 ISO8859-9 ISO_8859-9 ISO_8859-9:1989 L5 LATIN5 CSISOLATIN5
    ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10 ISO_8859-10:1992 L6 LATIN6 CSISOLATIN6
    ISO-8859-11 ISO8859-11 ISO_8859-11
    ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 L7 LATIN7
    ISO-8859-14 ISO-CELTIC ISO-IR-199 ISO8859-14 ISO_8859-14 ISO_8859-14:1998 L8 LATIN8
    ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 ISO_8859-15:1998 LATIN-9
    ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 ISO_8859-16:2001 L10 LATIN10
    KOI8-R CSKOI8R
    KOI8-U
    KOI8-RU
    CP1250 MS-EE WINDOWS-1250
    CP1251 MS-CYRL WINDOWS-1251
    CP1252 MS-ANSI WINDOWS-1252
    CP1253 MS-GREEK WINDOWS-1253
    CP1254 MS-TURK WINDOWS-1254
    CP1255 MS-HEBR WINDOWS-1255
    CP1256 MS-ARAB WINDOWS-1256
    CP1257 WINBALTRIM WINDOWS-1257
    CP1258 WINDOWS-1258
    850 CP850 IBM850 CSPC850MULTILINGUAL
    862 CP862 IBM862 CSPC862LATINHEBREW
    866 CP866 IBM866 CSIBM866
    MAC MACINTOSH MACROMAN CSMACINTOSH
    MACCENTRALEUROPE
    MACICELAND
    MACCROATIAN
    MACROMANIA
    MACCYRILLIC
    MACUKRAINE
    MACGREEK
    MACTURKISH
    MACHEBREW
    MACARABIC
    MACTHAI
    HP-ROMAN8 R8 ROMAN8 CSHPROMAN8
    NEXTSTEP
    ARMSCII-8
    GEORGIAN-ACADEMY
    GEORGIAN-PS
    KOI8-T
    CP154 CYRILLIC-ASIAN PT154 PTCP154 CSPTCP154
    KZ-1048 RK1048 STRK1048-2002 CSKZ1048
    MULELAO-1
    CP1133 IBM-CP1133
    ISO-IR-166 TIS-620 TIS620 TIS620-0 TIS620.2529-1 TIS620.2533-0 TIS620.2533-1
    CP874 WINDOWS-874
    VISCII VISCII1.1-1 CSVISCII
    TCVN TCVN-5712 TCVN5712-1 TCVN5712-1:1993
    ISO-IR-14 ISO646-JP JIS_C6220-1969-RO JP CSISO14JISC6220RO
    JISX0201-1976 JIS_X0201 X0201 CSHALFWIDTHKATAKANA
    ISO-IR-87 JIS0208 JIS_C6226-1983 JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 X0208 CSISO87JISX0208
    ISO-IR-159 JIS_X0212 JIS_X0212-1990 JIS_X0212.1990-0 X0212 CSISO159JISX02121990
    CN GB_1988-80 ISO-IR-57 ISO646-CN CSISO57GB1988
    CHINESE GB_2312-80 ISO-IR-58 CSISO58GB231280
    CN-GB-ISOIR165 ISO-IR-165
    ISO-IR-149 KOREAN KSC_5601 KS_C_5601-1987 KS_C_5601-1989 CSKSC56011987
    EUC-JP EUCJP EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE CSEUCPKDFMTJAPANESE
    MS_KANJI SHIFT-JIS SHIFT_JIS SJIS CSSHIFTJIS
    CP932
    ISO-2022-JP CSISO2022JP
    ISO-2022-JP-1
    ISO-2022-JP-2 CSISO2022JP2
    CN-GB EUC-CN EUCCN GB2312 CSGB2312
    GBK
    CP936 MS936 WINDOWS-936
    GB18030
    ISO-2022-CN CSISO2022CN
    ISO-2022-CN-EXT
    HZ HZ-GB-2312
    EUC-TW EUCTW CSEUCTW
    BIG-5 BIG-FIVE BIG5 BIGFIVE CN-BIG5 CSBIG5
    CP950
    BIG5-HKSCS:1999
    BIG5-HKSCS:2001
    BIG5-HKSCS BIG5-HKSCS:2004 BIG5HKSCS
    EUC-KR EUCKR CSEUCKR
    CP949 UHC
    CP1361 JOHAB
    ISO-2022-KR CSISO2022KR
    CP856
    CP922
    CP943
    CP1046
    CP1124
    CP1129
    CP1161 IBM-1161 IBM1161 CSIBM1161
    CP1162 IBM-1162 IBM1162 CSIBM1162
    CP1163 IBM-1163 IBM1163 CSIBM1163
    DEC-KANJI
    DEC-HANYU
    437 CP437 IBM437 CSPC8CODEPAGE437
    CP737
    CP775 IBM775 CSPC775BALTIC
    852 CP852 IBM852 CSPCP852
    CP853
    855 CP855 IBM855 CSIBM855
    857 CP857 IBM857 CSIBM857
    CP858
    860 CP860 IBM860 CSIBM860
    861 CP-IS CP861 IBM861 CSIBM861
    863 CP863 IBM863 CSIBM863
    CP864 IBM864 CSIBM864
    865 CP865 IBM865 CSIBM865
    869 CP-GR CP869 IBM869 CSIBM869
    CP1125
    EUC-JISX0213
    SHIFT_JISX0213
    ISO-2022-JP-3
    BIG5-2003
    ISO-IR-230 TDS565
    ATARI ATARIST
    RISCOS-LATIN1

    Warnings about text encodings:

    In general, you should be certain that your files are in the
    correct encoding.  Invalid character codes will be ignored/omitted
    when your file is read in, and/or file conversion will stop at the
    first invalid code encountered.  (As an example, please note that
    ISO-8859 has several unmapped code points -- you may wish to use
    CP1252FULL instead -- see above.)

    "UTF-8" encoding:

    Note that if you use the "UTF-8" encoding, the file reading will
    STOP at the first invalid character found, and the string will
    appear to be truncated.

    "raw" enccoding:
    
    PRegEx also defines a "raw" encoding that permits a binary file to
    be read directly into a Director string.  The data *really should*
    be in UTF-8 format, but the format will *not* be verified upon
    reading.  So, any attempt to use that string (print it, modify it,
    search or replace within it, view it in the debugger window, put
    it to the message window, etc.), could result in Director crashing
    or other unpredictable behavior, because all of the code paths
    mentioned above (and maybe others) will be assuming that the
    string contains valid UTF-8 characters.  Therefore, this mode
    should not be used, or at the least, should be used only by
    advanced users who are willing to accept the inherent risks.  If
    you read a string with "raw" mode, you should only use "raw" mode
    when writing it back out again, since the same issue will occur
    there -- only "raw" mode will write a string to a file without
    first examining its characters for UTF-8 conformance.


PRegEx_ReadEntireFile  (FilePath)  ==> StringBufferList
PRegEx_WriteEntireFile (FilePath, StringBufferList) ==> 1/-Err

    These functions are *deprecated*.  Please do not use
    them in new code, and please take them out of any existing projects
    (including their aliases, re_read and re_write).  Please see the
    documentation for ReadFileToString and WriteStringToFile, above,
    for an explanation of why they are deprecated.

    For backward compatibility with the prior version of PRegEx, these
    have been redefined to do roughly the same as calling
    ReadFileToString and WriteStringToFile, described immediately
    above, but with with "MACROMANFULL" or "CP1252FULL" filled in for
    you as the TextEncoding, depending on which platform you are
    using:

    Mac:

    PRegEx_ReadFileToString (FilePath, "MACROMANFULL")
    PRegEx_WriteStringToFile(FilePath, StringBufferList, "MACROMANFULL")

    Windows:

    PRegEx_ReadFileToString (FilePath, "CP1252FULL")
    PRegEx_WriteStringToFile(FilePath, StringBufferList, "CP1252FULL")

    "MACROMANFULL" and "CP1252FULL" were chosen as the default
    encodings because they are (as described earlier):

     - full 8-bit (support all 256 possible binary bytes and no more)

     - bi-directional (data read in then written back out using same
       encoding will be unaltered on disk, as long as no characters
       not present in the encoding are added in the meanwhile)

     - compatible with Director 7-10 behavior, where 8-bit characters
       read into strings were simply interpreted as being MacRoman on
       the Mac, and Windows Latin 1 (aka Windows 1252 aka CP 1252) on
       Windows.

    If the files you are reading and writing will only ever contain
    7-bit ASCII characters, then there is no harm in continuing to use
    these functions, however you should still switch to the newer
    versions of these functions so that if your future needs change.
    Your Lingo code will reflect the need to expressly choose a text
    encoding when reading and writing files.

    Similarly, if the strings your old projects are reading and
    writing from/to files do not rely on specific encodings of
    non-ASCII characters, there should be no harm in using these
    deprecated functions since the non-ASCII characters should still
    work as they did.

    NOTE: To maintain compatibility with PregEx 1.0, an optional
    second element in the StringBufferList will be interpreted as a
    character length limit by WriteEntireFile.  This was found to be a
    real pitfall by many programmers, especially since this value is
    set by ReadEntireFile, but is NOT kept in sync as the string is
    altered by PRegEx functions (or other lingo code).  The
    length-limit behavior was discontinued in the replacement
    function, WriteStringToFile.


Callback-related functions:
---------------------------

    PRegEx's internal callback mechanism is so flexible that we decided
    to expose it in this API so Lingo functions can be created that
    can elegantly make callbacks to other Lingo functions, something
    that is essentially impossible to do using regular Lingo.


PRegEx_CallHandler   (#CallbackFunction, [ArgList1, ArgList2])

    Calls any function by symbol name.  ArgList1 and ArgList2 are both
    optional.  Together they are flattened to produce a single
    argument list for the callback function.  

    In other words, each ArgList is separately treated this way:

    If not a list (i.e. any other kind of value, even "void"), the
    value itself becomes an argument to the #CallbackFunction. If a
    list, it is shallowly flattened and its elements become arguments
    to the #CallbackFunction, in the order they appear in the list.

    Note: if what you really want is to pass the actual list object
    itself and be sure it does not get flattened, just be sure to put
    the list you want to pass inside another temporary list, like
    this:

    PRegEx_Callback(#MyFunction, [myList1 ,  myList2]) or this:
    PRegEx_Callback(#MyFunction, [myList1], [myList2]) -- equivalent

    ... where [] is the Lingo list-construction operator, of course.

    Why have two optional arg lists?  Because you may wish to use this
    function when implementing a callback feature in a Lingo handler
    that you're designing.

    Just as some of the PRegEx callback-oriented functions do, you
    might use ArgList1 for the arguments YOU are supplying to the
    callback function, if any, and pass through ArgList2 for the
    arguments YOUR CALLER is supplying to the callback function, if
    any.

    This is how all the other PRegEx_ functions that take callbacks
    also behave (they all use CallHandler internally, in fact).  You
    don't have to do it this way, but this is a logical and clean way
    to implement any routine that offers to make calls to a callback
    function.

    Note: You may wish to allow the CallbackFunction to call
    PRegEx_CallbackAbort etc. to set those flags while running.  If you
    do allow this, then it is your responsibility to check those flags
    and then to reset them to zero each time after calling
    PRegEx_CallHandler.  Otherwise, those flags may persist and
    incorrectly affect another routine in your call stack.  If there
    is any chance at all that the callback function will set these,
    then be sure to re-set them to zero after it returns.
    
    PRegEx transparently takes care of saving and restoring settings of
    the callback control flags in stack frames below yours, so you
    never have to worry that setting these flags might inadvertently
    interrupt their use in a lower stack frame, if any.


PRegEx_CallbackAbort([bool]) ==> Stop operation and fail with error
PRegEx_CallbackStop ([bool]) ==> Stop before this iteration, but succeed
PRegEx_CallbackLast ([bool]) ==> Stop after this iteration, but succeed
PRegEx_CallbackSkip ([bool]) ==> Skip this iteration, but continue

    These flags may be set by any callback function that wishes to
    send a signal to its caller.  The caller may either be a built-in
    PRegEx routine OR, a Lingo-authored routine that called the
    function using the PRegEx_CallHandler utility routine.

    These flags should NOT be set by any function that doesn't believe
    it is currently being called as a callback by some PRegEx function.

    As an extended example, consider how these may be called from
    within a ReplFunction to set a flag that tells the ReplaceExec
    function to end its loop after the next time the ReplFunction
    returns.  

    Each one would cause ReplaceExec to terminate slightly
    differently.

    CallbackLast says that the current replacement should be done, but
    then it will be the last one (do not keep searching), terminating
    the replacement successfully (including keeping any replacements
    up to this point).

    CallbackStop says to NOT do the current replacement (ignoring the
    return value of the ReplFunction), and terminate the replacement
    successfully (including keeping any replacements up to this
    point).

    CallbackAbort is the same as ReplaceStop, but "aborts", causing
    CallbackExec to leave the search string untouched, not set any back
    refs, and set FoundCount to zero, much as if the very first search
    had simply not succeeded in the first place.

    Stopping using CallbackLast or CallbackStop could be useful if
    replacement should stop once a certain token is reached in the
    input.

    Aborting could be useful if there is a memory failure or other
    serious failure encountered by the callback function and it needs
    to gracefully abort any further potentially memory-consuming
    activity.
    
    CallbackSkip could be useful if a particular item should be
    ignored/untouched/omitted/left unchanged, but you want your
    calling function to continue with whatever loop it is currently
    processing.


Error code constants:
---------------------
    Each of these "constant" functions returns the corresponding
    numeric PRegEx error code.  This can be helpful if you want to
    write code that checks for these specific error cases, either with
    functions that return error codes directly, or for those that
    merely set PRegEx_LastErrCode.

PRegEx_ErrCode_OutOfMemory()
PRegEx_ErrCode_SearchStrLMustBeList()
PRegEx_ErrCode_SearchStrLMustContainString()
PRegEx_ErrCode_SearchStrLLengthArgMustBeInteger()
PRegEx_ErrCode_REMustNotBeEmpty()
PRegEx_ErrCode_REDidNotCompile()
PRegEx_ErrCode_ReplPatMustBeString()
PRegEx_ErrCode_CallbackFuncMustBeSymbol()
PRegEx_ErrCode_CallbackFuncDidNotReturnString()
PRegEx_ErrCode_QuoteMetaNeedsString()
PRegEx_ErrCode_TriedToMatchWithoutSearchStrL()
PRegEx_ErrCode_TriedToMatchWithoutSearchPattern()
PRegEx_ErrCode_TriedToReplaceWithoutMatching()
PRegEx_ErrCode_CallbackRequestedAbort()
PRegEx_ErrCode_UnexpectedMOAError()
PRegEx_ErrCode_UnexpectedInternalError()
PRegEx_ErrCode_CallbackFunctionNotFound()
PRegEx_ErrCode_ExpectedListArgument()
PRegEx_ErrCode_ExpectedPListArgument()
PRegEx_ErrCode_GrepNeedsFunctionNameOrPRegEx()
PRegEx_ErrCode_ExpectedStringArgument()
PRegEx_ErrCode_SortFunctionDidNotReturnInteger()
PRegEx_ErrCode_FileNotFound()
PRegEx_ErrCode_ErrorOpeningFile()
PRegEx_ErrCode_ErrorReadingFile()
PRegEx_ErrCode_ErrorWritingFile()

    Example:

    put PRegEx_DescribeError(PRegEx_ErrCode_SearchStrLMustBeList())
    -- "PRegEx: SearchStrL argument must be a Lingo list."


==================================================================
Help! What is a Regular Expression?  What's going on here?
==================================================================

[ASIDE TO NEWBIES: If you don't already know what regular expressions
are and are now burning with desire to use them, then you are facing a
pretty steep, but immensely gratifying, learning curve. Hang in there!
It's worth the effort to learn!]

This is a very brief intro.  Don't expect much.  Try Google.

Regular Expression = Search String or Pattern

That's all there is to it.

Longer explanation: A Regular Expression (or RE or regex or regexp) is
a search specification that can contain special syntax (think:
wildcard characters on steroids) that allows you to perform extremely
complex search, search/replace, or extraction operations on text
buffers of any size.

Examples: 

dog                  -- matches just these letters
(dog)|(cat)          -- matches the letters "dog" or "cat"
organi[sz]e          -- matches US or British spelling of "organize"
^\w{1,8}.\w{1,3}$    -- matches any DOS 8.3-style file name

In addition to many dozens of special syntax characters like the ones
hinted at above, some special "escape" sequences, triggered by a
backslash, are also recognized within the RE pattern.  

    \n matches a return char (same as Lingo "return" or char(13))
    \t matches a tab char

(There are many others -- see definition of all "escape codes" earlier
in this file.  See also the documentation for the PCRE project.)

Backreferences, written as \#, such as \1, \2 ... \99, mean "match (or
insert when replacing) the parenthesized expression number N in this
spot".

Backreference example A:

"((Chris)|(Ravi)).*?\1" 

... finds the name "Chris" or "Ravi" in a string, provided it is also
followed again some distance later by the same name again.

Backreference example B:

"(<(\w+)(.*?)>)(.*?)(</\2.*?>)" 

... Matches most pairs of balanced HTML/XML tags, such as: <P>....</P>
or <B>...</B> or <A HREF=foo.html>Home</A>.

In this last example, the backreference substrings would be assigned
(and individually retrievable!) as follows:

Backreference 1: "<A HREF=foo.html>"
Backreference 2: "A"
Backreference 3: " HREF=foo.html"
Backreference 4: "Home"
Backreference 5: "</A>"

Backreferences can be used to extract pieces of data from a string
when searching, and, equally importantly, can be used in a Replacement
pattern when doing a search/replace, so you can insert part or parts
of the matched expression directly into the replaced string.

HOW TO LEARN REGULAR EXPRESSION SYNTAX: 

1) There are whole BOOKS written about regular expression syntax and
   its subtleties.  We are not going to try to teach you anything more
   about them in this document.  Buy one of those books now, if you
   are interested.  http://amazon.com/.

2) Another good way to get started: ask a friend for help and
   pointers.  (Preferably you'll be asking someone other than Chris or
   Ravi :-)).

3) The PCRE documentation, included with this Xtra and on the Web,
   gives a thorough, possibly overly-technical, overview of the
   precise features of the regular expression language supported by
   it, and consequently supported by PRegEx.  (To get the most out of
   it: ignore all the deeper technical stuff; just read about the
   syntax.)  http://pcre.org/man.html   

4) Also, if you have access to perl, be sure to read the "perlre"
   manual page that comes with every perl distribution.  99% of the
   syntax documented there applies here.

5) Practice, practice, practice.  Have a copy of Director open while
   learning.  Try every example in the message window.  Try to make a
   test case for every different feature or behavior your learn about,
   and test it right then and there.  Read and understand the test
   cases in the test movie that accompanies the Xtra.


TWO NOTES FOR PERL USERS ONLY

Note 1: Surrounding the RE with forward slashes is NOT NECESSARY.  In
Perl, the slashes are string delimiters, much like quote marks, and
are not part of the search pattern itself.

Note 2: $-sign and @-sign interpolation are not normally performed by
any of the functions that process the other backslashed escape codes,
as those are features of Perl's built-in string interpolation, not
features of regular expressions per se.  If you need to build up a
replacement pattern string out of pieces, just use normal Lingo & and
&& or other means of concatenation, such as PRegEx_Join. OR, read above
about PRegEx_Interpolate, which does all the usual interpolation
functions, plus can optionally look up values from a property list and
interpolate them into a string, similar to Perl's $-sign interpolation
feature.  Note that if you plan to search using a RE that has had
user-supplied data interpolated into it, you almost certainly need to
call QuoteMeta either on the user-supplied parts before they are
interpolated, or on the interpolated whole, depending on what you can
assume about the data.
    

=========================================
Additional Examples
=========================================

Searching and/or Extracting
---------------------------

==> Search for a string

set FoundCount = max(PRegEx_Search(foo, "(abc+)", ""), 0)

==> Search a string and then extract backrefs by number

if (PRegEx_Search(foo, "(abc+)([,;])", "") > 0) then 
  set ABC =   PRegEx_GetMatchString(1)
  set Punct = PRegEx_GetMatchString(2)
end if
set FoundCount    = PRegEx_FoundCount()

==> Search a string, extracting matching subexpressions into a list or
sorted property list

set NRs = PRegEx_ExtractIntoList   (foo, "Name: (.*?) Rank: (.*?)", "")
set NRs = PRegEx_ExtractIntoSPList (foo, "Name: (.*?) Rank: (.*?)", "")
set FoundCount    = PRegEx_FoundCount()

==> Same, but "globally" -- repeating the search till the end of the
string, extracting _all_ backreferences along the way into a Lingo
list or sorted property list

set NRs = PRegEx_ExtractIntoList   (foo, "Name: (.*?) Rank: (.*?)", "g")
set NRs = PRegEx_ExtractIntoSPList (foo, "Name: (.*?) Rank: (.*?)", "g")
set FoundCount    = PRegEx_FoundCount()


Searching and Replacing
-----------------------

==> Search and replace with a simple string

set FoundCount = max(PRegEx_Replace(foo, "(abc+)", "i", "ABC"), 0)

==> Search and replace with a string with escape codes for back references

set FoundCount = max(PRegEx_Replace(foo, "(abc+)", "i", "### \1 ###"), 0)

==> "Global" flag -- i.e. replace one vs. replace all.

set FoundCount = max(PRegEx_Replace(foo, "(abc+)", "ig", "ABC"), 0)

==> Search functions also extract backrefs, like matching functions.
So you can retrieve an item at the same time you delete or modify it:

if (PRegEx_Replace(foo, "(abc+)", "", "") > 0) then 
  set ABC = PRegEx_GetMatchString(1)
end if
set ItemsReplaced  = PRegEx_FoundCount()

==> Search and replace, but a function gets called to perform each
replacement

on NameCnv nameLookup
  return("Name:" && nameLookup[PRegEx_GetMatchString(1)]
end NameCnv

PRegEx_ReplaceExec(foo, "Name: (\S+)", "ig", #NameCnv, [nameLookup])
set ChangeCount = PRegEx_FoundCount();