================================================= PRegEx Version 1.0 Specification and Documentation - This document last updated 2001-06-04 ================================================= What is PRegEx? PRegEx is a free, cross-platform scripting Xtra for Director 7+. It does searching, replacing, data extraction, and more. It provides the "search" features of PCRE (the Perl-Compatible Regular Expression library from http://pcre.org/), while adding its own "replace" capabilities. It also supplies Lingo versions of some powerful features of Perl pertaining to manipulating string data, lists, and property lists, and converting between and among all those different formats. You don't have to know anything about Perl to use it. You don't have to know about regular expressions to use it. But you can do some pretty cool stuff if you do know about them. Who should use it? If you have ever needed to use Lingo to: - do any kind of text searching - modify strings - parse anything - extract data from a file - standardize data formats - clean/canonicalize/validate user-provided data fields - manipulate lists and property lists - copy or deep-copy lists / property lists - reverse lists - convert a list of one kind of thing into another kind of thing - use custom sort functions to sort lists - sort lists without modifying the original - filter lists - deal with binary data buffers in Lingo - do any of the above with very large string buffers - call a handler, passing arguments, and get return value - have a way for a callback function to signal its caller - quickly read/write entire files into/from memory - globally map characters in buffers - etc. ... then PRegEx is for you. If you have not used regular expressions before, then as you learn them, you will hardly believe how powerful they are. They are like a whole new programing language unto themselves. Enjoy. Help! What is a Regular Expression? What's going on here? Please see the Introduction and Examples sections near the end of this doc. What does it cost? Nothing. PRegEx is a free, open-source project. See "PRegEx Licensing", below, for full details. Where do I get the latest version? PRegEx released on the Web site http://openxtras.org/. Latest updates, notes, or issues will be posted there, too. Who made it? PRegEx authors are: Chris Thorman Ravi Singh Chris needed it for a project his company was doing. Chris designed it and wrote this spec, and his company hired Ravware to do most of the programming work. Ravi Singh at RavWare also did some of the work for free as a contribution to the Director community. About RavWare: RavWare develops components and software that allow developers to easily extend their applications with powerful features. They work to make each product consistent and efficient while providing optimal performance. RavWare creates custom ActiveX controls, Director Xtras, and applications. Please contact RavWare Sales or visit http://ravware.com/ for further information. Philip Hazel (see below) wrote PCRE, upon which PRegEx heavily relies, but he was not directly involved in PRegEx itself. What other libraries is it based on? PCRE, the regular expression library that PRegEx uses, is included with this distrbution. It was written by: Philip Hazel University of Cambridge Computing Service, Cambridge, England. Phone: +44 1223 334714. Copyright (c) 1997-2000 University of Cambridge We all owe Philip a pint or two sometime. Please see file "pregex-1.X/pcre-3.4/Copying" for more info. Or, visit the PCRE home page at: http://pcre.org/. Of course, PRegEx also uses MOA, the Macromedia Open Architecture, and is built using the Director 7/8 XDK from Macromedia at http://www.macromedia.com/. Who supports it? Nobody supports PRegEx for free. It's free to begin with. However... Can I pay for support or additional features? If you need support for PRegEx for a project-critical need, we recommend that you hire someone to support that need. Because the source is OPEN, you are completely free to approach and make an offer to any anyone you like, and they are free to add your custom features or create any other derivative work you may require, subject only to the liberal licensing restrictions outlined in this document. You may especially wish to approach RavWare, one of the companies that helped write PRegEx. Ravware is in the business of creating Xtras for others. (See complete description up above.) http://ravware.com/ Please do not be offended if the PRegEx authors or others that you approach are unable to assist you. We apologize in advance if a lack of free or inexpensive or even available support means you are unable to use PRegEx for your project. On the other hand, we believe PRegEx is quite robust in its current feature set and anticipate you will have few problems making use of it. Can I see some examples? 1) Some function descriptions include examples. 2) See "Examples" section at end. 3) See PRegExTestMovie.dir, which you should have received with this package. It has a full test suite which can be used to torture-test every feature of the Xtra, including heavy leak testing. There are literally hundreds of usage examples there. It also has a few fun little features that let you import the spec file you are reading now and manipulate it. How well tested is it? We feel that PRegExTestMovie.dir extensively tests all PRegEx features by calling it literally millions of times in 30 seconds or so, and thereby demonstrates that PRegEx is free of any leaks and that it performs with jaw-dropping speed. Please try to prove us wrong. We'd be grateful for bug reports. Where do I send bug reports? Please send reports of confirmed or suspected bugs to: PRegEx Bugs Do not send the source code for your project. Send the simplest possible 2-5-line example or set of steps, or a simple test movie that demonstrates the problem (without anything else in it). Or, best yet, send a modified copy of PRegExTestMovie.dir with a new test added that demonstrates the problem. Be sure to state clearly in your report what you expected to happen, what did happen instead, and why you believe it's an error in the software. Bug reports that include a Lingo example that conclusively demonstrates the problem will get attention more quickly. Please be aware that we will be grateful for the reports, but may or may not have the time to reply. =============================================== PRegEx Licensing =============================================== How did PRegEx get here? PRegEx is an "open-source" project. What do I get for free? You are free to use the accompanying version of the PRegEx Xtra in any way you see fit: in any project, for any purpose, at any time, now, or in the future, or in the past, free of charge. Can I change the PRegEx source code? You may create derivative versions of the Xtra, or re-use any source code you find in it, but if you do so for pay or profit, you must provide the recipient with both the original, full, PRegEx package, including source code, along with any modifications you have made, including source code. It would also be polite but not required to contribute the derived version back to the copyright holder via the contact information that you will find at http://openxtras.org/. Is PRegEx supported or guaranteed to work? No! PRegEx is provided without support or warranty of any kind. In particular, nobody guarantees that this code is fit for any purpose, or that it will not cause you and your customers great physical harm when you use it. In fact, assume it will cause harm until you have tested it to your own satisfaction. You accept all risks associated with using this software, should you choose to do so. Can I contribute? The best way you can contribute is to give YOUR TIME to test, review, use, verify, and debug this code, to make it better, stronger, faster, and more powerful for others. Can I contribute financially? If you find that this Xtra was insanely useful, which you will, and then you also feel motivated to contribute $$ to help offset its considerable development costs and express gratitude for the hours and weeks of time it has saved you, or the impossible projects it made possible, please log on to http://openxtras.org/ and select one of the contribution options shown there. Contributions will be used to help maintain the OpenXtras web site and anything left over will be used to feed and clothe the authors' families. What about Shockwave? PRegEx is not currently Shockwave-safe, and the do not intend to do any work or spend any $$ to make it so. However, you have the full source here. You're free to accept the challenge -- and the legal responsibility -- for making a Shockwave-safe version for whatever use you desire. Just be sure you follow the guidelines laid out in this document if you distribute modified versions of PRegEx to anyone. What about future versions? This liberal licensing policy may or may not apply to future versions of PRegEx created by If.Net, Inc., the copyright holder. However, this liberal licensing policy will always apply to this and earlier versions and to any derivative works based on it/them. ------------------------------------------------------------------------- Regular Expression Xtra Licensing Statement Version 1.0 ------------------------------------------------------------------------- This is a Scripting Xtra for Macromedia Director which lets you use regular expressions as implemented by PCRE http://pcre.org/, plus a whole lot more. Written by: Chris Thorman Ravi Singh Copyright (c) 2001 If.Net, Inc. ----------------------------------------------------------------------------- Permission is granted to anyone to use this software for any purpose on any computer system, and to redistribute it freely, subject to the following restrictions: 1. This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 2. The origin of this software must not be misrepresented, either by explicit claim or by omission. 3. Altered versions must be plainly marked as such, and must not be misrepresented as being the original software. 4. If PRegEx is embedded in any software that is released under the GNU General Purpose License (GPL), then the terms of that license shall supersede any condition above with which it is incompatible. (Thanks to Philip Hazel, creator of PCRE, for the above licensing statement.) ----------------------------------------------------------------------------- ========================================= PRegEx Quick-Reference / Interface Summary ========================================= A complete detailed description of all functions follows later in this document. This is just a summary for quick reference. Housekeeping functions: ----------------------- PRegEx_Clear ([Complete]) ==> void; partial or complete reset PRegEx_GetPRegExVersion () ==> Version string of PRegEx (e.g. "1.0") PRegEx_GetPCREVersion () ==> Version string of PCRE (e.g. "3.4") Search/Replace low-level interface: ----------------------------------- PRegEx_SetSearchString (SrchStrL) ==> True or -Err PRegEx_SetMatchPattern (RE, [Opts]) ==> True or -Err PRegEx_GetNextMatch ([noBlastBR])==> True or -Err PRegEx_ReplaceString (ReplPat) ==> True or -Err Search/Replace high-level interface: ------------------------------------ PRegEx_Search (SrchStrL, RE, [Opts]) ==> FoundCount or -Err PRegEx_SearchExec (SrchStrL, RE, Opts, #Callback, [ArgList]) PRegEx_SearchBegin (SrchStrL, RE, [Opts]) ==> 1 (success) or -Err PRegEx_SearchContinue() ==> 1: Found; 0: Done; Negative: -Err PRegEx_Replace (SrchStrL, RE, Opts, ReplPat) ==> FoundCount PRegEx_ReplaceExec (SrchStrL, RE, Opts, #ReplFunction, [ArgList]) Search/Extract utilities: ------------------------- PRegEx_Split (SrchStrL, RE, [Opts, InitList, Max])=>List PRegEx_ExtractIntoList (SrchStrL, RE, [Opts, InitList])=>PList PRegEx_ExtractIntoSPList (SrchStrL, RE, [Opts, InitList])=>PList PRegEx_ExtractIntoSPListSym(SrchStrL, RE, [Opts, InitList])=>PList Match Status functions: ----------------------- PRegEx_FoundCount () ==> Running or final count of match events PRegEx_GetPos () ==> Char pos where last left off; next begins PRegEx_SetPos (num) ==> Change pos (0 <= Pos <= buffer len) PRegEx_GetMatchBRCount() ==> Number of back refs in last matched RE PRegEx_GetMatchString ([num]) ==> Last matched str (entire -or- BR #) PRegEx_GetMatchStart ([num]) ==> Start pos of "" (entire -or- BR #) PRegEx_GetMatchLen ([num]) ==> Length of "" (entire -or- BR #) Error-handling functions: ------------------------- PRegEx_LastErrCode () ==> Error code for last failed call PRegEx_DescribeError ([Err]) ==> Error msg (Err or LastErrCode) PRegEx_CompiledOK () ==> True if last expression compiled PRegEx_MemError () ==> True if last op failed due to memory PRegEx_MemErrorSticky () ==> True if any op has failed due to mem PRegEx_MemErrorStickyReset () ==> Reset sticky err; return prev value Preference flags: ----------------- PRegEx_ErrorsToMessageWindow ([Bool]) ==> Echo all errors to Msg wind. String-manipulation utility functions: -------------------------------------- PRegEx_QuoteMeta (String) ==> String with RE-special chars quoted PRegEx_Translate (SrchStrL, InputTable, OutputTable) ==> ChangeCount PRegEx_Interpolate(String, [VarsPList]) ==> String List-manipulation utility functions: ------------------------------------ PRegEx_CopyList(ListOrPList, [Deep, InitList]) ==> CopiedListOrPList PRegEx_Grep (List, RE, [Opts]) ==> NewList ("PRegEx mode") PRegEx_Grep (List, #Filter, [ArgList]) ==> NewList ("Filter mode") PRegEx_Map (List, #MapFunction, [ArgList]) ==> MappedList PRegEx_Sort (List, DeepCopy, #SortFunction, [ArgList]) ==> NewList PRegEx_Reverse (List, [DeepCopy]) ==> Reversed copy PRegEx_Join (List, [DelimiterString]) ==> String PRegEx_Keys (PList, [InitList]) ==> KeyList PRegEx_Values (PList, [InitList]) ==> ValueList PRegEx_GetSlice(List, Keys, [InitList]) ==> SliceList PRegEx_SetSlice(List, Keys, Values) ==> List PRegEx_PListToList (PList, [InitList]) ==> List PRegEx_PListToListStrings(PList, [InitList]) ==> List PRegEx_ListToSPList (List, [InitPList]) ==> SPList PRegEx_ListToSPListSym (List, [InitPList]) ==> SPList General utility functions: -------------------------- PRegEx_ReadEntireFile (FilePath) ==> StringBufferList PRegEx_WriteEntireFile (FilePath, StringBufferList, Append) ==> 1/-Err Callback-related functions: --------------------------- PRegEx_CallHandler (#CallbackFunction, [ArgList1, ArgList2]) PRegEx_CallbackAbort([bool]) ==> Stop operation and fail with error PRegEx_CallbackStop ([bool]) ==> Stop before this iteration, but succeed PRegEx_CallbackLast ([bool]) ==> Stop after this iteration, but succeed PRegEx_CallbackSkip ([bool]) ==> Skip this iteration, but continue Error code constants: --------------------- PRegEx_ErrCode_OutOfMemory() PRegEx_ErrCode_SearchStrLMustBeList() PRegEx_ErrCode_SearchStrLMustContainString() PRegEx_ErrCode_SearchStrLLengthArgMustBeInteger() PRegEx_ErrCode_REMustNotBeEmpty() PRegEx_ErrCode_REDidNotCompile() PRegEx_ErrCode_ReplPatMustBeString() PRegEx_ErrCode_CallbackFuncMustBeSymbol() PRegEx_ErrCode_CallbackFuncDidNotReturnString() PRegEx_ErrCode_QuoteMetaNeedsString() PRegEx_ErrCode_TriedToMatchWithoutSearchStrL() PRegEx_ErrCode_TriedToMatchWithoutSearchPattern() PRegEx_ErrCode_TriedToReplaceWithoutMatching() PRegEx_ErrCode_CallbackRequestedAbort() PRegEx_ErrCode_UnexpectedMOAError() PRegEx_ErrCode_UnexpectedInternalError() PRegEx_ErrCode_CallbackFunctionNotFound() PRegEx_ErrCode_ExpectedListArgument() PRegEx_ErrCode_ExpectedPListArgument() PRegEx_ErrCode_GrepNeedsFunctionNameOrPRegEx() PRegEx_ErrCode_ExpectedStringArgument() PRegEx_ErrCode_SortFunctionDidNotReturnInteger() PRegEx_ErrCode_ListIndicesMustBeIntegers() PRegEx_ErrCode_FileNotFound() PRegEx_ErrCode_ErrorOpeningFile() PRegEx_ErrCode_ErrorReadingFile() PRegEx_ErrCode_ErrorWritingFile() Perl-ish shorter function names: ------------------------------- These perl-friendlier "aliases" to certain of the PRegEx functions have been provided. Their syntax is more evocative for Perl prorammers, and the shorter names might be preferred by the typing-impaired. re_m ==> PRegEx_Search (aka "match") re_s ==> PRegEx_Replace (aka "substitute") re_search ==> PRegEx_Search re_replace ==> PRegEx_Replace re_get ==> PRegEx_GetMatchString re_pos ==> PRegEx_GetPos re_extract ==> PRegEx_ExtractIntoList re_extractp ==> PRegEx_ExtractIntoSPList re_extractps ==> PRegEx_ExtractIntoSPListSym re_call ==> PRegEx_CallHandler re_abort ==> PRegEx_CallbackAbort re_stop ==> PRegEx_CallbackStop re_last ==> PRegEx_CallbackLast re_skip ==> PRegEx_CallbackSkip re_quotemeta ==> PRegEx_QuoteMeta re_tr ==> PRegEx_Translate re_i ==> PRegEx_Interpolate re_split ==> PRegEx_Split re_join ==> PRegEx_Join re_grep ==> PRegEx_Grep re_map ==> PRegEx_Map re_sort ==> PRegEx_Sort re_reverse ==> PRegEx_Reverse re_copy ==> PRegEx_CopyList re_keys ==> PRegEx_Keys re_values ==> PRegEx_Values re_slice ==> PRegEx_GetSlice re_slice_set ==> PRegEx_SetSlice re_list ==> PRegEx_PListToList re_list_strs ==> PRegEx_PListToListStrings re_hash ==> PRegEx_ListToSPList re_hash_syms ==> PRegEx_ListToSPListSym re_read ==> PRegEx_ReadEntireFile re_write ==> PRegEx_WriteEntireFile re_err ==> PRegEx_LastErrCode re_debug ==> PRegEx_ErrorsToMessageWindow ========================================= PRegEx Return Values: General Principles ========================================= Unless otherwise noted, search/replace related functions return an integer saying how many matches were successfully made, even if fewer replacements were completed due to some being skipped by the program. For functions returning match counts, a return value of 0 means successful operation, but means that 0 matches were found (and of course 0 replacements were done). Any NEGATIVE INTEGER returned by any function is an ERROR CODE, which may be interpreted using the Error-related features of PRegEx, described Later. Some functions return a 1 meaning successful completion or a negative error code if an error occurred. Consequently, you should never treat the return of PRegEx functions as Booleans when checking whether a match was done, because Lingo considers all non-zero numbers, even negative numbers, to be "true". Instead, you should check integer results for being > 0 or > -1, depending on your interest. Wrong: if (PRegEx_Search(str, "foo", "g") ) then print "Found!" Right: if (PRegEx_Search(str, "foo", "g") > 0) then print "Found!" Most functions that do not ordinarily return integers will either return void or empty strings or empty lists when there is an error encountered, and their error code is then set in the LastErrCode flag, which may subsequently be queried. Remember, a failure to match is never an "Error" from PRegEx's point of view. An "Error" always means a parameter error, syntax error, or runtime error, such as memory or disk problems. A failure to match is viewed as the successful completion of a match request whose answer happened to be "zero matches". ========================================= PRegEx Parameter: General Descriptions ========================================= In all function prototypes shown above and below, sample argument names are used consistently to represent arguments of a particular type or meeting certain criteria. For example, "RE" always means a Regular Expression string, "Opts" always means a 0-7-character string of option flags, etc. This section is a glossary explaining each of these standard argument types. Unless otherwise noted, the descriptions here apply to all functions in which these named parameters appear. RE -- Regular Expression pattern Example: "(dog)|(cat)" This is a simple Lingo string containing literal characters and/or special character sequences that specify what is to be searched for. See above section, and PCRE and/or Perl documention for precise details of the RE syntax. Opts -- Options string Example: "gisx" A string of 0-7 option flag chars in any order. Any other type of argument is treated like an empty string ("") and results in all options being turned off. Any other characters in Opts are silently ignored. The 7 option flags are: Pattern matching flags: i == case Insensitive matching Corresponds to PCRE option PCRE_CASELESS s == "Single line" mode (. and \s match newline) Corresponds to PCRE option PCRE_DOTALL m == "Multi line" mode (^ and $ match internal line start/end) Corresponds to PCRE option PCRE_MULTILINE x == eXtended mode Ignores whitespace in patterns; allows comments. Corresponds to PCRE option PCRE_EXTENDED Behavior control flags: t == sTudy; optimize the PRegEx by "Studying" it first. g == Global; re-do Srch or Srch/Repl till no more match e == Exec; call a callback function on each iteration (see also SearchExec, ReplaceExec, SearchBegin.) SrchStrL -- String to be searched ("String Buffer List") Examples: ["my data my data my data"] -- string only ["my data my data my data", 23] -- with optional length ["my data my data my data", 23, 0] -- 0 means no NUL chars You must pass search string buffers to PRegEx in a special, arguably unusual, way. Instead of passing a string as you normally would when calling a Lingo command, you pass a LIST CONTAINING A STRING upon which searching/replacing commands can operate. SrchStrL is a regular Lingo list. The operation occurs on the FIRST ELEMENT of the list. If SrchStrL is not a list, it's a param error. A non-string first element or an empty list is considered a parameter error. The second, optional, element of the list is a length value. If supplied, it is taken to be the intended length (even if not the actual length) of the first element. Of course, this value should be no greater than length(SrchStrL[1]). and no less than zero. String buffer to be searched may contain any amount of binary data, including ascii zero (NUL), which does NOT signify end-of-string. (However, you should be aware of bugs in Director's Message and Debug windows which incorrectly display string buffers that have NULs in them as if the buffers were truncated at that position. Don't worry: the data is still in the buffer even if it is printed out wrong.) Supplying this element overrides the Xtra's perceived length of the buffer. This allows the search or other operation to take place on a reduced subset of the string. (Warning: doing a replace on this string will truncate it at the specified point. Writing a file from this string will also truncate the resulting file.) The third, optional, element is a boolean integer (0 or 1) which says whether the string buffer in element 1 is known to contain NUL characters (this is set for you by ReadEntireFile, for your convenience, because you may want to use the data with non-NUL-friendly Xtras and it will be helpful to know if it has "binary" data that could trump them up). Its value is never observed by PRegEx and so does not alter the behavior of any PRegEx functions -- all PRegEx functions are NUL-safe. They never assume that your data does not contain NULs. Other elements of the list, if any, are left untouched by any functions that modify your SrchStrL. WHY THE LIST/STRING APPROACH? Storing the string in a list is how we do pass-by-value to minimize copying of the string, and also allow you to hold the string in a single, named Lingo variable, while calling multiple Search and/or Replace commands that will modify the string buffer in place for you without replacing or renaming your variable. This also allows you to pass your string buffer around from one Lingo function to another and to PRegEx functions without copies of the string data getting made each time you make a function call. For example: set File = PRegEx_ReadEntireFile("@:SettingsFile.txt") -- is a SrchStrL PRegEx_Replace(File, "(\x0D\x0A)|[\x0D\x0A]", "g", "\n") -- line ends PRegEx_Replace(File, "\n+", "g", "\n") -- remove blank lines PRegEx_Replace(File, "\t+", "g", "\t") -- multiple tabs --> single tab set SettingProps = PRegEx_ExtractIntoSPList(File, "(.*?)[\t\n]", "g") ReplPat -- Replacement string or pattern Example: "Date: \1 Time: \3 Place: \2\n" This is the replacement string for any PRegEx functions that do replacing. It can be a simple string, OR it may also contain special escape sequences to specify backreferences \1, \2, etc. or other special characters. Special escape sequences recognized within ReplPat string: \n == inserts a return char (same as RETURN constant in Lingo) \t == inserts a tab char \xHH == inserts character HH (hex) \0 == inserts the ascii zero (NUL) character \#[##] (\ + 1 or 2 digits) == inserts backreference # or ## \### (\ + 3 digits) == character ### in octal \0### (\ + 0 + 3 digits) == character ### in octal \\ == inserts a backslash character itself \{any other char} ==> just the character. (e.g. \a = char "a") The process of interpreting these escape sequences and converting them into the actual data is called "interpolation". It is done automatically on replacement strings, and may also be done explicitly by calling and by the PRegEx function PRegEx_Interpolate(). Don't get confused: these sequences are not generally recognized by Lingo; the are only interpreted within PRegEx search patterns (REs) and replacement patterns (ReplPats), and by PRegEx_Interpolate(). InitList For most PRegEx functions whose purpose is to create a list, an optional InitList parameter may be specified. If specified, then the function will begin with that list and modify it, rather than creating a new list for you. Otherwise, all list-generating functions automatically begin with a new, empty, list. This allows you to progressively build up a list through several invocations of PRegEx_ routines, or to use any PRegEx_ routines to append items to an existing list. ArgList For any functions that take Callback functions, they also take an optional ArgList argument (which defaults to [], the empty list). The values inside the ArgList will be passed to the callback function, AFTER any other task-centric values that must be passed. So, for example, a #FilterFunction that must take a single argument and return a boolean saying whether that argument should be "in" or "out", gets passed item to be filtered as its first argument, PLUS additional arguments, if any, are taken from the supplied ArgList. Additional arguments could include data to be compared against, or perhaps other lists or property lists or instance objects that can be used to access a database or other external resources, or to serve as persistent state between multiple calls to the callback function. Using ArgList is a good practice because it lets you call callback functions by name without relying on global variables to communicate with those functions -- pass any parameters the function needs in order to operate in ArgList rather than using globals. #ReplFunction -- Callback function for replacement The SYMBOL name of a Lingo handler to be called during one of the _Replace* commands. The function is called EVERY time the command makes a successful match (0 or 1 time if "global" option is off; 0 or more times if "global" is on). The return value, which MUST be a string, is inserted as the replacement text. The replacement command itself does not pass any arguments to the function, but you may specify an optional ArgList parameter, whose elements, if any, will be passed, each time, as arguments to #ReplFunction. #ReplFunction may request backrefs or the entire match string by calling PRegEx_GetMatchString(N), and may discover which of multiple iterations it is on by calling PRegEx_FoundCount(). Note that there is no way for the #ReplFunction to know whether it is being called for the last time during a global replace (there is no final "cleanup" call). As with all callback functions in PRegEx, #ReplFunction may signal to the function that is calling it that the function should abort, stop, skip, or "last" -- see PRegEx_CallbackAbort, etc. Example of typical uses: - selectively replace based on calculated criteria - terminate a replacement early based on calculated criteria - look up or translate symbols from a property list or database at runtime and insert them into the correct locations in a buffer. - extract some data before/while it is being replaced #Callback -- General-purpose callback function This is the symbol name of a Lingo handler in the Movie scope that will be called, generally with arguments optionally supplied by the calling routine, and may do anything it wishes, but should avoid actions that would stop playback or otherwise terminate the caller's context. ========================================= PRegEx: Detailed Function Descriptions ========================================= Note: common parameters are described in detail in the section above. That information is not generally reiterated in the descriptions below. Housekeeping functions: ---------------------- PRegEx_Clear ([Complete]) ==> void; partial or complete reset Clears internal state, search strings, back references, buffers, error codes, etc, except for MemErrorSticky. "Complete" option also clears call stack, if any, callback flags, and other info. DO NOT USE "Complete" option except when first starting up. Clear is automatically called by all high-level search/replace functions. PRegEx_GetPRegExVersion () ==> Version string of PRegEx (e.g. "1.0") PRegEx_GetPCREVersion () ==> Version string of PCRE (e.g. "3.4") As described. Search/Replace low-level interface: ----------------------------------- Note: For best results, avoid using these "low-level" routines directly. They are really intended only for someone who needs to directly control the individual steps of setting up a search and/or replace, or who, for efficiency reasons, would like to keep a single SrchStrL variable and repeatedly apply multiple REs to it. The low-level routines ignore the "global" option. They assume the caller wants to control multiple matches. PRegEx_SetSearchString (SrchStrL) ==> True or -Err Sets a new string to be operated on. Resets all counters and buffers and flags, except the match pattern. Resets Pos to zero. PRegEx_SetMatchPattern (RE, [Opts]) ==> True or -Err Initializes engine and then compiles new RE. Sets Opts for subsequent operations. Resets all counters and buffers and flags, except the search string. Resets Pos to zero. PRegEx_GetNextMatch ([noBlastBR])==> True or -Err Performs one single search event in the current string, using the current pattern and options, beginning at the current Pos, either the Pos left from the immediate previous search (of any kind), or from a Pos you determine by first using SetPos(). When GetNextMatch succeeeds, any previous global back-reference data is replaced by the new back-reference data (see "Match Status Functions" below). When it fails, all back-reference buffers are cleared out and MatchStatus functions will all return zero/empty/void. The optional noBlastBR argument tells GetNextMatch to not blow away the back-reference buffers when it FAILS, but instead, to keep the information there from the previous successful match. Important special case: If Entire Match is zero-length (i.e. a match succeeded but matched string had no length), then Pos will be increased before next the iteration; this guarantees that a global match will terminate by stepping through the string character-by-character rather than spinning endlessly at the starting position. This behavior applies to all matching functions in PRegEx. PRegEx_ReplaceString (ReplPat) ==> True or -Err ONLY AFTER a successful match, replaces the entire matched segment with ReplPat, after "interpolations" have been performed (i.e. inserting back references or other special escape sequences into a copy of ReplPat before then inserting the resulting string into the search buffer). Note that all Replace functions in PRegEx MODIFY the original buffer. They never return a copy. Search/Replace high-level interface: ------------------------------------ You should almost always choose to use these "high-level" functions and avoid the "low-level" interface whenever possible. Only the high-level functions are aware of the "g" (global) flag. These "high-level" search/replace functions, and any other functions that use SrchStrL, RE, or Opts arguments, always interally call the low-level functions listed above, or their equivalents, as needed to perform their documented tasks. Their function is abstractly described here partially in terms of the low-level functions above; and these routines have the same effect as if they were implemented by actually calling the low-level routines. However, in actual fact, they may or may not be implemented exactly that way; for example, doing a global replace is implemented more efficiently by doing all the searching in one shot and then all of the replacing, rather than by repeatedly calling GetNextMatch and ReplaceString. Consequently, do not rely on any particular assumptions about the contents of a string buffer DURING the course of operation of a single high-level Replace (say, for example, inside a callback function being called in the middle of a global Replace). PRegEx_Search (SrchStrL, RE, [Opts]) ==> FoundCount or -Err Sets up and does a search, comparing SrchStrL to RE. If Global, the search is repeated continuously until it cannot match anymore. Afterwards, the Match Status functions only return information pertaining to the LAST successful search done. If there were zero matches, then the Match Status information will all be empty/void. Returns the FoundCount or Err code. In non-global mode, this will be 0 or 1, but should not be. In global mode it will be 0 or higher and can be treated as a count of the number of entire matches. If "e" (exec) option is supplied, then Search behaves exactly like SearchExec, documented below. Equivalent to: - Call PRegEx_SetMatchPattern; or fail if error - Call PRegEx_SetSearchString; or fail if error - Call PRegEx_GetMatch 1 time or until search fails if global; return Err if error; Retain back refs from ultimate successful search when in global mode). - return PRegEx_FoundCount() PRegEx_SearchExec (SrchStrL, RE, Opts, #Callback, [ArgList]) Like PRegEx_Search, but takes a #Callback function, which is called, with arguments from optional ArgList, after each SUCCESSFUL match that takes place. Callback may use any of the Match Status functions to inquire about the current match. PRegEx_SearchBegin (SrchStrL, RE, [Opts]) ==> 1 (success) or -Err PRegEx_SearchContinue() ==> 1: Found; 0: Done; Negative: -Err These two functions are used as a pair if you want to execute some Lingo code in-line, each time a successful match takes place, like this: if (PRegEx_SearchBegin(str, "(\w+)", "g") > 0) then repeat while (PRegEx_SearchContinue() > 0) put PRegEx_MatchString(1); if PRegEx_FoundCount() > 3 then exit repeat end repeat end if PRegEx_Replace (SrchStrL, RE, Opts, ReplPat) ==> FoundCount Sets up and performs a single or global search and replace in SrchStrL using RE and Opts. ReplPat is interpolated and inserted on each successful match. If "e" (exec) option is supplied, then Replace behaves exactly like ReplaceExec, documented below (ReplPat is replaced by an executable #ReplFunction, with optional argument list). PRegEx_ReplaceExec (SrchStrL, RE, Opts, #ReplFunction, [ArgList]) Like Replace, but instead of using a fixed ReplPat string, calls #ReplFunction, optionally suplying any arguments from ArgList. (Note: Replace does NOT supply any information about the match directly to #ReplFunction. #ReplFunction should use any of the MatchStatus routines for that information, if needed. #ReplFunction is REQUIRED to return a string each time it is called. Failure to do so causes immediate termination of ReplaceExec, with an error code being returned. The string returned by #ReplFunction is used as the replacement for the entire matched string. Returning the empty string, then, causes the matched string to be deleted from the string buffer. Returing PRegEx_GetMatchString(0), causes the original string to replace itself, essentially skipping this replacement. The string returned by #ReplFunction is not subject to interpolation, but rather inserted literally into the buffer. So don't try to return "Joe \1 Blow" and expect \1 to convert into back-reference. The #ReplFunction may and should use the Callback-related Abort/Stop/Skip/Last flags, described later, in order to signal ReplaceExec to alter its default looping behavior. Search/Extract utilities: ------------------------- Searching with parentheses and then checking back-references is the standard way to retrieve searched/matched data from a string buffer. The Search and Replace functions, combined with the Match Status functions, make it easy to extract values one at a time or in small clusters. The Search/Extract utilities, on the other hand, provide convenient ways to extract an arbitrary number of data values from a string buffer in one or a few quick operations. PRegEx_Split (SrchStrL, RE, [Opts, InitList, Max])=>List "Splits" a string buffer, using the pattern specified in RE as a delimiter. The matched portions of the string are REMOVED, and the intervening segements are extracted into a list. However, if the RE contains backreferences, then ALL of the backreferences generated by the RE, in numeric order, will be inserted, each as a separate element, into the resulting list at the appropriate point in the list. This allows retention of all the matched portions of the original string, as well. Here's another way to think about Split: it's the same as PRegEx_ExtractIntoList, but in addition to extracting the backreferences from each match, also adds all of the strings BETWEEN each matched segment, effectively "split"ting the string into multiple strings. Optional MaxItems argument, which must be 2 or greater to be meaningful, limits the maximum number of items that the list will be split into. (i.e. limits the max number of successful matches to (MaxItems - 1)). Omitting the optional Opts argument or omitting the "g" flag from Opts has the same effect as setting Max = 2 because only one match will be performed and the string will be split into two parts. If MaxItems is zero or unspecified, Split will remove any empty trailing items that would result if the delimiter RE is found to match at the very end of the search string. In other words, splitting "1,2," on comma would yield ["1", "2"]. However, if MaxItems is ANY NEGATIVE NUMBER, then empty trailing items will not be removed and the result would be ["1", "2", ""]. Note: in order to be able to pass MaxItems, you'll be forced to also pass values for Opts and InitialList. These can be defaulted to "" and [], respectively. Examples: put PRegEx_Split(["1 2 3"], "\s+", "g") -- splitting whitespace - ["1", "2", "3"] put PRegEx_Split(["1 2 3"], "\s+", "g", [], 2) -- max 2 items - ["1", "2 3"] put PRegEx_Split(["1 2 3"], "(\s+)", "g") -- keeping whitespace - ["1", " ", "2", " ", "3"] put PRegEx_Split(["1 2 3"], "(\w+)", "g", [], 0) -- delim @ start,end - ["", "1", " ", "2", " ", "3"] -- note "" at start, but not end put PRegEx_Split(["1 2 3"], "(\w+)", "g", [], -1) -- note Max = -1 - ["", "1", " ", "2", " ", "3", ""] -- note "" at start, AND at end PRegEx_ExtractIntoList (SrchStrL, RE, [Opts, InitList])=>PList Does a global or non-global search, putting ALL MATCHED BACK REFERENCES (omitting non-matched ones, but keeping empty matches) from each iteration into a lingo list; if global, repeats until matching fails, gathering up all the back references from all iterations along the way. Equivalent to the following: - Start with InitList or create an empty list to hold elements. - Enter a Begin/Continue loop; if errors, return empty list. - On each iteration, call PRegEx_GetMatchBRCount to count backrefs - For each back reference: - Call PRegEx_GetMatchString - If error, abandon partial list & return an empty list. - Insert string into list - Return list. PRegEx_ExtractIntoSPList (SrchStrL, RE, [Opts, InitList])=>PList PRegEx_ExtractIntoSPListSym(SrchStrL, RE, [Opts, InitList])=>PList These Extract routines are the same as PRegEx_ExtractIntoList, but using a sorted property list; strings extracted using the current set of matched backreferences are inserted pairwise into the list. Here is how it works... as each complete pair is retrieved: - Use first item in pair as the key, second item as the value. - Add/Replace an entry into the SPList - If odd number of items, then use as final value. The properties generated by ExtractIntoSPList are "String" properties, which IS allowed in Lingo, and can be absolutely any string. ExtractIntoSPListSym is identical except that it converts all property strings to symbols before inserting them into the list. Consequently, it is imperative to ensure that all strings destined to become properties can actually be converted into legal Lingo symbols. (Lingo places many restrictions on what characters may legally appear in property names (aka symbols). It is your repsonsibility to ensure the input is going to be clean, or some funky, broken, or illegal symbols could result.) Examples: put PRegEx_ExtractIntoSPList (["c d b a", (\w+), "g"]) -- ["a":"b", "c":"d"] put PRegEx_ExtractIntoSPListSym(["c d b a", (\w+), "g"]) -- [ #a:"b", #c:"d"] Match Status functions: ----------------------- These functions return information about the last successful match AND any backreference substrings that are available due to the use of parentheses inside the RE. PRegEx_FoundCount () ==> Running or final count of match events This returns the number of matches completed by a previous search even, or done up to this point in an ongoing search. Always re-set to 0 at the start of any match-related function except GetNextMatch itself. Incremented by 1 each time a match happens, and always before any callback routines, so callback routines may call this to find out the iteration count of a global search IN PROGRESS. Note: this function does not count backreference matches. It counts each entire successful match as one event, regardless of the number of successful backreference matches each might have had within it. PRegEx_GetPos () ==> Char pos where last left off; next begins PRegEx_SetPos (num) ==> Change pos (0 <= Pos <= buffer len) "Pos" is the character offset within the currently-active SrchStrL of where the current or most recent successful match STOPPed (which is also the beginning point for the next attempted match, unless the string buffer or PRegEx are replaced. GetPos returns this value. SetPos lets you set the Pos for the following GetNextMatch either ahead or backward. SetPos(0) would always restart from the beginning. The legal bounds of Pos are 0 <= Pos <= length(SrchStrL[1])). Generally, it is recommended that you avoid calling SetPos during the midle of any of the high-level Search/Replace routines, especially the Replace routines, or unpredictable results could occur. Instead, call SetPos() only when working with the low-level interface routines. High-level routines always re-set Pos to zero before they start, because they internally call the low-level routines SetMatchPattern and SetSearchString, which have this effect as well. Recommendation: instead of ever using GetPos or SetPos, use the power of REs to extract the data you need based on its pattern and nearby context, rather than trying to search at specific character positions within a buffer. PRegEx_GetMatchBRCount() ==> Number of back refs in last matched RE Returns the number of backreference-generating parenthesis pairs that were in the currently-successfully-matche RE. This number serves as the upper bound of the "num" argument to the following routines -- i.e. it gives the number of the highest-available numbered back reference from the current match. PRegEx_GetMatchString ([num]) ==> Last matched str (entire -or- BR #) PRegEx_GetMatchStart ([num]) ==> Start pos of "" (entire -or- BR #) PRegEx_GetMatchLen ([num]) ==> Length of "" (entire -or- BR #) These return the entire string, its start position within the original buffer, and its length, for the Entire Match, or, if num is supplied and > 0, for any numbered backreference string. If GetMatchString and GetMatchLen return "" and 0, respectively, it means the corresponding match string was a successful match, but empty, and GetMatchStart will still give the correct offset of that matched position. If they return void, it means that there is no corresponding successful match, and GetMatchStart will also return void. For example: put PRegEx_Search(["Ravi is a nice guy"], "((Chris)|(Ravi))") -- 1 put PRegEx_GetMatchString(0) -- "Ravi" put PRegEx_GetMatchString(1) -- "Ravi" put PRegEx_GetMatchString(2) -- -- 2nd set of parens did not kick in put PRegEx_GetMatchString(3) -- "Ravi" You can use this to check which of several alternate cases in a match pattern was the successful one: if PRegEx_GetMatchString(2) = void then put "Ravi matched." if PRegEx_GetMatchString(3) = void then put "Chris matched." -- "Ravi matched." Error-handling functions: ------------------------- PRegEx_LastErrCode () ==> Error code for last failed call Yields the numeric error code generated by the immediate previous PRegEx function call. 0 means success. All other codes are negative values. Some functions return their error codes, and LastErrCode() will agree with those; others do not return integers, and so checking LastErrCode() is the only way to check the exact error in case they return an unexpected result. PRegEx_DescribeError ([Err]) ==> Error msg (Err or LastErrCode) Given an Error code, returns a string message explaining it. If no Err is supplied, then describes PRegEx_LastErrorCode() Returns empty string if the Error code is zero (success). Example: put PRegEx_DescribeError(PRegEx_ErrCode_SearchStrLMustBeList()) -- "PRegEx: SearchStrL argument must be a Lingo list." PRegEx_CompiledOK () ==> True if last expression compiled Returns true if and only if the last attempted compilation of a regular expression succeeded, even if there have been other intervening errors since then. PRegEx_MemError () ==> True if last op failed due to memory Returns true if the last PRegEx function generated a memory error. Each new PRegEx function call resets this value. PRegEx_MemErrorSticky () ==> True if any op has failed due to mem PRegEx_MemErrorStickyReset () ==> Reset sticky err; return prev valuex MemErrorSticky() returns true if ANY PRegEx function has generated a memory error at any point since the last call to PRegEx_Clear(1) ("Complete" reset), or since the last call to PRegEx_MemErrorStickyReset(), which turns off this flag until the next memory error occurs. This flag could be checked after a long sequence of PRegEx calls to see if there was a problem encountered. Or, it could be checked every time through an idle loop, perhaps. Preference flags: Functions listed in this section act as both the Get() and Set(1/0) functions for the correspondingly-named preferences. (Call with no arguments to Get() the value, and call with 1 argument to Set the value, which is also returned to you.) PRegEx_ErrorsToMessageWindow ([Bool]) ==> Echo all errors to Msg wind. Tells PRegEx to echo the string description of any error codes generated by any PRegEx routine directly to the message window immediately as they occur. This can be left on all the time, if desired, since it will have no effect during projector playback, since projectors lack a message window. String-manipulation utility functions: -------------------------------------- PRegEx_QuoteMeta (String) ==> String with RE-special chars quoted Takes a Lingo string and returns a copy of the string with any potentially special "meta" characters "quoted" ("escaped") by having a backslash inserted in front of them. This makes the string "safe" to use in an RE, even when its contents or origin cannot be known or trusted in advance (e.g. searching for user-supplied data with a potentially untrusted user, or any time when you know you want to search literally for a string that might have special characters in it and you may or may not know that in advance. Maybe you want to search for "?" or backslash, for example). The characters that get escaped are EVERY CHARACTER EXCEPT a-z, A-Z, 0-9, and underscore. As a special case, NUL characters in the input are escaped as "\0", so the output of QuoteMeta is 100% compatible with the ReplPat argument to the Replace functions. In other words, the QuoteMeta function is equivalent to this Lingo example (except it does NOT have the side effect of modifying the current search string, pattern, or Match Strings etc. as calling PRegEx_Replace would do): on QuoteMeta String set myStr = [String] PRegEx_Replace(myStr, "([^A-Z_0-9])", "gi", "\\\1") PRegEx_Replace(myStr, "\0", "g", "\\0" ) return myStr[1] end QuoteMeta Note: PRegEx_Interpolate can be used to reverse the processing done by QuoteMeta. PRegEx_Translate(SrchStrL, InputTable, OutputTable) Converts chars in SrchStrL using the mapping specified. InputTable and OutputTable are a pair of strings specifying input-chars and corresponding output-chars; any input-char mentioned in SrchStrL will be mapped to the corresponding output-char. Others will be untouched. Dashes can be used in InputTable and OutputTable to signify a range of characters. Example: PRegEx_Translate(SrchStrL, "a-z", "n-za-m") -- Rot13 encode/decode Supports interpolation of \t, \n, \0, \\, \xDD for hex, \123 for octal in the InputTable and the OutputTable. But, does NOT support back-reference interpolation. That would almost never be helpful. \# and \## are ignored, consequently, except for \0. Does NOT support expression syntax. "Translate" has its own, different, syntax. InputTable and OutputTable may not contain ascii-zero (NUL) characters (if they do, they'll be effectively truncated at that point), but the escape code \0 may be used to specify this character for mapping purposes. If you want to mention a literal dash in either the InputTable or OutputTable, that character must either be the first or last character in the table, where it couldn't possibly be interpreted as a range specifier. If for any reason there are fewer characters in the Output table than in the Input table, then the last character is understood to be replicated as necessary. Examples: PRegEx_Translate(SrchStrL, "-.", "M") -- dash or dot become M PRegEx_Translate(SrchStrL, "\177-\377", "_") -- high-ascii --> _ Returns number of characters that changed; 0 if none did; or a negative error code if there is an error in the parameters. PRegEx_Interpolate(String, [VarsPList]) ==> String Does the pre-processing step that PRegEx_ReplaceString would do before it does a replace, and returns the interpolated string. Note: Since interpolation is usually done on short-ish programmer-supplied strings rather than large buffers, the incoming argument is a simple string, not a String Buffer (list). Supports all of the escape codes mentioned in the "ReplPat", including insertion of back-references, if any. IN ADDITION to the normal interpolation, and IF the optional argument of VarsPList is supplied, then the sequence ${Foobar} inside the String will be replaced with the value of the property (string) "Foobar" from VarsPList, and ${#Foobar} will be replaced with value of the property (symbol) #Foobar. Properties whose values are absent or not of type "string" will result in an empty string being inserted. Example: set Props = [#FirstName: "Joe"] set Location = "Town: Davis County: Sacramento" PRegEx_Search([Location], "Town: (.*?) County:") -- sets \1 put re_i("\1 says \x22Welcome, ${#FirstName}!\x22", Props) -- "Davis says "Welcome, Joe!"" Note: Although not documented to behave this way, in the current MOA implementation, searching a property list for the property "a" is considered equivalent to searching for the property #a, and vice versa. Consequently, Interpolate also has this behavior -- i.e. it does not distinguish between the string symbol form of the property name. However, if MOA ever "corrects" this behavior, then Interpolate will behave with the more strict interpretation documented above. Just be sure to use or omit the "#" as documented here, and your code will be upwardly-compatible with future versions of MOA. Then, if you never intermix symbol properties and string properties in the same property list, you probably will not have to worry about this subtlety. Note, however, that strings can contain any character(s) in any length, while symbols have a more limited range of legal characters. However, symbols are much faster to look up in a large property list. List-manipulation utility functions: ------------------------------------ These are PRegEx-supplied variants of favorite built-in Perl functions. In Perl, regular expressions and list manipulation are tightly coupled, so it's only natural that PRegEx should strive for the same. You'll notice that many of these functions are generically useful list-manipulation functions, even if you don't need to do any searching, replacing, and extracting. PRegEx_CopyList(ListOrPList, [Deep, InitList]) ==> CopiedListOrPList Returns a copy (shallow by default, deep if Deep is true) of the given List. If a memory error occurs, returns an error code instead of a list. Warning: Deep copying does not check for recursive list inclusion. If you try to Deep copy a recursive data structure, the routine will run for a VERY LONG TIME till memory is filled up and then fail with a memory error. If InitList is passed, it must be the same type of list as ListOrPList. If present, the items copied from ListOrPList will be copied into InitList. This is a way to use CopyList to deeply or shallowly APPEND items from a list onto another list (or in the case of PLists, ADD those arguments). Note: Assumes that all new PLists should be marked as "sorted". Note: Deep copying only makes deep copies of elements that themselves are Lists or PLists. Otherwise, any other type of object is shallowly copied. (Possible future improvement: if a child object has a "clone" method, Deep mode could check for that method and try to call it to allow the object to clone itself.) PRegEx_Grep (List, #Filter, [ArgList]) ==> NewList ("Filter mode") Grep produces a new list derived by filtering an existing one. Grep has two modes. This is the first one. It is triggered by suppling a STRING (RE) as the second argument. Returns a new list whose contents are the elements of List for which, when matched against RE/Opts, produce at least 1 match. Elements of the incoming List must be plain strings, or SrchStrL string buffers (a list containing a string and optional length integer). Elements that do not meet these requirements will simply be skipped. Errors encountered in matching (e.g. failure of RE to compile correctly, memory errors), will cause Grep to finish prematurely, returning only the items that have been matched up to that point. Checking LastErrCode() after calling Grep will indicate the error code, if any. Example: put PRegEx_Grep([1,"abc","","fo","",["w"],"b",#symb], "\w+", "g") -- ["abc", "fo", ["w"], "b"] Notice how both strings one String Buffer object within the list were successfully matched by Grep. PRegEx_Grep (List, #Filter, [ArgList]) ==> NewList ("Filter mode") Grep produces a new list derived by filtering an existing one. Grep has two modes. This is the second one. It is triggered by supplying a symbol (#Filter) as the second argument. Filters list according to the boolean results returned by the "#Filter" function, which can be your own custom handler or any Lingo built-in function whose results can be interpreted as Boolean (e.g. #symbolP, #stringP, #integerP, #length). Returns a new list whose contents are the elements of List for which, when passed to #Filter with optional additional arguments from ArgList as described above, #Filter returns true. In this "Filter" mode, Grep is similar to Map or ReplaceExec in its recognition of any CallbackAbort/Stop/etc. flags set by the #Filter callback function. Example: put PRegEx_Grep([1,"abc","","fo","",["w"],"b",#symb], #length) -- ["abc", "fo", "b"] Notice how only items for which the Lingo built-in "length" function returned a non-zero number, were selected, so any empty strings also any non-string objects were removed. PRegEx_Map (List, #MapFunction, [ArgList]) ==> MappedList Map takes one list and makes another list where (generally) each item in the new list corresponds to an item in the original list. It uses a #MapFunction to convert an original item into its counterpart in the new list. Calls #MapFunction on each element in List. On each call, first argument to #MapFunction is the element being processed. Subsequent arguments to #MapFunction are derived from the optional ArgList parameter in the manner described earlier. #MapFunction should be prepared to convert its first argument into the desired output value (of any type), using its additional arguments in whatever way needed. MapFunction may use PRegEx_CallbackAbort, Stop, etc. to affect the behavior of PRegEx_Map. Abort: stop and discard any work done so far; delete partially-built result list and return empty list instead. Set LastErrorCode to indicate that an Abort was requested. Stop: stop and return only elements successfully mapped prior to this point; ignore current return value of #MapFunction. Last: keeps this current return value but then stops and successfully returns the list created up to that point. Skip: skips adding a value for the current invocation, but continues to process others. Clever use of "Skip" allows Map to do conversion and filtering (similar to grep's filtering) at the same time -- it can "Skip" items that should not make their way into the new list, while mapping the items that should. PRegEx_Sort (List, DeepCopy, #SortFunction, [ArgList]) ==> NewList Returns a new list consisting of a shallow OR Deep copy of the old list, sorted according to the ordering implied by #SortFunction, which takes as arguments two values (of any type), here dubbed A and B, from the list to be compared, plus optional additional arguments if required. For any pair of items, #SortFunction must return -1 if A is less than B, 0 if A == B, and 1 if A > B. Sort does NOT modify the original list in any way, as Lingo's "sort" function does. Rather, it makes a sorted copy which you may, at your option, choose to use in place of the original. PRegEx_Reverse (List, [DeepCopy, InitList]) ==> Reversed copy Returns a copy (shallow or deep -- default is shallow) of List whose elements are in the reverse order of what they were in List. If InitList is supplied, then reversed list is appended onto it. PRegEx_Join (List, [DelimiterString]) ==> String Returns a string which is a concatenation of all strings in List, with the optional DelimiterString between each pair (it's the opposite of PRegEx_Split -- it rejoins a list of strings into a single string). Delimiter string may be empty, which is the default. Example: put PRegEx_Join(PRegEx_Split(["a,b,c,d,e"], ",", "g"), ":") -- "a:b:c:d:e" PRegEx_Keys (PList, [InitList]) ==> KeyList PRegEx_Values (PList, [InitList]) ==> ValueList Create a list of the keys (properties) or values in PList and either returns them in a new list or appends them to the optional InitList (a regular list), if provided. These functions do NOT attempt to change the sorting behavior of the incoming PList; each returns keys or values in the order that MOA yields them, and, if Keys and Values are called without the list being altered, then the items yielded by each should correspond. If the PList is modified between calls to Keys and Values, then no correspondence is guaranteed, or even likely. To get all the keys and values intermixed together pairwise in a single list, use PRegEx_PListToList, described below. Examples: put PRegEx_Keys ([#a:10,#b:11,#c:12], ["dog", "cow"]) -- ["dog", "cow", #a, #b, #c] put PRegEx_Values([#a:10,#b:11,#c:12], ["dog", "cow"]) -- ["dog", "cow", 10, 11, 12] PRegEx_GetSlice(List, Keys, [InitList]) ==> SliceList Given a List (regular OR PList) and a list of (item numbers / keys), which are said to define a "slice" of the first list, creates a new regular list of values corresponding to those specified by the "slice", and either appends the resulting list of values to optional InitList or returns it as a new List. Examples: put PRegEx_GetSlice([#a ,#b ,#c ], [3, 2]) -- [#c,#b] put PRegEx_GetSlice([#a:10,#b:11,#c:12], [#b,#a]) -- [11,10] PRegEx_SetSlice(List, Keys, Values) ==> List Given a List (regular or PList) and a list of (item numbers or keys), which are said to define a "slice" of the list, plus a third list of values corresponding to the keys, sets the keys/values accordingly in the incoming List, MODIFYING THE LIST. For convenience, also returns the same List/PList that was modified, allowing you to start with a list specified directly in Lingo, including an empty one, if you need. If the incoming List was a PList, SetSlice will mark it "Sorted". Calling SetSlice with an empty PList [:] is a way to convert a list a keys and a corresponding list of values into a an SPList. Calling SetSlice with an existing PList is a way to add all the keys and values from one property list into another. Note that any list positions that are modified by SetSlice will have their existing values REPLACED (like SetAt and SetAProp would do). Examples: put PRegEx_SetSlice([#a:1], [#d, #c, #b], [2, 3, 4]) -- [#a:1, #b:4, #c:3, #d:2] put PRegEx_SetSlice([#a, #b], [2, 4, 3], ["dog", "cat", "cow"]) -- [#a, "dog", "cow", "cat"] PRegEx_PListToList (PList, [InitList]) ==> List PRegEx_PListToListStrings(PList, [InitList]) ==> List "Flattens" PList into a regular list: [key, value, key, value....] PRegEx_PListToListStrings does the same, but converts any keys of type "symbol" into strings before adding them to the new List. Either a new list is created, or items are appended to optional InitList, if provided. Examples: put PRegEx_PListToList([#a: 2, #b: 4]) -- [#a, 2, #b, 4] put PRegEx_PListToList([#a: 2, #b: 4], ["dog", "cat"]) -- ["dog", "cat", #a, 2, #b, 4] put PRegEx_PListToListStrings([#a: 2, #b: 4, 1: 3]) -- ["a", 2, "b", 4, 1, 3] PRegEx_ListToSPList (List, [InitPList]) ==> SPList PRegEx_ListToSPListSym (List, [InitPList]) ==> SPList "Unflattens" List into a sorted PList, taking elements pairwise from List. Any odd key left over at the end gets a void value. PRegEx_ListToSPListSym does the same, but converts any string keys to symbols before adding to the PList. Other types of keys are left unaltered. As with other PRegEx functions that create symbols, the symbol created is subject to Lingo's rules governing symbols. Attempt to create invalid symbols at your own risk: MOA's default behavior will govern. Either a new SP list is created, or items are appended to optional InitPList, if provided. In either case, the resulting list will be marked as "sort"ed. Examples: put PRegEx_ListToPList([#a, 2, #b, 4]) -- [#a: 2, #b: 4]) put PRegEx_ListToPListSym(["a", "dog", "b", "box", #c, 2]) -- [#a: "dog", #b: "box", #c: 2]) General utility functions: -------------------------- The items included here are here because the authors thought they are helpful and they work particularly well with in conjunction with other PRegEx functions. PRegEx_ReadEntireFile (FilePath) ==> StringBufferList ReadEntireFile and WriteEntireFile accept and create StringBufferList (SrchStrL) objects -- that is, a list containing a required string buffer in item 1, and an optional data length field in item 2. ReadEntireFile reads an entire file whose name is specified a MOA-style FilePath and resolved according to Director's documented pathname-resolution algorithm (including obeying the canonical "@:" syntax), and returns a StringBufferList. Conveniently, the StringBufferList may be used as a PRegEx-compatible SrchStrL argument, allowing the file buffer to be immediately searched and/or manipulated by PRegEx's search/replace routines. PRegEx_WriteEntireFile (FilePath, StringBufferList) ==> 1/-Err Takes a StringBufferObj and writes it out to the given file. The FilePath may be relative or absolute, and may use any of the standard Director path name conventions, but it MUST contain at least one directory component. If it does not, a directory not found error will occur. WriteEntireFile does NOT attempt to create directories; only files. On success, returns # of bytes actually written, possibly zero. This will always be the shorter of: 1) the length specifed in the StringBufferObj, if any, and 2) the actual length of the string (according to the Lingo "length" operator), but of course never smaller than zero. On failure, deletes created or partially-(over)written file, if any, and returns a negative error code. So: any negative return value should be interpreted as an error code. Callback-related functions: --------------------------- PRegEx's internal callback mechanism is so flexible that we decided to expose it in this API so Lingo functions can be created that can elegantly make callbacks to other Lingo functions, something that is essentially impossible to do using regular Lingo. PRegEx_CallHandler (#CallbackFunction, [ArgList1, ArgList2]) Calls any function by symbol name. ArgList1 and ArgList2 are both optional. Together they are flattened to produce a single argument list for the callback function. In other words, each ArgList is separately treated this way: If not a list (i.e. any other kind of value, even "void"), the value itself becomes an argument to the #CallbackFunction. If a list, it is shallowly flattened and its elements become arguments to the #CallbackFunction, in the order they appear in the list. Note: if what you really want is to pass the actual list object itself and be sure it does not get flattened, just be sure to put the list you want to pass inside another temporary list, like this: PRegEx_Callback(#MyFunction, [myList1 , myList2]) or this: PRegEx_Callback(#MyFunction, [myList1], [myList2]) -- equivalent ... where [] is the Lingo list-construction operator, of course. Why have two optional arg lists? Because you may wish to use this function when implementing a callback feature in a Lingo handler that you're designing. Just as some of the PRegEx callback-oriented functions do, you might use ArgList1 for the arguments YOU are supplying to the callback function, if any, and pass through ArgList2 for the arguments YOUR CALLER is supplying to the callback function, if any. This is how all the other PRegEx_ functions that take callbacks also behave (they all use CallHandler internally, in fact). You don't have to do it this way, but this is a logical and clean way to implement any routine that offers to make calls to a callback function. Note: You may wish to allow the CallbackFunction to call PRegEx_CallbackAbort etc. to set those flags while running. If you do allow this, then it is your responsibility to check those flags and then to reset them to zero each time after calling PRegEx_CallHandler. Otherwise, those flags may persist and incorrectly affect another routine in your call stack. If there is any chance at all that the callback function will set these, then be sure to re-set them to zero after it returns. PRegEx transparently takes care of saving and restoring settings of the callback control flags in stack frames below yours, so you never have to worry that setting these flags might inadvertently interrupt their use in a lower stack frame, if any. PRegEx_CallbackAbort([bool]) ==> Stop operation and fail with error PRegEx_CallbackStop ([bool]) ==> Stop before this iteration, but succeed PRegEx_CallbackLast ([bool]) ==> Stop after this iteration, but succeed PRegEx_CallbackSkip ([bool]) ==> Skip this iteration, but continue These flags may be set by any callback function that wishes to send a signal to its caller. The caller may either be a built-in PRegEx routine OR, a Lingo-authored routine that called the function using the PRegEx_CallHandler utility routine. These flags should NOT be set by any function that doesn't believe it is currently being called as a callback by some PRegEx function. As an extended example, consider how these may be called from within a ReplFunction to set a flag that tells the ReplaceExec function to end its loop after the next time the ReplFunction returns. Each one would cause ReplaceExec to terminate slightly differently. CallbackLast says that the current replacement should be done, but then it will be the last one (do not keep searching), terminating the replacement successfully (including keeping any replacements up to this point). CallbackStop says to NOT do the current replacement (ignoring the return value of the ReplFunction), and terminate the replacement successfully (including keeping any replacements up to this point). CallbackAbort is the same as ReplaceStop, but "aborts", causing CallbackExec to leave the search string untouched, not set any back refs, and set FoundCount to zero, much as if the very first search had simply not succeeded in the first place. Stopping using CallbackLast or CallbackStop could be useful if replacement should stop once a certain token is reached in the input. Aborting could be useful if there is a memory failure or other serious failure encountered by the callback function and it needs to gracefully abort any further potentially memory-consuming activity. CallbackSkip could be useful if a particular item should be ignored/untouched/omitted/left unchanged, but you want your calling function to continue with whatever loop it is currently processing. Error code constants: --------------------- Each of these "constant" functions returns the corresponding numeric PRegEx error code. This can be helpful if you want to write code that checks for these specific error cases, either with functions that return error codes directly, or for those that merely set PRegEx_LastErrCode. PRegEx_ErrCode_OutOfMemory() PRegEx_ErrCode_SearchStrLMustBeList() PRegEx_ErrCode_SearchStrLMustContainString() PRegEx_ErrCode_SearchStrLLengthArgMustBeInteger() PRegEx_ErrCode_REMustNotBeEmpty() PRegEx_ErrCode_REDidNotCompile() PRegEx_ErrCode_ReplPatMustBeString() PRegEx_ErrCode_CallbackFuncMustBeSymbol() PRegEx_ErrCode_CallbackFuncDidNotReturnString() PRegEx_ErrCode_QuoteMetaNeedsString() PRegEx_ErrCode_TriedToMatchWithoutSearchStrL() PRegEx_ErrCode_TriedToMatchWithoutSearchPattern() PRegEx_ErrCode_TriedToReplaceWithoutMatching() PRegEx_ErrCode_CallbackRequestedAbort() PRegEx_ErrCode_UnexpectedMOAError() PRegEx_ErrCode_UnexpectedInternalError() PRegEx_ErrCode_CallbackFunctionNotFound() PRegEx_ErrCode_ExpectedListArgument() PRegEx_ErrCode_ExpectedPListArgument() PRegEx_ErrCode_GrepNeedsFunctionNameOrPRegEx() PRegEx_ErrCode_ExpectedStringArgument() PRegEx_ErrCode_SortFunctionDidNotReturnInteger() PRegEx_ErrCode_FileNotFound() PRegEx_ErrCode_ErrorOpeningFile() PRegEx_ErrCode_ErrorReadingFile() PRegEx_ErrCode_ErrorWritingFile() Example: put PRegEx_DescribeError(PRegEx_ErrCode_SearchStrLMustBeList()) -- "PRegEx: SearchStrL argument must be a Lingo list." ================================================================== Help! What is a Regular Expression? What's going on here? ================================================================== [ASIDE TO NEWBIES: If you don't already know what regular expressions are and are now burning with desire to use them, then you are facing a pretty steep, but immensely gratifying, learning curve. Hang in there! It's worth the effort to learn!] This is a very brief intro. Don't expect much. Regular Expression = Search String or Pattern That's all there is to it. Longer explanation: A Regular Expression (or RE or pregex or PRegEx or pregexp) is a search specification that can contain special syntax (think: wildcard charcters on drugs) that allows you to perform mindblowingly-complex searches and replaces on any size text string or data buffer. Examples: dog -- matches just these letters (dog)|(cat) -- matches the letters "dog" or "cat" organi[sz]e -- matches US or British spelling of "organize" ^\w{1,8}.\w{1,3}$ -- matches any DOS 8.3-style file name In addition to many dozens of special syntax characters like the ones hinted at above, some special "escape" sequences, triggered by a backslash, are also recognized within the RE pattern: \n == matches a return char (same as Lingo "return" or char(13)) \t == matches a tab char \xHH == matches character HH (hex) \0 == matches the ascii zero (NUL) character \#[##] (\ + 1 or 2 digits) == matches backreference # or ## \### (\ + 3 digits) == character ### in octal \0### (\ + 0 + 3 digits) == character ### in octal \\ == matches a backslash character itself (... PLUS all others mentioned in PCRE documentation, of course) Backreferences, written as \#, such as \1, \2 ... \99, mean "match (or insert when replacing) the parenthesized expression number N in this spot". Backreference example A: "((Chris)|(Ravi)).*?\1" ... finds the name "Chris" or "Ravi" in a string, provided it is also followed again some distance later by the same name again. Backreference example B: "(<(\w+)(.*?)>)(.*?)()" ... Matches most pairs of balanced HTML/XML tags, such as:

....

or ... or Home. In this last example, the backreference substrings would be assigned (and individually retrievable!) as follows: Backreference 1: "" Backreference 2: "A" Backreference 3: " HREF=foo.html" Backreference 4: "Home" Backreference 5: "" Backreferences can be used to extract pieces of data from a string when searching, and, equally importantly, can be used in a Replacement pattern when doing a search/replace, so you can insert part or parts of the matched expression directly into the replaced string. HOW TO LEARN REGULAR EXPRESSION SYNTAX: 1) There are whole BOOKS written about regular expression syntax and its subtleties. We are not going to try to teach you anything more about them in this document. Buy one of those books now, if you are interested. http://amazon.com/. 2) Another good way to get started: ask a friend for help and pointers. (Preferably you'll be asking someone other than Chris or Ravi :-)). 3) The PCRE documentation, included with this Xtra and on the Web, gives a thorough, possibly overly-technical, overview of the precise features of the regular expression language supported by it, and consequently supported by PRegEx. (To get the most out of it: ignore all the deeper technical stuff; just read about the syntax.) On the Web: http://pcre.org/man.html Local copy: pregex-1.XX/pcre-3.4/doc/pcre.html 4) Also, if you have access to perl, be sure to read the "perlre" manual page that comes with every perl distribution. 99% of the syntax documented there applies here. Mainly, the variable interpolation and pre-compilation features do not apply here. 5) Practice, practice, practice. Have a copy of Director open while learning. Try every example in the message window. Try to make a test case for every different feature or behavior your learn about, and test it right then and there. TWO NOTES FOR PERL USERS ONLY Note 1: Surrounding the RE with slashes is NOT NECESSARY. In Perl, the slashes are string delimiters, much like quote marks, and are not part of the search pattern itself. Note 2: $-sign and @-sign interpolation are not normally performed by any of the functions that process the other backslashed escape codes, as those are features of Perl's built-in string interpolation, not features of regular expressions per se. If you need to build up a replacement pattern string out of pieces, just use normal Lingo & and && or other means of concatenation, such as PRegEx_Join. OR, read above about PRegEx_Interpolate, which does all the usual interpolation functions, plus can optionally look up values from a property list and interpolate them into a string, similar to Perl's $-sign interpolation feature. Note that if you plan to search using a RE that has had user-supplied data interpolated into it, you almost certainly need to call QuoteMeta either on the user-supplied parts before they are interpolated, or on the interpolated whole, depending on what you can assume about the data. ========================================= Additional Examples ========================================= Searching and/or Extracting --------------------------- ==> Search for a string set FoundCount = max(PRegEx_Search(foo, "(abc+)", ""), 0) ==> Search a string and then extract backrefs by number if (PRegEx_Search(foo, "(abc+)([,;])", "") > 0) then set ABC = PRegEx_GetMatchString(1) set Punct = PRegEx_GetMatchString(2) end if set FoundCount = PRegEx_FoundCount() ==> Search a string, extracting matching subexpressions into a list or sorted property list set NRs = PRegEx_ExtractIntoList (foo, "Name: (.*?) Rank: (.*?)", "") set NRs = PRegEx_ExtractIntoSPList (foo, "Name: (.*?) Rank: (.*?)", "") set FoundCount = PRegEx_FoundCount() ==> Same, but "globally" -- repeating the search till the end of the string, extracting _all_ backreferences along the way into a lingo list or sorted property list set NRs = PRegEx_ExtractIntoList (foo, "Name: (.*?) Rank: (.*?)", "g") set NRs = PRegEx_ExtractIntoSPList (foo, "Name: (.*?) Rank: (.*?)", "g") set FoundCount = PRegEx_FoundCount() ==> Search "globally", but in a while() loop, being able to execute code upon each match. PRegEx_SearchBegin (foo, "Date: (\S+)", "g") repeat while (PRegEx_SearchContinue () > 0) put PRegEx_GetMatchString(1) end repeat set FoundCount = PRegEx_FoundCount() Searching and Replacing ----------------------- ==> Search and replace with a simple string set FoundCount = max(PRegEx_Replace(foo, "(abc+)", "i", "ABC"), 0) ==> Search and replace with a string with escape codes for back references set FoundCount = max(PRegEx_Replace(foo, "(abc+)", "i", "### \1 ###"), 0) ==> "Global" flag -- i.e. replace one vs. replace all. set FoundCount = max(PRegEx_Replace(foo, "(abc+)", "ig", "ABC"), 0) ==> Search functions also extract backrefers, like matching functions. So you can retrieve an item at the same time you delete or modify it: if (PRegEx_Replace(foo, "(abc+)", "", "") > 0) then set ABC = PRegEx_GetMatchString(1) end if set ItemsReplaced = PRegEx_FoundCount() ==> Search and replace, but a function gets called to perform each replacement on NameCnv nameLookup return("Name:" && nameLookup[PRegEx_GetMatchString(1)] end NameCnv PRegEx_ReplaceExec(foo, "Name: (\S+)", "ig", #NameCnv, [nameLookup]) set ChangeCount = PRegEx_FoundCount(); =============================================== Document History =============================================== 2001-06-04: CPT 1.0. Final release draft. Added Ravware info. 2001-05-31: CPT 1.0. Final draft for review. 2001-05-29: CPT Pre-1.0. Finished re_tr and re_i; documented. 2001-05-28: CPT Pre-1.0. Moved new list utils off of wish list. 2001-05-27: CPT Pre-1.0. Implemented aliases & added to main doc. 2001-05-26: CPT Pre-1.0. Added some more items to wish list section. 2001-05-24: CPT Pre-1.0. Rewrote some prose. Added examples. 2001-05-23: CPT Pre-1.0. Updated to reflect new features.