xregex.rb
Path: lib/xregex.rb
Modified: Sun Dec 01 20:14:18 Arabian Standard Time 2002

Extended Regular Expression library

Copyright (c) 2002 Martin DeMello

Martin DeMello martindemello@yahoo.com

Usage

Basic usage is (?.var=pattern) to capture a named subexpression and \{var} to refer to it.

The pattern is captured in the returned extended matchdata object, which behaves just like an ordinary matchdata, but supports accessors of the form matchdata['var'] and matchdata.var

Example

Named Subexpressions

create a new xregexp:

 a = rx("(?.num=\\d+):(?.alpha=[\\w']+)")

match it against a string

 b = a.match("120:abc")

and the variables can be accessed in the old numeric style

 puts b[1]

or by name

 puts b['num']
 puts b.alpha

regex-like =~, ~ and last_match

 a = rx("id #(?:ident=\\d{4})")
 b = (a =~ "Ford Prefect, id #1234")
 c = a.last_match
 p b, c.ident

Inlining Subexpressions

Named variables are useful when you want to build up a regex from several subexpressions and not have to bother about the positions of the $n variables in the final regex

 day = "(0?[1-9]|[12][0-9]|[3][012])"
 month = "(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)"
 year = '(19|20)\d{2}'
 date1 = "(?.month=#{month}) (?.day=#{day}), (?.year=#{year})"
 date2 = "(?.day=#{day})th (?.month=#{month}), (?.year=#{year})"
 date = rx("(#{date1}|#{date2})")
 a = date.match("Copyright (c) Martin DeMello, Mar 14, 1999")

Named variables must be uniquely matched, or an exception is thrown (I'm working on other methods of resolving this - suggestions welcomed)

 dates = rx("#{date1} #{date2}") # ==> raises a conflict exception

Referencing variables within the regexp

 twiplet = rx('^(?.tlw=...).*?\{tlw}','i')
 ['ionisation', 'bedaubed', 'zyzzyva', 'ingoing', 'twiplet'].each {|i|
         puts "matched #{i}" if twiplet.match(i)
 }
Required files
tools.rb   
Classes and Modules
Module XRegex
  ::Class XRegex::XMatchData
  ::Class XRegex::XRegex
Included modules
XRegex