|
|||||||
Extended Regular Expression library
Copyright (c) 2002 Martin DeMello
Martin DeMello martindemello@yahoo.com
Basic usage is (?.var=pattern) to capture a named subexpression and \{var} to refer to it.
The pattern is captured in the returned extended matchdata object, which behaves just like an ordinary matchdata, but supports accessors of the form matchdata['var'] and matchdata.var
create a new xregexp:
a = rx("(?.num=\\d+):(?.alpha=[\\w']+)")
match it against a string
b = a.match("120:abc")
and the variables can be accessed in the old numeric style
puts b[1]
or by name
puts b['num'] puts b.alpha
regex-like =~, ~ and last_match
a = rx("id #(?:ident=\\d{4})")
b = (a =~ "Ford Prefect, id #1234")
c = a.last_match
p b, c.ident
Named variables are useful when you want to build up a regex from several subexpressions and not have to bother about the positions of the $n variables in the final regex
day = "(0?[1-9]|[12][0-9]|[3][012])"
month = "(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)"
year = '(19|20)\d{2}'
date1 = "(?.month=#{month}) (?.day=#{day}), (?.year=#{year})"
date2 = "(?.day=#{day})th (?.month=#{month}), (?.year=#{year})"
date = rx("(#{date1}|#{date2})")
a = date.match("Copyright (c) Martin DeMello, Mar 14, 1999")
Named variables must be uniquely matched, or an exception is thrown (I'm working on other methods of resolving this - suggestions welcomed)
dates = rx("#{date1} #{date2}") # ==> raises a conflict exception
twiplet = rx('^(?.tlw=...).*?\{tlw}','i')
['ionisation', 'bedaubed', 'zyzzyva', 'ingoing', 'twiplet'].each {|i|
puts "matched #{i}" if twiplet.match(i)
}
| Required files |
| Classes and Modules |
| Included modules |