weighted-regexparray libraryRelaxed dependency on array library from < 0.4 to < 0.5 to support building with array-0.4 shipped with GHC 7.4.1.
BangPatterns language extensionAdded the BangPatterns language extension to the cabal file because without it the generated parser fails to build using GHC 7.2.
Added new module Text.RegExp.Internal that exposes internal data types and matching functions. Users probably don’t want to use it unless they implement their own matching functions.
Moved build dependencies for QuickCheck and Criterion test programs under a conditional so they are only pulled in if one actually compiles these programs using the flags -fQuickCheck or -fCriterion. (Thank you Brent!)
Currently, GHC can SPECIALIZE functions only where they are defined. The types Leftmost, Longest, and LeftLong are now defined in separate modules to bring them into the scope of the matching functions. Specialization makes the matching functions almost three times faster for the types mentioned above.
This workaround allows to specialize the matching functions for types defined in this package. Users, however, must use the matching functions unspecialized for their own types.
Along with this change, the constructors of the matching types are no longer exported.
The functions fullMatch and partialMatch now both have the type
Weight a b w => RegExp a -> [b] -> w
whereas previously the signatures have been:
fullMatch :: Semiring w => RegExp c -> [c] -> w
partialMatch :: Weight c (Int,c) w => RegExp c -> [c] -> w
The change allows users to provide custom symbol weights in full matchings and to do partial matchings with arbitrary symbols weights instead of having to use only characters and their positions.
This generalization leads to a slight performance penalty in small examples but has a negligible effect when matching large inputs.
accept to acceptFull, added acceptPartialBased on the more general partialMatch function, the function acceptPartial was added for the Bool semiring. The accept function has been appropriately renamed.
The lazy definition of arithmetic operations for the Numeric semiring has been dropped in favour of the more efficient standard implementation. As a consequence, matchingCount no longer works with infinite regular expressions.
The generalization of the matching functions leads to a memory leak that can be avoided by specializing them for concrete semirings. Corresponding pragmas have been added for Bool and for Numeric types but not for the more complex semirings defined in the extra matching modules. It is unclear what is the best way to specialize them too because the pragma must be placed in the module where the matching functions are defined but, there, not all semirings are in scope. See GHC ticket 4227.
In the group of partial matchings, the benchmark for Bool accidentally used full matching. It now uses partial matching which, unsurprisingly, is slower.
noMatchText.RegExp now provides a combinator
noMatch :: RegExp c
which is an identity of alt. With this combinator, regular expressions form a semiring with
zero = noMatch
one = eps
(.+.) = alt
(.*.) = seq_
A corresponding Semiring instance is not defined due to the lack of an appropriate Eq instance.
permText.RegExp now provides a combinator
perm :: [RegExp c] -> RegExp c
that matches the given regular expressions in sequence. Each expression must be matched exactly once but in arbitrary order. For example, the regular expression
perm (map char "abc")
is equivalent to abc|acb|bca|bac|cba|cab and represented as a(bc|cb)|b(ca|ac)|c(ba|ab).