Krugle Regular Expression Syntax in Solr Queries

Selected regular expression capabilities are supported in Krugle. The regular expression engine is not Perl-compatible but supports a range of useful operators. Regular expressions need to be enclosed within two slash characters.


Automatic anchoring

Anchoring is automatic with Krugle search, beginning with the first character of the term and ending with the last character of the same term. As a result, the input pattern used with regular expression search must match the entire matching term. For the term abcde:

ab.*     # match
abcd     # no match
Allowed characters

Any Unicode characters may be used in the pattern, but certain characters are reserved and must be escaped. The Krugle regular expression reserved characters are:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /

Any reserved character can be escaped with a backslash, e.g. \* to indicate that * is NOT a regular expression "zero or more" pattern specifier. This includes the backslash character itself, via \\

Additionally, any characters (except double quotes) are interpreted literally when surrounded by double quotes:

john"@regex-demo.com"
Match any character

The period . can be used to represent any single character. For the term abcde:

ab...   # match
a.c.e   # match
One-or-more

The plus sign + can be used to repeat the preceding shortest pattern one or more times. For the term aaabbb:

a+b+        # match
aa+bb+      # match
a+.+        # match
aa+bbb+     # match
Zero-or-more

The asterisk * can be used to match the preceding pattern zero-or-more times. For the term aaabbb":

a*b*        # match
a*b*c*      # match
.*bbb.*     # match
aaa*bbb*    # match
Min-to-max

Braces { } can be used to specify a minimum and (optionally) a maximum number of times the preceding shortest pattern can repeat. The allowed forms are:

{5}     # repeat exactly 5 times
{2,5}   # repeat at least twice and at most 5 times
{2,}    # repeat at least twice

For the term aaabbb:

a{3}b{3}        # match
a{2,4}b{2,4}    # match
a{2,}b{2,}      # match
.{3}.{3}        # match
a{4}b{4}        # no match
a{4,6}b{4,6}    # no match
a{4,}b{4,}      # no match
Logical OR

The pipe symbol | acts as an OR operator. The match will succeed if the pattern on either the left-hand side OR the right-hand side matches. The alternation applies to the longest pattern, not the shortest. For the term aabb:

aabb|bbaa   # match
aacc|bb     # no match
aa(cc|bb)   # match
a+|b+       # no match
a+b+|b+a+   # match
a+(b|c)+    # match
Alphanumeric character ranges

Ranges of potential characters may be represented as character classes by enclosing them in square brackets [ ]. A leading ^ negates the character class. The allowed forms are:

[abc]    # 'a' or 'b' or 'c'
[a-c]    # 'a' or 'b' or 'c'
[-abc]   # '-' or 'a' or 'b' or 'c'
[abc\-]  # '-' or 'a' or 'b' or 'c'
[^abc]   # any character except 'a' or 'b' or 'c'
[^a-c]   # any character except 'a' or 'b' or 'c'
[^-abc]  # any character except '-' or 'a' or 'b' or 'c'
[^abc\-] # any character except '-' or 'a' or 'b' or 'c'

Note that the dash - indicates a range of characters, unless it is the first character or if it is escaped with a backslash.

For the term abcd:

ab[cd]+     # match
[a-d]+      # match
[^a-d]+     # no match