Java Regex Explained!


 

We use Pattern.matches() to check the regex:



DOT

.            (dot) single character

List<String> list = new ArrayList<>();
list.add("a");
list.add("!");
list.add("he");
list.add("~");
list.forEach(x ->
System.out.println
(Pattern.matches(".", x) + " -> " + x));
true -> a
true -> !
false -> he
true -> ~


CHARACTER CLASSES

The examples below means one character string without regex quantifiers (will see in next subject)


[abc]a or b or c
[^abc]any character except a and b and c
[a-z]all English letters (lowercase)
[A-Z]all English letters (uppercase)
[a-zA-Z]all English letters (lowercase and uppercase included)
[a-dm-p]a through d or m through p
[a-dM-P]a through d (lowercase) or M through P (uppercase)
[a-z&&[^bc]]a through z except b and c (same as: [ad-z])
[a-z&&[^m-p]]                 a through z and not m through p (same as: [a-lq-z])

System.out.println("a is " + Pattern.matches("[abc]", "a") + " for regex [abc]");
System.out.println("ab is " + Pattern.matches("[abc]", "ab") + " for regex [abc]");
System.out.println("a is " + Pattern.matches("[^abc]", "a") + " for regex [^abc]");
System.out.println("d is " + Pattern.matches("[^abc]", "d") + " for regex [^abc]");
System.out.println("G is " + Pattern.matches("[A-Z]", "G") + " for regex [A-Z]");
System.out.println("g is " + Pattern.matches("[A-Z]", "g") + " for regex [A-Z]");
System.out.println("c is " + Pattern.matches("[a-dM-P]", "c") + " for regex [a-dM-P]");
System.out.println("K is " + Pattern.matches("[a-dM-P]", "K") + " for regex [a-dM-P]");
System.out.println("n is " + Pattern.matches("[a-z&&[^m-p]]", "n") + " for regex [a-z&&[^m-p]]");
System.out.println("f is " + Pattern.matches("[a-z&&[^m-p]]", "f") + " for regex [a-z&&[^m-p]]");
a is      true       for regex      [abc]
ab is false for regex [abc]                            // 2 characters, must be 1
a is false for regex [^abc]
d is true for regex [^abc]
G is true for regex [A-Z]
g is false for regex [A-Z]                            // g is lowercase
c is true for regex [a-dM-P]
K is false for regex [a-dM-P]                      // K is not between M and P or a and d
n is false for regex [a-z&&[^m-p]]            // m is between m and p
f is true for regex [a-z&&[^m-p]]


REGEX QUANTIFIERS

In the examples below, we use X which means specifically "X" string. We can also use character classes instead of X for these examples.


X?X occurs once or not at all
X+X occurs once or more times
X*X occurs zero or more times
X{n}X occurs n times only
X{n,}X occurs n or more times
X{y,z}          X occurs at least y times but less than z times

System.out.println("X is " + Pattern.matches("X+", "X") + " for regex X+");
System.out.println("XXXX is " + Pattern.matches("X+", "XXXX") + " for regex X+");
System.out.println(" is " + Pattern.matches("X+", "") + " for regex X+");
System.out.println("X is " + Pattern.matches("X*", "X") + " for regex X*");
System.out.println("XXXX is " + Pattern.matches("X*", "XXXX") + " for regex X*");
System.out.println(" is " + Pattern.matches("X*", "") + " for regex X*");
System.out.println("X is " + Pattern.matches("X{2,}", "X") + " for regex X{2,}");
System.out.println("XXXX is " + Pattern.matches("X{2,}", "XXXX") + " for regex X{2,}");
System.out.println("XX is " + Pattern.matches("X{2,}", "XX") + " for regex X{2,}");
X is true for regex X+
XXXX is  true     for regex    X+
(null) is false for regex X+                // + quantifier must have at least 1 one X
X is true for regex X*
XXXX is true for regex X*
(null) is true for regex X*
X is false for regex X{2,}            // less than 2
XXXX is true for regex X{2,}
XX is true for regex X{2,}


CHARACTER CLASSES WITH REGEX QUANTIFIERS


After character classes, type regex quantifiers and define the length of regex.

[!_.%&']+These characters: ! _ . % & ', one or more
[0-9]*Numbers only, zero or more
[a-z&&[^kmn]]{3,}          a through d except k and m and n, minimum 3 characters
[A-Z0-9]{8,16} A through Z and numbers, minimum 8, maximum 16 characters

System.out.println("(null) is " + Pattern.matches("[!_.%&']+", "") + " for regex [!_.%&']+");
System.out.println("!% is " + Pattern.matches("[!_.%&']+", "!%") + " for regex [!_.%&']+");
System.out.println("2_ is " + Pattern.matches("[!_.%&']+", "2_") + " for regex [!_.%&']+");
System.out.println("(null) is " + Pattern.matches("[0-9]*", "") + " for regex [0-9]*");
System.out.println("1984 is " + Pattern.matches("[0-9]*", "1984") + " for regex [0-9]*");
System.out.println("RC45 is " + Pattern.matches("[0-9]*", "RC45") + " for regex [0-9]*");
System.out.println("abxyz is " + Pattern.matches("[a-z&&[^kmn]]{3,}", "abxyz") + " for regex [a-z&&[^kmn]]{3,}");
System.out.println("ab is " + Pattern.matches("[a-z&&[^kmn]]{3,}", "ab") + " for regex [a-z&&[^kmn]]{3,}");
System.out.println("abk is " + Pattern.matches("[a-z&&[^kmn]]{3,}", "abk") + " for regex [a-z&&[^kmn]]{3,}");
System.out.println("FEVER105 is " + Pattern.matches("[A-Z0-9]{8,16}", "FEVER105") + " for regex [A-Z0-9]{8,16}");
System.out.println("Fever105 is " + Pattern.matches("[A-Z0-9]{8,16}", "Fever105") + " for regex [A-Z0-9]{8,16}");
System.out.println("FE105 is " + Pattern.matches("[A-Z0-9]{8,16}", "FE105") + " for regex [A-Z0-9]{8,16}");

(null) is false      for regex      [!_.%&']+                    // + quantifier must have at least 1
!% is true for regex [!_.%&']+
2_ is false for regex [!_.%&']+                    // there is 2
(null) is true for regex [0-9]*
1984 is true for regex [0-9]*
RC45 is false for regex [0-9]*                            // there is non digit R C
abxyz is true for regex [a-z&&[^kmn]]{3,}
ab is false for regex [a-z&&[^kmn]]{3,}    // less than 3
abk is false for regex [a-z&&[^kmn]]{3,}    // there is k
FEVER105 is    true for regex [A-Z0-9]{8,16}            
Fever105 is false for regex [A-Z0-9]{8,16}            // there is lowercases
FE105 is false for regex [A-Z0-9]{8,16}            // less than 8


REGEX METACHARACTERS

Consider them like character group aliases.

.Any character
\dAny digits (same as: [0-9])
\DAny non digits (same as: [^0-9])
\sAny whitespace character (same as: [\t\n\x0B\f\r])
\SAny non whitespace character
\wAny word character (same as: [a-zA-Z_0-9])
\WAny non word character
\bAny word boundary
\BAny non word boundary

PS: \w includes underscore, so [\w&&[^_]] means word characters without underscore. We use it below.
System.out.println("19216845124 is " + Pattern.matches("[\\d]{11}", "19216845124") + " for regex [\\d]{11}");
System.out.println("05325320532 is " + Pattern.matches("[\\d]{10}", "05325320532") + " for regex [\\d]{10}");
System.out.println("Atif Imal is " + Pattern.matches("[\\w]{2,}", "Atif Imal") + " for regex [\\d]{10}");
System.out.println("Atif Imal is " + Pattern.matches("[\\w\\s]{2,}", "Atif Imal") + " for regex [\\d]{10}");

19216845124 is       true for regex        [\d]{11}
05325320532 is false       for regex [\d]{10}            // must be 10 character
Atif Imal is false for regex [\w]{2,}            // there is whitespace
Atif Imal is true for regex [\w\s]{2,}


CHAINING CHARACTER CLASSES

We are about to type an e-mail regex:


For username, it will be characters
(word, digits, dot and underscore,        
min. 3 character)
[\w\d.]{3,}[a-zA-Z0-9._]{3,}
After username, it will have @
@[@]
Then domain name (word, digits,
3-25 characters (for example))
[\w\d&&[^_]]{3,25}        [a-zA-Z0-9]{3,25}
Then dot\.[.]
Then domain extension (word,
can't be zero length and
min. 1 character (for example))
[\w&&[^_]]+[a-zA-Z]+

The result is... Just queue them.

[\w\d.]{3,}@[\w\d&&[^_]]{3,25}\.[\w&&[^_]]+

Or use the alternates, (you can use them mixed)

[a-zA-Z0-9._]{3,}[@][a-zA-Z0-9]{3,25}[.][a-zA-Z]+


List<String> list19 = new ArrayList<>();
list19.add(".testing@test.tes");
list19.add("tes.ting@test.tes");
list19.add("testing@test.tes");
list19.add("testing.@test.tes");
list19.add("testing@.test.tes");
list19.add("testing@test..tes");
list19.add("testing@test.");
list19.add("testing@test.t");
list19.add("testing@t.tes");
list19.add("@test.tes");
list19.add(".@test.tes");
list19.add("@.");
list19.add("t@t.t");
list19.add("ttttttttttttttttt@test.tes");
list19.add("tt@test.tes");
list19.add("ttt@test.tes");
list19.forEach(x ->
System.out.println
(Pattern.matches("^[\\w\\d.]{3,}@[\\w\\d&&[^_]]{3,25}\\.[\\w&&[^_]]+$", x) + " -> " + x));
true ->       .testing@test.tes
true -> tes.ting@test.tes
true -> testing@test.tes
true -> testing.@test.tes
false -> testing@.test.tes
false     -> testing@test..tes
false -> testing@test.
true -> testing@test.t
false -> testing@t.tes
false -> @test.tes
false -> .@test.tes
false -> @.
false -> t@t.t
true -> ttttttttttttttttt@test.tes
false -> tt@test.tes
true -> ttt@test.tes

So, we have some troubles. We don't want username start or end with dot or underscore.
We need to do ...

To be continued. 

Comments