Java regex match

Java regex match DEFAULT

Java Regex

next →← prev

The Java Regex or Regular Expression is an API to define a pattern for searching or manipulating strings.

It is widely used to define the constraint on strings such as password and email validation. After learning Java regex tutorial, you will be able to test your regular expressions by the Java Regex Tester Tool.

Java Regex API provides 1 interface and 3 classes in java.util.regex package.

java.util.regex package

The Matcher and Pattern classes provide the facility of Java regular expression. The java.util.regex package provides following classes and interfaces for regular expressions.

  1. MatchResult interface
  2. Matcher class
  3. Pattern class
  4. PatternSyntaxException class
Java Regex API

Matcher class

It implements the MatchResult interface. It is a regex engine which is used to perform match operations on a character sequence.

No.MethodDescription
1boolean matches()test whether the regular expression matches the pattern.
2boolean find()finds the next expression that matches the pattern.
3boolean find(int start)finds the next expression that matches the pattern from the given start number.
4String group()returns the matched subsequence.
5int start()returns the starting index of the matched subsequence.
6int end()returns the ending index of the matched subsequence.
7int groupCount()returns the total number of the matched subsequence.

Pattern class

It is the compiled version of a regular expression. It is used to define a pattern for the regex engine.

No.MethodDescription
1static Pattern compile(String regex)compiles the given regex and returns the instance of the Pattern.
2Matcher matcher(CharSequence input)creates a matcher that matches the given input with the pattern.
3static boolean matches(String regex, CharSequence input)It works as the combination of compile and matcher methods. It compiles the regular expression and matches the given input with the pattern.
4String[] split(CharSequence input)splits the given input string around matches of given pattern.
5String pattern()returns the regex pattern.

Example of Java Regular Expressions

There are three ways to write the regex example in Java.

Test it Now

Output


Regular Expression . Example

The . (dot) represents a single character.

Test it Now

Regex Character classes

No.Character ClassDescription
1[abc]a, b, or c (simple class)
2[^abc]Any character except a, b, or c (negation)
3[a-zA-Z]a through z or A through Z, inclusive (range)
4[a-d[m-p]]a through d, or m through p: [a-dm-p] (union)
5[a-z&&[def]]d, e, or f (intersection)
6[a-z&&[^bc]]a through z, except for b and c: [ad-z] (subtraction)
7[a-z&&[^m-p]]a through z, and not m through p: [a-lq-z](subtraction)

Regular Expression Character classes Example

Test it Now

Regex Quantifiers

The quantifiers specify the number of occurrences of a character.

RegexDescription
X?X occurs once or not at all
X+X occurs once or more times
X*X occurs zero or more times
X{n}X occurs n times only
X{n,}X occurs n or more times
X{y,z}X occurs at least y times but less than z times

Regular Expression Character classes and Quantifiers Example

Test it Now

Regex Metacharacters

The regular expression metacharacters work as shortcodes.

RegexDescription
.Any character (may or may not match terminator)
\dAny digits, short of [0-9]
\DAny non-digit, short for [^0-9]
\sAny whitespace character, short for [\t\n\x0B\f\r]
\SAny non-whitespace character, short for [^\s]
\wAny word character, short for [a-zA-Z_0-9]
\WAny non-word character, short for [^\w]
\bA word boundary
\BA non word boundary

Regular Expression Metacharacters Example

Test it Now

Regular Expression Question 1


Test it Now

Regular Expression Question 2

Test it Now

Java Regex Finder Example

Output:

Enter regex pattern: java Enter text: this is java, do you know java I found the text java starting at index 8 and ending at index 12 I found the text java starting at index 26 and ending at index 30

Next TopicJava Exception Handling



← prevnext →



Sours: https://www.javatpoint.com/java-regex
ConstructMatches CharactersxThe character xThe backslash characternThe character with octal value n (0  n  7)nnThe character with octal value nn (0  n  7)mnnThe character with octal value mnn (0  m  3, 0  n  7)hhThe character with hexadecimal value hhhhhhThe character with hexadecimal value hhhh{h...h}The character with hexadecimal value h...h (  <= h...h <=  )The tab character ()The newline (line feed) character ()The carriage-return character ()The form-feed character ()The alert (bell) character ()The escape character ()xThe control character corresponding to x Character classes, , or (simple class)Any character except , , or (negation) through or through , inclusive (range) through , or through : (union), , or (intersection) through , except for and : (subtraction) through , and not through : (subtraction) Predefined character classesAny character (may or may not match line terminators)A digit: A non-digit: A whitespace character: A non-whitespace character: A word character: A non-word character:  POSIX character classes (US-ASCII only)A lower-case alphabetic character: An upper-case alphabetic character:All ASCII:An alphabetic character:A decimal digit: An alphanumeric character:Punctuation: One of A visible character: A printable character: A space or a tab: A control character: A hexadecimal digit: A whitespace character:  java.lang.Character classes (simple java character type)Equivalent to java.lang.Character.isLowerCase()Equivalent to java.lang.Character.isUpperCase()Equivalent to java.lang.Character.isWhitespace()Equivalent to java.lang.Character.isMirrored() Classes for Unicode scripts, blocks, categories and binary properties * A Latin script character (script)A character in the Greek block (block)An uppercase letter (category)An alphabetic character (binary property)A currency symbolAny character except one in the Greek block (negation)Any letter except an uppercase letter (subtraction) Boundary matchersThe beginning of a lineThe end of a lineA word boundaryA non-word boundaryThe beginning of the inputThe end of the previous matchThe end of the input but for the final terminator, if anyThe end of the input Greedy quantifiersXX, once or not at allXX, zero or more timesXX, one or more timesXnX, exactly n timesXnX, at least n timesXnmX, at least n but not more than m times Reluctant quantifiersXX, once or not at allXX, zero or more timesXX, one or more timesXnX, exactly n timesXnX, at least n timesXnmX, at least n but not more than m times Possessive quantifiersXX, once or not at allXX, zero or more timesXX, one or more timesXnX, exactly n timesXnX, at least n timesXnmX, at least n but not more than m times Logical operatorsXYX followed by YXYEither X or YXX, as a capturing group Back referencesnWhatever the nthcapturing group matchedk<name>Whatever the named-capturing group "name" matched QuotationNothing, but quotes the following characterNothing, but quotes all characters until Nothing, but ends quoting started by  Special constructs (named-capturing and non-capturing)XX, as a named-capturing groupXX, as a non-capturing groupNothing, but turns match flags idmsuxU on - offX  X, as a non-capturing group with the given flags idmsux on - offXX, via zero-width positive lookaheadXX, via zero-width negative lookaheadXX, via zero-width positive lookbehindXX, via zero-width negative lookbehindXX, as an independent, non-capturing group
Sours: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
  1. Kohler command 16 carburetor
  2. P0740 honda accord 2001
  3. Mollie miles driver
  4. 11 step rolling ladder

Java Regex - Java Regular Expressions

Java regex is the official Java regular expression API. The term Java regex is an abbreviation of Java regular expression. The Java regex API is located in the package which has been part of standard Java (JSE) since Java 1.4. This Java regex tutorial will explain how to use this API to match regular expressions against text.

Although Java regex has been part of standard Java since Java 1.4, this Java regex tutorial covers the Java regex API released with Java 8.

Regular Expressions

A regular expression is a textual pattern used to search in text. You do so by "matching" the regular expression against the text. The result of matching a regular expression against a text is either:

  • A / specifying if the regular expression matched the text.
  • A set of matches - one match for every occurrence of the regular expression found in the text.

For instance, you could use a regular expression to search an Java String for email addresses, URLs, telephone numbers, dates etc. This would be done by matching different regular expressions against the String. The result of matching each regular expression against the String would be a set of matches - one set of matches for each regular expression (each regular expression may match more than one time).

I will show you some examples of how to match regular expressions against text with the Java regex API further down this page. But first I will introduce the core classes of the Java regex API in the following section.

Java Regex Core Classes

The Java regex API consists of two core classes. These are:

The class is used to create patterns (regular expressions). A pattern is precompiled regular expression in object form (as a instance), capable of matching itself against a text.

The class is used to match a given regular expression ( instance) against a text multiple times. In other words, to look for multiple occurrences of the regular expression in the text. The will tell you where in the text (character index) it found the occurrences. You can obtain a instance from a instance.

Both the and classes are covered in detail in their own texts. See links above, or in the top left of every page in this Java regex tutorial trail.

Java Regular Expression Example

As mentioned above the Java regex API can either tell you if a regular expression matches a certain String, or return all the matches of that regular expression in the String. The following sections will show you examples of both of these ways to use the Java regex API.

Pattern Example

Here is a simple java regex example that uses a regular expression to check if a text contains the substring :

String text = "This is the text to be searched " + "for occurrences of the http:// pattern."; String regex = ".*http://.*"; boolean matches = Pattern.matches(regex, text); System.out.println("matches = " + matches);

The variable contains the text to be checked with the regular expression.

The variable contains the regular expression as a . The regular expression matches all texts which contains one or more characters () followed by the text followed by one or more characters ().

The third line uses the static method to check if the regular expression (pattern) matches the text. If the regular expression matches the text, then returns true. If the regular expression does not match the text returns false.

The example does not actually check if the found string is part of a valid URL, with domain name and suffix (.com, .net etc.). The regular expression just checks for an occurrence of the string .

Matcher Example

Here is another Java regex example which uses the class to locate multiple occurrences of the substring "is" inside a text:

String text = "This is the text which is to be searched " + "for occurrences of the word 'is'."; String regex = "is"; Pattern pattern = Pattern.compile(regex); Matcher matcher = pattern.matcher(text); int count = 0; while(matcher.find()) { count++; System.out.println("found: " + count + " : " + matcher.start() + " - " + matcher.end()); }

From the instance a instance is obtained. Via this instance the example finds all occurrences of the regular expression in the text.

Java Regular Expression Syntax

A key aspect of regular expressions is the regular expression syntax. Java is not the only programming language that has support for regular expressions. Most modern programming languages supports regular expressions. The syntax used in each language define regular expressions is not exactly the same, though. Therefore you will need to learn the syntax used by your programming language.

In the following sections of this Java regex tutorial I will give you examples of the Java regular expression syntax, to get you started with the Java regex API and regular expressions in general. The regular expression syntax used by the Java regex API is covered in detail in the text about the Java regular expression syntax

Matching Characters

The first thing to look at is how to write a regular expression that matches characters against a given text. For instance, the regular expression defined here:

String regex = "http://";

will match all strings that are exactly the same as the regular expression. There can be no characters before or after the - or the regular expression will not match the text. For instance, the above regex will match this text:

String text1 = "http://";

But not this text:

String text2 = "The URL is: http://mydomain.com";

The second string contains characters both before and after the that is matched against.

Metacharacters

Metacharacters are characters in a regular expression that are interpreted to have special meanings. These metacharacters are:

CharacterDescription
<
>
(
)
[
]
{
}
\
^
-
=
$
!
|
?
*
+
.

What exactly these metacharacters mean will be explained further down this Java Regex tutorial. Just keep in mind that if you include e.g. a "." (fullstop) in a regular expression it will not match a fullstop character, but match something else which is defined by that metacharacter (also explained later).

Escaping Characters

As mentioned above, metacharacters in Java regular expressions have a special meaning. If you really want to match these characters in their literal form, and not their metacharacter meaning, you must "escape" the metacharacer you want to match. To escape a metacharacter you use the Java regular expression escape character - the backslash character. Escaping a character means preceding it with the backslash character. For instance, like this:

\.

In this example the character is preceded (escaped) by the character. When escaped the fullstop character will actually match a fullstop character in the input text. The special metacharacter meaning of an escaped metacharacter is ignored - only its actual literal value (e.g. a fullstop) is used.

Java regular expression syntax uses the backslash character as escape character, just like Java Strings do. This gives a little challenge when writing a regular expression in a Java string. Look at this regular expression example:

String regex = "\\.";

Notice that the regular expression String contains two backslashes after each other, and then a . The reason is, that first the Java compiler interprets the two characters as an escaped Java String character. After the Java compiler is done, only one is left, as means the character . The string thus looks like this:

\.

Now the Java regular expression interpreter kicks in, and interprets the remaining backslash as an escape character. The following character is now interpreted to mean an actual full stop, not to have the special regular expression meaning it otherwise has. The remaining regular expression thus matches for the full stop character and nothing more.

Several characters have a special meaning in the Java regular expression syntax. If you want to match for that explicit character and not use it with its special meaning, you need to escape it with the backslash character first. For instance, to match for the full stop character, you need to write:

String regex = "\\.";

To match for the backslash character itself, you need to write:

String regex = "\\\\";

Getting the escaping of characters right in regular expressions can be tricky. For advanced regular expressions you might have to play around with it a while before you get it right.

Matching Any Character

So far we have only seen how to match specific characters like "h", "t", "p" etc. However, you can also just match any character without regard to what character it is. The Java regular expression syntax lets you do that using the character (period / full stop). Here is an example regular expression that matches any character:

String regex = ".";

This regular expression matches a single character, no matter what character it is.

The character can be combined with other characters to create more advanced regular expressions. Here is an example:

String regex = "H.llo";

This regular expression will match any Java string that contains the characters "H" followed by any character, followed by the characters "llo". Thus, this regular expression will match all of the strings "Hello", "Hallo", "Hullo", "Hxllo" etc.

Matching Any of a Set of Characters

Java regular expressions support matching any of a specified set of characters using what is referred to as character classes. Here is a character class example:

String regex = "H[ae]llo";

The character class (set of characters to match) is enclosed in the square brackets - the part of the regular expression, in other words. The square brackets are not matched - only the characters inside them.

The character class will match one of the enclosed characters regardless of which, but no mor than one. Thus, the regular expression above will match any of the two strings "Hallo" or "Hello", but no other strings. Only an "a" or an "e" is allowed between the "H" and the "llo".

You can match a range of characters by specifying the first and the last character in the range with a dash in between. For instance, the character class will match all characters between a lowercase and a lowercase , both and included.

You can have more than one character range within a character class. For instance, the character class will match all letters between and or between and .

You can also use ranges for digits. For instance, the character class will match the characters between 0 and 9, both included.

If you want to actually match one of the square brackets in a text, you will need to escape them. Here is how escaping the square brackets look:

String regex = "H\\[llo";

The is the escaped square left bracket. This regular expression will match the string "H[llo".

If you want to match the square brackets inside a character class, here is how that looks:

String regex = "H[\\[\\]]llo";

The character class is this part: . The character class contains the two square brackets escaped ( and ).

This regular expression will match the strings "H[llo" and "H]llo".

Matching a Range of Characters

The Java regex API allows you to specify a range of characters to match. Specifying a range of characters is easier than explicitly specifying each character to match. For instance, you can match the characters a to z like this:

String regex = "[a-z]";

This regular expression will match any single character from a to z in the alphabet.

The character classes are case sensitive. To match all characters from a to z regardless of case, you must include both uppercase and lowercase character ranges. Here is how that looks:

String regex = "[a-zA-Z]";

Matching Digits

You can match digits of a number with the predefined character class with the code . The digit character class corresponds to the character class .

Since the character is also an escape character in Java, you need two backslashes in the Java string to get a in the regular expression. Here is how such a regular expression string looks:

String regex = "Hi\\d";

This regular expression will match strings starting with "Hi" followed by a digit ( to ). Thus, it will match the string "Hi5" but not the string "Hip".

Matching Non-digits

Matching non-digits can be done with the predefined character class (uppercase D). Here is an regular expression containing the non-digit character class:

String regex = "Hi\\D";

This regular expression will match any string which starts with "Hi" followed by one character which is not a digit.

Matching Word Characters

You can match word characters with the predefined character class with the code . The word character class corresponds to the character class .

String regex = "Hi\\w";

This regular expression will match any string that starts with "Hi" followed by a single word character.

Matching Non-word Characters

You can match non-word characters with the predefined character class (uppercase W). Since the character is also an escape character in Java, you need two backslashes in the Java string to get a in the regular expression. Here is how such a regular expression string looks:

Here is a regular expression example using the non-word character class:

String regex = "Hi\\W";

Boundaries

The Java Regex API can also match boundaries in a string. A boundary could be the beginning of a string, the end of a string, the beginning of a word etc. The Java Regex API supports the following boundaries:

The end of the input
SymbolDescription
^The beginning of a line.
$The end of a line.
\bA word boundary (where a word starts or ends, e.g. space, tab etc.).
\BA non-word boundary.
\AThe beginning of the input.
\GThe end of the previous match.
\ZThe end of the input but for the final terminator (if any).
\z

Some of these boundary matchers are explained below.

Beginning of Line (or String)

The boundary matcher matches the beginning of a line according to the Java API specification. However, in practice it seems to only be matching the beginning of a String. For instance, the following example only gets a single match at index 0:

String text = "Line 1\nLine2\nLine3"; Pattern pattern = Pattern.compile("^"); Matcher matcher = pattern.matcher(text); while(matcher.find()){ System.out.println("Found match at: " + matcher.start() + " to " + matcher.end()); }

Even if the input string contains several line breaks, the character only matches the beginning of the input string, not the beginning of each line (after each line break).

The beginning of line / string matcher is often used in combination with other characters, to check if a string begins with a certain substring. For instance, this example checks if the input string starts with the substring :

String text = "http://jenkov.com"; Pattern pattern = Pattern.compile("^http://"); Matcher matcher = pattern.matcher(text); while(matcher.find()){ System.out.println("Found match at: " + matcher.start() + " to " + matcher.end()); }

This example finds a single match of the substring from index 0 to index 7 in the input stream. Even if the input string had contained more instances of the substring they would not have been matched by this regular expression, since the regular expression started with the character.

End of Line (or String)

The boundary matcher matches the end of the line according to the Java specification. In practice, however, it looks like it only matches the end of the input string.

The beginning of line (or string) matcher is often used in combination with other characters, most commonly to check if a string ends with a certain substring. Here is an example of the end of line / string matcher:

String text = "http://jenkov.com"; Pattern pattern = Pattern.compile(".com$"); Matcher matcher = pattern.matcher(text); while(matcher.find()){ System.out.println("Found match at: " + matcher.start() + " to " + matcher.end()); }

This example will find a single match at the end of the input string.

Word Boundaries

The boundary matcher matches a word boundary, meaning a location in an input string where a word either starts or ends.

Here is a Java regex word boundary example:

String text = "Mary had a little lamb"; Pattern pattern = Pattern.compile("\\b"); Matcher matcher = pattern.matcher(text); while(matcher.find()){ System.out.println("Found match at: " + matcher.start() + " to " + matcher.end()); }

This example matches all word boundaries found in the input string. Notice how the word boundary matcher is written as - with two (backslash) characters. The reason for this is explained in the section about escaping characters. The Java compiler uses as an escape character, and thus requires two backslash characters after each other in order to insert a single backslash character into the string.

The output of running this example would be:

Found match at: 0 to 0 Found match at: 4 to 4 Found match at: 5 to 5 Found match at: 8 to 8 Found match at: 9 to 9 Found match at: 10 to 10 Found match at: 11 to 11 Found match at: 17 to 17 Found match at: 18 to 18 Found match at: 22 to 22

The output lists all the locations where a word either starts or ends in the input string. As you can see, the indices of word beginnings point to the first character of the word, whereas endings of a word points to the first character after the word.

You can combine the word boundary matcher with other characters to search for words beginning with specific characters. Here is an example:

String text = "Mary had a little lamb"; Pattern pattern = Pattern.compile("\\bl"); Matcher matcher = pattern.matcher(text); while(matcher.find()){ System.out.println("Found match at: " + matcher.start() + " to " + matcher.end()); }

This example will find all the locations where a word starts with the letter (lowercase). In fact it will also find the ends of these matches, meaning the last character of the pattern, which is the lowercase letter.

Non-word Boundaries

The boundary matcher matches non-word boundaries. A non-word boundary is a boundary between two characters which are both part of the same word. In other words, the character combination is not word-to-non-word character sequence (which is a word boundary). Here is a simple Java regex non-word boundary matcher example:

String text = "Mary had a little lamb"; Pattern pattern = Pattern.compile("\\B"); Matcher matcher = pattern.matcher(text); while(matcher.find()){ System.out.println("Found match at: " + matcher.start() + " to " + matcher.end()); }

This example will give the following output:

Found match at: 1 to 1 Found match at: 2 to 2 Found match at: 3 to 3 Found match at: 6 to 6 Found match at: 7 to 7 Found match at: 12 to 12 Found match at: 13 to 13 Found match at: 14 to 14 Found match at: 15 to 15 Found match at: 16 to 16 Found match at: 19 to 19 Found match at: 20 to 20 Found match at: 21 to 21

Notice how these match indexes corresponds to boundaries between characters within the same word.

Quantifiers

Quantifiers can be used to match characters more than once. There are several types of quantifiers which are listed in the Java Regex Syntax. I will introduce some of the most commonly used quantifiers here.

The first two quantifiers are the and characters. You put one of these characters after the character you want to match multiple times. Here is a regular expression with a quantifier:

String regex = "Hello*";

This regular expression matches strings with the text "Hell" followed by zero or more characters. Thus, the regular expression will match "Hell", "Hello", "Helloo" etc.

If the quantifier had been the character instead of the character, the string would have had to end with 1 or more characters.

If you want to match any of the two quantifier characters you will need to escape them. Here is an example of escaping the quantifier:

String regex = "Hell\\+";

This regular expression will match the string "Hell+";

You can also match an exact number of a specific character using the quantifier, where is the number of characters you want to match. Here is an example:

String regex = "Hello{2}";

This regular expression will match the string "Helloo" (with two characters in the end).

You can set an upper and a lower bound on the number of characters you want to match, like this:

String regex = "Hello{2,4}";

This regular expression will match the strings "Helloo", "Hellooo" and "Helloooo". In other words, the string "Hell" with 2, 3 or 4 characters in the end.

Logical Operators

The Java Regex API supports a set of logical operators which can be used to combine multiple subpatterns within a single regular expression. The Java Regex API supports two logical operators: The and operator and the or operator.

The and operator is implicit. If two characters (or other subpatterns) follow each other in a regular expression, that means that both the first and the second subpattern much match the target string. Here is an example of a regular expression that uses an implicit and operator:

String text = "Cindarella and Sleeping Beauty sat in a tree"; Pattern pattern = Pattern.compile("[Cc][Ii].*"); Matcher matcher = pattern.matcher(text); System.out.println("matcher.matches() = " + matcher.matches());

Notice the 3 subpatterns , and

Since there are no characters between these subpatterns in the regular expression, there is implicitly an and operator in between them. This means, that the target string must match all 3 subpatterns in the given order to match the regular expression as a whole. As you can see from the string, the expression matches the string. The string should start with either an uppercase or lowercase , followed by an uppercase or lowercase and then zero or more characters. The string meets these criteria.

The or operator is explicit and is represented by the pipe character . Here is an example of a regular expression that contains two subexpression with the logical or operator in between:

String text = "Cindarella and Sleeping Beauty sat in a tree"; Pattern pattern = Pattern.compile(".*Ariel.*|.*Sleeping Beauty.*"); Matcher matcher = pattern.matcher(text); System.out.println("matcher.matches() = " + matcher.matches());

As you can see, the pattern will match either the subpattern or the subpattern somewhere in the target string. Since the target string contains the text , the regular expression matches the target string.

Java String Regex Methods

The Java String class has a few regular expression methods too. I will cover some of those here:

matches()

The Java String method takes a regular expression as parameter, and returns if the regular expression matches the string, and if not.

Here is a example:

String text = "one two three two one"; boolean matches = text.matches(".*two.*");

split()

The Java String method splits the string into N substrings and returns a String array with these substrings. The method takes a regular expression as parameter and splits the string at all positions in the string where the regular expression matches a part of the string. The regular expression is not returned as part of the returned substrings.

Here is a example:

String text = "one two three two one"; String[] twos = text.split("two");

This example will return the three strings "one", " three" and " one".

replaceFirst()

The Java String method returns a new String with the first match of the regular expression passed as first parameter with the string value of the second parameter.

Here is a example:

String text = "one two three two one"; String s = text.replaceFirst("two", "five");

This example will return the string "one five three two one".

replaceAll()

The Java String method returns a new String with all matches of the regular expression passed as first parameter with the string value of the second parameter.

Here is a example:

String text = "one two three two one"; String t = text.replaceAll("two", "five");

This example will return the string "one five three five one".

Next: Java Regex - Pattern

Sours: http://tutorials.jenkov.com/java-regex/index.html

Java - Regular Expressions



Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn.

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.

The java.util.regex package primarily consists of the following three classes −

  • Pattern Class − A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile() methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.

  • Matcher Class − A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher() method on a Pattern object.

  • PatternSyntaxException − A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.

Capturing Groups

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

Capturing groups are numbered by counting their opening parentheses from the left to the right. In the expression ((A)(B(C))), for example, there are four such groups −

  • ((A)(B(C)))
  • (A)
  • (B(C))
  • (C)

To find out how many groups are present in the expression, call the groupCount method on a matcher object. The groupCount method returns an int showing the number of capturing groups present in the matcher's pattern.

There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupCount.

Example

Following example illustrates how to find a digit string from the given alphanumeric string −

Live Demo

import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { public static void main( String args[] ) { // String to be scanned to find the pattern. String line = "This order was placed for QT3000! OK?"; String pattern = "(.*)(\\d+)(.*)"; // Create a Pattern object Pattern r = Pattern.compile(pattern); // Now create matcher object. Matcher m = r.matcher(line); if (m.find( )) { System.out.println("Found value: " + m.group(0) ); System.out.println("Found value: " + m.group(1) ); System.out.println("Found value: " + m.group(2) ); }else { System.out.println("NO MATCH"); } } }

This will produce the following result −

Output

Found value: This order was placed for QT3000! OK? Found value: This order was placed for QT300 Found value: 0

Regular Expression Syntax

Here is the table listing down all the regular expression metacharacter syntax available in Java −

SubexpressionMatches
^Matches the beginning of the line.
$Matches the end of the line.
.Matches any single character except newline. Using m option allows it to match the newline as well.
[...]Matches any single character in brackets.
[^...]Matches any single character not in brackets.
\ABeginning of the entire string.
\zEnd of the entire string.
\ZEnd of the entire string except allowable final line terminator.
re*Matches 0 or more occurrences of the preceding expression.
re+Matches 1 or more of the previous thing.
re?Matches 0 or 1 occurrence of the preceding expression.
re{ n}Matches exactly n number of occurrences of the preceding expression.
re{ n,}Matches n or more occurrences of the preceding expression.
re{ n, m}Matches at least n and at most m occurrences of the preceding expression.
a| bMatches either a or b.
(re)Groups regular expressions and remembers the matched text.
(?: re)Groups regular expressions without remembering the matched text.
(?> re)Matches the independent pattern without backtracking.
\wMatches the word characters.
\WMatches the nonword characters.
\sMatches the whitespace. Equivalent to [\t\n\r\f].
\SMatches the nonwhitespace.
\dMatches the digits. Equivalent to [0-9].
\DMatches the nondigits.
\AMatches the beginning of the string.
\ZMatches the end of the string. If a newline exists, it matches just before newline.
\zMatches the end of the string.
\GMatches the point where the last match finished.
\nBack-reference to capture group number "n".
\bMatches the word boundaries when outside the brackets. Matches the backspace (0x08) when inside the brackets.
\BMatches the nonword boundaries.
\n, \t, etc.Matches newlines, carriage returns, tabs, etc.
\QEscape (quote) all characters up to \E.
\EEnds quoting begun with \Q.

Methods of the Matcher Class

Here is a list of useful instance methods −

Index Methods

Index methods provide useful index values that show precisely where the match was found in the input string −

Sr.No.Method & Description
1

public int start()

Returns the start index of the previous match.

2

public int start(int group)

Returns the start index of the subsequence captured by the given group during the previous match operation.

3

public int end()

Returns the offset after the last character matched.

4

public int end(int group)

Returns the offset after the last character of the subsequence captured by the given group during the previous match operation.

Study Methods

Study methods review the input string and return a Boolean indicating whether or not the pattern is found −

Sr.No.Method & Description
1

public boolean lookingAt()

Attempts to match the input sequence, starting at the beginning of the region, against the pattern.

2

public boolean find()

Attempts to find the next subsequence of the input sequence that matches the pattern.

3

public boolean find(int start)

Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.

4

public boolean matches()

Attempts to match the entire region against the pattern.

Replacement Methods

Replacement methods are useful methods for replacing text in an input string −

Sr.No.Method & Description
1

public Matcher appendReplacement(StringBuffer sb, String replacement)

Implements a non-terminal append-and-replace step.

2

public StringBuffer appendTail(StringBuffer sb)

Implements a terminal append-and-replace step.

3

public String replaceAll(String replacement)

Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.

4

public String replaceFirst(String replacement)

Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string.

5

public static String quoteReplacement(String s)

Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class.

The start and end Methods

Following is the example that counts the number of times the word "cat" appears in the input string −

Example

Live Demo

import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { private static final String REGEX = "\\bcat\\b"; private static final String INPUT = "cat cat cat cattie cat"; public static void main( String args[] ) { Pattern p = Pattern.compile(REGEX); Matcher m = p.matcher(INPUT); // get a matcher object int count = 0; while(m.find()) { count++; System.out.println("Match number "+count); System.out.println("start(): "+m.start()); System.out.println("end(): "+m.end()); } } }

This will produce the following result −

Output

Match number 1 start(): 0 end(): 3 Match number 2 start(): 4 end(): 7 Match number 3 start(): 8 end(): 11 Match number 4 start(): 19 end(): 22

You can see that this example uses word boundaries to ensure that the letters "c" "a" "t" are not merely a substring in a longer word. It also gives some useful information about where in the input string the match has occurred.

The start method returns the start index of the subsequence captured by the given group during the previous match operation, and the end returns the index of the last character matched, plus one.

The matches and lookingAt Methods

The matches and lookingAt methods both attempt to match an input sequence against a pattern. The difference, however, is that matches requires the entire input sequence to be matched, while lookingAt does not.

Both methods always start at the beginning of the input string. Here is the example explaining the functionality −

Example

Live Demo

import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { private static final String REGEX = "foo"; private static final String INPUT = "fooooooooooooooooo"; private static Pattern pattern; private static Matcher matcher; public static void main( String args[] ) { pattern = Pattern.compile(REGEX); matcher = pattern.matcher(INPUT); System.out.println("Current REGEX is: "+REGEX); System.out.println("Current INPUT is: "+INPUT); System.out.println("lookingAt(): "+matcher.lookingAt()); System.out.println("matches(): "+matcher.matches()); } }

This will produce the following result −

Output

Current REGEX is: foo Current INPUT is: fooooooooooooooooo lookingAt(): true matches(): false

The replaceFirst and replaceAll Methods

The replaceFirst and replaceAll methods replace the text that matches a given regular expression. As their names indicate, replaceFirst replaces the first occurrence, and replaceAll replaces all occurrences.

Here is the example explaining the functionality −

Example

Live Demo

import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { private static String REGEX = "dog"; private static String INPUT = "The dog says meow. " + "All dogs say meow."; private static String REPLACE = "cat"; public static void main(String[] args) { Pattern p = Pattern.compile(REGEX); // get a matcher object Matcher m = p.matcher(INPUT); INPUT = m.replaceAll(REPLACE); System.out.println(INPUT); } }

This will produce the following result −

Output

The cat says meow. All cats say meow.

The appendReplacement and appendTail Methods

The Matcher class also provides appendReplacement and appendTail methods for text replacement.

Here is the example explaining the functionality −

Example

Live Demo

import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { private static String REGEX = "a*b"; private static String INPUT = "aabfooaabfooabfoob"; private static String REPLACE = "-"; public static void main(String[] args) { Pattern p = Pattern.compile(REGEX); // get a matcher object Matcher m = p.matcher(INPUT); StringBuffer sb = new StringBuffer(); while(m.find()) { m.appendReplacement(sb, REPLACE); } m.appendTail(sb); System.out.println(sb.toString()); } }

This will produce the following result −

Output

-foo-foo-foo-

PatternSyntaxException Class Methods

A PatternSyntaxException is an unchecked exception that indicates a syntax error in a regular expression pattern. The PatternSyntaxException class provides the following methods to help you determine what went wrong −

Sr.No.Method & Description
1

public String getDescription()

Retrieves the description of the error.

2

public int getIndex()

Retrieves the error index.

3

public String getPattern()

Retrieves the erroneous regular expression pattern.

4

public String getMessage()

Returns a multi-line string containing the description of the syntax error and its index, the erroneous regular expression pattern, and a visual indication of the error index within the pattern.

Sours: https://www.tutorialspoint.com/java/java_regular_expressions.htm

Regex match java

Java Regex - Matcher

The Java class () is used to search through a text for multiple occurrences of a regular expression. You can also use a to search for the same regular expression in different texts.

The Java class has a lot of useful methods. I will cover the core methods of the Java class in this tutorial. For a full list, see the official JavaDoc for the class.

Java Matcher Example

Here is a quick Java example so you can get an idea of how the class works:

import java.util.regex.Pattern; import java.util.regex.Matcher; public class MatcherExample { public static void main(String[] args) { String text = "This is the text to be searched " + "for occurrences of the http:// pattern."; String patternString = ".*http://.*"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(text); boolean matches = matcher.matches(); } }

First a instance is created from a regular expression, and from the instance a instance is created. Then the method is called on the instance. The returns if the regular expression matches the text, and if not.

You can do a whole lot more with the class. The rest is covered throughout the rest of this tutorial. The class is covered separately in my Java Regex Pattern tutorial.

Creating a Matcher

Creating a is done via the method in the class. Here is an example:

import java.util.regex.Pattern; import java.util.regex.Matcher; public class CreateMatcherExample { public static void main(String[] args) { String text = "This is the text to be searched " + "for occurrences of the http:// pattern."; String patternString = ".*http://.*"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(text); } }

At the end of this example the variable will contain a instance which can be used to match the regular expression used to create it against different text input.

matches()

The method in the class matches the regular expression against the whole text passed to the method, when the was created. Here is a example:

String patternString = ".*http://.*"; Pattern pattern = Pattern.compile(patternString); boolean matches = matcher.matches();

If the regular expression matches the whole text, then the method returns true. If not, the method returns false.

You cannot use the method to search for multiple occurrences of a regular expression in a text. For that, you need to use the , and methods.

lookingAt()

The method works like the method with one major difference. The method only matches the regular expression against the beginning of the text, whereas matches the regular expression against the whole text. In other words, if the regular expression matches the beginning of a text but not the whole text, will return true, whereas will return false.

Here is a example:

import java.util.regex.Pattern; import java.util.regex.Matcher; public class CreateMatcherExample { public static void main(String[] args) { String text = "This is the text to be searched " + "for occurrences of the http:// pattern."; String patternString = "This is the"; Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(text); System.out.println("lookingAt = " + matcher.lookingAt()); System.out.println("matches = " + matcher.matches()); } }

This example matches the regular expression against both the beginning of the text, and against the whole text. Matching the regular expression against the beginning of the text () will return true.

Matching the regular expression against the whole text () will return false, because the text has more characters than the regular expression. The regular expression says that the text must match the text exactly, with no extra characters before or after the expression.

find() + start() + end()

The method searches for occurrences of the regular expressions in the text passed to the method, when the was created. If multiple matches can be found in the text, the method will find the first, and then for each subsequent call to it will move to the next match.

The methods and will give the indexes into the text where the found match starts and ends. Actually returns the index of the character just after the end of the matching section. Thus, you can use the return values of and inside a call.

Here is a Java , and example:

import java.util.regex.Pattern; import java.util.regex.Matcher; public class MatcherFindStartEndExample { public static void main(String[] args) { String text = "This is the text which is to be searched " + "for occurrences of the word 'is'."; String patternString = "is"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(text); int count = 0; while(matcher.find()) { count++; System.out.println("found: " + count + " : " + matcher.start() + " - " + matcher.end()); } } }

This example will find the pattern "is" four times in the searched string. The output printed will be this:

found: 1 : 2 - 4 found: 2 : 5 - 7 found: 3 : 23 - 25 found: 4 : 70 - 72

reset()

The method resets the matching state internally in the . In case you have started matching occurrences in a string via the method, the will internally keep a state about how far it has searched through the input text. By calling the matching will start from the beginning of the text again.

There is also a method. This method resets the , and makes the search through the passed as parameter, instead of the the was originally created with.

group()

Imagine you are searching through a text for URL's, and you would like to extract the found URL's out of the text. Of course you could do this with the and methods, but it is easier to do so with the group functions.

Groups are marked with parentheses in the regular expression. For instance:

(John)

This regular expression matches the text . The parentheses are not part of the text that is matched. The parentheses mark a group. When a match is found in a text, you can get access to the part of the regular expression inside the group.

You access a group using the method. A regular expression can have more than one group. Each group is thus marked with a separate set of parentheses. To get access to the text that matched the subpart of the expression in a specific group, pass the number of the group to the method.

The group with number 0 is always the whole regular expression. To get access to a group marked by parentheses you should start with group numbers 1.

Here is a example:

import java.util.regex.Pattern; import java.util.regex.Matcher; public class MatcherGroupExample { public static void main(String[] args) { String text = "John writes about this, and John writes about that," + " and John writes about everything. " ; String patternString1 = "(John)"; Pattern pattern = Pattern.compile(patternString1); Matcher matcher = pattern.matcher(text); while(matcher.find()) { System.out.println("found: " + matcher.group(1)); } } }

This example searches the text for occurrences of the word . For each match found, group number 1 is extracted, which is what matched the group marked with parentheses. The output of the example is:

found: John found: John found: John

Multiple Groups

As mentioned earlier, a regular expression can have multiple groups. Here is a regular expression illustrating that:

(John) (.+?)

This expression matches the text followed by a space, and then one or more characters. You cannot see it in the example above, but there is a space after the last group too.

This expression contains a few characters with special meanings in a regular expression. The . means "any character". The + means "one or more times", and relates to the . (any character, one or more times). The ? means "match as small a number of characters as possible".

Here is a full code example:

import java.util.regex.Pattern; import java.util.regex.Matcher; public class MatcherGroupExample { public static void main(String[] args) { String text = "John writes about this, and John Doe writes about that," + " and John Wayne writes about everything." ; String patternString1 = "(John) (.+?) "; Pattern pattern = Pattern.compile(patternString1); Matcher matcher = pattern.matcher(text); while(matcher.find()) { System.out.println("found: " + matcher.group(1) + " " + matcher.group(2)); } } }

Notice the reference to the two groups, marked in bold. The characters matched by those groups are printed to . Here is what the example prints out:

found: John writes found: John Doe found: John Wayne

Groups Inside Groups

It is possible to have groups inside groups in a regular expression. Here is an example:

((John) (.+?))

Notice how the two groups from the examples earlier are now nested inside a larger group. (again, you cannot see the space at the end of the expression, but it is there).

When groups are nested inside each other, they are numbered based on when the left paranthesis of the group is met. Thus, group 1 is the big group. Group 2 is the group with the expression inside. Group 3 is the group with the expression inside. This is important to know when you need to reference the groups via the method.

Here is an example that uses the above nested groups:

import java.util.regex.Pattern; import java.util.regex.Matcher; public class MatcherGroupsExample { public static void main(String[] args) { String text = "John writes about this, and John Doe writes about that," + " and John Wayne writes about everything." ; String patternString1 = "((John) (.+?)) "; Pattern pattern = Pattern.compile(patternString1); Matcher matcher = pattern.matcher(text); while(matcher.find()) { System.out.println("found: <" + matcher.group(1) + "> <" + matcher.group(2) + "> <" + matcher.group(3) + ">"); } } }

Here is the output from the above example:

found: <John writes> <John> <writes> found: <John Doe> <John> <Doe> found: <John Wayne> <John> <Wayne>

Notice how the value matched by the first group (the outer group) contains the values matched by both of the inner groups.

replaceAll() + replaceFirst()

The and methods can be used to replace parts of the string the is searching through. The method replaces all matches of the regular expression. The only replaces the first match.

Before any matching is carried out, the is reset, so that matching starts from the beginning of the input text.

Here are two examples:

import java.util.regex.Pattern; import java.util.regex.Matcher; public class MatcherReplaceExample { public static void main(String[] args) { String text = "John writes about this, and John Doe writes about that," + " and John Wayne writes about everything." ; String patternString1 = "((John) (.+?)) "; Pattern pattern = Pattern.compile(patternString1); Matcher matcher = pattern.matcher(text); String replaceAll = matcher.replaceAll("Joe Blocks "); System.out.println("replaceAll = " + replaceAll); String replaceFirst = matcher.replaceFirst("Joe Blocks "); System.out.println("replaceFirst = " + replaceFirst); } }

And here is what the example outputs:

replaceAll = Joe Blocks about this, and Joe Blocks writes about that, and Joe Blocks writes about everything. replaceFirst = Joe Blocks about this, and John Doe writes about that, and John Wayne writes about everything.

The line breaks and indendation of the following line is not really part of the output. I added them to make the output easier to read.

Notice how the first string printed has all occurrences of with a word after, replaced with the string . The second string only has the first occurrence replaced.

appendReplacement() + appendTail()

The and methods are used to replace string tokens in an input text, and append the resulting string to a .

When you have found a match using the method, you can call the . Doing so results in the characters from the input text being appended to the , and the matched text being replaced. Only the characters starting from then end of the last match, and until just before the matched characters are copied.

The method keeps track of what has been copied into the , so you can continue searching for matches using until no more matches are found in the input text.

Once the last match has been found, a part of the input text will still not have been copied into the . This is the characters from the end of the last match and until the end of the input text. By calling you can append these last characters to the too.

Here is an example:

import java.util.regex.Pattern; import java.util.regex.Matcher; public class MatcherReplaceExample { public static void main(String[] args) { String text = "John writes about this, and John Doe writes about that," + " and John Wayne writes about everything." ; String patternString1 = "((John) (.+?)) "; Pattern pattern = Pattern.compile(patternString1); Matcher matcher = pattern.matcher(text); StringBuffer stringBuffer = new StringBuffer(); while(matcher.find()){ matcher.appendReplacement(stringBuffer, "Joe Blocks "); System.out.println(stringBuffer.toString()); } matcher.appendTail(stringBuffer); System.out.println(stringBuffer.toString()); } }

Notice how is called inside the loop, and is called just after the loop.

The output from this example is:

Joe Blocks Joe Blocks about this, and Joe Blocks Joe Blocks about this, and Joe Blocks writes about that, and Joe Blocks Joe Blocks about this, and Joe Blocks writes about that, and Joe Blocks writes about everything.

The line break in the last line is inserted by me, to make the text more readable. In the real output there would be no line break.

As you can see, the is built up by characters and replacements from the input text, one match at a time.

Next: Java Regex - Regular Expression Syntax

Sours: http://tutorials.jenkov.com/java-regex/matcher.html
Tutorial Java 7 SE Avanzado - 22 Pattern Regex

Java Regular Expressions

❮ PreviousNext ❯


What is a Regular Expression?

A regular expression is a sequence of characters that forms a search pattern. When you search for data in a text, you can use this search pattern to describe what you are searching for.

A regular expression can be a single character, or a more complicated pattern.

Regular expressions can be used to perform all types of text search and text replace operations.

Java does not have a built-in Regular Expression class, but we can import the package to work with regular expressions. The package includes the following classes:

  • Class - Defines a pattern (to be used in a search)
  • Class - Used to search for the pattern
  • Class - Indicates syntax error in a regular expression pattern

Example

Find out if there are any occurrences of the word "w3schools" in a sentence:

Try it Yourself »

Example Explained

In this example, The word "w3schools" is being searched for in a sentence.

First, the pattern is created using the method. The first parameter indicates which pattern is being searched for and the second parameter has a flag to indicates that the search should be case-insensitive. The second parameter is optional.

The method is used to search for the pattern in a string. It returns a Matcher object which contains information about the search that was performed.

The method returns true if the pattern was found in the string and false if it was not found.


Flags

Flags in the method change how the search is performed. Here are a few of them:

  • - The case of letters will be ignored when performing a search.
  • - Special characters in the pattern will not have any special meaning and will be treated as ordinary characters when performing a search.
  • - Use it together with the flag to also ignore the case of letters outside of the English alphabet

Regular Expression Patterns

The first parameter of the method is the pattern. It describes what is being searched for.

Brackets are used to find a range of characters:

ExpressionDescription
[abc]Find one character from the options between the brackets
[^abc]Find one character NOT between the brackets
[0-9]Find one character from the range 0 to 9

Metacharacters

Metacharacters are characters with a special meaning:

MetacharacterDescription
|Find a match for any one of the patterns separated by | as in: cat|dog|fish
.Find just one instance of any character
^Finds a match as the beginning of a string as in: ^Hello
$Finds a match at the end of the string as in: World$
\dFind a digit
\sFind a whitespace character
\bFind a match at the beginning of a word like this: \bWORD, or at the end of a word like this: WORD\b
\uxxxxFind the Unicode character specified by the hexadecimal number xxxx

Quantifiers

Quantifiers define quantities:

QuantifierDescription
n+Matches any string that contains at least one n
n*Matches any string that contains zero or more occurrences of n
n?Matches any string that contains zero or one occurrences of n
n{x}Matches any string that contains a sequence of Xn's
n{x,y}Matches any string that contains a sequence of X to Y n's
n{x,}Matches any string that contains a sequence of at least X n's

Note: If your expression needs to search for one of the special characters you can use a backslash ( \ ) to escape them. In Java, backslashes in strings need to be escaped themselves, so two backslashes are needed to escape special characters. For example, to search for one or more question marks you can use the following expression: "\\?"



❮ PreviousNext ❯


Sours: https://www.w3schools.com/java/java_regex.asp

Now discussing:

Would you like to see your beloved Julia half-naked, that is, half-naked. Well, say, no bra. He nodded in agreement, as if not understanding the comic nature of the situation: the girl, his own sister, caressing his ready-to-fight weapon, invites him to admire.

His half-naked girlfriend.



432 433 434 435 436