Get Start End Offset of Named Group in JDK7


The Problem
We want to know the start and end offset of named group, but Matcher start(), end() in JDK 7 doesn't accept group name as its parameter.

JDK7 adds the support of Named Group:
(1) (?<NAME>X) to define a named group NAME".
(2) \\k<Name> to backref a named group "NAME"                   
(3) <$<NAME> to reference to captured group in matcher's replacement str 

We can use matcher.group(String NAME) to return the captured input subsequence by the given "named group", but its start(), end() in matcher doesn't accept group name as its parameter.

The Solution
Check the JDK code, look at how mathcer.group(String name) is implemented:
public String group(String name) {
    if (name == null)
        throw new NullPointerException("Null group name");
    if (first < 0)
        throw new IllegalStateException("No match found");
    if (!parentPattern.namedGroups().containsKey(name))
        throw new IllegalArgumentException("No group with name <" + name + ">");
    int group = parentPattern.namedGroups().get(name);
    if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
        return null;
    return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
}
It uses int group = parentPattern.namedGroups().get(name) to get the group position of the named group. Check the pattern code: its namedGroups is not public: it's package visible only.
Map<String, Integer> namedGroups() {
    if (namedGroups == null)
        namedGroups = new HashMap<>(2);
    return namedGroups;
}
We can't call it directly, but we can use Java reflection to call this package visible method.

public void testGetNamedGrpupPositionInJDK7() throws Exception {
  Pattern pattern = Pattern.compile("((?<capture>abc).*d)(ef)");
  Integer groupPos = getNamedGrpupPositionInJDK7(pattern, "capture");
  if (groupPos == null) {
    System.out
        .println("Doesn't contain named group: capture, the pattern: "
            + pattern.toString());
  }
  Matcher matcher = pattern.matcher("abcxxdef");
  while (matcher.find()) {
    String matchedText = matcher.group("capture");
    matchedText = matcher.group(groupPos);
    System.out.println(matchedText + " " + matcher.start(groupPos)
        + ":" + matcher.end(groupPos));
  }
}

@SuppressWarnings("unchecked")
// don't use int, it would throw NPE if the regex doesn't contain the named
// group
private Integer getNamedGrpupPositionInJDK7(Pattern pattern,
    String namedGroup) throws NoSuchMethodException,
    IllegalAccessException, InvocationTargetException {
  Method namedGroupsMethod = Pattern.class.getDeclaredMethod(
      "namedGroups", null);
  namedGroupsMethod.setAccessible(true);

  Map<String, Integer> namedGroups = (Map<String, Integer>) namedGroupsMethod
      .invoke(pattern, null);
  return namedGroups.get(namedGroup);
}
Get Start End Offset of Named Group in JDK8
JDK8 realized this problem and added APIs: start(String groupName), end(String groupName) to get start and end offset of named group.
public void testGetNamedGrpupPositionInJDK8() throws Exception {
  Pattern pattern = Pattern.compile("((?<capture>abc).*d)(ef)");
  Matcher matcher = pattern.matcher("abcxxdef");
  while (matcher.find()) {
    // if the regex doesn't contain the named group, it would throw
    // IllegalArgumentException: No group with name <capture>
    System.out.println(matcher.group("capture") + " "
        + matcher.start("capture") + ":" + matcher.end("capture"));
  }
}
References
Named Capturing Group in JDK7 RegEx

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)