Thursday, September 19, 2013

Regexp regular expression matching in ABAP

 

Hi Folks,
looking around while trying to search an abap pattern in the whole abap repository with the interesting "ABAP_SOURCE_CODE_SCAN" report i realized that SAP has built-in regexp pattern matching features into ABAP.

I had been waiting long time for such a feature.

An overview of the underlying abap syntax can be found looking at the demo source code and is quite simple:

          IF nocase = 'X'.
            FIND REGEX regex IN TABLE result_it IGNORING CASE
                 SUBMATCHES sub1 sub2 sub3 sub4 sub5 sub6.

            IF first = 'X'.
              REPLACE REGEX regex IN TABLE result_it
                      WITH new_marked IGNORING CASE.

            ELSE. " all = 'X'
              REPLACE ALL OCCURRENCES OF REGEX regex IN TABLE result_it
                      WITH new_marked IGNORING CASE.

            ENDIF.
          ELSE. " case = 'X'
            FIND REGEX regex IN TABLE result_it
                 SUBMATCHES sub1 sub2 sub3 sub4 sub5 sub6.

            IF first = 'X'.
              REPLACE REGEX regex IN TABLE result_it
                      WITH new_marked.
            ELSE. " all = 'X'
              REPLACE ALL OCCURRENCES OF REGEX regex IN TABLE result_it
                      WITH new_marked.

            ENDIF.

Here text lines in abap are represente as a "table of string data elements" which is quite common.

If you are interested in regexp there is an interesting tutorial here at
regexp tutorial

The syntax should be posix compliant. i tried some simple stuff and it works just fine.

SAP has released i nice toy test program acually called
DEMO_REGEX_TOY

Here a screen shot



So what does \b(\w+)\s+\1\b mean?

\b is the beginning of the word
\w+ means 1 or more alphanumeric chars
\s+ means 1 or more whitespace chars
\1 means the first subpattern matched
\b means end of the word in that position.

So basically you ask the regexp engine to match
a beginning word made of 1 or more alpha chars wollowed by one or more whitespace chars followed by the first matched subpattern (the (\w+)  ) followed by an ending word.
Funny isn't it?
But quite useful and powerful!