- 1. Regex course – part one. Basic concepts.
- 2. Regex course – part two. Writing more elegant and precise patterns.
- 3. Regex course – part three. Grouping and using ES6 features.
- 4. Regex course – part four. Avoiding catastrophic backtracking using lookahead
We covered quite a few features of regex so far. There is a lot more, though. Today we will deal with more advanced concepts, like groping and cover more of the RegExp object features in JavaScript. We will also learn how to use some of the features that ES6 brought us. Let’s go!
exec
It is a method that executes a search for a match in a string – similar to the test method – but returns a result array (or null). Its result has additional properties, like index and input
1 2 3 4 5 6 7 8 9 10 |
const string = 'fileName.png, fileName2.png, fileName3.png'; const regexp = /fileName[0-9]?.png/g; regexp.exec(string); [ 0: "fileName.png", index: 0, input: "fileName.png, fileName2.png, fileName3.png" ] |
The index is the position of a match, and input is the provided string. Please note, that I am using a global flag here, that is mentioned in the first part of the course. Thanks to that, we can look for more than one match in our string, by calling exec multiple times. It will set the lastIndex property of the RegExp object to a number indicating the place where the searching stopped.
1 2 3 4 5 6 7 8 |
let resultArray; while((resultArray = regexp.exec(string)) !== null) { console.log(resultArray[0], regexp.lastIndex); } // fileName.png 12 // fileName2.png 27 // fileName3.png 42 |
Grouping in regex
With regular expressions, we can not only check the string for matches but also extract certain information while ignoring unnecessary characters. To do this, we will use grouping with round brackets.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
function getDateFromString(dateString) { const regexp = /([0-9]{2})-([0-9]{2})-([0-9]{4})/; const result = regexp.exec(dateString); if(result) { return { day: result[1], month: result[2], year: result[3] } } } getDateFromString('14-05-2018'); |
1 2 3 4 5 |
{ day: '14', month: '05', year: '2018' } |
In this case, we extracted three groups of characters and ignored the dashes. Just note that result[0] will be the full string of characters matched.
There is a named groups proposition that is in stage 4 already and proves to be helpful in use-cases such as the one above. It was nicely described in the article on the 2ality blog by Axel Rauschmayer.
Nested groups
You can actually nest groups:
1 2 3 4 5 6 7 8 9 10 11 12 |
function getYearFromString(dateString) { const regexp = /[0-9]{2}-[0-9]{2}-([0-9]{2}([0-9]{2}))/; const result = regexp.exec(dateString); if(result) { return { year: result[1], yearShort: result[2] } } } getYearFromString('14-05-2018'); |
1 2 3 4 |
{ year: '2018', yearShort: '18' } |
Here, in the part ([0-9]{2}([0-9]{2})) of our pattern, we nest one group in the other. Thanks to that, we get both long and short string for the year.
Conditional patterns
There is another useful feature, which is the OR statement. We can use it with the pipe character:
1 2 3 4 5 6 7 8 9 10 11 |
function doYearsMatch(firstDateString, secondDateString) { const execResult = /[0-9]{2}-[0-9]{2}-([0-9]{4})/.exec(firstDateString); if(execResult) { const year = execResult[1]; const yearShort = year.substr(2,4); return RegExp(`[0-9]{2}-[0-9]{2}-(${year}|${yearShort})`).test(secondDateString); } } doYearsMatch('14-05-2018', '12-02-2018'); // true doYearsMatch('14-05-2018', '24-04-18'); // true |
In our pattern, (${year}|${yearShort}) will cause the years to match even if the second one is provided in a short form.
Capture all
While working with groups, there is a particular one that might come in handy: (.*)
1 2 3 4 5 6 7 8 9 |
function getResolution(resolutionString) { const execResult = /(.*) ?x ?(.*)/.exec(resolutionString); if(execResult) { return { width: execResult[1], height: execResult[2] } } } |
1 2 3 4 5 6 |
getResolution('1024x768'); { width: '1024', height: '768' } |
Thanks to using the ? operator, it will work also if there are additional spaces:
1 2 3 4 5 6 |
getResolution('1920 x 1080'); { width: '1920', height: '1080' } |
Sticky flag
As you’ve already seen, RegExp object has a property called lastIndex. It is used when the search is global (with the use of appropriate flag) for the pattern matching to be continued in the right place. With the sticky flag, y , introduced in ES6, we can force the search start at a certain index.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
function getDateFromString(dateString) { const regexp = /([0-9]{2})-([0-9]{2})-([0-9]{4})/y; regexp.lastIndex = 14; const result = regexp.exec(dateString); if(result){ return { day: result[1], month: result[2], year: result[3] } } } getDateFromString('Current date: 14-05-2018'); |
Remember that performing a check on a string (for example with exec) changes the lastIndex property, so if you would like it to stay the same between multiple sticky searches, don’t forget to set it. If the pattern matching fails, lastIndex is set to 0.
It is a good time to note that you can check if the RegExp object has flags enabled.
1 2 3 |
const regexp = /([0-9]{2})-([0-9]{2})-([0-9]{4})/y; regexp.lastIndex = 14; console.log(regexp.sticky); // true |
Same goes for other flags: for more, visit MDN web docs.
Unicode flag
ES6 brought a better support for Unicode, too. Adding a Unicode flag, u , enables additional features related to Unicode. Thanks to it, you can use \u{x} in your patterns, where x is the code of the desired character.
1 |
/\u{24}/u.test('$'); // true |
It won’t work without u flag. It is important to know, that it impacts more than just that, though. It is possible to use some more exotic Unicode characters without the flag:
1 |
/😹/.test('😹'); // true |
but it will fail us in more advanced cases:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
/a.b/.test('a😹b'); // false /a.b/u.test('a😹b'); // true /😹{2}/.test('😹😹'); // false /😹{2}/u.test('😹😹'); // true /[a😹b]/.test('😹'); // false /[😹🐶]/u.test('😹'); // true /^[^x]$/.test('😹'); // false /[^x]/.test('😹'); // true /^[^x]$/u.test('😹'); // true /^[ab😹]$/.test('😹'); // false /[ab😹]/.test('😹'); // true /^[ab😹]$/u.test('😹'); // true |
We can easily draw a conclusion, that it is a good practice to include u flag in our patterns, especially if there is any chance that there would be characters other than just the standard ASCII.
If you combine it with the ignore case flag, the pattern will also match for both lowercase and uppercase characters.
1 |
/\u{78}/ui.test('X'); // true |
An interesting note is that in the pattern attribute of input and textarea elements in HTML has this flag enabled by default.
Summary
Today we learned more about RegExp object in JavaScript and how we can use this knowledge with a great feature of regular expressions: grouping. We’ve also learned two new flags: sticky and Unicode. Hopefully, you now see more and more use-cases for regular expressions. Until the next time!