Introduction of Look-around regular expression:
Unlike normal regular expression, look-around regular expression matches the front and behind part of the given targeted string. But in case of normal regular expression we just match a string based as per the standard rules.
Look-Around regular expressions are of two types: firstly look-ahead, secondly look-behind.In both the types each expression can be devided into two parts: fixed part and the variable part.
We than go through each of them in the sections below!
Look-Ahead regular expression:
The look head regular expression means the ahead part(front part) is fixed but the rear part is variable. Which essentially means that we have to match the rear part where ever the ahead part is found. In otherwords, the front part is fixed and the rear part is variable.
Example sentence: 12 cups of coffee costs 10$, 3 cups of black coffee costs 12$, 4 cups of cappucinno costs 15$.
Explaination of Look Ahead :
The problem is, we have to extract all the prices of different varieties of coffee using regular expression.
In the above problem, costs are 10$, 12$ and 15$. So we have to extract 10,12,15 because these number as a $ symbol following it. The price 10 is the variable part and $ is the fixed part. $ is ahead of 10. Hence, look-ahead.
How to construct look ahead regular expression in Python?
Syntax : variable-part(?=ahead-part-constant)
Python example for the look head problem : re.compile(r’ [0-9]{2}(?=\$) ‘)
Explaination of the above expression:
-
- [0-9]{2} –> matches all the two digit numbers 10,12 etc.
- (?=) –> is the syntax of look ahead
- \$ –> as symbol dollar has special.
Output of the : 10,12,15.
How to construct look ahead negative ?
If you have got the concept of look ahead, then look ahead negative means that the regular expression will only match to those digit where there is not a dollat symbol.
Syntax : variable-part(?=!ahead-part-constant)
In the syntax above the exclamation mark “!” denotes negative operation.
Python example for the look head negative problem : re.compile(r’ [0-9]{2}(?=!\$) ‘)
Explaination of the above expression:
-
- [0-9]{2} –> matches all the two digit numbers 10,12 etc.
- (?=!) –> is the syntax of look ahead negative
- \$ –> as symbol dollar has special.
Output would be : 3,4.
Look Behind regular expressions:
Straight to the point, look behind expressions are just opposite of look head. In otherwords, the behind part is constant and the ahead part is variable.
Lets take an example sentence: 2 cup of black coffee costs $10, 5 cup of cappucinno costs $40. The temperate is 40c.
In case above case, we would only extract 10 and 40. As these digits are followed by symbol $.
Syntax of Look Behind : (?<=constantPart)variablePart.
Python regular expression for the above problem would be: re.compile(r’(?<=\$)[0-9]{2}‘).
Look behind negative regular expressions:
Syntax of Look Behind : (?<!constantPart)variablePart.
Python regular expression for the above problem would be: re.compile(r’(?<!\$)[0-9]{2}‘)
Python Code:
import re # look ahead --> ahead is constant, 10$, 20$. # the behind part is variable # extract only those digist wehre there is a $ following digit lookahead = re.findall(r'[0-9]{1,2}(?=\$)','10 cup of tea cost 12$ 34$ 56') print(lookahead) # look ahead negative --> extract only those digit where there is not a $ following the digit lookaheadNegative = re.findall(r'[0-9]{2}(?!\$)','10 cup of tea cost 12$ 34$ 56') #print(lookaheadNegative) # look for a word following a hyphen - is constant, extract the word just after hyphen #example = re.findall(r'(?<=-)\w+', 'spam-egg dim-dm') # look behind --> the behind part is constant, $10, $56. the ahead part is variable lookbehindPositive = re.findall(r'(?<=\$)[0-9]{2}','10 cup of tea cost $10 $20 56') #print(lookbehindPositive) # look behind negative --> lookbehindNegative = re.findall(r'(?<!\$)[0-9]{2}','10 cup of tea cost $10 $20 56') #print(lookbehindNegative) xc = re.findall(r'(?<=inr )[0-9]{2,3}','10 cup of tea cost inr 100 inr 202 56') # regular expression matching end of string EOString = re.findall(r'.*\.jpg$','image.jpg, written.doc, image2.jpg') #print(EOString)
Conclusion of look around regular expressions:
In the above article we have explored the applications of look-around regular expression.Hope this article will help you to understand rarely touched topic of regular expressions.