We found this regular expression online at haacked.com for finding html tags in a string. I used this in the new Gustavus eCard program that I have been working on (stay tuned) as a way to get the text of captions which contain hyperlinks. This regular expression was written by Phil Haack.
</?w+(((s|n)+w+((s|n)*=(s|n)*(?:".*?"|'.*?'|[^'">s]+))?)+(s|n)*|(s|n)*)/?>
This expression is so smart that it even accounts for things like newline characters and angle brackets which happen to appear in data.
Update: As Haacked pointed out, (s|n)
is redundant, so the updated regular expression should be as follows:
</?w+((s+w+(s*=s*(?:".*?"|'.*?'|[^'">s]+))?)+s*|s*)/?>
Leave a Reply