Regular Expression to find HTML links with title attributes

Posted on November 21st, 2005 by

I recently needed to write a regular expression to find HTML and XHTML links with title attributes (<a href="http://www.gustavus.edu" title="Visit the Gustavus Adolphus College homepage">Gustavus</a>). Here it is:

<a[[:print:]]*title=('|"")?(.*?(?=1))1?[[:print:]]*>([[:print:]]*)</a>

Note that the [[:print:]] parts are applicable to ColdFusion regular expressions and would have to be changed to something else if you aren’t using ColdFusion. Additionally, the ('|"") part uses the double-double quote escape syntax because the regular expression is passed in between double quotes.

I was at a major roadblock when writing this regular expression until I found out about wildcard non-greedy quantifier and used it with the positive lookahead/backreference combo. This critical section is the part that says (.*?(?=1)). This basically means grab any amount of any character until the next occurrence of the first backreference (in this case either ‘ or “). Brilliant.

Contact Us

Phone: 507-933-6111
Email: helpline@gustavus.edu
Web: https://gustavus.edu/gts
Blog: https://gts.blog.gustavus.edu
Twitter: https://twitter.com/gtshelpline
Remote Support: https://sos.gac.edu
System Status: https://gustavus.freshstatus.io

Sign up for our newsletter.

Receive a daily digest anytime we post something new.

We don’t spam! Unsubscribe at any time!

 


2 Comments

  1. Hackspirit says:

    hello…what about if i want to use the same reg exp in PHP
    <a>([[:print:]]*)</a>