About 3 weeks ago I got contacted by Will Drewry from the oCERT Team about a possible XSS attack vector in the popoon externalinput class, which was discovered by Alexios Fakos. It used a (again stupid) behaviour by browsers which treat a “/” like a space, so something like <body/onload=alert(/hello/)> was not cleaned up… The actual fix was quite easy and a no-brainer. But as I always said that externalinput.php alone should not be used as your own line of defense, I went for a more thorough solution.
Many issues with cleaning XSS attacks comes from the fact, that browsers accept a lot of really f***ed up markup like the example above and therefore before actually trying to clean unwanted stuff away one should make a well formed HTML out of the input. Tidy does that job pretty well, the loadHTML method of PHP's DomDocument class also produces well-formed XHTML as output. That's what I always told people to do and that's what we do since always in Flux CMS. That's why Flux CMS is not affected by this exploit at all – some of the advantages of XML based CMS systems.
To more enforce that approach, I wrote a new class called lx_externalinput_clean, which uses the same regexes but does by default filter first the input with tidy, and if that's not available with DomDocument->loadHTML and if even that is not available, it does a striptags(), just to be sure. This can be overridden, so that if you do that cleaning already before, you don't have to do that again. I hope this helps people just blindly copying code :)
Having said that, the class still tries to clean many nasty things without first having to tidy up the HTML to get proper markup (and the above attack is also handled without tidy et al.). But there are most certainly more attacks possible which come from WTF-a-browser-does-parse-that? markup and using those tools should get rid of them for once and all. You have been warned, if you don't use them.
Thanks a lot to the oCERT team to give me this much time ahead to fix it and to inform a lot of other projects in advance which are apparently using my code.
The full advisory can be found at ocert.org/advisories/ocert-2008-012.html