One thing I need to do when scrubbing information that is structured by HTML or XML (or XHTML) is remove extraneous nodes that contain no data. For instance, I would remove <style/> blocks. Using a tool like TextPad for Windows, you can use the following regular expression to select an entire node (i.e., the start and end tags and everything in between, including line breaks).
<style\b[^>]*>([^<>]*)</style>This regex allows me to search and replace (with nothing) to remove the <style/> blocks. To do another block, just replace the two instances of "style" with the tag you want to find (e.g., "script").