Thursday, May 2, 2013

Adobe AIR: Getting HTML Source on MX HTML Component

Problem:
I decided to write a tool that would aggregate data from web sites as one surfs the web, kind of like a browser with some extra functionality. It soon became apparent that the <mx:HTML> component didn't have a simple "source" property where I could see all the HTML from the website.

Solution:
It turned out to be a simple task. The HTML component has a property which allows you to access the HTMLLoader object associated with the component. The HTMLLoader object has direct DOM access using Psuedo DOM objects (basically basic objects with a couple preset properties called Nodes and NodeLists). Many of the JavaScript dom functions exist in this representation and will work just fine.

For example:
getElementById
getElementsByClassName
getElementsByTagName

Using the last one along with the innerHTML property, we can get the source of the body element.

html_component.htmlLoader.window.document.getElementsByTagName('body')[0].innerHTML;

I'm sure you could get the entire document source if you wanted to examine the page head, etc. But these weren't useful for me.

Good luck!

No comments: