Subj : Re: Making a Spider in Java with Rhino
To   : =?UTF-8?B?R29uemFsbyBGbG9yw61h?= <gfloria@tcpsi.es>
From : Igor Bukanov <igor@fastmail.fm>
Date : Tue Mar 25 2003 07:14 pm

Gonzalo Floría wrote:
> Thnx, I can see I'm greener than I thought. From your words I understand:
> 1.- I need to have a DOM implementation: An class able to "understand" the structure
> of the HTML I'm working with, separeting HTML from Js code. This I already have.
> 2.- My DOM should create a Document Object. This object shoul implement the
> org.w3c.dom.html.HTMLDocument interface. This I need to do.
> 3.- Here is the part I don't know how to do: I should "give" this document object to
> the Rhino interpreter before processing the JS code from the document. how?

In general you can not parse HTML or build DOM separately from execution 
of JavaScript since JavaScript can change HTML source via document.write 
  or modify DOM via any DOM mutation function. One solution is to create 
a document object representing empty DOM tree and then build the tree 
there so scripts will see the current DOM tree during their execution.

Regards, Igor

.