Subj : Re: Get URLs To : "John J. Lee" From : Brendan Eich Date : Tue Oct 07 2003 09:10 pm John J. Lee wrote: > Brendan Eich writes: > >>You can't execute the script on the server side. It needs the > > [...] > > Who said anything about a server? Did I miss it? Berurier wrote: > Hello, > > I make a crawler which must *parse* javascript Crawler, server; potato, potahto. My point was that neither a crawler nor a server is the client JS environment, with a client DOM, events, cookies.txt, and a real user sitting there with whom to interact. >>you need to run >>in the client. Or you need a proxy between client and server. > > > For which XPCOM is what you want (or other browsers' automation > interfaces: MSIE has COM automation interfaces, Konqueror has KParts > and DCOP). I meant by proxy something more like an HTTP proxy. In other words, reverse the sense of the crawl, and wait for URIs (computed however) to come to the proxy. Don't try to perform a "static" or even "dynamic" (but offline, crawler-side if not server-side) analysis of the JS to hope to discern URI strings. > There are other ways, too: client libraries like HttpUnit (in Java). > Swings and roundabouts. Right, but the URI(s) to be filtered, if computed based on client per-user state or preferences, can't be computed otherwise. So if the idea is to deal with any contingent URI, crawling may not be the best way to go. But I admit I'm not sure what manulenantais@yahoo.fr wanted to do with those URI strings found by his crawler. What's more, I'm sorry to say I confused him with emmanuel.fernandez@crf.canon.fr, but now I'm not sure they're the same person. So instead of guessing, I'm asking: what is the crawler trying to discover? The set of all possible URIs that clients may request based on the JS and HTML content? Or just the ones easily found by static analysis, or even "default user" dynamic analysis (running the JS in the crawler, with default user cookie and other settings)? /be .