[Cialug] Interesting problem [OT?]
Stuart Thiessen
thiessenstuart at aol.com
Fri Nov 14 17:56:18 CST 2008
The img tag shows (for example) <img src="http://www.signbank.org/swis/glyph.php?code=45568
">. The glph script returns the appropriate image file for that code
number. I have the dump of the html, but it is these <img> tags that
are blocking me from having a complete offline copy. That was why I
was also thinking of trying to automate some kind of PDF dump of each
page.
Tony, I did look at the Selenium website, but I am not sure how it
will help with this task. Could you explain?
Thanks,
Stuart
On Nov 14, 2008, at 17:36 , Matthew Nuzum wrote:
> On Fri, Nov 14, 2008 at 5:20 PM, Stuart Thiessen <thiessenstuart at aol.com
> > wrote:
>> Anyway, my technical challenge is that the organization developing
>> this
>> writing system has published a PHP database of the symbols at:
>> http://www.signbank.org/swis/data.php?subset=&bs_code=*
>>
>> I need to get a offline dump of each of the basesymbol child pages
>> listed on
>> that page. I can't do a simple download of the page as HTML because
>> the
>> image file showing the symbol is actually a link to a script that
>> finds the
>> right symbol and plugs it in, so when I use programs like wget, a
>> broken
>> link for the symbol image appears when I try to look at it offline.
>
> Does wget download the symbols and give them a funny name like
> glyph.php?code=368.html or glyph.php?code=368.png?
>
> I don't see anything sneaky on that page that would prevent you from
> using an automated tool, my guess is that the names are getting
> mangled. If so, view the source of your downloaded page and see what
> the filename is that it's expecting and how it differs from what was
> actually generated. If they don't differ and the problem is really
> that the name is illegal for the filesystem then you may be able to
> just use a script to rename the images and the paths to the images in
> the html files.
>
> I've run into this problem before and just tried a different
> downloader program. It's been a while since I've used one so I can't
> think of one to suggest at the moment.
>
> --
> Matthew Nuzum
> newz2000 on freenode
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> http://cialug.org/mailman/listinfo/cialug
More information about the Cialug
mailing list