#!/local/bin/perl # showdead.pl Daniel MacKay Daniel.MacKay@Dal.Ca # 990922 DEM Scan the log from a "htdig -s" run and produce pages listing # all the dead links for your web managers to browse. # dig_report.pl a variant by Malcolm.Austen@oucs.ox.ac.uk # for use on daneel.ox.ac.uk # 990929-30 MDA first hack with a view to counting pages at each level # 991201 MDA tidy up the presentation # 000119 MDA exclude "not found" and "redirect" urls from the table # 0103.. MDA reordeed server lists by reverse IPname # added lists of all pages indexed (+hop counts) # Typical(?) usage: # perl dig_report.pl < htdig.stdout #use strict; $prefix = "db/new/report/" ; # that's where the html pages will be written $title = "ht://Dig report generated on " . localtime(); # this is the
Jump forward to lists of pages that were not indexed.
This table shows the number of pages indexed at various depths from
each server.
The depth is measured from (one of) the server start
points. Some level counts may be misled by, for example, a high level
page having already been indexed at a lower level from another server
start point or by one server having serveral start point entries (with
different start directories).
The "?" column gives the count for
pages that had a dubious(unset?) depth in the log record.
| Overall | \n"; print SERV "$totcount | $badtot | \n"; for ($i=0;$i<=$maxd;$i++) { print SERV "$dcount[$i] | \n" }; print SERV "
| Server\\Depth | \n"; print SERV "Total | ? | \n"; for ($i=0;$i<=$maxd;$i++) { print SERV "$i\n" }; print SERV " |
|---|---|---|---|
| $s | \n"; print SERV "$servtot{$s} | \n"; print SERV "$bad{$s} | \n"; for ($i=0;$i<=$maxd;$i++) { print SERV "$sites{$s}->[$i] | \n" }; print SERV "
These lists (one per server) show the pages that ht://Dig could not index. The page may be missing but it may simply be that the ht://Dig process was forbidden access to the page.