Date: Thu, 12 Jul 2001 13:43:11 -0500 (CDT) From: Gilles Detillieux To: "ht://Dig mailing list" Subject: Re: [htdig] PATCH - HtFile.cc bug in 3.2.0b* (was: List of numbers chokes htdig 3.1.5) According to me... > I tried it with the 3.2.0b4-070801 snapshot, using a file:// URL, and > it took 8 minutes on an 866 MHz Pentium III, but it messed up somehow. > It seems to have lost all the newlines from the file, so it tried to > index 1 big number. I'll need to look into whether this is a problem > with the HtFile handler or the Plaintext parser. Once I debug it, I'll > try profiling it too, although the word indexing is quite different in > 3.2. Different indeed! Once I fixed a glaring error in HtFile.cc, htdig 3.2.0b4 (070801 snapshot) correctly indexed the spp2.txt file in a few short seconds. So, even though HtFile.cc has been in there since before 3.2.0b1, it's pretty obvious that NOBODY EVER TESTED THIS CODE BEFORE!!!!! ARRRGGH! Why are major revisions and big additions continually being committed to this source tree without even the most basic testing? Here is my patch to this latest snapshot, which I'll be committing later today. It fixes this bug, and also a couple apparent problems in the mime.types handling. It seems to me that if it's unable to open the mime.types file, it will keep trying on every request. Also, if the mime.types file is there but empty, it doesn't fall back to the built-in rules, which it does now with this patch (sort of a side effect, but I think a good one, of my fix to keep it from continually trying to open the file). --- htnet/HtFile.cc.readbug Sun May 20 02:13:53 2001 +++ htnet/HtFile.cc Thu Jul 12 12:57:28 2001 @@ -88,10 +88,10 @@ HtFile::DocStatus HtFile::Request() if (!mime_map) { + mime_map = new Dictionary(); ifstream in(config->Find("mime_types").get()); if (in) { - mime_map = new Dictionary(); String line; while (in >> line) { @@ -170,7 +170,7 @@ HtFile::DocStatus HtFile::Request() if (ext == NULL) return Transport::Document_not_local; - if (mime_map) + if (mime_map && mime_map->Count()) { String *mime_type = (String *)mime_map->Find(ext + 1); if (mime_type) @@ -190,20 +190,21 @@ HtFile::DocStatus HtFile::Request() _response._modification_time = new HtDateTime(stat_buf.st_mtime); - ifstream in((const char *)_url.path()); - if (!in) + FILE *f = fopen((const char *)_url.path(), "r"); + if (f == NULL) return Document_not_found; - String tmp; - while (in >> tmp) + char docBuffer[8192]; + int bytesRead; + while ((bytesRead = fread(docBuffer, 1, sizeof(docBuffer), f)) > 0) { - if (_response._contents.length()+tmp.length() > _max_document_size) - tmp.chop(_response._contents.length()+tmp.length() - - _max_document_size); - _response._contents.append(tmp); - if (_response._contents.length() >= _max_document_size) - break; + if (_response._contents.length() + bytesRead > _max_document_size) + bytesRead = _max_document_size - _response._contents.length(); + _response._contents.append(docBuffer, bytesRead); + if (_response._contents.length() >= _max_document_size) + break; } + fclose(f); _response._content_length = stat_buf.st_size; _response._document_length = _response._contents.length(); -- Gilles R. Detillieux E-mail: Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list To unsubscribe, send a message to with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html