Date: Fri, 5 Apr 2002 14:02:15 -0600 (CST) From: Gilles Detillieux To: "Tsai, Jin" Cc: "ht://Dig mailing list" Subject: Re: [htdig] ignores PDF, Word, Excel if from https According to Tsai, Jin: > The output of the suggested handler.pl command execution, as follows: > [root@fhlx010 bin]# ./handler.pl https > 'https://fh2k018.fhmis.net/quickplace/mis/PageLibrary85256B3500795FD6.nsf/h_ > index/C6D21E7CEE35AC8185256B3C0055D8F1/$FILE/Technology+Standards.doc' > /etc/htdig.conf | cat -v -t -e | grep '^.:' > s:^I200$ > r:^IOK^M$ > t:^Iapplication/msword^M$ > l:^I72192^M$ > m:^IThu, 31 Jan 2002 18:34:48 GMT^M$ > > Would it work if I add single quote to handlargs[argi++] = (char > *)_URL.get().get(); in ExternalTransport.cc? No, that's not the problem! ExternalTransport.cc calls the handler script directly via execv(), not using a shell, so you can't add single quotes to the argument, because they'd never get parsed. The problem is the CR character (represented by ^M in cat -v output), which I believe is preventing htdig from recognizing the content-type string. Try this patch to the code, if you can apply it. Use "patch -p0 < stuff-below". --- htdig/ExternalTransport.cc.orig Sat Feb 2 07:40:21 2002 +++ htdig/ExternalTransport.cc Fri Apr 5 13:43:21 2002 @@ -202,6 +202,7 @@ Transport::DocStatus ExternalTransport:: while (in_header && readLine(input, line)) { + line.chop('\r'); token1 = strtok(line, "\t"); if (token1 == NULL) { I'll commit this change today, and maybe add some debugging output to it too. -- Gilles R. Detillieux E-mail: Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list To unsubscribe, send a message to with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html