Mark,
Are you using triggers at all or JWAIT (which use triggers internally)?
It looks as if faxq has just "lost" track of the job that that
particular hfaxd has submitted. The only way I know of that to happen
is the trigger bug which has been fixed recently (but is a bug in 4.3.3
yet).
Also - the slowness of the faxq process when handling large queues is a
well know deficiency of the scheduler in faxq. If you're interested in
handling large queues (with or without batching), I would recommend you
try out current CVS. You can get a snapshot of it from:
ftp://ftp.hylafax.org/source/hylafax-SNAPSHOT.tar.gz
or, get it straight from CVS:
:pserver:cvs:cvs@xxxxxxxxxxxxxxx:/cvsroot or from GIT:
git://cvs.hylafax.org/HylaFAX
For busy queues with all devices almost always busy, the new scheduler
is essentially O(1), where as the original HylaFAX scheduler was O(N),
and the one in 4.3 is O(N**2).
* Mark Hunting <mark@xxxxxxxxxx> [070508 06:33]:
The server is very busy today, and I already have a new hanging
sendfax process now. I hangs almost an hour now, and I guess it will
never finish anymore:
strace -p26875
uucp 26875 0.0 0.1 4152 1664 ? S 11:42 0:00
/usr/bin/sendfax -m -T 3 -I 300 -n -k now + 3 days -P 128 -f
xxxxxx@xxxxxxxxxxxxx -d 084xxxxxxx 92726.pdf
This is the strace:
Process 26875 attached - interrupt to quit
read(3,
Under normal circumstances the last line becomes something like
read(3, "200 Job 118161 submitted.\r\n", 1024) = 27
etc...
And here is the strace of the corresponding hfaxd process:
strace -p26876
Process 26876 attached - interrupt to quit
select(5, [0 4], [], [], NULL
Which under normal circumstances becomes something like
select(5, [0 4], [], [], NULL***) = 1 (in [4])
read(4, "S*\0", 2047) = 3
read(4, 0xbfb03500, 2047) = -1 EAGAIN (Resource
temporarily unavailable)
write(1, "200 Job 119050 submitted.\r\n", 27) = 27
fcntl64(0, F_SETFL, O_RDWR|O_NONBLOCK) = 0
etc...
I also notice that adding faxes to the queue (using sendfax) is
always very slow when the queue is long (7000+ faxes). It becomes so
slow that the queue never becomes bigger than +/- 8000 faxes. At that
point adding faxes to the queue (from my perl loop) goes at the same
speed as the sending of the faxes itself using 60 phone lines. In the
past this problem was even worse, until I set MaxBatchJobs to 1. I
don't know why adding faxes is still slow now when the queue is long.
When I do a strace on those slow processes, they hang for some
seconds (or longer) at the same point as the straces above.
Apparently sometimes these slow processes are not only slow, but hang
forever. Slow processes are no problem, but the faxes should be sent
at some point, and not hang forever.
I hope you can help me with this problem. Please let me know if you
need more information.
Best regards,
Mark