Mark,
Are you using triggers at all or JWAIT (which use triggers internally)?
It looks as if faxq has just "lost" track of the job that that
particular hfaxd has submitted. The only way I know of that to happen
is the trigger bug which has been fixed recently (but is a bug in 4.3.3
yet).
Also - the slowness of the faxq process when handling large queues is a
well know deficiency of the scheduler in faxq. If you're interested in
handling large queues (with or without batching), I would recommend you
try out current CVS. You can get a snapshot of it from:
ftp://ftp.hylafax.org/source/hylafax-SNAPSHOT.tar.gz
or, get it straight from CVS:
:pserver:cvs:cvs@xxxxxxxxxxxxxxx:/cvsroot
or from GIT:
git://cvs.hylafax.org/HylaFAX
For busy queues with all devices almost always busy, the new scheduler
is essentially O(1), where as the original HylaFAX scheduler was O(N),
and the one in 4.3 is O(N**2).
* Mark Hunting <mark@xxxxxxxxxx> [070508 06:33]:
The server is very busy today, and I already have a new hanging sendfax
process now. I hangs almost an hour now, and I guess it will never
finish anymore:
strace -p26875
uucp 26875 0.0 0.1 4152 1664 ? S 11:42 0:00
/usr/bin/sendfax -m -T 3 -I 300 -n -k now + 3 days -P 128 -f
xxxxxx@xxxxxxxxxxxxx -d 084xxxxxxx 92726.pdf
This is the strace:
Process 26875 attached - interrupt to quit
read(3,
Under normal circumstances the last line becomes something like
read(3, "200 Job 118161 submitted.\r\n", 1024) = 27
etc...
And here is the strace of the corresponding hfaxd process:
strace -p26876
Process 26876 attached - interrupt to quit
select(5, [0 4], [], [], NULL
Which under normal circumstances becomes something like
select(5, [0 4], [], [], NULL***) = 1 (in [4])
read(4, "S*\0", 2047) = 3
read(4, 0xbfb03500, 2047) = -1 EAGAIN (Resource
temporarily unavailable)
write(1, "200 Job 119050 submitted.\r\n", 27) = 27
fcntl64(0, F_SETFL, O_RDWR|O_NONBLOCK) = 0
etc...
I also notice that adding faxes to the queue (using sendfax) is always
very slow when the queue is long (7000+ faxes). It becomes so slow that
the queue never becomes bigger than +/- 8000 faxes. At that point adding
faxes to the queue (from my perl loop) goes at the same speed as the
sending of the faxes itself using 60 phone lines. In the past this
problem was even worse, until I set MaxBatchJobs to 1. I don't know why
adding faxes is still slow now when the queue is long. When I do a
strace on those slow processes, they hang for some seconds (or longer)
at the same point as the straces above. Apparently sometimes these slow
processes are not only slow, but hang forever. Slow processes are no
problem, but the faxes should be sent at some point, and not hang forever.
I hope you can help me with this problem. Please let me know if you need
more information.
Best regards,
Mark