* Dominique dHotman <
dominique@xxxxxxxxxxx> [100713 09:27]:
> Hi everyone,
>
> We have found what we think might be a bug but are not sure if the fax
> scheduler is behaving like this by design. First some background:
> The problem that we have found is that our audit hook is not getting all the
> JOB events and because of this we are recording the job as completed
> successfully instead of failed. And this causes all kinds of problems in
> our processing system.
Hm... The log shows that it *is* getting all the events. How are you
expecting to classify "failed/success" from the JOB/SEND events?
Unfortunately, there aren't "failed" or "success" events directly:
Event Class Mask Description
JOB_CREATE J 0x0001 job created
JOB_SUSPEND J 0x0002 job suspended
JOB_READY J 0x0004 job ready to send
JOB_SLEEP J 0x0008 job sleeping awaiting time-to-send
JOB_DEAD J 0x0010 job marked dead
JOB_PROCESS J 0x0020 job processed by scheduler
JOB_REAP J 0x0040 job corpus reaped
JOB_ACTIVE J 0x0080 job activated
JOB_REJECT J 0x0100 job rejected
JOB_KILL J 0x0200 job killed
JOB_BLOCKED J 0x0400 job blocked by other job
JOB_DELAYED J 0x0800 job delayed by tod restriction or similar
JOB_ALTERED¹ J 0x1000 job parameters altered
JOB_TIMEDOUT J 0x2000 job kill timer expired
JOB_PREP_BEGIN J 0x4000 job preparation started
JOB_PREP_END J 0x8000 job preparation finished
SEND_BEGIN S 0x0001 fax, send attempt started
SEND_CALL S 0x0002 fax, call placed
SEND_CONNECTED S 0x0004 fax, call answered by fax
SEND_PAGE S 0x0008 fax, page transmit done
SEND_DOC S 0x0010 fax, document transmit done
SEND_POLLRCVD S 0x0020 fax, document retrieved by poll operation
SEND_POLLDONE S 0x0040 fax, poll operation completed
SEND_END S 0x0080 fax, send attempt finished
SEND_REFORMAT S 0x0100 fax, job being reformatted
SEND_REQUEUE S 0x0200 fax, job requeued
SEND_DONE S 0x0400 fax, send job done
A normal failure would give you a a SEND_BEGIN/SEND_DONE at least, but so would
a normal success. The AuditHook is built on triggers, which signal events, but
don't transmit state... I'm guessing that when Sam designed them at first, he
assumed that people wanting job state would just examine the job. At least,
that's how I've use triggers/audit hooks.
But there is no event to signal which state the jos is in. Just that the job
is "dead", meaning it's in it's final state.. The usual means of inspecting
jobs is to watch the triggers to know an event happened, and then inspect the
job to see whatever state you want from it.
> Here is the listing of events (from /var/log/messages) for a failed job
> where this happened:
>
> Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: SUBMIT JOB 129182
> Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0001 "129182"
> Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0004 "129182"
> Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0020 "129182"
> Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0080 "129182"
> Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x4000 "129182"
> Jul 13 09:44:36 hylafax01 FaxQueuer[9835]: JOB 129182: CONVERT DOCUMENT: job time limit exceeded
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x8000 "129182"
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: NOTIFY: bin/notify "doneq/q129182" "format_failed" ""
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0010 "129182"
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0040 "129182"
So you have a similar set of events on a job that failed for fax session
reasons? With the exception of some SEND_* events, the hsould be the same,
right?
> So my question here is: is there a bug somewhere that is causing this "JOB
> 129182: CONVERT DOCUMENT: job time limit exceeded" not be passed to the
> Audit Hook or is this right?
I think it's "right"... There is no event for format failed. Only
JOB_PREP_BEGIN and JOB_PREP_END (again, the triggers are on events, not the
state of jobs).
> If this is working properly then how can the Audit Hook script that we have
> get notified that the job preparation phase has actually failed with the
> failure reason being "CONVERT DOCUMENT: job time limit exceeded"
I don't think audit hook alone can give you that... Just like it can't give yo
if the job was sent successfully or not either. I guess you could deduce that
since you got JOB_DEAD without any SEND_* attempts, it must have failed to
prepare correctly...
I wouldn't build a system *only* with triggers... In the systems I've built,
I've always used the triggers as a point when its time to check the state of a
job/fax/modem for what ever particular state I'm monitoring.
Even ignoring the format conversion failure, and just expecting normal fax
session errors, the triggers don't even provide the necessary means to classify
success or failure. I guess you could go try and deduce "success" by counting
the SEND_PAGE events associated with a job, and consider it a 'success' IIF the
count matches the "expected" count, but that's sound like more work than just
checking the job state directly in the first place.
a.
--
Aidan Van Dyk aidan@xxxxxxxx
Senior Software Developer +1 215 825-8700 x8103
iFAX Solutions, Inc. http://www.ifax.com/