HylaFAX The world's most advanced open source fax server |
* Dominique dHotman <dominique@xxxxxxxxxxx> [100713 09:27]: > Hi everyone, > > We have found what we think might be a bug but are not sure if the fax > scheduler is behaving like this by design. First some background: > The problem that we have found is that our audit hook is not getting all the > JOB events and because of this we are recording the job as completed > successfully instead of failed. And this causes all kinds of problems in > our processing system. Hm... The log shows that it *is* getting all the events. How are you expecting to classify "failed/success" from the JOB/SEND events? Unfortunately, there aren't "failed" or "success" events directly: Event Class Mask Description JOB_CREATE J 0x0001 job created JOB_SUSPEND J 0x0002 job suspended JOB_READY J 0x0004 job ready to send JOB_SLEEP J 0x0008 job sleeping awaiting time-to-send JOB_DEAD J 0x0010 job marked dead JOB_PROCESS J 0x0020 job processed by scheduler JOB_REAP J 0x0040 job corpus reaped JOB_ACTIVE J 0x0080 job activated JOB_REJECT J 0x0100 job rejected JOB_KILL J 0x0200 job killed JOB_BLOCKED J 0x0400 job blocked by other job JOB_DELAYED J 0x0800 job delayed by tod restriction or similar JOB_ALTERED¹ J 0x1000 job parameters altered JOB_TIMEDOUT J 0x2000 job kill timer expired JOB_PREP_BEGIN J 0x4000 job preparation started JOB_PREP_END J 0x8000 job preparation finished SEND_BEGIN S 0x0001 fax, send attempt started SEND_CALL S 0x0002 fax, call placed SEND_CONNECTED S 0x0004 fax, call answered by fax SEND_PAGE S 0x0008 fax, page transmit done SEND_DOC S 0x0010 fax, document transmit done SEND_POLLRCVD S 0x0020 fax, document retrieved by poll operation SEND_POLLDONE S 0x0040 fax, poll operation completed SEND_END S 0x0080 fax, send attempt finished SEND_REFORMAT S 0x0100 fax, job being reformatted SEND_REQUEUE S 0x0200 fax, job requeued SEND_DONE S 0x0400 fax, send job done A normal failure would give you a a SEND_BEGIN/SEND_DONE at least, but so would a normal success. The AuditHook is built on triggers, which signal events, but don't transmit state... I'm guessing that when Sam designed them at first, he assumed that people wanting job state would just examine the job. At least, that's how I've use triggers/audit hooks. But there is no event to signal which state the jos is in. Just that the job is "dead", meaning it's in it's final state.. The usual means of inspecting jobs is to watch the triggers to know an event happened, and then inspect the job to see whatever state you want from it. > Here is the listing of events (from /var/log/messages) for a failed job > where this happened: > > Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: SUBMIT JOB 129182 > Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0001 "129182" > Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0004 "129182" > Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0020 "129182" > Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0080 "129182" > Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x4000 "129182" > Jul 13 09:44:36 hylafax01 FaxQueuer[9835]: JOB 129182: CONVERT DOCUMENT: job time limit exceeded > Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x8000 "129182" > Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: NOTIFY: bin/notify "doneq/q129182" "format_failed" "" > Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0010 "129182" > Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0040 "129182" So you have a similar set of events on a job that failed for fax session reasons? With the exception of some SEND_* events, the hsould be the same, right? > So my question here is: is there a bug somewhere that is causing this "JOB > 129182: CONVERT DOCUMENT: job time limit exceeded" not be passed to the > Audit Hook or is this right? I think it's "right"... There is no event for format failed. Only JOB_PREP_BEGIN and JOB_PREP_END (again, the triggers are on events, not the state of jobs). > If this is working properly then how can the Audit Hook script that we have > get notified that the job preparation phase has actually failed with the > failure reason being "CONVERT DOCUMENT: job time limit exceeded" I don't think audit hook alone can give you that... Just like it can't give yo if the job was sent successfully or not either. I guess you could deduce that since you got JOB_DEAD without any SEND_* attempts, it must have failed to prepare correctly... I wouldn't build a system *only* with triggers... In the systems I've built, I've always used the triggers as a point when its time to check the state of a job/fax/modem for what ever particular state I'm monitoring. Even ignoring the format conversion failure, and just expecting normal fax session errors, the triggers don't even provide the necessary means to classify success or failure. I guess you could go try and deduce "success" by counting the SEND_PAGE events associated with a job, and consider it a 'success' IIF the count matches the "expected" count, but that's sound like more work than just checking the job state directly in the first place. a. -- Aidan Van Dyk aidan@xxxxxxxx Senior Software Developer +1 215 825-8700 x8103 iFAX Solutions, Inc. http://www.ifax.com/
Attachment:
signature.asc
Description: Digital signature