HylaFAX The world's most advanced open source fax server

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [hylafax-users] Audit hook not getting all Job events



Hi Aidan,

Thanks for the quick reply.  I understand that the audit hook script will just be called ever time an event is fired and that the audit hook then needs to determine the job state.

The way that we have done this is to look in the job file determining first if the job file is in the sendq or doneq.  Our manner of getting this information is not very advanced, we just use /bin/grep to parse the data out of the job file and populate the variables we are interested in.  Then depending on the event class (Job, Send or Recv) and event ID we classify the event as something that we are interested in and if so insert a record into our database.

So the answer to my first question is basically answered: The events are only fired based on different stages of the fax job and so we are definitely getting all the events. 

The problem that we are having is that something happens to the fax job during the preparation phase (the auto rotate times out) and we are not able to determine the reason for this failure by looking in the job file.  The only way we know something has happened is to check when we the "job marked dead" event is fired.  So when "job marked dead" is fired we check if the previous event is "send job done".  If not then we have to parse /var/log/message and and find out what exactly the output from the job scheduler is.  Then take this reason and load a job failed status into our database.

Is there a better place to get this job preparation state that from parsing machines system log?  So is it possible to look somewhere when the "job prep done" event is fired and find out if the the preparation was successful or not?

Last point:  I can post a copy of our audit hook script if it would be helpful but I'd have to anonimise it a bit first.  We have spent a lot of time working on this and it forms a core part of our overall fax infrastructure. We'd be very happy to share it with the community if others will benefit from it. 

Thanks
Dominique



On Tue, Jul 13, 2010 at 5:37 PM, Aidan Van Dyk <aidan@xxxxxxxx> wrote:
* Dominique dHotman <dominique@xxxxxxxxxxx> [100713 09:27]:
> Hi everyone,
>
> We have found what we think might be a bug but are not sure if the fax
> scheduler is behaving like this by design.  First some background:

> The problem that we have found is that our audit hook is not getting all the
> JOB events and because of this we are recording the job as completed
> successfully instead of failed.  And this causes all kinds of problems in
> our processing system.

Hm... The log shows that it *is* getting all the events.  How are you
expecting to classify "failed/success" from the JOB/SEND events?
Unfortunately, there aren't "failed" or "success" events directly:
      Event                Class  Mask       Description

      JOB_CREATE           J      0x0001     job created
      JOB_SUSPEND          J      0x0002     job suspended
      JOB_READY            J      0x0004     job ready to send
      JOB_SLEEP            J      0x0008     job sleeping awaiting time-to-send
      JOB_DEAD             J      0x0010     job marked dead
      JOB_PROCESS          J      0x0020     job processed by scheduler
      JOB_REAP             J      0x0040     job corpus reaped
      JOB_ACTIVE           J      0x0080     job activated
      JOB_REJECT           J      0x0100     job rejected
      JOB_KILL             J      0x0200     job killed
      JOB_BLOCKED          J      0x0400     job blocked by other job
      JOB_DELAYED          J      0x0800     job delayed by tod restriction or similar
      JOB_ALTERED¹         J      0x1000     job parameters altered
      JOB_TIMEDOUT         J      0x2000     job kill timer expired
      JOB_PREP_BEGIN       J      0x4000     job preparation started
      JOB_PREP_END         J      0x8000     job preparation finished

      SEND_BEGIN           S      0x0001     fax, send attempt started
      SEND_CALL            S      0x0002     fax, call placed
      SEND_CONNECTED       S      0x0004     fax, call answered by fax
      SEND_PAGE            S      0x0008     fax, page transmit done
      SEND_DOC             S      0x0010     fax, document transmit done
      SEND_POLLRCVD        S      0x0020     fax, document retrieved by poll operation
      SEND_POLLDONE        S      0x0040     fax, poll operation completed
      SEND_END             S      0x0080     fax, send attempt finished
      SEND_REFORMAT        S      0x0100     fax, job being reformatted
      SEND_REQUEUE         S      0x0200     fax, job requeued
      SEND_DONE            S      0x0400     fax, send job done

A normal failure would give you a a SEND_BEGIN/SEND_DONE at least, but so would
a normal success.  The AuditHook is built on triggers, which signal events, but
don't transmit state...   I'm guessing that when Sam designed them at first, he
assumed that people wanting job state would just examine the job.  At least,
that's how I've use triggers/audit hooks.

But there is no event to signal which state the jos is in.  Just that the job
is "dead", meaning it's in it's final state..  The usual means of inspecting
jobs is to watch the triggers to know an event happened, and then inspect the
job to see whatever state you want from it.

> Here is the listing of events (from /var/log/messages) for a failed job
> where this happened:
>
> Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: SUBMIT JOB 129182
> Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0001 "129182"
> Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0004 "129182"
> Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0020 "129182"
> Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0080 "129182"
> Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x4000 "129182"
> Jul 13 09:44:36 hylafax01 FaxQueuer[9835]: JOB 129182: CONVERT DOCUMENT: job time limit exceeded
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x8000 "129182"
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: NOTIFY: bin/notify "doneq/q129182" "format_failed" ""
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0010 "129182"
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0040 "129182"

So you have a similar set of events on a job that failed for fax session
reasons?  With the exception of some SEND_* events, the hsould be the same,
right?

> So my question here is:  is there a bug somewhere that is causing this "JOB
> 129182: CONVERT DOCUMENT: job time limit exceeded" not be passed to the
> Audit Hook or is this right?

I think it's "right"... There is no event for format failed. Only
JOB_PREP_BEGIN and JOB_PREP_END (again, the triggers are on events, not the
state of jobs).

> If this is working properly then how can the Audit Hook script that we have
> get notified that the job preparation phase has actually failed with the
> failure reason being "CONVERT DOCUMENT: job time limit exceeded"

I don't think audit hook alone can give you that... Just like it can't give yo
if the job was sent successfully or not either.  I guess you could deduce that
since you got JOB_DEAD without any SEND_* attempts, it must have failed to
prepare correctly...

I wouldn't build a system *only* with triggers...  In the systems I've built,
I've always used the triggers as a point when its time to check the state of a
job/fax/modem for what ever particular state I'm monitoring.

Even ignoring the format conversion failure, and just expecting normal fax
session errors, the triggers don't even provide the necessary means to classify
success or failure.  I guess you could go try and deduce  "success" by counting
the SEND_PAGE events associated with a job, and consider it a 'success' IIF the
count matches the "expected" count, but that's sound like more work than just
checking the job state directly in the first place.


a.

--
Aidan Van Dyk                                             aidan@xxxxxxxx
Senior Software Developer                          +1 215 825-8700 x8103
iFAX Solutions, Inc.                                http://www.ifax.com/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iD8DBQFMPIgxuVxNPsxNPScRAqrnAKCqOG3drJiC6Wdws2oMr23hOFPKZgCcDujI
llC4kVnrNP3fDX3ivWyQvkk=
=L55s
-----END PGP SIGNATURE-----





Project hosted by iFAX Solutions