Re: [hylafax-users] Audit hook not getting all Job events

To: Aidan Van Dyk <aidan@xxxxxxxx>
Subject: Re: [hylafax-users] Audit hook not getting all Job events
From: Dominique dHotman <dominique@xxxxxxxxxxx>
Cc: hylafax-users <hylafax-users@xxxxxxxxxxx>
Date: Tue, 13 Jul 2010 21:36:30 +0200

Hi Aidan,

Thanks for the quick reply. I understand that the audit hook script will just be called ever time an event is fired and that the audit hook then needs to determine the job state.

The way that we have done this is to look in the job file determining first if the job file is in the sendq or doneq. Our manner of getting this information is not very advanced, we just use /bin/grep to parse the data out of the job file and populate the variables we are interested in. Then depending on the event class (Job, Send or Recv) and event ID we classify the event as something that we are interested in and if so insert a record into our database.

So the answer to my first question is basically answered: The events are only fired based on different stages of the fax job and so we are definitely getting all the events.

The problem that we are having is that something happens to the fax job during the preparation phase (the auto rotate times out) and we are not able to determine the reason for this failure by looking in the job file. The only way we know something has happened is to check when we the "job marked dead" event is fired. So when "job marked dead" is fired we check if the previous event is "send job done". If not then we have to parse /var/log/message and and find out what exactly the output from the job scheduler is. Then take this reason and load a job failed status into our database.

Is there a better place to get this job preparation state that from parsing machines system log? So is it possible to look somewhere when the "job prep done" event is fired and find out if the the preparation was successful or not?

Last point: I can post a copy of our audit hook script if it would be helpful but I'd have to anonimise it a bit first. We have spent a lot of time working on this and it forms a core part of our overall fax infrastructure. We'd be very happy to share it with the community if others will benefit from it.

Thanks

Dominique

On Tue, Jul 13, 2010 at 5:37 PM, Aidan Van Dyk <aidan@xxxxxxxx> wrote:

* Dominique dHotman <dominique@xxxxxxxxxxx> [100713 09:27]:

> Hi everyone,
>
> We have found what we think might be a bug but are not sure if the fax
> scheduler is behaving like this by design. First some background:

> The problem that we have found is that our audit hook is not getting all the
> JOB events and because of this we are recording the job as completed
> successfully instead of failed. And this causes all kinds of problems in
> our processing system.

Hm... The log shows that it *is* getting all the events. How are you
expecting to classify "failed/success" from the JOB/SEND events?
Unfortunately, there aren't "failed" or "success" events directly:
Event Class Mask Description

JOB_CREATE J 0x0001 job created
JOB_SUSPEND J 0x0002 job suspended
JOB_READY J 0x0004 job ready to send
JOB_SLEEP J 0x0008 job sleeping awaiting time-to-send
JOB_DEAD J 0x0010 job marked dead
JOB_PROCESS J 0x0020 job processed by scheduler
JOB_REAP J 0x0040 job corpus reaped
JOB_ACTIVE J 0x0080 job activated
JOB_REJECT J 0x0100 job rejected
JOB_KILL J 0x0200 job killed
JOB_BLOCKED J 0x0400 job blocked by other job
JOB_DELAYED J 0x0800 job delayed by tod restriction or similar
JOB_ALTERED¹ J 0x1000 job parameters altered
JOB_TIMEDOUT J 0x2000 job kill timer expired
JOB_PREP_BEGIN J 0x4000 job preparation started
JOB_PREP_END J 0x8000 job preparation finished

SEND_BEGIN S 0x0001 fax, send attempt started
SEND_CALL S 0x0002 fax, call placed
SEND_CONNECTED S 0x0004 fax, call answered by fax
SEND_PAGE S 0x0008 fax, page transmit done
SEND_DOC S 0x0010 fax, document transmit done
SEND_POLLRCVD S 0x0020 fax, document retrieved by poll operation
SEND_POLLDONE S 0x0040 fax, poll operation completed
SEND_END S 0x0080 fax, send attempt finished
SEND_REFORMAT S 0x0100 fax, job being reformatted
SEND_REQUEUE S 0x0200 fax, job requeued
SEND_DONE S 0x0400 fax, send job done

A normal failure would give you a a SEND_BEGIN/SEND_DONE at least, but so would
a normal success. The AuditHook is built on triggers, which signal events, but
don't transmit state... I'm guessing that when Sam designed them at first, he
assumed that people wanting job state would just examine the job. At least,
that's how I've use triggers/audit hooks.

But there is no event to signal which state the jos is in. Just that the job
is "dead", meaning it's in it's final state.. The usual means of inspecting
jobs is to watch the triggers to know an event happened, and then inspect the
job to see whatever state you want from it.

> Here is the listing of events (from /var/log/messages) for a failed job
> where this happened:
>
> Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: SUBMIT JOB 129182
> Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0001 "129182"
> Jul 13 09:39:49 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0004 "129182"
> Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0020 "129182"
> Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0080 "129182"
> Jul 13 09:39:50 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x4000 "129182"
> Jul 13 09:44:36 hylafax01 FaxQueuer[9835]: JOB 129182: CONVERT DOCUMENT: job time limit exceeded
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x8000 "129182"
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: NOTIFY: bin/notify "doneq/q129182" "format_failed" ""
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0010 "129182"
> Jul 13 09:44:58 hylafax01 FaxQueuer[9000]: HOOK CMD /usr/local/bin/ooba_aud_hook.sh JOB 0x0040 "129182"

So you have a similar set of events on a job that failed for fax session
reasons? With the exception of some SEND_* events, the hsould be the same,
right?

> So my question here is: is there a bug somewhere that is causing this "JOB
> 129182: CONVERT DOCUMENT: job time limit exceeded" not be passed to the
> Audit Hook or is this right?

I think it's "right"... There is no event for format failed. Only
JOB_PREP_BEGIN and JOB_PREP_END (again, the triggers are on events, not the
state of jobs).

> If this is working properly then how can the Audit Hook script that we have
> get notified that the job preparation phase has actually failed with the
> failure reason being "CONVERT DOCUMENT: job time limit exceeded"

I don't think audit hook alone can give you that... Just like it can't give yo
if the job was sent successfully or not either. I guess you could deduce that
since you got JOB_DEAD without any SEND_* attempts, it must have failed to
prepare correctly...

I wouldn't build a system *only* with triggers... In the systems I've built,
I've always used the triggers as a point when its time to check the state of a
job/fax/modem for what ever particular state I'm monitoring.

Even ignoring the format conversion failure, and just expecting normal fax
session errors, the triggers don't even provide the necessary means to classify
success or failure. I guess you could go try and deduce "success" by counting
the SEND_PAGE events associated with a job, and consider it a 'success' IIF the
count matches the "expected" count, but that's sound like more work than just
checking the job state directly in the first place.

a.

--
Aidan Van Dyk aidan@xxxxxxxx
Senior Software Developer +1 215 825-8700 x8103
iFAX Solutions, Inc. http://www.ifax.com/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iD8DBQFMPIgxuVxNPsxNPScRAqrnAKCqOG3drJiC6Wdws2oMr23hOFPKZgCcDujI
llC4kVnrNP3fDX3ivWyQvkk=
=L55s
-----END PGP SIGNATURE-----

Follow-Ups:
- Re: [hylafax-users] Audit hook not getting all Job events
  - From: Dominique dHotman
- Re: [hylafax-users] Audit hook not getting all Job events
  - From: Aidan Van Dyk

References:
- [hylafax-users] Audit hook not getting all Job events
  - From: Dominique dHotman
- Re: [hylafax-users] Audit hook not getting all Job events
  - From: Aidan Van Dyk

Prev by Date: Re: [hylafax-users] Audit hook not getting all Job events
Next by Date: Re: [hylafax-users] Audit hook not getting all Job events
Previous by thread: Re: [hylafax-users] Audit hook not getting all Job events
Next by thread: Re: [hylafax-users] Audit hook not getting all Job events
Index(es):
- Main
- Thread

Project hosted by iFAX Solutions