HylaFAX The world's most advanced open source fax server

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [hylafax-users] faxgetty <defunct> - 'Extra' FaxGetty



Hans Strickler wrote:

4588 ? 00:00:00 faxgetty <defunct>


Several of the HylaFAX processes fork in order to allow for some kind of action to occur as needed, such as flow-control or image decoding, without interrupting it in order to do other things.


"<defunct>" processes are also known as "zombies", and exist because the parent thread is still running, while the child process has exited, but the parent thread has not yet acknowledged the child's exit and cleaned it up. Normally this situation shouldn't give you any errors. If you do some googling on it you'll find that a lot of software over time has left zombie processes. Now HylaFAX intends to clean up after itself, and it does. But in your case the existence of the zombie process may help us figure out what's wrong.

Often we see “faxgetty <defunct>” when we arrive in the morning. ALWAYS from the same sender. This after we ‘Grep’ “Fax” I notice that there is a 3^rd faxgetty (which is Defunct).

We have ruled out this being line noise by switching the phone lines between the 2 modems (ttyS14 USED to do it) in the Box. This is a send/receive error.

Sometimes we get errors, sometimes it will not give errors. (Sender broadcasting multiple faxes overnight). Sometimes it will happen on the 1^st or 2^nd fax, other times after the 10^th . Simply must be a send-receive issue. Sender SAYS nothing has changed on their end w/ their fax-server and we have not changed anything on ours, yet, this happened kind of ‘out of the blue’

Solution so far has been to kill the faxgetty instances and re-start them. What could be causing these?


From the rest of your e-mail you seem to be saying that faxgetty will stop answering calls on these situations. From the log you provide that shows a clear completion of fax protocol, but I do not see the usual "RECV FAX" log entries at the bottom (including the one detailing the faxrcvd command), so that gives us further clues. In fact, it points the finger at the last two forks in faxd/FaxRecv.c++.


In FaxRecv.c++ we have three non-priority logging and document-processing events. They are run in the background in child processes because they're not prioritized, but the three of them must occur in order. The three events are: notifyRecvBegun, notifyPageRecvd, and notifyDocumentRecvd, and they must occur in that order.

In looking at that section of code again, it appears to be doing something a bit shady, and that may be where the problem lies. Please try the attached patch to see if it resolves things for you. If you need a tarball or an RedHat/Fedora RPM then let me know.

I'm not exactly sure why this would only happen to you with one particular sender, though. Maybe it's just bad luck with that sender. Maybe that sender is doing things in a particular way that makes the problem more likely to occur. Anyway, please try the patch.

Thanks,

Lee.

--- hylafax.orig/faxd/FaxRecv.c++	2006-01-16 07:13:56.000000000 -0800
+++ hylafax/faxd/FaxRecv.c++	2006-01-26 09:27:29.622047904 -0800
@@ -69,7 +69,7 @@
 	     * If the system is busy then notifyRecvBegun may not return
 	     * quickly.  Thus we run it in a child process and move on.
 	     */
-	    waitNotifyPid = fork();
+	    waitNotifyPid = fork();	// waitNotifyPid keeps the notifies ordered
 	    switch (waitNotifyPid) {
 		case 0:
 		    // NB: partially fill in info for notification call
@@ -219,11 +219,10 @@
 	 * If syslog is busy then notifyDocumentRecvd may not return
 	 * quickly.  Thus we run it in a child process and move on.
 	 */
-	pid_t pid = waitNotifyPid;
+	if (waitNotifyPid > 0) (void) Sys::waitpid(waitNotifyPid);	// keep the notifies ordered
 	waitNotifyPid = fork();
 	switch (waitNotifyPid) {
 	    case 0:
-		if (pid > 0) (void) Sys::waitpid(pid);
 		notifyDocumentRecvd(info);
 		sleep(1);		// XXX give parent time
 		exit(0);
@@ -284,11 +283,10 @@
 	 * Thus we run it in a child process and move on.  Timestamps
 	 * in syslog cannot be expected to have exact precision anyway.
 	 */
-	pid_t pid = waitNotifyPid;
+	if (waitNotifyPid > 0) (void) Sys::waitpid(waitNotifyPid);	// keep the notifies ordered
 	waitNotifyPid = fork();
 	switch (waitNotifyPid) {
 	    case 0:
-		if (pid > 0) (void) Sys::waitpid(pid);
 		notifyPageRecvd(tif, info, ppm);
 		sleep(1);		// XXX give parent time
 		exit(0);



Project hosted by iFAX Solutions