HylaFAX The world's most advanced open source fax server

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [hylafax-users] Modem gets stuck on some incoming faxes



On 2003.10.20 11:19 Michael Evans wrote:
I've got HylaFAX installed on a SuSE 7.2 system with a Zoom 2920 internal faxmodem. The HylaFAX version is 4.6 from cvs, patched to enable ECM. Sending faxes (via WHFC) works, although sometimes slowly due to the ECM. Some faxes are received properly, as well. However, some incoming faxes cause the modem to get "stuck" - it holds the phone line infinitely, so that any new callers hear a busy signal. Meanwhile, every 3 seconds or so a "Timeout awaiting synchronization sequence" message gets written to the log file, resulting in some enormous logs. Getting the modem to hang up the phone requires stopping faxgetty, stopping hylafax, then restarting faxgetty and hylafax.

Not that you're ever *supposed* to need to do it, but restarting faxgetty probably would be sufficient.


Whenever this happens, some very weird stuff is written to the log file, namely:

Oct 20 09:04:29.00: [ 3583]: Timeout awaiting synchronization sequence
Oct 20 09:04:29.00: [ 3583]: Bad HDLC terminating flag received.
Oct 20 09:04:32.00: [ 3583]: Timeout awaiting synchronization sequence
Oct 20 09:04:32.00: [ 3583]: Bad HDLC terminating flag received.
Oct 20 09:04:35.00: [ 3583]: Timeout awaiting synchronization sequence
Oct 20 09:04:35.00: [ 3583]: Bad HDLC terminating flag received.
Oct 20 09:04:38.00: [ 3583]: Timeout awaiting synchronization sequence
Oct 20 09:04:38.00: [ 3583]: Bad HDLC terminating flag received.
Oct 20 09:04:41.00: [ 3583]: Timeout awaiting synchronization sequence
Oct 20 09:05:16.52: [ 3583]: --> [10:NO CARRIER]

Something's interfering with the communication. Since we're all the way down to 2400 bps, this isn't surprising.


Oct 20 09:05:16.52: [ 3583]: <-- [9:AT+FRH=3\r]
Oct 20 09:05:16.61: [ 3583]: --> [7:CONNECT]
Oct 20 09:05:17.80: [ 3583]: --> [2:OK]
Oct 20 09:05:17.80: [ 3583]: RECV recv PPS (partial page signal)
Oct 20 09:05:17.80: [ 3583]: RECV recv EOP (no more pages or documents)
Oct 20 09:05:17.80: [ 3583]: RECV received 105 frames of block 1 of page 3
Oct 20 09:05:17.80: [ 3583]: <-- [9:AT+FRS=7\r]
Oct 20 09:05:17.88: [ 3583]: --> [2:OK]
Oct 20 09:05:17.88: [ 3583]: <-- [9:AT+FTH=3\r]
Oct 20 09:05:18.74: [ 3583]: --> [7:CONNECT]
Oct 20 09:05:18.74: [ 3583]: <-- data [35]
Oct 20 09:05:18.74: [ 3583]: <-- data [2]
Oct 20 09:05:20.06: [ 3583]: --> [2:OK]
Oct 20 09:05:20.06: [ 3583]: RECV send PPR (partial page request)
Oct 20 09:05:20.06: [ 3583]: <-- [10:AT+FRM=24\r]
Oct 20 09:05:21.90: [ 3583]: --> [7:CONNECT]
Oct 20 09:05:24.00: [ 3583]: Timeout awaiting synchronization sequence
Oct 20 09:05:24.00: [ 3583]: <-- [10:AT+FRM=24\r]
Oct 20 09:05:24.11: [ 3583]: --> [750:ÿÿÿûùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùù

This here is actual output from the modem itself in response to the AT+FRM=24 command. It indicates some kind of error occurred with the modem (hardware problem/firmware bug).


ùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùù
ùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùù
ùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùù
ùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùù
ùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùù
ùùùùùùùùùùùùùùùù}`°P4"Þ¬ÀhPàì

At this point I am not certain whether this is still coming from the modem or elsewhere, but I suspect it's garbage coming from the modem as it resets itself due to the error above. We see this with other modems when they reset while the tty is reading from it.


°}oçí"ñÎ;Ø8lÎnéu·tIK{&
Þt4rÇ Å Ã s ãô\!.îêrÆ7÷êXGÊ:x0ÔHgLH ¹v5Õ¦ëK{&Þ
;fb%Bb
8""·¦èLä
ú¾/À:}À4Ü8GÁç0ïfCBË
-RE\æ¦ëK{&Þtb""""rSÆÄÀT0
<¬GºG

At this point the garbage from the modem seems to have triggered a memory dump via the HylaFAX session logger from the system somehow... and that's why we get this...


--More--
Most commands optionally preceded by integer argument k. Defaults in brackets.
Star (*) indicates argument becomes new default.
-------------------------------------------------------------------------------
<space> Display next k lines of text [current screen size]
z Display next k lines of text [current screen size]*
<return> Display next k lines of text [1]*
d or ctrl-D Scroll k lines [current scroll size, initially 11]*
q or Q or <interrupt> Exit from more
s Skip forward k lines of text [1]
f Skip forward k screenfuls of text [1]
b or ctrl-B Skip backwards k screenfuls of text [1]
' Go to place where previous search started
= Display current line number
/<regular expression> Search for kth occurrence of regular expression [1]
n Search for kth occurrence of last r.e [1]
!<cmd> or :!<cmd> Execute <cmd> in a subshell
v Start up /usr/bin/vi at current line
ctrl-L Redraw screen
:n Go to kth next file [1]
:p Go to kth previous file [1]
:f Display current file name and line number
. Repeat previous command
-------------------------------------------------------------------------------
Ú6ÈG{&ÞtÄ
ôÈ
Ô]

The reception attempt should ideally end here, since it's irrecoverable from this point on.


Oct 20 09:05:24.11: [ 3583]: <-- [10:AT+FRM=24\r]
Oct 20 09:05:24.11: [ 3583]: --> [2:OK]

The command above resulted in "OK", meaning it wasn't interpreted but was merely a call to return to command-mode. The OK means we're back in command mode.


Oct 20 09:05:24.11: [ 3583]: <-- [10:AT+FRM=24\r]
Oct 20 09:05:24.22: [ 3583]: --> [7:CONNECT]

This would indicate that the remote is still on-line... somehow...


Oct 20 09:05:27.00: [ 3583]: Timeout awaiting synchronization sequence
...

Once these "Timeouts" start, they go on indefinitely, until I intervene by stopping faxgetty and hylafax:

Oct 20 09:31:30.00: [ 3583]: Timeout awaiting synchronization sequence
Oct 20 09:31:30.00: [ 3583]: <-- [10:AT+FRM=24\r]
Oct 20 09:31:30.00: [ 3583]: --> [2:]
Oct 20 09:31:30.00: [ 3583]: <-- [10:AT+FRM=24\r]
Oct 20 09:31:30.00: [ 3583]: --> [10:NO CARRIER]

After 25 minutes the remote is certainly gone, so these are not surprising...


Oct 20 09:31:30.00: [ 3583]: <-- [10:AT+FRM=24\r]
Oct 20 09:31:30.11: [ 3583]: --> [7:CONNECT]

But this one is surprising. Seems to be another firmware bug, responding CONNECT when no carrier is present.


Oct 20 09:31:33.00: [ 3583]: Timeout awaiting synchronization sequence
Oct 20 09:31:33.00: [ 3583]: <-- [10:AT+FRM=24\r]
Oct 20 09:31:33.00: [ 3583]: --> [2:]
Oct 20 09:31:33.00: [ 3583]: <-- [10:AT+FRM=24\r]
Oct 20 09:31:33.00: [ 3583]: --> [10:NO CARRIER]
Oct 20 09:31:33.00: [ 3583]: <-- [10:AT+FRM=24\r]
Oct 20 09:31:33.11: [ 3583]: --> [7:CONNECT]
Oct 20 09:31:36.00: [ 3583]: Timeout awaiting synchronization sequence

The attached patch is an error-handling fix and should fix all of the poor handling of the problematic responses above. Please test and lets see how it goes. Seems to not inhibit normal behavior here.


I don't know how what appears to be help instructions for vi get written to a hylafax log, but there they are.

The good news is that I configured HylaFAX to convert incoming faxes to PDF files and send them as e-mail attachments to our receptionist, and when it's able to receive a fax properly (about 50% of the time), this part works perfectly.

50% is a very bad success ratio.


Any ideas, though, what the problem could be that prevents successful fax reception the other 50% of the time?

Do 50% of the faxes result in logs like this one (with faxgetty hanging)? Or can you show us a log of the usual problem type?


Thanks.

Lee.
diff -Nru hylafax.orig/faxd/Class1Recv.c++ hylafax/faxd/Class1Recv.c++
--- hylafax.orig/faxd/Class1Recv.c++	Thu Oct 16 23:09:34 2003
+++ hylafax/faxd/Class1Recv.c++	Mon Oct 20 12:48:11 2003
@@ -445,8 +445,9 @@
 	     * high speed carrier traffic other than the TCF).
 	     */
 	    fxStr rmCmd(curcap[HasShortTraining(curcap)].value, rmCmdFmt);
+	    u_short attempts = 0;
 	    ATResponse rmResponse = AT_NOTHING;
-	    while (rmResponse != AT_CONNECT && rmResponse != AT_TIMEOUT && rmResponse != AT_ERROR) {
+	    while ((rmResponse == AT_NOTHING || rmResponse == AT_FCERROR) && attempts++ < 20) {
 		(void) atCmd(rmCmd, AT_NOTHING);
 		rmResponse = atResponse(rbuf, conf.t2Timer);
 	    }
@@ -726,6 +727,7 @@
 	u_short rcpcnt = 0;
 	u_short pprcnt = 0;
 	u_int fcount = 0;
+	u_short syncattempts = 0;
 	bool blockgood = false;
 	do {
 	    sentERR = false;
@@ -910,18 +912,25 @@
 		    free(block);
 		    return (false);
 		}
+	    } else {
+		if (syncattempts++ > 20) {
+		    emsg = "Cannot synchronize ECM frame reception.";
+		    free(block);
+		    return(false);
+		}
 	    }
 	    if (! blockgood) {		// back to high-speed carrier
 		setInputBuffering(true);
 		if (flowControl == FLOW_XONXOFF)
 		    (void) setXONXOFF(FLOW_NONE, FLOW_XONXOFF, ACT_FLUSH);
 		fxStr rmCmd(curcap[HasShortTraining(curcap)].value, rmCmdFmt);
+		u_short attempts = 0;
 		ATResponse response = AT_NOTHING;
-		while (response != AT_CONNECT && response != AT_TIMEOUT && response != AT_ERROR) {
+		while ((response == AT_NOTHING || response == AT_FCERROR) && attempts++ < 20) {
 		    (void) atCmd(rmCmd, AT_NOTHING);
 		    response = atResponse(rbuf, conf.t2Timer);
 		}
-		if (response == AT_TIMEOUT || response == AT_ERROR) {
+		if (response != AT_CONNECT) {
 		    free(block);
 		    return (false);
 		}
@@ -948,12 +957,13 @@
 	    if (flowControl == FLOW_XONXOFF)
 		(void) setXONXOFF(FLOW_NONE, FLOW_XONXOFF, ACT_FLUSH);
 	    fxStr rmCmd(curcap[HasShortTraining(curcap)].value, rmCmdFmt);
+	    u_short attempts = 0;
 	    ATResponse response = AT_NOTHING;
-	    while (response != AT_CONNECT && response != AT_TIMEOUT && response != AT_ERROR) {
+	    while ((response == AT_NOTHING || response == AT_FCERROR) && attempts++ < 20) {
 		(void) atCmd(rmCmd, AT_NOTHING);
 		response = atResponse(rbuf, conf.t2Timer);
 	    }
-	    if (response == AT_TIMEOUT || response == AT_ERROR) {
+	    if (response != AT_CONNECT) {
 		free(block);
 		return (false);
 	    }



Project hosted by iFAX Solutions