HylaFAX The world's most advanced open source fax server

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Large number of errors with Hylafax and ZyXEL Omni modem



Rob, Shuvam, et al,

First - let me state that the recent messages on the nature of faxing and
these problems with doing electronic faxing, particularly the mesage from
Dr. Pollack, to be the most interesting and educational things on this
list that I've seen go by in a long time.  That one email alone taught me
more about the internals of faxing than three months getting a tidbit here
and there from the various FAQs and stuff on the net.  Who do I make my
check out to?

On a second note, I'm getting ready to start spending time with our faxing
systems again, which will involve moving from a mere 3K a night on up, as
we have more and more clients who access our services.  So, anything I can
do to reduce the overhead of calls, etc, is of high importance!  In any
case, here's some numbers for a month's worth of faxes, across 8 USR Total
Control modems, which are the chassis version of the high-end couriers.  I
also included a PERL version of the gawk script for those who don't speak
gawk that well.

Modem: total
Total faxes sent:                                            48263

Error                                                           No      % Fl
---------------------------------------
Normal and proper end of connection                            510       1.1
Unable to configure modem for fax use                            2       0.0
No response to EOP repeated 3 times                            206       0.4
Busy signal detected                                          5250      10.9
Can not open document file                                      98       0.2
Job aborted by user                                            108       0.2
DCS sent 3 times without response; Giving up after 3 attempt     2       0.0
RSPREC error/got DCN                                           233       0.5
No carrier detected                                           6601      13.7
No local dialtone                                               19       0.0
DTE to DCE data underflow                                        9       0.0
DCS sent 3 times without response                             1741       3.6
No response to MPS repeated 3 times                              2       0.0
Unspecified Transmit Phase B error                              18       0.0
Unknown problem (check modem power)                              2       0.0
No answer (T.30 T1 timeout)                                    325       0.7
---------------------------------------
Total                                                            0      31.3

For me, the error ratio, once you remove the 10.9 of busy and 13.7 of no
carrier, is down to 6.7%.  The errors I am worried about, that I am
interested in tracking down and fixing, are those that make up the 5% of
what appear to be errors in the faxmodem protocol:

-  DCS errors - appear to be related to random modem failures
-  Normal and proper end of connection - is actually a different class
   of error, I don't know why it's reported like this.

and one that looks strongly like a bug in Hylafax's non-fax code.

-  Can not open document file - looks like some sort of locking/race
   condition, having to do with jobs that share the same document file.

Oddly enough, on the fax errors, often times that number will work on
another day or another try.  Why?  Who knows...line noise?  Phase of the
moon?  Composition of document?

David.

On Thu, 2 Apr 1998, Robert Colquhoun wrote:

> Hi Shuvam,
> It's hard to make sense of the numbers below(at least for me), i've got a
> small script "errorstats" which parses xferlog for errors grouping by type.
> 
> Download from http://www.trump.net.au/~rjc/hylafax/
> 
> For some numbers - we are using USR Couriers for sending and receiving.
> >From the statistics gathered by my errorstats script no particular modem
> error is occuring more than 20 times per thousand fax sent.
> Roughly:
> 	Transmit Phase B ~ 13/1000 (The modems are operating in Class 2.0)
> 	Unknown Modem Problem ~ 17/1000
> 	T1 Timeout ~ 6/1000
> 	DCS Errors ~ 7/1000
> 	EOP Errors ~4/1000
> A number of other errors each less than 3 per 1000 fax sent
> 
> The negative retrain problem is occuring occasionally, it is worse than
> other fax errors in that you always get a complaint from the other end
> whereas with the other modems errors fail silently.
> 
> I've looked quickly into the source code and i think i can see where to
> change the code to stop the retransmit _but_ i am not very keen to do this.
> Firstly i am not sure this is the right thing to do and i dont currently
> have a good test setup to check whether this is done correctly.
> 
> If i was in your situation i would 1) get some accurate data on how many
> calls are failing and why 2) double check your configuration 3) get the
> telephone company to check all your phone lines.
> 
> - Robert
> 
> At 07:53 2/04/98 +0530, Shuvam Misra wrote:
> >This is in continuation to earlier mails from myself and others on this
> >topic. I'm running a TPC cell using Linux, Hylafax 4.0pl2, and a ZyXEL
> >Omni O288S, as mentioned before. This is the statistical count of the
> >Hylafax transfer figures for the last day:
> >
> >     52	0
> >     39	2
> >      6	3
> >     13	4
> >      1	5
> >      1	6
> >      2	7
> >      1	9
> >      1	10
> >      1	11
> >      1	12
> >      1	13
> >      1	14
> 
> >
> >The first field is count or frequency, and the second is the number of
> >people who sent a total number of pages as shown. So "52 0" means that
> >52 people queued one or more jobs and could send only 0 pages. The
> >second line is "39 2" which means that 39 people sent a total of 2 pages
> >each. (This is the minimum length of most TPC jobs, because they have
> >a cover page, and a short message page after that. This means that 39
> >people sent one job each.) And so on.
> >
> >The total is 121 people, who have tried sending faxes out. Out of them
> >52 people couldn't send out anything at all, as we can see. This is
> >unacceptable by any standards, I would think. Harald Pollack has said that
> >any public fax broadcaster will encounter about 40% failures. Another
> >gentleman has been using Multitech modems and has reported 5 to 10%
> >failures only. I am reporting about 45% faxes failed, which means perhaps
> >60 to 80% of alls calls made, failing. (This is because each fax fails
> >only after it has made several attempts to go through.)
> >
> >Another figure shows that 242 pages were transmitted successfully,
> >but 313 error-calls were recorded. These errors exclude the ones where
> >carrier couldn't be established. This means that each of these 313 calls
> >were completed at the Telco level, and so will be billed to me. Is this
> >typical? I am sure all TPC operators who use Hylafax will be getting
> >their daily report from "faxcron". Can they send me a five line email
> >just cutting out the day's "Total" line from yesterday's report, together
> >with a line giving which area they operate in?
> >
> >Most of us run TPC as a voluntary service. We need to cut wasteful
> >expenditure.  We use Hylafax, and we need the Hylafax community to
> >come together to figure out whether there is indeed any problem in
> >the software which might be increasing errors, as Harald's and others'
> >mails have hinted. I have not seen much discussion on the Hylafax
> >mailing lists about these fundamental aspects of its behaviour; most of
> >the discussion revolves around new setup problems, cover pages, and the
> >like, which are important but are not of use in this specific case.
> >
> >Is Sam Leffler still active on this list or with this software? Is Matt
> >Apitz or David Woolley or anyone else in a position to take up these
> >issues and bring down the errors, specially with ZyXEL modems? One set
> >of ZyXEL users who don't use Hylafax, swear by its fax implementation.
> >Should all Hylafax users then switch out of ZyXEL to Multitech? Or
> >should all TPC operators switch out of Hylafax?
> >
> >Hoping for more responses on this list,
> >
> >regards,
> >Shuvam
> > 
> 
> 


#!/usr/local/bin/perl
#
#  Adapted from the GAWK script....
#
#  -----

require 5.000;

use Text::ParseWords;
use Getopt::Std;

&getopts("t");

while(<>) {
    chomp;
    @array = quotewords("\t", 0, $_);
    $array[3] = "total" if $opt_t;
    $total{$array[3]}++;
    if ( $array[10] == 0 ) {
	$array[13] =~ s/; too many .*//;
	$errcount->{$array[3]}->{$array[13]}++;
    }
}

foreach $modem ( keys %total ) {
    print "Modem: $modem\n";
    printf "%-68.68s %6i\n\n", "Total faxes sent:", $total{$modem};
    printf "%-60.60s %5s %10s\n","Error", "No","% Fl";
    print "---------------------------------------\n";

    $tot = 0;
    foreach $error ( keys %{$errcount->{$modem}} ) {
	if ( $errcount->{$modem}->{$error} > 0 ) {
	    $rate = $errcount->{$modem}->{$error} * 100 / $total{$modem};
	    $tot += $errcount->{$modem}->{$error};
	    printf "%-60.60s %5i %8.1f\n", $error, $errcount->{$modem}->{$error}, $rate;
	}
    }

    print "---------------------------------------\n";
    printf "%-60.60s %5i %8.1f\n", "Total", $total, $tot * 100 / $total{$modem};
    printf "\n";
}
X-Mailer: exmh version 2.0.1 12/23/97
To: Matthias.Apitz@SOFTCON.de (Matthias Apitz)
cc: shuvam@spacenetindia.com, flexfax@sgi.com, harald.pollack@omv.co.at,
        tpc-oper@info.tpc.int
Subject: Re: TPC-OPER: Re: flexfax: Large number of errors with Hylafax and 
 ZyXEL Omni modem
Date: Thu, 02 Apr 1998 17:15:18 +0200
From: "Mr. Arlington Hewes" <tpcadmin@info.tpc.int>
Sender: owner-flexfax@celestial.com


>>>>> On Thu, 2 Apr 1998, "MA" == Matthias.Apitz@SOFTCON.de wrote:

  MA> I can't imagine that your problems mostly has to do with the HylaFAX
  MA> software. You should try to check the problems in the log files. I can't
  MA> say anything about the ZyXEL Omni O288S because I don't have such a
  MA> device. If someone can offer a device for testing I will run tests.

  MA> I run two modems in my company; one ZyXEL 1496EGP fw. 6.13 and one ZyXEL
  MA> Elite 2864D. I did a test last month for 172 destinations and 4 pages to
  MA> each destination: 165 destinations were successfull and only 7 caused
  MA> problems.

Thanks for replying Matthias, but why won't anyone address the very technical points raised by Mr. Pollack?

We have someone stating in no uncertain terms that HylaFAX's response to 'RTN negative' is wrong. Most respondents on-list refuse to address this problem, and those who do say anything about it do not feel they have the authority to know. 

This problem is particularly damaging since the recipient ends up with multiple pages of the fax (this should not happen!). Suppose your cell is setup to try sending 4 times (MaxDials: 4) . . . my experience is that it will send the first page, get RTN neg and retry the page, and again, and again after sending the page a third time and so disconnects. It will retry three more times, for a total of 12 coverpages to the recipient, and none of the subsequent pages.

I have seen just as many problems with RTN at EOP in Sunnyvale Calif - this is _not_ a line noise issue, it's fairly specific to the ZyXel/HylaFAX combo. I generally recommend people ditch their ZyXELs and but multitechs, but perhaps the time has come to address the problem in a more productive manner?

It seems important to me that we seek an authority capable of saying how HylaFAX should behave. If HylaFAX is non ITU compliant, this should be fixed.

Perhaps someone could raise Sam on the blower? Or anyone with friends in the ITU?

-DPN




Project hosted by iFAX Solutions