Re: [hylafax-users] Document conversion failed

To: hylafax-users <hylafax-users@xxxxxxxxxxx>
Subject: Re: [hylafax-users] Document conversion failed
From: Lee Howard <faxguy@xxxxxxxxxxxxxxxx>
Date: Thu, 13 Sep 2007 22:25:47 -0700

Darren Nickerson wrote:

"Lee Howard" <faxguy@xxxxxxxxxxxxxxxx> wrote:

Darren Nickerson wrote:

Whereas HylaFAX used to bog down with faxq consuming 99% CPU and jobs submitting _VERY_ slowly, it should now be pretty zippy.

I can't say that I recall the last time I saw faxq consume 99% CPU (or anything close to even half of it)... and I routinely stuff between 500 or 1000 jobs into the outbound queue... this without the major refactoring.

I guess maybe 500 or 1000 jobs isn't sufficiently large to uncover the problem these days. 10,000 jobs is not a contrived scenario - we work with a lot of people who submit that many jobs to their servers in one go.

Of course, 10,000 jobs should be no sweat.

I first introduced the problem in 2002 because it was even worse back then:

http://bugs.hylafax.org/show_bug.cgi?id=344

Yes, it was very bad back then.

although there was some improvement from the work in bug 667, we really just moved the goalposts a bit.

In my estimation that collateral work done in 667 was a major improvement, vastly improving performance. Yes, by no means was it the end-all to the problem, but it was significant enough for you at the time to close 344 due to it. Certainly I would not characterize post-667 performance as "bogged down" or "submitting _VERY_ slowly". Indeed, there have been performance improvements yet later in HylaFAX+... and undoubtedly there are more yet to come.

There was still a fundamental problem queueing jobs with long queues and a more comprehensive solution was needed. It really was an algorithmic problem ... we were limited by the way faxq was written and unless things were restructured we'd only be able to acheive incremental improvements.

What, exactly, was that fundamental algorithmic problem?

(The long quote below not trimmed for contextual meaning for my comments below it...)

Here's some rough numbers for the time taken to queue 10,000 jobs in one go:
HylaFAX+-5.1.9: 07:03:58
HylaFAX-4.3.5: 03:06:18
HylaFAX-4.4 CVS: 00:02:52
That's 7 hours for HylaFAX+, 3 hours for HylaFAX-4.3.x, and about 3 minutes for HylaFAX-4.4 CVS commit 2d0b444c473ce4d7b0e5bd8da14725e2923f16e0).

The command we ran was:

sendfax -h test@localhost -n -t 1 -T 1 -z diallist-10000.txt somefile.pdf

and the diallist file contained 10,000 unique numbers.

Some settings from faxq config:
LogFacility:  local1
CountryCode:  1
AreaCode:  415
LongDistancePrefix: 1
InternationalPrefix: 011
DialStringRules: etc/dialrules
SendFaxCmd:  /usr/sbin/faxsend
Use2D:   yes
DestControls:  etc/destcontrols
PollLockWait:  1
MaxConcurrentCalls: 100
MaxBatchJobs:  2
MaxConcurrentCalls: 1
NotifyCmd:  /bin/true
ServerTracing:  0
I know that with benchmarks the devil is in the details, and so I'm the first to consider them suspect. These may be wrong. This was on a dual AMD Athlon box with a Raid5 array (which was running in degraded mode due to a failed disk so reduced I/O performance). We can repeat the test and gather more details if anyone cares. I find the above numbers a little suspect myself, but we repeated the tests twice. I'm particularly concerned by the slowness of HylaFAX+ versus 4.3.5 - that would seem unlikely to me, and should probably be reconfirmed. But we definitely did expect 4.4.x to sing compared to 4.3.x.

How about you run similar tests, show us the results you get, and if they're dramatically different we can dig into what we may have done wrong here on our side?

Well, I'm testing with a slightly less-beefy machine here. :-) It's a PII-400 with 224MB RAM and a rather slow QUANTUM BIGFOOT TS8.4A, ATA DISK drive. I'm running pretty-much stock-updated Fedora Core 2 on it, and I've not messed with any hardware tuning at all.

The config file that I started with looked like this:

LogFacility:            daemon
CountryCode:            1
AreaCode:               360
LongDistancePrefix:     1
InternationalPrefix:    011
DialStringRules:        etc/dialrules
ServerTracing:          0
MaxBatchJobs:           2
NotifyCmd:              /bin/true

Also, in hfaxd.conf I likewise have ServerTracing: 0. And although the system has modems, I have disabled all of the modems (by stopping all faxgettys) because, as I'll discuss later, they would play a significant factor. And you have an obsolete DestControls and two disagreeing MaxConcurrentCalls in your config that I have ignored.

In my test I am using the command "echo test | sendfax -n -t 1 -T 1 -z /tmp/numbers > /dev/null". The /tmp/numbers file contains the numbers 1 through 10000 on separate lines. I'm running current HylaFAX+ CVS code (marked as 5.1.9) although nothing has changed in HylaFAX+ for a long time that would affect this.

With MaxBatchJobs: 2 the command completes in 25:47. That's a little less than 26 minutes... on my more than antiquated poorly-endowed dev system.

With MaxBatchJobs: 1 (which is probably where it would be set for a 10,000 job broadcast anyway) the command completes in 17:07. That's roughly 10 jobs per second. Watching CPU usage of the faxq process during that time shows it usually around 10-20% CPU... although there are very brief peaks above that.

Furthermore, an astute user would probably not submit 10,000 jobs for immediate transmission. Why? Well, notice that when I use the command:

echo test | sendfax -a "00:00 Jan 1" -n -t 1 -T 1 -z /tmp/numbers > /dev/null

... the command completes in 11:20, or 33% faster, nearly 15 jobs per second on my lousy 9 year-old dev box. I'm just guessing, but I suspect that if I were to have done the same on a modern server-grade system (64-bit, SMP, 3GHz CPU; 2GB RAM; SATA hard drives) I'd expect that time to be noticeably less than 2 minutes.

faxq is event-driven. So as you reduce the overall effect of any faxq-related event you will speed up functions that repeatedly trigger events... such as submitting jobs. This is why the future time-to-send has an effect on performance... because those jobs then aren't actively triggering queue events while the job group is being submitted. Similarly, if faxgettys were running then a whole host of other things become involved including document conversions, modem initializations, faxsend calls and return handling, etc. And, depending on the prioritization that we want to give to those events they will have an effect on overall performance of other events.

A lot of this comes down to how we interact with the client's instructions. If a client submits a job... do we address it immediately? And if we do, then there will be a cost that gets paid by that user or by any other user on the system at the time. If we don't address it immediately then how long do we wait before we taint the perception of responsiveness in that respect? Certainly, I suppose one could code the system to inhale and swallow jobs just as fast as possible... at the cost of everything else... but I think that that there is a certain balance that needs to be acheived between the various users and functions of the server.

... anyway... I'm not sure what it is that happened in your tests to cause the job submission to run so slowly, but I hope that I've given you some useful pointers.

And, in the end, I'm not really a big fan of how broadcast jobs are handled anyway. I mean, really, do we truly want 10,000+ identical files (or links) in the docq? Do we really, truly want to fire-off 10,000+ job notifications to the user (or inhibit job notifications altogether)? I'm more and more of the opinion that for broadcast jobs there should be but one copy of the source document in the docq, but one sendq file, and but one notification message when the entire job group is finished. I think that entire job group submission time would then maybe take a second. :-)

Thanks,

Lee.


____________________ HylaFAX(tm) Users Mailing List _______________________
 To subscribe/unsubscribe, click http://lists.hylafax.org/cgi-bin/lsg2.cgi
On UNIX: mail -s unsubscribe hylafax-users-request@xxxxxxxxxxx < /dev/null
 *To learn about commercial HylaFAX(tm) support, mail sales@xxxxxxxxx*

Follow-Ups:
- Re: [hylafax-users] Document conversion failed
  - From: Darren Nickerson

References:
- [hylafax-users] Document conversion failed
  - From: James Rich
- Re: [hylafax-users] Document conversion failed
  - From: Darren Nickerson
- Re: [hylafax-users] Document conversion failed
  - From: Lee Howard
- Re: [hylafax-users] Document conversion failed
  - From: Darren Nickerson

Prev by Date: Re: [hylafax-users] How to increase the speed & quality (send fax)?
Next by Date: Re: [hylafax-users] faxmail & Could not reopen converted doc...
Previous by thread: Re: [hylafax-users] Document conversion failed
Next by thread: Re: [hylafax-users] Document conversion failed
Index(es):
- Main
- Thread

Project hosted by iFAX Solutions