HylaFAX The world's most advanced open source fax server

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

faxq maxing out CPU - let's crack this!!!!!




Folks,

As may of you know, the latest release of Sam Leffler's libtiff (now 
maintained by the great people at http://www.libtiff.org) broke HylaFAX(tm). 
Not too much of a problem for us until RedHat decided to adopt the new libtiff 
and release update RPMs for RedHat-6.1 (Cartman). Now we're in deep deep 
trouble, and it's essential that we resolve this issue - as far as I know it's 
the only MAJOR problem holding back a release of hylafax-4.1-beta3.

I sat down today and recreated the bug. It's easy to do . . . just install a 
build of the current HylaFAX CVS onto vanilla RedHat, then upgrade libtiff to:

[darren@ducent darren]$ rpm -qa | egrep tiff
libtiff-3.5.4-1
libtiff-devel-3.5.4-1

I set strace up on each of the relevant processes, and then submitted a fax, 
and watched. Sure enough, the fax was submitted and processing began (sendfax 
exited cleanly) and faxq ended up maxing out CPU, leaving hylafax completely 
hung and unresponsive, yet not making any syscalls. In fact, once the hang was 
initiated, the strace output did not grow at all.

Here's the last bit of each strace:

[darren@ducent darren]$ tail /tmp/*strace.out
==> /tmp/faxgetty_strace.out <==
18934 ioctl(8, SNDCTL_TMR_START, {B19200 -opost -isig -icanon -echo ...}) = 0
18934 ioctl(8, TCGETS, {B19200 -opost -isig -icanon -echo ...}) = 0
18934 flock(4, LOCK_EX)                 = 0
18934 _llseek(4, 0, [0], SEEK_SET)      = 0
18934 write(4, "Running and idle\n", 17) = 17
18934 _llseek(4, 0, [17], SEEK_CUR)     = 0
18934 ftruncate(4, 17)                  = 0
18934 flock(4, LOCK_UN)                 = 0
18934 write(6, "+ttyS0:RP3fd5e7ff:ff\0", 21) = 21
18934 select(9, [5 8], [], [8], NULL
==> /tmp/faxq_strace.out <==
18940 fstat(9, {st_mode=S_IFREG|0600, st_size=10910, ...}) = 0
18940 mmap(0, 10910, PROT_READ, MAP_SHARED, 9, 0) = 0x402fe000
18940 munmap(0x402fe000, 10910)         = 0
18940 close(9)                          = 0
18940 close(8)                          = 0
18940 open("docq/doc6.cover;31", O_RDONLY) = 8
18940 read(8, "II*\0\10\0\0\0", 8)      = 8
18940 fstat(8, {st_mode=S_IFREG|0600, st_size=25761, ...}) = 0
18940 mmap(0, 25761, PROT_READ, MAP_SHARED, 8, 0) = 0x402fe000
18940 brk(0x8088000)                    = 0x8088000

==> /tmp/hfaxd_strace.out <==
18936 close(3)                          = 0
18936 unlink("client/18936")            = 0
18936 _exit(1)                          = ?
18916 <... select resumed> )            = ? ERESTARTNOHAND (To be restarted)
18916 --- SIGCHLD (Child exited) ---
18916 wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 1], WNOHANG, NULL) = 18936
18916 wait4(-1, 0xbffff848, WNOHANG, NULL) = -1 ECHILD (No child processes)
18916 sigreturn()                       = ? (mask now [])
18916 rt_sigaction(SIGCHLD, {SIG_DFL}, NULL, 8) = 0
18916 select(5, [4], [], [], NULL
==> /tmp/sendfax_strace.out <==
18406 munmap(0x40018000, 4096)          = 0
18406 rt_sigaction(SIGHUP, {SIG_IGN}, {0x804b324, [], SA_RESTART|0x4000000}, 
8) = 0
18406 rt_sigaction(SIGINT, {SIG_IGN}, {0x804b324, [], SA_RESTART|0x4000000}, 
8) = 0
18406 rt_sigaction(SIGTERM, {SIG_IGN}, {0x804b324, [], SA_RESTART|0x4000000}, 
8) = 0
18406 unlink("/tmp//sndfaxxrGI1A")      = 0
18406 unlink("/tmp//sndfax5lhY76")      = 0
18406 unlink("/tmp//sndfaxmJvGjc")      = 0
18406 unlink("/tmp//sndfax5lhY76")      = -1 ENOENT (No such file or directory)
18406 munmap(0x40275000, 4096)          = 0
18406 _exit(0)                          = ?


Looks to me like faxq goes silent just after opening the TIFF file (filehandle 
8) and doing an mmap. I can provide the entire strace output for anyone who 
wants to look at this in greater detail . . . this is about as far as I can 
take this tonight. If you read this and went "ahhhhhhhhhhhhhh, I 
seeeeeeeeeeeee" then I'd love to hear from you ;-)

-Darren




Project hosted by iFAX Solutions