HylaFAX The world's most advanced open source fax server

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [hylafax-users] IRQ Overload



Shane Eckert wrote:

We are using Hylafax on two servers with 8 port PCI modems. (single card IQ Express 8 port)


Things were going just fine until both servers were unable to initialize the modems. After awhile dmesg was flooded with the following messages.

Jun 23 17:41:42 s_sys@util10 FaxGetty[27504]: LOCKWAIT

Jun 23 17:41:42 s_sys@util10 kernel: serial8250: too much work for irq201

Jun 23 17:41:42 s_sys@util10 kernel: serial8250: too much work for irq201

Jun 23 17:41:42 s_sys@util10 kernel: serial8250: too much work for irq201

Jun 23 17:41:42 s_sys@util10 kernel: serial8250: too much work for irq201

Jun 23 17:41:42 s_sys@util10 kernel: serial8250: too much work for irq201

Jun 23 17:41:42 s_sys@util10 kernel: serial8250: too much work for irq201

Jun 23 17:41:42 s_sys@util10 kernel: serial8250: too much work for irq201

Jun 23 17:41:42 s_sys@util10 kernel: serial8250: too much work for irq201

Jun 23 17:41:42 s_sys@util10 kernel: serial8250: too much work for irq201

Jun 23 17:41:42 s_sys@util10 kernel: serial8250: too much work for irq201

And so on for several hundred lines.

Shutting down the server and reseating the PCI cards fixed the issue. Nothing else would do so.

I fear this is a temporary fix since the IRQ’s were overloaded.

Any ideas?


This is something that I've been researching and working on for a couple of months now... and, unfortunately I don't currently have a dead-ringer solution to give you (which is why I didn't reply yesterday as I was testing out some more things).


The "serial8250: too much work..." warning message is a bit of a red herring. You can possibly make that message go away by moving the IQ Express to a different PCIe slot (and thus changing the IRQ assignment). But the underlying kernel problem will persist.

Unfortunately, it's a difficult issue to debug because the serial driver in the kernel does not offer any debugging aids for this condition. So in order to debug one would have to become familiar with a bit of kernel hacking just to get things started - and then to continue would need to know a fair bit of how the PCIe bus, interrupt handler, and serial drivers work. You can imagine it's a bit difficult.

My messages to the Linux kernel mailing list have generated wonderful tips on things to try, but in the end none of them have yielded any improvements, and all avenues have come to dead ends as far as I can tell.

Here is what I can tell you about the problem:

1) it doesn't happen on Windows

2) is severity varies quite noticeably with kernel version: it's much more pronounced on new kernels (i.e. 2.6.25), and not so much on older ones (i.e. patched 2.6.19)

3) the crash seems to occur between calls: when the ioctls are being called to reset and recondition the modem - thus you *may* get some different behavior by changing things like by setting "ModemDTRDropDelay: 200" in your modem config file.

So the problem does seem to be due to some kind of interaction between the IQ Express and the Linux kernel (especially the later versions). Mainpine is working on what I believe will be a solution to this problem (by working-around the problem entirely), but that solution is currently not available for another month or two.

Please do keep in touch with me until that time.

Thanks,

Lee.

--
*Lee Howard*
*Mainpine, Inc. Support Manager*
Tel: +1 866 363 6680 ext 4 | Fax: +1 360 462 8160
lee.howard@xxxxxxxxxxxx | www.mainpine.com


____________________ HylaFAX(tm) Users Mailing List _______________________ To subscribe/unsubscribe, click http://lists.hylafax.org/cgi-bin/lsg2.cgi On UNIX: mail -s unsubscribe hylafax-users-request@xxxxxxxxxxx < /dev/null *To learn about commercial HylaFAX(tm) support, mail sales@xxxxxxxxx*




Project hosted by iFAX Solutions