HylaFAX The world's most advanced open source fax server

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [hylafax-users] Fax confirmation: Umlauts broken



* Lee Howard <faxguy@xxxxxxxxxxxxxxxx> [070127 18:31]:

> I do not disagree with the correctness of your statement here.  However, 
> encoding the Subject header should not be necessary in most ISO-8859-1 
> cases.  The mail reader whose default character set is ISO-8859-1 should 
> be able to display 8-bit German umlauted characters just fine without 
> being RFC 2047-encoded.

No, it's not that straight forward.

The mail reader whose default character set is ISO-8859-1 should be able
to display the German umlauted characters from the ISO-8859-1
character set (and maybe some of the other ISO-8859 family).

Each character set uses *different* byte(s) to represent the different
characters.  The character set is what tells the program displaying
stuff what sequence of bytes mean.  And each character set has a
different set of characters.

One of the reasons the things are starting to standardise on UNICODE is
because the UNICODE character set (of which UTF-8 is one way to encode
UNICODE code-points) is *supposed* to have every character...  UNICODE's
not perfect, or complete, but it's certainly the largest, and closest to
complete character code set currently available.

> Mail readers should be programmed such that they interpret unencoded 
> Subject headers using either the default character set or ISO-8859-1.  
> Any Subject header using a character set other than ISO-8859-1 should, 
> indeed, be RFC 2047-encoded.  But that's not the case here.

More correctly, any header that contains any character outside of the
7-bit US ASCII characters should be RFC2047 encode.

> Granted, the omission of the RFC 2047-encoding in this case could 
> possibly result in difficulty when the mail reader's default character 
> set is not ISO-8859-1.  And thus encoding all Subject headers would be, 
> perhaps, the most RFC-compliant thing to do.  However, I'm not sure that 
> is a perfect real-world solution, either.

Actually, if his default character set was UTF-8, the characters *would*
have been seen correctly.  But *somewhere* along the line, the UTF-8
characters were being handled by an ISO-8859 program.

> Interesting, however, is that the "receiver" data that Max is describing 
> is coming from the remote fax machine - and that fax machine doesn't 
> encode the data, and there is no sure-fire way to know what character 
> set the receiver data should be displayed in.  Suppose that I sent a fax 
> to someone in Poland and that their "receiver" data used the ISO-8859-2 
> character set.  Even if the Subject line of the email were encoded 
> according to RFC 2047 for ISO-8859-1 it would still be wrong for the 
> "receiver" data which would need to be specially handled with 
> ISO-8859-2... and the remote fax machine does not (as far as I know) 
> present us with the kind of information to make that distinction.  So in 
> that kind of situation it would still be displayed incorrectly... and 
> there would be very little that we could do about it... and in the end 
> it may just be best to leave that part up to the mail reader and its own 
> default character set.

But that's not what happened here.

Thankfully, T.30 only allows a  *small set* of characters (techinally,
only the number digits 0-9, +, and a space) which they have mapped onto
their normal ASCII byte definitions.  Most fax machines (including
HylaFAX) accept pretty much any ASCII character here, but techinally,
anuthing out of the digit range isn't guarenteed to work anywhere...

But in Max's case, the "receiver" is the field that the user submitting
the fax set (the TOUSER field, from the sendfax name@number
destination).  It appears to be UTF-8 characters (but remember, HylaFAX
has no way to know that it was intended to be UTF-8, or anything else).

This $receiver value was being used in the Subject line of the
notification email.

HylaFAX has already taken the approach that everything it produces is
intended to be UTF8 (although custom templates can be made to be other
character sets).   But currently, subject lines are raw - this is
definitely something to try and address.

a.

-- 
Aidan Van Dyk                                             aidan@xxxxxxxx
Senior Software Developer                          +1 215 825-8700 x8103
iFAX Solutions, Inc.                                http://www.ifax.com/

Attachment: signature.asc
Description: Digital signature




Project hosted by iFAX Solutions