[Rpm-metadata] updates

Joe Shaw joe at ximian.com
Mon Oct 27 17:45:02 UTC 2003


On Fri, 2003-10-24 at 17:38, Daniel Veillard wrote:
>   How does Python know that the string is in Latin 1 ??? what happen if
> you used another encoding actually ? I'm afraid I18N is a really weak
> point of python (that and the choice of UTF16 for "unicode" strings :-\)

It doesn't, but it does assume that strings are in a default encoding
(sys.defaultencoding or somesuch) if they're not "unicode" strings.

>   The real problem is that there is no encoding defined for the 
> textual strings in an RPM header. This is just surfacing one more
> time as we try to convert those data to I18N compliant software.
> In general assuming Iso Latin 1 is relatively safe, but there are
> distro using Latin 2, I remember indexing a Turkish translated
> version of Red Hat too etc. 

Right.  There are packges in SuSE in whatever encoding czech is in both
RPM data and in filenames on the system.  (Which are obviously garbage
on my system with Latin-1 or UTF-8)

>   Trying to guess encodings is quite hard and error prone, automatically
> assuming Iso Latin 1 at least has some consistency, if it fails it will
> do so in a deterministic way <grin/>

True.  Are there any encodings which don't use ASCII for the lower 7
bits?  Or at least are likely to use values below 32 which will cause
problems (practically)?

>   Jeff, is there any sane way to escape this ? I'm afraid the answer is no,
> is Latin 1 really the safest way out ? Can this actually lead to a
> guideline for improvements at the distro level ? 

Ideally everything in UTF-8, or perhaps a tag in the header which
indicates what encoding for a given i18n tag?

Joe




More information about the Rpm-metadata mailing list