[Rpm-metadata] updates

Daniel Veillard veillard at redhat.com
Fri Oct 24 21:38:49 UTC 2003


On Fri, Oct 24, 2003 at 03:24:49PM -0400, Joe Shaw wrote:
> On Fri, 2003-10-24 at 11:40, Daniel Veillard wrote: 
> >   Hum, seems that libxml2 error message got inserted at the beginning
> > of the XML result:
> 
> Yeah, even removing that, though, I get this:
> 
> [joe at bacon joe]$ xmllint fedora-core-test2-metadata.xml
> fedora-core-test2-metadata.xml:10061: error: Input is not proper UTF-8,
> indicate encoding !
> 
> And it shows the registered mark and trademark lines (for some reason
> Evo won't paste them).
> 
> Anyway, they're the Latin-1 trademark symbols, not the UTF-8 ones. 
> (ASCII is a subset of UTF-8, but Latin 1 isn't)
> 
> In python (>= 2.1 anyway) you can encode string as UTF-8 by doing:
> 
> string = "blah blah ®" (some latin 1 encoded string)
> utf8_string = string.encode('utf-8')

  How does Python know that the string is in Latin 1 ??? what happen if
you used another encoding actually ? I'm afraid I18N is a really weak
point of python (that and the choice of UTF16 for "unicode" strings :-\)

> and then write that out to the disk.

  The real problem is that there is no encoding defined for the 
textual strings in an RPM header. This is just surfacing one more
time as we try to convert those data to I18N compliant software.
In general assuming Iso Latin 1 is relatively safe, but there are
distro using Latin 2, I remember indexing a Turkish translated
version of Red Hat too etc. 
  Trying to guess encodings is quite hard and error prone, automatically
assuming Iso Latin 1 at least has some consistency, if it fails it will
do so in a deterministic way <grin/>

  Jeff, is there any sane way to escape this ? I'm afraid the answer is no,
is Latin 1 really the safest way out ? Can this actually lead to a
guideline for improvements at the distro level ? 

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard at redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



More information about the Rpm-metadata mailing list