[Rpm-metadata] dumpMetadata utf-8 question
seth vidal
skvidal at linux.duke.edu
Fri Jan 5 04:05:30 UTC 2007
On Thu, 2007-01-04 at 13:45 -0500, Jay Soffian wrote:
> In dealing with some i18n RPMs recently I noticed that dumpMetadata
> can generate XML which is unparseable on the receiving end by yum due
> to feeding libxml2 non-utf8 encoded strings in some cases. The reason
> for this is two-fold:
>
> 1) The RPM in question (constructed for QA purposes) was encoded
> using the euc_jp encoding.
> 2) dumpMetadata does not pass all the strings it extracts from an RPM
> through utf8String. (In particular, the name of the RPM as well as
> the name portion of each of the PRCO entries.)
>
> I've modified dumpMetadata to: a) pass all strings through
> utf8String; and b) allow you to optionally specify the encoding that
> was in use when the RPM was constructed.
>
> This causes problems downstream with yum when attempting to install
> the RPM (because yum compares the utf-8 encoded RPM name to the name
> of the RPM as represented by the raw bytes from the downloaded RPM
> header and these are not equal, thus yum rejects the header), but at
> least the XML is valid allowing other RPMs in the repo to be installed.
>
> Anyway, I'm wondering whether it was an intentional design decision
> to not pass all the bits handed to libxml2 through utf8String first.
Mostly I think it was that:
1. We never encountered non-utf8 strings in package names and I
_thought_ that at one point in time rpm used to bitch about them
2. file names used to get upset in package builds when non-utf8 strings
were in %files - I remember running into this error at one point.
-sv
More information about the Rpm-metadata
mailing list