[Rpm-metadata] dumpMetadata utf-8 question

seth vidal skvidal at linux.duke.edu
Fri Jan 5 04:05:30 UTC 2007

On Thu, 2007-01-04 at 13:45 -0500, Jay Soffian wrote:
> In dealing with some i18n RPMs recently I noticed that dumpMetadata  
> can generate XML which is unparseable on the receiving end by yum due  
> to feeding libxml2 non-utf8 encoded strings in some cases. The reason  
> for this is two-fold:
> 1) The RPM in question (constructed for QA purposes) was encoded  
> using the euc_jp encoding.
> 2) dumpMetadata does not pass all the strings it extracts from an RPM  
> through utf8String. (In particular, the name of the RPM as well as  
> the name portion of each of the PRCO entries.)
> I've modified dumpMetadata to: a) pass all strings through  
> utf8String; and b) allow you to optionally specify the encoding that  
> was in use when the RPM was constructed.
> This causes problems downstream with yum when attempting to install  
> the RPM (because yum compares the utf-8 encoded RPM name to the name  
> of the RPM as represented by the raw bytes from the downloaded RPM  
> header and these are not equal, thus yum rejects the header), but at  
> least the XML is valid allowing other RPMs in the repo to be installed.
> Anyway, I'm wondering whether it was an intentional design decision  
> to not pass all the bits handed to libxml2 through utf8String first.

Mostly I think it was that:
1. We never encountered non-utf8 strings in package names and I
_thought_ that at one point in time rpm used to bitch about them
2. file names used to get upset in package builds when non-utf8 strings
were in %files - I remember running into this error at one point.


More information about the Rpm-metadata mailing list