[Rpm-metadata] dumpMetadata utf-8 question

Jay Soffian jay-rpm at soffian.org
Thu Jan 4 18:45:37 UTC 2007

In dealing with some i18n RPMs recently I noticed that dumpMetadata  
can generate XML which is unparseable on the receiving end by yum due  
to feeding libxml2 non-utf8 encoded strings in some cases. The reason  
for this is two-fold:

1) The RPM in question (constructed for QA purposes) was encoded  
using the euc_jp encoding.
2) dumpMetadata does not pass all the strings it extracts from an RPM  
through utf8String. (In particular, the name of the RPM as well as  
the name portion of each of the PRCO entries.)

I've modified dumpMetadata to: a) pass all strings through  
utf8String; and b) allow you to optionally specify the encoding that  
was in use when the RPM was constructed.

This causes problems downstream with yum when attempting to install  
the RPM (because yum compares the utf-8 encoded RPM name to the name  
of the RPM as represented by the raw bytes from the downloaded RPM  
header and these are not equal, thus yum rejects the header), but at  
least the XML is valid allowing other RPMs in the repo to be installed.

Anyway, I'm wondering whether it was an intentional design decision  
to not pass all the bits handed to libxml2 through utf8String first.



More information about the Rpm-metadata mailing list