[Rpm-metadata] dumpMetadata utf-8 question
Jay Soffian
jay-rpm at soffian.org
Thu Jan 4 18:45:37 UTC 2007
In dealing with some i18n RPMs recently I noticed that dumpMetadata
can generate XML which is unparseable on the receiving end by yum due
to feeding libxml2 non-utf8 encoded strings in some cases. The reason
for this is two-fold:
1) The RPM in question (constructed for QA purposes) was encoded
using the euc_jp encoding.
2) dumpMetadata does not pass all the strings it extracts from an RPM
through utf8String. (In particular, the name of the RPM as well as
the name portion of each of the PRCO entries.)
I've modified dumpMetadata to: a) pass all strings through
utf8String; and b) allow you to optionally specify the encoding that
was in use when the RPM was constructed.
This causes problems downstream with yum when attempting to install
the RPM (because yum compares the utf-8 encoded RPM name to the name
of the RPM as represented by the raw bytes from the downloaded RPM
header and these are not equal, thus yum rejects the header), but at
least the XML is valid allowing other RPMs in the repo to be installed.
Anyway, I'm wondering whether it was an intentional design decision
to not pass all the bits handed to libxml2 through utf8String first.
Thanks,
j.
More information about the Rpm-metadata
mailing list