[Rpm-metadata] dumpMetadata utf-8 question
Jay Soffian
jay-rpm at soffian.org
Fri Jan 5 05:06:55 UTC 2007
On Jan 4, 2007, at 11:06 PM, seth vidal wrote:
> Mostly I think it was that:
> 1. We never encountered non-utf8 strings in package names and I
> _thought_ that at one point in time rpm used to bitch about them
>
> 2. file names used to get upset in package builds when non-utf8
> strings
> were in %files - I remember running into this error at one point.
At least on my RHEL4 system /bin/rpm appears happy to generate and
consume files with euc_* encodings.
In any case, it seems that dumpMetadata should either:
1) coerce all strings to utf-8 (per the patch I sent previously), or
2) ensure that the strings which it doesn't coerce are already valid
utf-8 with something like:
def isUtf8(string):
try:
x = unicode(string, 'utf-8')
except UnicodeError:
return False
else:
return x.encode('utf-8') == string
The problem with (1) is that yum compares the name of the RPM in the
downloaded header to the coerced string and if they don't match it
rejects the header. This is simple to fix in yum though and I'd be
happy to contribute a patch. (There might be additional problems, but
that one is a show stopper.)
In lieu of (1), I can submit a patch for (2). This will at least
prevent createrepo from generating XML that cannot be consumed
downstream by yum.
j.
More information about the Rpm-metadata
mailing list