[Rpm-metadata] dumpMetadata utf-8 question

Jay Soffian jay-rpm at soffian.org
Fri Jan 5 05:06:55 UTC 2007


On Jan 4, 2007, at 11:06 PM, seth vidal wrote:

> Mostly I think it was that:
> 1. We never encountered non-utf8 strings in package names and I
> _thought_ that at one point in time rpm used to bitch about them
>
> 2. file names used to get upset in package builds when non-utf8  
> strings
> were in %files - I remember running into this error at one point.

At least on my RHEL4 system /bin/rpm appears happy to generate and  
consume files with euc_* encodings.

In any case, it seems that dumpMetadata should either:

1) coerce all strings to utf-8 (per the patch I sent previously), or
2) ensure that the strings which it doesn't coerce are already valid  
utf-8 with something like:

def isUtf8(string):
     try:
         x = unicode(string, 'utf-8')
     except UnicodeError:
         return False
     else:
         return x.encode('utf-8') == string

The problem with (1) is that yum compares the name of the RPM in the  
downloaded header to the coerced string and if they don't match it  
rejects the header. This is simple to fix in yum though and I'd be  
happy to contribute a patch. (There might be additional problems, but  
that one is a show stopper.)

In lieu of (1), I can submit a patch for (2). This will at least  
prevent createrepo from generating XML that cannot be consumed  
downstream by yum.

j.



More information about the Rpm-metadata mailing list