[Rpm-metadata] updates

Malcolm Tredinnick malcolm at commsecure.com.au
Fri Oct 24 03:58:54 UTC 2003


On Fri, 2003-10-24 at 13:35, seth vidal wrote:
> On Thu, 2003-10-23 at 12:13, Joe Shaw wrote:
> > On Thu, 2003-10-23 at 03:02, seth vidal wrote:
> > > 2. produced a sample of the format indexing all of fedora-core-test2 
> > > (this is a big 4.0M file - becareful if you're on a slow connection)
> > >  http://linux.duke.edu/~skvidal/metadata/fedora-core-test2-metadata.xml
> > 
> > This file isn't valid UTF-8. :)  Looks like a couple package
> > descriptions or summaries are encoded in a non-specified encoding. :)
> > 
> 
> <grumble> I'll work on it see what I can do to solve it. thanks.

When I looked at this, I couldn't see what Joe was talking about (apart
from the warning on the first line). There were only four characters
that had values > 127 and they were all the second of two bytes in a
valid UTF-8 character (registered symbol in three cases, copyright
symbol in one).

My test was (in python):

d = open('fedora-core-test2-metadata.xml').read()
[i for i in range(len(d)) if ord(d[i]) > 127]

(not blindingly fast on a 4MB file, but it gets the job done).

Okay, I was looking at the bzip2 version, rather than the raw version,
but I assume the former was generated from the latter.

(By the way, only 320K in size is extremely cool!)

Cheers,
Malcolm




More information about the Rpm-metadata mailing list