skvidal at phy.duke.edu
Fri Oct 24 05:42:01 UTC 2003
> When I looked at this, I couldn't see what Joe was talking about (apart
> from the warning on the first line). There were only four characters
> that had values > 127 and they were all the second of two bytes in a
> valid UTF-8 character (registered symbol in three cases, copyright
> symbol in one).
> My test was (in python):
> d = open('fedora-core-test2-metadata.xml').read()
> [i for i in range(len(d)) if ord(d[i]) > 127]
> (not blindingly fast on a 4MB file, but it gets the job done).
> Okay, I was looking at the bzip2 version, rather than the raw version,
> but I assume the former was generated from the latter.
Any suggestions on utf-8 normalizing text are appreciated. :)
More information about the Rpm-metadata