[Yum-devel] yum on an olpc machine (slooooooooooow)

Panu Matilainen pmatilai at laiskiainen.org
Thu Dec 21 07:44:27 UTC 2006


On Thu, 21 Dec 2006, seth vidal wrote:
> On Mon, 2006-12-18 at 09:33 +0200, Panu Matilainen wrote:
>
>> The overall repodata size could be cut down somewhat by at least couple of
>> ways:
>> - Drop the filenames redundancy from primary.xml. It's going to require
>>    of course the full filelists file to be downloaded at all times (diffs
>>    would help a lot of course), but that's what apt and smart need to do
>>    anyway (because both calculate full dependency tree at all times). Only
>>    yum benefits from the primary.xml stuff to some extent, and sooner or
>>    later it needs the full filelists too.
>
> We're punishing low bandwidth clients more, then, by requiring they
> download all of filelists to do anything.

For yum users, yes. OTOH Smart and apt need the full filelists anyway, so 
for them the clients end up downloading quite a bit of redundant data 
because of the partial filelists in primary.xml.

>> - other.xml is not typically loaded, but it could be made quite a bit
>>    smaller by storing the changelogs just once by source rpm. The
>>    difference is *huge* - eg FC6 SRPMS/repodata/other.xml.gz is roughly ~2M,
>>    but ~6M for i386 and ~8M for x86_64. With that kind of size savings
>>    somebody might even want to use it for something :)
>
> I see what you mean here, but I'm not sure how that's possible w/o a lot
> of substantial changes in how we look up changelogs. Not impossible,
> just invasive, I think.

cur.execute("select changelog.date as date, "
             "changelog.author as author, "
             "changelog.changelog as changelog "
             "from packages,changelog where packages.pkgId = %s"
             "and packages.pkgKey = changelog.pkgKey", self.pkgId)

becomes something like

cur.execute("select changelog.date as date, "
             "changelog.author as author, "
             "changelog.changelog as changelog "
             "from packages,changelog where packages.pkgId = %s"
             "and packages.rpm_sourcerpm = changelog.rpm_sourcerpm", self.pkgId)

Yes, it needs changing the other database scheme a bit so it might not be 
something you'll want to deal with in, say, yum-3.0.x, but if we're 
looking at the scale of "next gen repodata" things like this *should* be 
dealt with IMHO.

We really ought to take this to the metadata list though :)

 	- Panu -



More information about the Yum-devel mailing list