[Yum-devel] Re: [Yum] metadata parser in C

Paul Nasrat pnasrat at redhat.com
Wed May 10 14:11:44 UTC 2006

On Wed, 2006-05-10 at 14:03 +0300, Tambet Ingo wrote:
> Hey,
> Attached is a yum metadata parser written in C. It should produce
> identical results with the sqlitecache implementation in yum-2.6.1 but
> should be quite a bit faster. Here are the numbers parsing FC5 core
> metadata (2207 packages):

This is really cool.  I think it's a really good start, you're aware of
all the caveats which need to be worked around for full inclusion.  It
might be interesting to see how python-lxml deals with it also for speed
comparisons.  I may try and whip something up for that.

> To test it, unpack the attached tarball and build it with standard:
> python setup.py build
> sudo python setup.py install

I just dumped .py and .so in yum/ directory of HEAD checkout and used
python yummain.py ...

makecache with 4 (3118 + 296 + 978 + 2207) repos from cold with your

real    1m48.291s
user    0m28.662s
sys     0m2.320s

Old implementation:

real    4m39.603s
user    2m33.674s
sys     0m4.760s

> Some notes:
> I updated the dbversion because I modified the SQL schema slightly: When
> deleting a package from 'packages' table, there are now SQL triggers to
> delete related rows from other tables (files, 'prco', filelists,
> changelog).
> It doesn't work for regular users: 

We really can't have this for the final implementation, as we want
repoquery and friends to work as non root.  

One option would be to in sqlitecache.py have different implementers -
such as pure python and the .so and a factory that we can select from.
This would mean we can put this in as a seperate package, so as not to
make yum arch specific.  I think this would be a good start on
implementation, as with multiple repo type support we may have different
frontend->cache parsers anyway.

> The current implementation uses in
> memory database for that case, which is not possible with this
> implementation. There's no way to return the sqlite db handle from C to
> python so the parser currently closes the db and returns the file name
> and python part re-opens it.

>  I'm thinking about adding per-user sqlite
> caches somewhere in users' home directories for that. Something like
> ~/.yum/$reponame/$md_filename.xml.gz.sqlite.

That would work, or you could even put them in 

> The yum logging isn't used. All the output is printed to stdout and
> stderr. It's quite easy to fix but ...

Yeah I see a few of these building from a clean cache:

** (process:4084): WARNING **: Incomplete package lost

Particularly as we use yum through graphical front ends we don't really
want to be using stdout/stderr.


More information about the Yum-devel mailing list