[Yum-devel] Importing filelists.xml (and SAX)

Menno Smits menno-yum at freshfoo.com
Sat Jan 29 06:06:51 UTC 2005


Hi all,

I've been playing around with trying to speed up import of filelist data 
into sqlite. See the attached standalone POC script for details. I've 
used libxml's push parser (SAX) interface.

Here's my findings:

- Using the SAX parser greatly reduces memory usage and is quite fast.
   On my machine this script can parse a 39M filelists.xml in 7 secs and
   uses only ~8MB of memory if its not writing to the database or
   otherwise storing the data. Has using SAX for the various XML
   metadata been considered for yum?  It could greatly reduce yum's
   memory footprint.

- The purpose of this script was to go straight from XML into the
   sqlite database to see how fast the data could be imported. I can't
   think how the import could go much faster. Even so, the import of
   this 39M filelists.xml still takes around 61s on my machine, and
   this is for just _1_ repository.

   Is this really acceptable especially when metadata could change
   frequently?

   Gijs has already done a lot of good work with sqlite but I think we
   should think about this some more before commiting to it. I realise
   that filelist data is typically used less often but this wait is
   still fairly excessive.  Should we be investigating other options
   such as dbm style databases?

- The sqlite documentation mentions that using manual commits may
   improve performance for large numbers of INSERTs. I've tried this
   and found that for this case manual imports actually _increase_ the
   import time by a few seconds.

- I considered using the COPY command which is typically used to
   insert bulk data but support for this is removed in later versions
   of sqlite so it would be unwise to rely on this.

Menno

Scanned by the NetBox from NetBox Blue
(http://netboxblue.com/)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: saximport.py
Type: application/x-python
Size: 2169 bytes
Desc: not available
Url : http://lists.baseurl.org/pipermail/yum-devel/attachments/20050129/91bbaff5/attachment.bin 


More information about the Yum-devel mailing list