[Yum-devel] Importing filelists.xml (and SAX)
Menno Smits
menno-yum at freshfoo.com
Sat Jan 29 06:06:51 UTC 2005
Hi all,
I've been playing around with trying to speed up import of filelist data
into sqlite. See the attached standalone POC script for details. I've
used libxml's push parser (SAX) interface.
Here's my findings:
- Using the SAX parser greatly reduces memory usage and is quite fast.
On my machine this script can parse a 39M filelists.xml in 7 secs and
uses only ~8MB of memory if its not writing to the database or
otherwise storing the data. Has using SAX for the various XML
metadata been considered for yum? It could greatly reduce yum's
memory footprint.
- The purpose of this script was to go straight from XML into the
sqlite database to see how fast the data could be imported. I can't
think how the import could go much faster. Even so, the import of
this 39M filelists.xml still takes around 61s on my machine, and
this is for just _1_ repository.
Is this really acceptable especially when metadata could change
frequently?
Gijs has already done a lot of good work with sqlite but I think we
should think about this some more before commiting to it. I realise
that filelist data is typically used less often but this wait is
still fairly excessive. Should we be investigating other options
such as dbm style databases?
- The sqlite documentation mentions that using manual commits may
improve performance for large numbers of INSERTs. I've tried this
and found that for this case manual imports actually _increase_ the
import time by a few seconds.
- I considered using the COPY command which is typically used to
insert bulk data but support for this is removed in later versions
of sqlite so it would be unwise to rely on this.
Menno
Scanned by the NetBox from NetBox Blue
(http://netboxblue.com/)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: saximport.py
Type: application/x-python
Size: 2169 bytes
Desc: not available
Url : http://lists.baseurl.org/pipermail/yum-devel/attachments/20050129/91bbaff5/attachment.bin
More information about the Yum-devel
mailing list