[Yum-devel] Importing filelists.xml (and SAX)
skvidal at phy.duke.edu
Sat Jan 29 08:22:41 UTC 2005
> - The purpose of this script was to go straight from XML into the
> sqlite database to see how fast the data could be imported. I can't
> think how the import could go much faster. Even so, the import of
> this 39M filelists.xml still takes around 61s on my machine, and
> this is for just _1_ repository.
The other problem is that maybe the format is just not right. I've
thought about it a fair bit and I can't come up with another arrangement
of the file metadata that is much less bulky.
Even if we tracked dirs and references to decrease the data size of the
file we'd still end up with the same number of nodes to traverse. so,
how do get to a place where this isn't so bulky to read in?
Right now a 1652 pkg repository - fc3 base w/o the srpms or debuginfo
stuff - has a 363K line filelists.xml and a 92K line primary.xml
I don't think the speed issue is the number of bytes in the file - I
think it has to do with the number of entries the xml parser has to
I tested your parser just writing out to a flat file - it goes through
363K lines in about 9.8s - which isn't bad, Then I read the file back in
and printed out each line into a new file. Just a simple loop. It took
2.4s to read in and write back out once it was a simple text file.
What's the fastest input sqlite can take? Does it have a benchmark for
dumping data into a db?
More information about the Yum-devel