[Yum-devel] Is providing sql text files instead of binary sqlite db files feasible?

James Antill james at fedoraproject.org
Mon Nov 9 05:01:21 UTC 2009


On Mon, 2009-11-09 at 01:25 +0330, Hedayat Vatankhah wrote:
> 
> On ۰۹/۱۱/۰۸  05:43, Seth Vidal wrote:
> >
> >
> > On Sat, 7 Nov 2009, Hedayat Vatankhah wrote:
> >
> >> Hi,
> >> Today I was testing different configurations for yum metadata files 
> >> (using different compression formats such as xz, with and without 
> >> requires table); I noticed something interesting:
> >> Dumping Fedora 11's primary.sqlite file (using .dump command of 
> >> sqlite3) in a text file and compressing that with bzip2 results in a 
> >> 4.4MB (4.1MB using xz) file, comparing with compressing that db 
> >> directly using bzip2 which results in a 11MB file (8.7MB using xz).
> >> For Fedora 11 Updates primary database using bzip2, the size reduces 
> >> from 5.6MB to 2.1MB.

 How big are the .xml files if you xz them?

> >> These savings are higher than the values I get when I drop requires 
> >> table from the databases and create new databases from them. Also, 
> >> they need the least changes in the code.
> >>
> >> Just thought that it might be interesting for you!
> >
> > Now - how long does it take to go from the .dump file back into a 
> > sqlite db you can use? I suspect that populating the filelists in 
> > particular is pretty expensive in terms of time.
>
> On my system (1.2G Core 2 Due Centerino), it take about 10 seconds to 
> read Fedora 11 repository's filelist data. I don't know how much time is 
> expected as "expensive", it depends on one's bandwidth. BTW, considering 
> that such metadata is not downloaded often, this time might be reasonable.

 By the same rationale if the metadata isn't updated often, then the
bandwidth savings aren't important.
 The problem with downloading the .xml and then converting it to .sqlite
is not _just_ that you have to convert it on each machine, but that from
within yum each time you use a repo. instead of managing a single file
"blah.sqlite" you have to manage a pair of files "blah.xml and
blah.sqlite".
 So startup is always slower, atm. And I'm pretty sure
_preload_md_from_system_cache() doesn't work as well (I think it just
gets the .xml).

 Of course we might end up stuck going back to .xml files, so we can get
metadata deltas ... but it might also be easy to get a usable delta
scheme for the .sqlite files. And there are other problems, out of our
control, with metadata deltas anyway.

> But if spending such time is considered unacceptable, yum might provide 
> two operating modes: low bandwidth and high bandwidth modes. In low 
> bandwidth mode, it'll download the compressed dump file and import it to 
> the sqlite database. In high bandwidth mode it'll download compressed 
> sqlite db files. The mode could be configured in yum config files, or it 
> could even determine the appropriate mode of operation considering its 
> download speed.

 Having that config. option to choose between downloading .sqlite
or .xml is certainly possible, and the quick fix should only be a few
lines of code ... doing it well shouldn't be _much_ more work, you just
need to store the choice that happened at repomd.xml time (so if the
user changes it we don't download when we don't need to etc.)



More information about the Yum-devel mailing list