[Yum-devel] Sqlite performance

Florian Festi ffesti at redhat.com
Tue Apr 24 16:38:36 UTC 2007


Hi!

Sorry for my long response time - I was on vacation.

seth vidal wrote:
> On Fri, 2007-03-23 at 15:51 +0100, Florian Festi wrote:
> 
>> I see absolutely no reason why this shouldn't be possible in the search 
>> methods. Of course you would have to move the .inPrcoRange() code into the 
>> package sack. In fact the rpmsack supports just that (whatProvides(), 
>> whatRequires()).
> 
> It is possible in the search method, except it'll mean doing the same
> operation inside the sack anyway. Putting it per-package object means
> that no matter the type of sack we're using it'll work. I'll be happy to
> give it a try if it'll help speed things up further.
> 
> 
>> If the sqlitesack and the transaction (as virtual sack representing the 
>> future rpmdb) provide this interface, too, the resolver will look much 
>> cleaner and changes on 'who searches where' will be much easier.
> 
> A sack object from the txmbr's in the tsInfo would be useful. I've
> thought of that, too. However, it seems like we would need to be able to
> get back a ListPackageSack based on:
>   - ts state
>   - pkg nevra
>   - all
> 
> I might be able to whip up a simple method to let us interact with it as
> a PackageSack so we can use the same methods. But that's even more
> reason the search method needs to be independent of the type of sack it
> is.

I don't know if I understand that last sentence. The search implementation 
should make use of the internal data structures of the sack (db4 or sqlite 
indices or what ever). So how can a search implementation be independent of 
the type of sack?


Anyway. For our own documentation I wrote down how our "databases" - which 
are inspired by the yum rpmsacks - work. We have very good experience having 
the same interface for everything as it allowed us to change what really 
happens fundamentally several times with out modifying most other parts of 
our code. This is what we finally/currently use.

This is in no way a suggestion for an implementation within yum but may be 
can give an idea of what is possible.

The last use case is what would correspond to a tsInfo based rpmsack.

Database Interface
------------------

A database within pyrpm is a set of rpms. Basic operations supported
by databases are:

  * open, close, read, clear - NOPs for some classes
  * clearPkgs - remove tags from rpms to reduce memory usage
  * isFilelistImported, importFilelist - NOPs for non repo dbs
  * reloadDependencies - needed after loading filelist
  * adding and removing rpms - some do that in memory others directly
                               write to disk
  * in operator
  * getMemoryCopy - a copy of the database that can be modified in memory
  * iterate over Provides, Requires, Conflicts, Obsoletes, Triggers and Files
    (PRCOTFs)
  * search for name and PRCOTFs
  * getFileRequires, getPkgsFileRequires

Database Classes
----------------

Most features are implemented in seperate classes. Those features are 
brought together either by inheritance or by using instances of other classes.

  RpmDatabase - abstract super class
    RpmDB - The on disk rpm db(4)
      RpmDiskShadowDB - allow virtually removes from db that are not written
                        to disk but insted are just filtered from all results
    RpmMemoryDB - in memory db that builds hashes for searching, work with
                  all kind of rpms
      RpmRepoDB - Yum repository, reads data into memory
        SqliteRepoDB - uses the yum sqlite db
          RhnChannelRepoDB - deals with RHN channels which are very similar
                             to Yum repositories
      RpmExternalSearchDB - use another db (sqlite) for searching while
                            maintaining an own list of rpms. All rpms must be
                            contained in the external db!
    JointDB - treat several dbs as one
      RhnRepoDB - RHN Repository. Work is done by RhnChannelRepoDB instances
      RpmShadowDB - current state during resolving - see RpmYum.pydb
                    use case below

Use Cases
---------

Although databases are used in more or less every script. There are two
use cases within pyrpmyum that cover all database classes.

"->" means holding a pointer to an/several instance(s) of another class

RpmYum.repos
~~~~~~~~~~~~

Database containing all rpms that are used to resolve
dependencies. After creation this database is read only.

  JointDB
   -> SqliteRepoDB - on per repository
   -> RhnRepoDB - optional
    -> RhnChannelRepoDB - one per channel
   -> RpmMemoryDB - containing rpms given at the command line (optional)

RpmYum.pydb
~~~~~~~~~~~

Database used for resolving. Rpms are added and removed to/from that
db and the searches for resolving dependencies are performed on
it. All modifications are kept in memory. It uses the RpmYum.repos and
the RpmDB for searching and filters the results to the rpms that have
not yet removed or have been added. That way neither linear search nor
building additional hashes is needed.

  RpmShadowDB
   -> RpmExternalSearchDB - keeps track of rpms installed from the repos
    -> RpmYum.repos - used for searches. See above for details
   -> RpmDiskShadowDB - keeps track of the rpms deleted in the RpmDB
    -> RpmDB - used for searches




More information about the Yum-devel mailing list