?

Log in

No account? Create an account

Previous Entry | Next Entry

ReiserFS

diff -puN /dev/null fs/reiser4/znode.c
--- /dev/null Thu Apr 11 07:25:15 2002
+++ 25-akpm/fs/reiser4/znode.c Wed Mar 30 14:55:08 2005
@@ -0,0 +1,1141 @@ /* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
  * reiser4/README */ /* Znode manipulation functions. */ /* Znode is the in-memory header for a tree node. It is stored
      separately from the node itself so that it does not get written to
      disk. In this respect znode is like buffer head or page head. We
      also use znodes for additional reiser4 specific purposes:

        . they are organized into tree structure which is a part of whole
            reiser4 tree.
        . they are used to implement node grained locking
        . they are used to keep additional state associated with a
            node
        . they contain links to lists used by the transaction manager

      Znode is attached to some variable "block number" which is instance of
      fs/reiser4/tree.h:reiser4_block_nr type. Znode can exist without
      appropriate node being actually loaded in memory. Existence of znode itself
      is regulated by reference count (->x_count) in it. Each time thread
      acquires reference to znode through call to zget(), ->x_count is
      incremented and decremented on call to zput(). Data (content of node) are
      brought in memory through call to zload(), which also increments ->d_count
      reference counter. zload can block waiting on IO. Call to zrelse()
      decreases this counter. Also, ->c_count keeps track of number of child
      znodes and prevents parent znode from being recycled until all of its
      children are. ->c_count is decremented whenever child goes out of existence
      (being actually recycled in zdestroy()) which can be some time after last
      reference to this child dies if we support some form of LRU cache for
      znodes.

*/ /* EVERY ZNODE'S STORY

      1. His infancy.

      Once upon a time, the znode was born deep inside of zget() by call to
      zalloc(). At the return from zget() znode had:

        . reference counter (x_count) of 1
        . assigned block number, marked as used in bitmap
        . pointer to parent znode. Root znode parent pointer points
            to its father: "fake" znode. This, in turn, has NULL parent pointer.
        . hash table linkage
        . no data loaded from disk
        . no node plugin
        . no sibling linkage

      2. His childhood

      Each node is either brought into memory as a result of tree traversal, or
      created afresh, creation of the root being a special case of the latter. In
      either case it's inserted into sibling list. This will typically require
      some ancillary tree traversing, but ultimately both sibling pointers will
      exist and JNODE_LEFT_CONNECTED and JNODE_RIGHT_CONNECTED will be true in
      zjnode.state.

      3. His youth.

      If znode is bound to already existing node in a tree, its content is read
      from the disk by call to zload(). At that moment, JNODE_LOADED bit is set
      in zjnode.state and zdata() function starts to return non null for this
      znode. zload() further calls zparse() that determines which node layout
      this node is rendered in, and sets ->nplug on success.

      If znode is for new node just created, memory for it is allocated and
      zinit_new() function is called to initialise data, according to selected
      node layout.

      4. His maturity.

      After this point, znode lingers in memory for some time. Threads can
      acquire references to znode either by blocknr through call to zget(), or by
      following a pointer to unallocated znode from internal item. Each time
      reference to znode is obtained, x_count is increased. Thread can read/write
      lock znode. Znode data can be loaded through calls to zload(), d_count will
      be increased appropriately. If all references to znode are released
      (x_count drops to 0), znode is not recycled immediately. Rather, it is
      still cached in the hash table in the hope that it will be accessed
      shortly.

      There are two ways in which znode existence can be terminated:

        . sudden death: node bound to this znode is removed from the tree
        . overpopulation: znode is purged out of memory due to memory pressure

      5. His death.

      Death is complex process.

      When we irrevocably commit ourselves to decision to remove node from the
      tree, JNODE_HEARD_BANSHEE bit is set in zjnode.state of corresponding
      znode. This is done either in ->kill_hook() of internal item or in
      kill_root() function when tree root is removed.

    At this moment znode still has:

        . locks held on it, necessary write ones
        . references to it
        . disk block assigned to it
        . data loaded from the disk
        . pending requests for lock

      But once JNODE_HEARD_BANSHEE bit set, last call to unlock_znode() does node
      deletion. Node deletion includes two phases. First all ways to get
      references to that znode (sibling and parent links and hash lookup using
      block number stored in parent node) should be deleted -- it is done through
      sibling_list_remove(), also we assume that nobody uses down link from
      parent node due to its nonexistence or proper parent node locking and
      nobody uses parent pointers from children due to absence of them. Second we
      invalidate all pending lock requests which still are on znode's lock
      request queue, this is done by invalidate_lock(). Another JNODE_IS_DYING
      znode status bit is used to invalidate pending lock requests. Once it set
      all requesters are forced to return -EINVAL from
      longterm_lock_znode(). Future locking attempts are not possible because all
      ways to get references to that znode are removed already. Last, node is
      uncaptured from transaction.

      When last reference to the dying znode is just about to be released,
      block number for this lock is released and znode is removed from the
      hash table.

      Now znode can be recycled.

      [it's possible to free bitmap block and remove znode from the hash
      table when last lock is released. This will result in having
      referenced but completely orphaned znode]

      6. Limbo

      As have been mentioned above znodes with reference counter 0 are
      still cached in a hash table. Once memory pressure increases they are
      purged out of there [this requires something like LRU list for
      efficient implementation. LRU list would also greatly simplify
      implementation of coord cache that would in this case morph to just
      scanning some initial segment of LRU list]. Data loaded into
      unreferenced znode are flushed back to the durable storage if
      necessary and memory is freed. Znodes themselves can be recycled at
      this point too.

*/