If maximum performance is not needed, a much simpler way of making a data transformation is to implement it in userland via the ggate (GEOM gate) facility. Unfortunately, there is no easy way to convert between, or even share code between the two approaches.
GEOM classes are transformations on the data. These transformations can be combined in a tree-like fashion. Instances of GEOM classes are called geoms.
Each GEOM class has several “class methods” that get called when there is no geom instance available (or they are simply not bound to a single instance):
.init
is called when GEOM becomes
aware of a GEOM class (when the kernel module gets
loaded.)
.fini
gets called when GEOM
abandons the class (when the module gets
unloaded)
.taste
is called next, once for
each provider the system has available. If applicable,
this function will usually create and start a geom
instance.
.destroy_geom
is called when the
geom should be disbanded
.ctlconf
is called when user
requests reconfiguration of existing
geom
Also defined are the GEOM event functions, which will get copied to the geom instance.
Field .geom
in the g_class
structure is a LIST of
geoms instantiated from the class.
These functions are called from the g_event kernel thread.
The name “softc” is a legacy term for “driver private data”. The name most probably comes from the archaic term “software control block”. In GEOM, it is a structure (more precise: pointer to a structure) that can be attached to a geom instance to hold whatever data is private to the geom instance. Most GEOM classes have the following members:
struct g_provider *provider
: The
“provider” this geom
instantiates
uint16_t n_disks
: Number of
consumer this geom consumes
struct g_consumer **disks
: Array
of struct g_consumer*
. (It is not
possible to use just single indirection because struct
g_consumer* are created on our behalf by
GEOM).
The softc
structure
contains all the state of geom instance. Every geom instance
has its own softc.
Format of metadata is more-or-less class-dependent, but MUST start with:
16 byte buffer for null-terminated signature (usually the class name)
uint32 version ID
It is assumed that geom classes know how to handle metadata with version ID's lower than theirs.
Metadata is located in the last sector of the provider (and thus must fit in it).
(All this is implementation-dependent but all existing code works like that, and it is supported by libraries.)
The sequence of events is:
In the case of creating/labeling a new geom, this is what happens:
geom(8) looks in the command-line argument for
the command (usually label
), and calls a
helper function.
The helper function checks parameters and gathers metadata, which it proceeds to write to all concerned providers.
This “spoils” existing geoms (if any) and initializes a new round of “tasting” of the providers. The intended geom class recognizes the metadata and brings the geom up.
(The above sequence of events is implementation-dependent but all existing code works like that, and it is supported by libraries.)
The helper geom_CLASSNAME.so
library
exports class_commands
structure, which is an array of struct g_command
elements.
Commands are of uniform format and look like:
verb [-options] geomname [other]
Common verbs are:
label — to write metadata to devices so they can be recognized at tasting and brought up in geoms
destroy — to destroy metadata, so the geoms get destroyed
Common options are:
-v
: be verbose
-f
: force
Many actions, such as labeling and destroying metadata can
be performed in userland. For this, struct g_command
provides field
gc_func
that can be set to a function (in
the same .so
) that will be called to
process a verb. If gc_func
is NULL, the
command will be passed to kernel module, to
.ctlreq
function of the geom
class.
Geoms are instances of GEOM classes. They have internal data (a softc structure) and some functions with which they respond to external events.
The event functions are:
.access
: calculates permissions
(read/write/exclusive)
.dumpconf
: returns XML-formatted
information about the geom
.orphan
: called when some
underlying provider gets disconnected
.spoiled
: called when some
underlying provider gets written to
.start
: handles I/O
These functions are called from the
g_down
kernel thread and there can be no
sleeping in this context, (see definition of sleeping
elsewhere) which limits what can be done quite a bit, but
forces the handling to be fast.
Of these, the most important function for doing actual
useful work is the .start
() function,
which is called when a BIO request arrives for a provider
managed by a instance of geom class.
There are three kernel threads created and run by the GEOM framework:
g_down
: Handles requests coming
from high-level entities (such as a userland request) on
the way to physical devices
g_up
: Handles responses from
device drivers to requests made by higher-level
entities
g_event
: Handles all other cases:
creation of geom instances, access counting,
“spoil” events, etc.
When a user process issues “read data X at offset Y of a file” request, this is what happens:
The filesystem converts the request into a struct bio instance and passes it to the GEOM subsystem. It knows what geom instance should handle it because filesystems are hosted directly on a geom instance.
The request ends up as a call to the
.start
() function made on the g_down
thread and reaches the top-level geom
instance.
This top-level geom instance (for example the
partition slicer) determines that the request should be
routed to a lower-level instance (for example the disk
driver). It makes a copy of the bio request (bio requests
ALWAYS need to be copied between
instances, with g_clone_bio
()!),
modifies the data offset and target provider fields and
executes the copy with
g_io_request
()
The disk driver gets the bio request also as a call to
.start
() on the
g_down
thread. It talks to hardware,
gets the data back, and calls
g_io_deliver
() on the
bio.
Now, the notification of bio completion “bubbles
up” in the g_up
thread. First
the partition slicer gets .done
()
called in the g_up
thread, it uses
information stored in the bio to free the cloned bio
structure (with
g_destroy_bio
()) and calls
g_io_deliver
() on the original
request.
The filesystem gets the data and transfers it to userland.
See g_bio(9) man page for information how the data is
passed back and forth in the bio
structure (note in
particular the bio_parent
and
bio_children
fields and how they are
handled).
One important feature is: THERE CAN BE NO SLEEPING IN G_UP AND G_DOWN THREADS. This means that none of the following things can be done in those threads (the list is of course not complete, but only informative):
Calls to msleep
() and
tsleep
(),
obviously.
Calls to g_write_data
() and
g_read_data
(), because these sleep
between passing the data to consumers and
returning.
Waiting for I/O.
Calls to malloc(9) and
uma_zalloc
() with
M_WAITOK
flag set
sx and other sleepable locks
This restriction is here to stop GEOM code clogging the I/O request path, since sleeping is usually not time-bound and there can be no guarantees on how long will it take (there are some other, more technical reasons also). It also means that there is not much that can be done in those threads; for example, almost any complex thing requires memory allocation. Fortunately, there is a way out: creating additional kernel threads.
Kernel threads are created with kthread_create(9) function, and they are sort of similar to userland threads in behaviour, only they cannot return to caller to signify termination, but must call kthread_exit(9).
In GEOM code, the usual use of threads is to offload
processing of requests from g_down
thread
(the .start
() function). These threads
look like “event handlers”: they have a linked
list of event associated with them (on which events can be
posted by various functions in various threads so it must be
protected by a mutex), take the events from the list one by
one and process them in a big switch
()
statement.
The main benefit of using a thread to handle I/O requests
is that it can sleep when needed. Now, this sounds good, but
should be carefully thought out. Sleeping is well and very
convenient but can very effectively destroy performance of the
geom transformation. Extremely performance-sensitive classes
probably should do all the work in
.start
() function call, taking great care
to handle out-of-memory and similar errors.
The other benefit of having a event-handler thread like
that is to serialize all the requests and responses coming
from different geom threads into one thread. This is also
very convenient but can be slow. In most cases, handling of
.done
() requests can be left to the
g_up
thread.
Mutexes in FreeBSD kernel (see mutex(9)) have one distinction from their more common userland cousins — the code cannot sleep while holding a mutex). If the code needs to sleep a lot, sx(9) locks may be more appropriate. On the other hand, if you do almost everything in a single thread, you may get away with no mutexes at all.
All FreeBSD documents are available for download at http://ftp.FreeBSD.org/pub/FreeBSD/doc/
Questions that are not answered by the
documentation may be
sent to <[email protected]>.
Send questions about this document to <[email protected]>.