In Section 31.6.4, we mentioned that factoring the
_names set and its mutex into a separate class instance does not really solve the cyclic dependency problem, at least not in general. To see why, suppose that we want to extend our factory with a new
getDetails operation:
// ...
struct Details {
PhoneEntry* proxy;
string name;
string phNum;
};
sequence<Details> DetailsSeq;
interface PhoneEntryFactory {
// ...
DetailsSeq getDetails();
};
This type of operation is common in collection managers: instead of returning a simple list of proxies,
getDetails returns a sequence of structures, each of which contains not only the object’s proxy, but also some of the state of the corresponding object. The motivation for this is performance: with a plain list of proxies, the client, once it has obtained the list, is likely to immediately follow up with one or more remote calls for each object in the list in order to retrieve their state (for example, to display the list of objects to the user). Making all these additional remote procedure calls is inefficient, and an operation such as
getDetails gets the job done with a single RPC instead.
To implement getDetails in the factory, we need to iterate over the set of entries and invoke the
getNumber operation on each object. (These calls are collocated and therefore very efficient, so they do not suffer the performance problem that a client calling the same operations would suffer.) However, this is potentially dangerous because the following sequence of events is possible:
•
The implementation of getDetails must lock
_lcMutex to prevent concurrent modification of the
_names set during iteration.
•
The implementation of destroy locks the entry’s mutex
_m, sets the
_destroyed flag, and then calls
remove, which attempts to lock
_lcMutex. However,
_lcMutex is already locked by
getDetails, so
remove blocks until
_lcMutex is unlocked again.
•
getDetails, while iterating over its set of entries, happens to call
getNumber on the entry that is currently being destroyed by client B.
getNumber, in turn, tries to lock its mutex
_m, which is already locked by
destroy.
At this point, the server deadlocks: getDetails holds a lock on
_lcMutex and waits for
_m to become available, and
destroy holds a lock on
_m and waits for
_lcMutex to become available, so neither thread can make progress.
For our example, it is fairly easy to avoid the deadlock: instead of holding the lock for the duration of
destroy, we set the
_destroyed flag under protection of the lock and unlock
_m again before calling
remove on the factory:
void
PhoneEntryI::destroy(const Current& c)
{
{
Mutex::Lock lock(_m);
if (_destroyed)
throw ObjectNotExistException(__FILE__, __LINE__);
_destroyed = true;
} // _m is unlocked here.
_factory‑>remove(_name, c.adapter);
}
Now deadlock is impossible because no function holds more than one lock, and no function calls another function while it holds a lock. However, rearranging locks in this fashion can be quite difficult for complex applications. In particular, if an application uses callbacks that do complex things involving several objects, it can be next to impossible to prove that the code is free of deadlocks. The same is true for applications that use condition variables and suspend threads until a condition becomes true.
At the core of the problem is that concurrency can create circular locking dependencies: an operation on the factory (such as
getDetails) can require the same locks as a concurrent call to
destroy. This is one reason why threaded code is harder to write than sequential code—the interactions among operations require locks, but dependencies among these locks are not obvious. In effect, locks set up an entirely separate and largely invisible set of dependencies. For example, it was easy to spot the mutual dependency between the factory and the servants due to the presence of
remove; in contrast, it was much harder to spot the lurking deadlock in
destroy. Worse, deadlocks may not be found during testing and discovered only after deployment, when it is much more expensive to rectify the problem.
Instead of trying to arrange code such that is deadlock-free in the presence of callbacks, it is often easier to change the code to avoid the callbacks entirely and to use an approach known as
reaping:
•
destroy marks the servant as destroyed and removes the ASM entry as usual, but it does not call back into the factory to update the
_names set.
To implement reaping, we need to change our PhoneEntryI definition a little. It no longer has a static
_factory smart pointer back to the factory (because it no longer calls
remove). Instead, the servant now provides a member function
_isZombie that the factory calls to check whether the servant was destroyed some time in the past:
class PhoneEntryI : public PhoneEntry {
public:
// ...
bool _isZombie() const;
private:
const string _name;
string _phNum;
bool _destroyed;
Mutex _m;
};
The implementation of _isZombie is trivial: it returns the
_destroyed flag under protection of the lock:
bool
PhoneEntryI::_isZombie() const
{
Mutex::Lock lock(_m);
return _destroyed;
}
The destroy operation no longer calls back into the factory to update the
_names set; instead, it simply sets the
_destroyed flag and removes the ASM entry:
void
PhoneEntryI::destroy(const Current& c)
{
Mutex::Lock lock(_m);
if (_destroyed)
throw ObjectNotExistException(__FILE__, __LINE__);
_destroyed = true;
c.adapter‑>remove(c.id);
}
class PhoneEntryFactoryI : public PhoneEntryFactory
{
public:
// Constructor and Slice operations here...
private:
typedef map<string, PhoneEntryIPtr> PMap;
PMap _entries;
Mutex _lcMutex;
};
During create (and other operations, such as
list,
getDetails, and
find), we scan for zombie servants and remove them from the
_entries map:
PhoneEntryPrx
PhoneEntryFactory::create(const string& name,
const string& phNum,
const Current& c)
{
Mutex::Lock lock(_lcMutex);
PhoneEntryPrx pe;
PhoneEntryIPtr servant = new PhoneEntryI(name, phNum);
// Try to create new object.
//
try {
CommunicatorPtr comm = c.adapter‑>getCommunicator();
pe = PhoneEntryPrx::uncheckedCast(c.adapter‑>add(
servant,
comm‑>stringToIdentity(name)));
} catch (const Ice::AlreadyRegisteredException&) {
throw PhoneEntryExists(name, phNum);
}
// Scan for zombies.
//
PMap::iterator i = _entries.begin();
while (i != _entries.end())
{
if (i‑>second‑>_isZombie())
_entries.erase(i++);
else
++i;
}
_entries[name] = servant;
return pe;
}
The implementations of list,
getDetails, and
find scan for zombies as well. Because they need to iterate over the existing entries anyway, reaping incurs essentially no extra cost:
PhoneEntries
PhoneEntryFactoryI::list(const Current& c)
{
Mutex::Lock lock(_lcMutex);
CommunicatorPtr comm = c.adapter‑>getCommunicator();
PhoneEntries pe;
PMap::iterator i = _entries.begin();
for (i = _entries.begin(); i != _entries.end(); ++i) {
if (i‑>second‑>_isZombie()) {
_entries.erase(i++);
} else {
ObjectPrx o = c.adapter‑>createProxy(
comm‑>stringToIdentity(i‑>first));
pe.push_back(PhoneEntryPrx::uncheckedCast(o));
++i;
}
}
return pe;
}
// Similar for getDetails and find...
This is a much cleaner design: there is no cyclic dependency between the factory and the servants, either implicit (in the type system) or explicit (as a locking dependency). Moreover, the implementation is easier to understand once you get used to the idea of reaping: there is no need to follow complex callbacks and to carefully analyze the order of lock acquisition. (Note that, depending on how state is maintained for servants, you may also need to reap during start-up and shutdown.)
In general, we recommend that you use a reaping approach in preference to callbacks for all but the most trivial applications: it simply is a better approach that is easier to maintain and understand.
You may be concerned that reaping increases the cost of create from
O(log
n) to
O(
n) because
create now iterates over all existing entries and locks and unlocks a mutex in each servant (whereas, previously, it simply added each new servant to the
_names set). Often, this is not an issue because life cycle operations are called infrequently compared to normal operations. However, you will notice the additional cost if you have a large number of servants (in the thousands or more) and life cycle operations are called frequently.
If you find that create is a bottleneck (by profiling, not by guessing!), you can change to a more efficient implementation by adding zombie servants to a separate zombie list. Reaping then iterates over the zombie list and removes each servant in the zombie list from the
_entries map before clearing the zombie list. This reduces the cost of reaping to be proportional to the number of
zombie servants instead of the
total number of servants. In addition, it allows us to remove the
_isZombie member function and to lock and unlock
_lcMutex only once instead of locking a mutex in each servant as part of
_isZombie. We will see such an implementation in
Section 31.10.
You may also be concerned about the number of zombie servants that can accumulate in the server if
create is not called for some time. For most applications, this is not a problem: the servants occupy memory, but no other resources because
destroy can clean up scarce resources, such as file descriptors or network connections before it turns the servant into a zombie. If you really need to prevent accumulation of zombie servants, you can reap from a background thread that runs periodically, or you can count the number of zombies and trigger a reaping pass once that number exceeds some threshold.