We can add some locking mechanism (i.e. Mutex) to the Single-Threaded Memory pooling to get a multi-threaded one. But the execution of such an approach will take 1,400 ms. The question is, do we need all the functionality of the Mutex? The answer is NO! When a thread calls Mutex to lock a section, the Mutex object checks whether this thread has called the lock before? Further, when unlocking the Mutex object checks if the caller is the actual thread that’s locked it. And depending on the Mutex object implementation, if it was processed in Kernel mode so add 200 CPU cycle to convert from user to kernel mode. So do we actually need all of that?
Suppose we don’t really need all this locking flexibility. Suppose that our application’s use of the locking services is so simple that we can guarantee that the locking thread does not already hold the lock. Further suppose that we can guarantee that the unlocking thread is the one that locked it in the first place. Now we can get away with a locking scheme that’s a lot less sophisticated than the one provided by the Mutex object. What we ought to do is implement a new lock class using faster and more primitive building blocks to provide locking. Along the way we are trading portability for speed. Spin Lock is one of these candidates.
After implementing the memory pool using Spin Lock, the run-time was 900 ms.