Arguments and Smart Pointers
For efficiency reasons, C++ had a myriad of ways to pass data around. A function can take arguments in several forms:
void foo(T);
void foo(T*);
void foo(T&);
void foo(T**);
void foo(T&&);
No to mention adding const
and smart pointers into the mix.
And speaking of pointers, we have raw pointers, unique_ptr
,
shared_ptr,
CComPtr
/ComPtr
(for Windows COM objects), and, if you
are working in older codebases, auto_ptr
, and maybe even some
homebrewed refcounted pointers.
All of this might seem a bit daunting and make C++ seem more complicated
than it really is. One way to think about it is from the ownership
perspective: objects are resources and the key question is who owns
this resource?
. This should dictate both how a particular resource is
allocated (and released) and the shape function arguments should take.
Lifetime is also an important consideration - a resource shouldn't be
released while other components expect it to be available, but it should
be released as soon as it is no longer needed.
In this post I will try to cover the various ways in which resources can be allocated, owned, and passed around.
Stack Objects and Passing Arguments
The simplest, clearest thing to do is allocate objects on the stack. A stack object doesn't involve any pointers:
void foo()
{
Bar bar;
}
The variable bar
of type Bar
is created on the stack. Once the stack
frame is popped (once the function is done executing, either through a
normal return or due to an exception) the object goes away. This is the
easiest, safest thing to do. The only reasons we wouldn't always do
this are time and space requirements: lifetime-wise, we might want to
somehow use bar
after foo()
returns - for example we might want to
pass it around to some other object that wants to use at a later time;
in terms of space, stack memory is more limited than the heap, so large
objects are better kept on the heap to avoid overflow.
Pass by value
One way to get around the lifetime requirement is to pass the object by value:
void do_suff(Bar bar); // hands bar off to some other object
void foo()
{
Bar bar;
do_suff(bar);
}
Let's assume for this example that do_suff
takes the argument and
sticks it into some global object which will use it as some future time.
The above code will simply create a copy of the object, so whatever
do_suff
gets won't be the original resource which gets freed once the
function returns, rather a copy of it. Copying an object costs both run
time and space, but if neither are a big concern, this is a great, safe
way of ensuring resources don't get released before we're done with
them.
Move
C++11 introduces a cheaper way of achieving this, through move semantics:
void do_suff(Bar&& bar); // hands bar off to some other object
void foo()
{
Bar bar;
do_suff(std::move(bar));
}
With move semantics, the resource is actually moved into the do_suff
function. Once this happens, the original object is left in an undefined
state and shouldn't be used anymore. This approach is usually employed
when we have a sink argument - we sink bar
to its final resting
place somewhere and foo()
no longer cares about it after passing it
down to do_stuff
.
One thing to keep in mind is that move
is not magic, so Bar
needs to
declare a move
constructor
in order for this to do what we expect it to do. If Bar doesn't declare
a move constructor, the above becomes a simple copy1.
Pass by reference
On the flipside, when we care about the size, so that we don't want to
create a copy of the object, but we aren't worried about the lifetime -
meaning the object we pass to do_stuff
won't have to outlive the
function call, we can pass by reference:
void do_suff(const Bar& bar); // bar is only used within do_suff
void foo()
{
Bar bar;
do_suff(bar);
}
Note the const
above - this means do_suff
will use bar
but won't
modify it. By default, arguments should be marked as const
unless the
function does indeed need to alter the object. Regardless of constness,
in this case we pass a reference to bar
as an argument, which is very
cheap (a reference has the same size as a pointer). The only caveat is
that do_stuff
should not pass this to some other object that outlives
the function call (eg. a global object which tries to use it later),
because as soon as foo
returns, the reference becomes invalid.
Pass by pointer
A pointer argument would look like this:
void do_stuff(const Bar* bar);
void foo()
{
Bar bar;
do_stuff(&bar);
}
A good rule of thumb is to not do this. The difference between passing by reference and by pointer in this case is that a pointer can be null, while a reference can't. So passing by pointer here automatically brings the need to perform null checks to ensure bad things don't happen. You would need to make a very good argument to convince me during code review that using a pointer instead of a reference is appropriate. Unless working against a legacy API which can't be changed, I highly discourage use of raw pointers.
Summary
In summary, when designing an API:
- Take argument by value if copying it is not a concern
- Take argument by
const&
if it's not a sink argument, meaning we don't need to refer to it passed the function call - Take argument by reference (
&
) if 2) but the API needs to modify it - Take argument by
&&
if it's a sink argument, the type has a move constructor, and copying it is expensive - Don't pass raw pointers around
Heap Objects and Smart Pointers
In all of the examples above, bar
was an object created on the stack.
This works great in some cases, but some objects are simply too big to
fit on the stack, or it doesn't make sense for them to do so (if, for
example, we want to vary their size at runtime). In this case, we
allocate the object on the heap and keep a pointer to it.
Once we start working with heap objects, ownership becomes even more important: unlike stack objects, which get automatically destroyed when their stack frame gets popped, heap objects need to be explicitly deleted. This responsibility should be with the owner of the object.
Unique pointer
A unique pointer (std::unique_ptr
) is a wrapper around a raw pointer
which will automatically delete the heap object when it goes out of
scope itself:
void foo()
{
auto ptrBar = std::make_unique<Bar>();
} // ptrBar goes out of scope => heap object gets deleted
The above call to make_unique
allocates an instance of Bar on the heap
and wraps the pointer to it into the unique pointer ptrBar
. Now
ptrBar
owns the object and as soon as ptrBar goes out of scope, the
heap object is also deleted.
Unique pointers cannot be copied, so we can never accidentally have more
than one single unique_ptr
pointing to the same heap object:
auto ptrBar = std::make_unique<Bar>();
...
std::unique_ptr<Bar> ptrBar2 = ptrBar; // Won't compile
Of course, if we really want to, we can get the raw pointer out of
ptrBar
using get()
and we can initialize a unique_ptr
from a raw
pointer -
// Please don't do this
std::unique_ptr<Bar> ptrBar2(ptrBar.get())
but this is very bad - now both pointers think they have sole ownership
of the resource, and as soon as one goes out of scope, using the other
one leads to undefined behavior. In general, the same way there are very
few good reasons to use raw pointers, there are very few good reasons to
call get()
on a smart pointer.
Shared pointer
Sometimes, we do need to have several pointers pointing to the same heap object. In this case, we can use shared pointers. Shared pointers pointing to the same heap object keep a common reference count. Whenever a new shared pointer is created for that particular heap object, the reference count is incremented. Whenever a shared pointer for that heap object goes out of scope, the reference count is decremented. Once the last shared pointer goes out of scope, the heap object is deleted.
void foo()
{
auto ptrBar1 = std::make_shared<Bar>();
// one pointer to a Bar object on the heap (ref count = 1)
{
auto ptrBar2(ptrBar1);
// second shared pointer (ref count = 2)
}
// ptrBar2 goes out of scope (ref count = 1)
}
// ptrBar1 goes out of scope (ref count = 0) => heap object is deleted
Shared pointers incur a bit more overhead than unique pointers - reference counting needs to be atomic to account for multi-threaded environments, which comes with a runtime cost. The reference count itself also needs to be stored somewhere, which is a small space cost. Unique pointers don't have these time and space costs since they don't need to count references - there is always only one pointer to the object.
Costs aside, shared pointers also don't make the ownership clear -
there are several instances owning
the heap resource at the same
time, which can potentially alter it and step on each other's toes. In
general, prefer unique pointers to shared pointers whenever possible.
Raw pointer
Avoid using raw pointers. Raw pointers don't express ownership, so they don't offer the same guarantees that a) the resource pointed to gets properly cleaned up and b) the resource pointed to is still valid at a given time. This leads to dereferencing invalid memory and double-deletes (trying to free the same heap object multiple times), which means undefined behavior. Also, don't mix smart and raw pointers - the smart pointers will keep doing their job happily, with the potential of making the raw pointers invalid.
COM pointers
On Windows, COM uses a different reference counting mechanism: the base
IUnknown
interface declares AddRef
and Release
methods, which
implementations are expected to use to keep track of the reference
count. CComPtr
(in ATL) and ComPtr
(in WRL) are the COM smart
pointers. They call AddRef
and Release
on the owned object, and the
owned object is supposed to delete itself once its reference count drops
to 0. Note that COM uses a slightly different mechanism than the
standard library shared pointers -instead of the smart pointer keeping
track of the reference count in the control block and deleting the
object once the last reference goes away, COM objects are expected to
keep track of their reference count themselves through the AddRef
and
Release
methods and self-delete when the last reference goes away
(through Release
call). The COM smart pointers only need to call
Release
when they go out of scope.
It's not a good idea to have both standard library and COM pointers
point to the same object, as each might decide to delete the object at
different times -shared_ptr
looks at the shared_ptr
refcount while
COM objects look at their internal reference count. So a shared_ptr
might decide to delete an object while a ComPtr
still expects it to be
valid or vice-versa. In general, when working with COM objects, use COM
smart pointers.
auto_ptr
auto_ptr
is a deprecated smart pointer. Unless working with an old
compiler and standard library, use unique_ptr
or shared_ptr
instead.
Other smart pointers
Old code bases might have custom smart pointer implementations, for the simple fact that automatic memory management is always a good idea, and there is C++ code that predates the introduction of smart pointers into the standard library. When interoperating with legacy code, use whatever works, but when writing new code, do prefer standard library smart pointers to homebrewed ones.
Summary
In summary, when creating objects:
- Create them on the stack if feasible (note that standard library
types like
std::vector
andstd::string
internally keep their data on the heap, but they fit perfectly well on the stack, so you don't need to create anstd::vector
on the heap just because you are planning to store a lot of elements in it - the vector manages a heap array internally already). - Use a
unique_ptr
when creating them on the heap, to make ownership obvious. - Use a
shared_ptr
only whenunique_ptr
isn't sufficient (review your design first, might be a design issue). - Use COM smart pointers like
CComPtr
when dealing with COM. - Don't use
auto_ptr
or other old constructs unless working with legacy code/compiler. - Don't use raw pointers.
Passing Smart Pointers as Arguments
We covered passing arguments and smart pointers. Now combining the two, how do we pass heap objects as arguments? Turns out Herb Sutter has a great post on this exact topic on his blog. I can't hope to explain better than him, so go read his post. I will try to summarize:
Pass by reference the pointed-to type
Rather than forcing callers to use unique_ptr
or shared_ptr
by
specifying the smart pointer type (which makes assumptions about
ownership), just ask for a reference to the pointed-to-type:
void do_stuff(const Bar& bar);
void foo()
{
auto ptrBar = std::make_unique<Bar>();
do_stuff(*ptrBar);
}
Herb also mentions raw pointer to the underlying type if the argument can be null, but as I mentioned above, I'd rather stick to references and discourage use of raw pointers as a general rule of thumb.
Pass smart pointer by value
Passing a unique_ptr
by value implies a sink argument - since a
unique_ptr
cannot be copied, it has to be std::move
'd in.
Interestingly, Scott Meyers has a post on his
blog where he disagrees with this and argues that arguments of move-only
types should be specified as &&
:
void do_stuff(unique_ptr<Bar>&& ptrBar); // sink
void foo()
{
auto ptrBar = std::make_unique<Bar>();
do_stuff(std::move(ptrBar));
}
Passing a shared_ptr
by value implies the function wants to partake in
the ownership - in other words, will keep somewhere a reference to the
object after the function returns, but unlike the above unique_ptr
example, it won't have exclusive ownership of the resource:
void do_stuff(shared_ptr<Bar> ptrBar);
void foo()
{
auto ptrBar = std::make_shared<Bar>();
do_stuff(ptrBar); // copy-constructs another shared_ptr which shares ownership of the heap object
}
Pass smart pointer by reference
Only expect a smart pointer by non-const reference if the function is going to modify the smart pointer itself (eg. by making it point to a different object). In my experience, this is a rare occurrence.
// Implies this function modifies the pointer itself.
void do_stuff(shared_ptr<Bar>& ptrBar);
There is no good reason to expect a const&
to a unique_ptr
, just
reference the underlying type:
// void do_stuff(const unique_ptr<Bar>& ptrBar);
// No reason to use the above as opposed to
void do_stuff(const Bar& bar);
Expect const&
to shared_ptr
only if the function might create a
copy of the smart pointer. If the function would never create a copy of
the pointer, simply use &
to underlying type. If the function would
always copy the pointer, expect shared_ptr
by value.
// Might or might not share ownership
void do_stuff(const shared_ptr<Bar>& ptrBar);
// Will never share ownership
void do_stuff(const Bar& bar);
// Will always share ownership
void do_stuff(shared_ptr<Bar> ptrBar);
Summary
A summary of the summary:
- Take argument by
&
to underlying type if you only care about the heap object, not about the pointer. - Take
unqiue_ptr
by&&
to transfer ownership. - Take
shared_ptr
argument by value to partake in ownership. - Take smart pointer by (non-const) reference only if you are going to modify the smart pointer itself.
- No need for
const&
tounique_ptr
(just take&
to underlying type) - Take
const&
toshared_ptr
only if unknown whether function wants ownership (take by&
to underlying type if function never wants ownership,shared_ptr
by value if function always wants ownership).