Categories
C++Programming

Unions are almost always a bad idea.

This post was viewed [wpstatistics stat=pagevisits time=total id=748] times.

To use a classical union from C is almost always a bad idea in C++ and you should use std::variant instead. I show you why. And I also will show you in which big(!) trap I ran into myself with a union.

We start with a simple union for illustration:

union Num {    // this union can either be an int or a float 
    int    i;
    float  f;
};

constexpr auto s1 = sizeof( Num );  // size of Num is 4 (bytes).

The good thing with that union is, it uses only 32 bits (4 bytes) in memory and it can be an int or a float.
So, why should I pay the overhead which comes along with a std::variant?
And… what even is that overhead?

Well, when using a bare metal union, there are very important rules which must not be violated or you are in the land of UB (undefined behavior) and your program is ill-formed.

One important rule for unions is, that only one (and only exactly one) of the alternative members can be active at a given time.

The union Num above can be either an int or a float, but never both!

Num num;      // no member active yet.
num.i = 123;  // now only the int i is active and can be used.

int x = num.i + 7;               // OK, using active i
std::cout << num.i << std::endl; // OK as well.

// Cannot use f because i is active!
float fi = num.f; // DON’T DO THIS!!! This is undefined behavior (UB)!

// switch the active member.
num.f = 2.1;  // now the float f is the active member and i not anymore.

// now can do this:
float fi = num.f;    // OK, f is the active member.

//BUT:
int i2 = num.i;  // DON’T DO THIS!!! This is now undefined behavior (UB)! 

You switch the active member of a union by assigning to it. After that only this member can be used to read from. Using any other member will be undefined behavior!

Because of that you almost always need a second variable to remember which type is active. Mostly an enum is used for it.

enum eNum   // enum to memory if the union is int or float
{
    Int,
    Float
};

Num num;
num.i = 123; // i is active.

eNum e{ Int }; // enum must be set separately to memorize that the int is active.

// dispatch the active member based on the enum
switch( e ) {
case Int: // the int is active.
    std::cout << num.i;
    break;
case Float: // the float is active.
    std::cout << num.f;
    break;
default:
    std::terminate(); // should not happen.
}

There are several possible traps when doing this.
The most dangerous one is, that the value of the enum must be maintained by hand.
It can be easily forgotten to update or it could be changed to the wrong value.
Also, the usage of the enum together with the union can still be buggy (typo, etc.).
There is even no enforcement that the enum is considered when read or write to the union.

Thus, you can easily still land in the world of undefined behavior.

There is a little improvement when bundling the union and the enum together in one struct (as a so-called tagged union.)

struct Number {  // this is a tagged union
    eNum tag;
    Num  val;
};

Number  num;
num.tag   = Int;
num.val.i = 123;

switch( num.tag ) {
case Int:
    std::cout << num.val.i;
    break;
/* ...  */
};

But still the same set of problems are present.
There is no enforcement to use the tag and to use it correctly.
It must be maintained by hand at all places where it is modified.

When using a std::variant instead, it will do all that things for you automatically!
Interestingly, when you compare the size of std::variant and the tagged union, you will see, that it is the same. The std::variant is essentially a smart tagged union, which do the work for you.

// The tagged union Number from above as std::variant.
std::variant< int, float > varnum{ 123 };

constexpr auto s1 = sizeof( Num );     // bare metal union: 4 bytes
constexpr auto s2 = sizeof( Number );  // tagged union    : 8 bytes
constexpr auto s3 = sizeof( varnum );  // std::variant    : 8 bytes

// is true, std::variant and tagged union have same size.
static_assert( sizeof( Number ) == sizeof( varnum ) );

Because almost always you need to maintain the information which member is active in a newly introduced variable, you can use just a std::variant instead.
With that maintaining the tag is enforced and guaranteed by the std::variant.
Also, you cannot access the wrong (inactive) member by accident.
The std::variant is doing all for you.

// The tagged union Number from above as std::variant.
std::variant< int, float > num{ 123 };

switch( num.index() ) {
case 0: // the int is active.
    std::cout << std::get<0>( num ); // will throw if int is not the active member.
    break;
case 1: // the float is active.
    std::cout << std::get<1>( num ); // will throw if float is not the active member.
    break;
case std::variant_npos:
default:
    std::terminate(); // should not happen.
}

// there are better alternatives as to switch over the index.
// but I just wanted to modify the origin example via a minimal change.
// see std::visit and search for the "Overload Pattern" for a better way.
// (or read here: https://www.cppstories.com/2019/02/2lines3featuresoverload.html/ )

num = 2.1; // now float is active.
num.index();  // returns 1 now.
std::cout << std::get<float>( num );  // works, prints 2.1
std::cout << std::get<int>( num );    // will throw std::bad_variant_access

As you can see in the example, the usage of std::variant is safer and also easier to use compared to a union + enum.
You don’t pay more as when using a tagged union. It is almost always necessary to know which member is active.
In fact, when using std::variant you are paying even less because you safe a lot of code which you don’t need to write by yourself.
Depending on the usage, you might pay the cost of one extra check inside the std::variant to probe if the correct alternative is being accessed. But hey, the alternative would be to enter the world of undefined behavior! So, this possible extra check can be easily neglected.

So, is this the complete story already?
No!

There is one more important thing when using a bare metal union. In this big(!) trap I recently ran into by myself.

I will demonstrate this big trap with the next example.

So, assuming you have objects of some class Foo and some of these objects are allocated statically and some dynamically. The statically allocated are maintained automatically during the program end. The dynamically allocated ones are maintained via a smart pointer.

class Foo {
    std::string const mName;
public:
    explicit Foo( std::string const &rName )
        : mName( rName )
    {}
};

// some statically Foos, will be cleaned up automatically at program end.
static Foo const foo_abc( "abc" );
static Foo const foo_xyz( "xyz" );

Then I had another class Bar which needs an instance of Foo. It will either get a pointer to one statically allocated Foo or a new instance which will be allocated dynamically. My idea then was to create a union which can handle both instances. Because I don’t change the active member of the union after it was initially created and I know from elsewhere which member is active, I thought I am fine to safe the std::variant usage in that case.

Then I came up with a similar code as shown in this simplified example:

class Bar
{
    union FooPtr       // union can either be a raw pointer to Foo or a std::unique_ptr
    {
        Foo const *ptr;
        std::unique_ptr<Foo const> uptr;
    };
    FooPtr  mFooPtr;
public:
    Bar( Foo const *raw_foo )    // this constructor uses a raw pointer without automatic deletion (b/c it is statically allocated we must not delete it here!)
        : mFooPtr( raw_foo )
    { }

    Bar( std::unique_ptr<Foo const> smart_foo ) // this constructor uses a smart pointer.
        : mFooPtr( nullptr )
    {
        mFooPtr.uptr = std::move( smart_foo );
    }
};

// with that you can use it like this:
Bar bar1( &foo_abc );                        // use statically allocated Foo.
Bar bar2( std::make_unique<Foo>( "uvw" ) );  // use dynamically allocated Foo.

When trying to compile this, you will notice it won’t compile!
With Visual Studio you will get this message: warning C4624: 'Bar::FooPtr': destructor was implicitly defined as deleted

And then:
error C2280: 'Bar::FooPtr::~FooPtr(void)': attempting to reference a deleted function

What the hack?

Why the destructor is deleted? What is going on there?

If you then naively put a destructor into the union FooPtr like this ~FooPtr(){}  it will compile fine. Are we done now? Is this all?
Nope!

There is a very huge problem now!

You can easily discover the problem by yourself by adding a destructor to Foo which prints a message to std::cout like this:

~Foo()
{
    std::cout << "Destructor Foo: " << mName << std::endl;
}


// with this given Foos
static Foo const foo_abc( "abc" );
static Foo const foo_xyz( "xyz" );

// ... and this given Bars
Bar bar1( &foo_abc );
Bar bar2( std::make_unique<Foo>( "uvw" ) );

Guess which messages from the Foo destructor will you see?
Or better: Which message will you not see?

You can try this code also on Godbolt: https://godbolt.org/z/3Gs3MfPMP

Yes, the destructor of the “uvw” Foo will never be called! There is a huge memory/resource leak in the code even though we are using a smart pointer!

That was the big trap I ran into myself: Although I use a smart pointer, the smart pointer is not smart anymore! The destructor of it is not called inside a union!

This is indeed true for all classes as union members.

You must call the destructor manually, e.g., like this mFooPtr.uptr.~unique_ptr();
But a smart pointer where you must call the destructor manually is not smart and you could also just call the delete operator manually. The smart pointer is completely useless then.

With that given rule a union with members other than primitive types (more precisely: types which are not trivially destructible, see https://en.cppreference.com/w/cpp/language/destructor#Trivial_destructor ) is highly unrecommended to use.

Simply use a std::variant and the destructor of the active member will be called automatically as everyone is expecting!

(But as a side note: Unfortunately a std::unique_ptr can neither be used inside a std::variant because it is a move-only type. You must use a std::shared_ptr then.)

I hope this article was a value for you and maybe you learned something new and useful. Feel free to leave a comment, also when you like it or disagree with something. Happy coding! 😊

4 replies on “Unions are almost always a bad idea.”

With a title like “Unions are almost always a bad idea.”, I’d have liked to have seen you cover when using them (directly) is not a bad idea, compared to using std::variant. One obvious case: it’s more space efficient to have one type-discriminating value track the active member of a large number of unions (you could instantiate something like std::variant<std::array, std::array>, but you might want them spread around amongst other relevant data for reasons related to OO design, lifetimes, and cache utilisation.

Thank you for your comment and an example use case where plain unions might be a good choice (or a better choice compared to std::variant). It is highly welcome and I appreciate it.
I was not thinking about a class architecture where several instances of the same union are floating around, but all have the same member active…. Do you have a real use case example for this?

I will take it into account for a possible follow up blog post about unions.
Among yours there are a few other use cases.

Cheers! 🙂

One potential use case for “tagged” union where the tag is outside is a table-like struct, where you have the column definition (and thus type) for that column. So you know all the values in that column will have the same type, tagging each such element is not very efficient.

All is well if you use POD types, however if you happen to need some kind of memory-management (ie strings), you have to do the trick of calling the destructor manually. In that case I would still use a smart pointer just for the sake of not needing to manually call delete[] on the dtor, also naked new is BAD ™

Thank you for your comment and great example for a use case of a union without a direct bound tag!

Also, I totally agree considering a naked new and delete as BAD ™ 😉

Leave a Reply

Your email address will not be published. Required fields are marked *