C# Value and Reference Types

It is always good to recap on "basics" - so this post is about C# value and reference types. To understand these, there are a few other things to know before that.

Stack and Heap

The stack and the heap are the two places the .NET framework stores items in memory as code executes.  They are in the operating memory on machines.

A way to understand them is to list their properties and differences - there are many, and some are below.

The stack:

  • is a last in first out (LIFO) data structure
  • keeps track of what's executing in code (i.e. what's been "called")
  • is self-maintaining, i.e. takes care of its own memory management
  • access is faster than the heap, because the access is simple to the top of the stack due to the LIFO data structure
  • objects allocated on the stack are available only inside of a stack frame (execution of a method), while objects allocated on the heap can be accessed from anywhere.

The heap:

  • keeps track of objects
  • has to do about garbage collection
  • access is slower than to the stack

There are four main types of things that are put in the stack and heap as code executes:

  • value types
  • reference types
  • pointers
  • instructions

We will get to value and reference types - but first, some definitions.

Pointers

A reference is often referred to as a pointer. C# supports pointers to a limited extent - but the are managed by the Common Language Runtime (CLR).

A pointer/reference differs to a reference type because a reference type is accessed through a pointer.

A pointer points to another space in memory - its value is either a memory address or null.

Type

What is a type? Well, “any value described by a type is called an instance of that type”.

Then, from the spec:

  • “type, value: A type such that an instance of it directly contains all its data. (…) The values described by a value type are self-contained.”
  • “type, reference: A type such that an instance of it contains a reference to its data. (…) A value described by a reference type denotes the location of another value.”

Heap and Stack and Value and Reference

Note that:

  • A reference type always goes on the heap
  • value types and pointers always go where they were declared.  As for where things are declared - if a value type is declared outside of a method, but inside a reference type, it will be placed within the reference type on the heap.

So what differences are there between value and reference types? There are differences such as when sharing the data:

  • passing by value vs passing by reference
  • reference types are passed by reference
  • value types are passed by value
  • changes to an instance of a reference type affect all references pointing to the instance.
  • value type instances are copied when they are passed by value. When an instance of a value type is changed, it of course does not affect any of its copies. Because the copies are not created explicitly by the user but are implicitly created when arguments are passed or return values are returned, value types that can be changed can be confusing to many users. So, value types should be immutable.

Also when comparing the identity of two objects:

  • value types are identical if and only if the bit sequences of their data are the same.
  • reference types are identical if and only if their locations are the same.

As for null:

  • value types cannot store null, whereas references types can - but in C# 8.0 nullable reference types were introduced, such that they are explicitly marked as being nullable, with others being non-nullable.

As for inheritance

  • reference types support inheritance, but value types do not

As for storage

  • reference types are allocated on the heap and garbage-collected. Value types are allocated either on the stack or inline in containing types and deallocated when the stack unwinds or when their containing type gets deallocated. Allocations and deallocations are in general cheaper.

As for arrays

  • arrays of reference types are allocated out-of-line, meaning the array elements are just references to instances of the reference type residing on the heap
  • Value type arrays are allocated inline, meaning that the array elements are the actual instances of the value type. Therefore, allocations and deallocations of value type arrays are much cheaper than allocations and deallocations of reference type arrays. In addition, in a majority of cases value type arrays exhibit much better locality of reference.

As for boxing

  • boxing occurs for reference types as reference types are cast
  • value types get boxed when cast to a reference type or one of the interfaces they implement. They get unboxed when cast back to the value type. Because boxes are objects that are allocated on the heap and are garbage-collected, too much boxing and unboxing can have a negative impact on the heap, the garbage collector, and ultimately the performance of the application.

As for copies

  • assignments of large reference types are cheaper than assignments of large value types, due to the storage of just the memory location.

Structs vs Classes

So then, with all this knowledge, of types, what are their applications? Structs and classes are one.

You would consider defining a struct instead of a class if instances of the type are small and commonly short-lived or are commonly embedded in other objects.

You would avoid defining a struct unless the type has all of the following characteristics:

  • It logically represents a single value, similar to primitive types (int, double, etc.).
  • It has an instance size under 16 bytes.
  • It is immutable.
  • It will not have to be boxed frequently.

In all other cases, you should define your types as classes.

Managed Resources

We did mention garbage collection earlier. But what are managed resources? What are unmanaged resources?

Managed resources are "managed memory" i.e. managed by the garbage collector. When you no longer have any references to a managed object then the garbage collector will, in time, release that memory for you.

So unmanaged resources are things that the garbage collector does not know about. For example open files, open network connections etc.

Usually you want to release those unmanaged resources before you lose all the references you have to the object managing them. You do this by calling Dispose on that object, or (in C#) using the using statement which will handle calling Dispose for you.

If you neglect to Dispose of your unmanaged resources correctly, the garbage collector will in time handle it - but because it does not know about the unmanaged resources, it does not know how important it is to release them - so it is possible for things to perform poorly when this happens.

If you implement a class yourself that handles unmanaged resources, it is up to you to implement Dispose and Finalize correctly.

The .NET garbage collector does not allocate or release unmanaged memory. The pattern for disposing an object, referred to as the dispose pattern, imposes order on the lifetime of an object. The dispose pattern is used for objects that implement the IDisposable interface, and is common when interacting with file and pipe handles, registry handles, wait handles, or pointers to blocks of unmanaged memory. This is because the garbage collector is unable to reclaim unmanaged objects.

Mutability

Both value and reference types can be immutable or mutable.

Of note in the implementation of C# is that String is immutable but StringBuilder is mutable. So while a string is a reference type, due to the fact of it being immutable, its value cannot be changed.