HLSL Constant Buffer Packing Rules & Layout Visualizer

HLSL Constant Buffer Layout Visualizer

HLSL Constant Buffer Packing Rules

0. Introduction

Constant Buffers in HLSL have rather peculiar rules for how their individual members are laid out in memory. These rules are very different from those for structures in C and C++ and very hard to practically impossible to implement in their type system. This often leads to a lot of confusion and frustration when the corresponding struct layouts in the CPU and GPU code mismatch and data ends up in the wrong places as a result. This article will teach you all the packing rules for constant buffers in HLSL for Direct3D 10 / Shader Model 4.0 or newer.

Note that these rules are not the same as for HLSL Structured Buffers (which are pretty much equal to C) or GLSL uniform buffers (called "std140 layout"). I will compare and contrast with those where appropriate, but explaining all the rules for different languages/buffer types from scratch is beyond the scope of this article. If you want to play around with your own constant buffers and confirm their memory layout, click the big button above (code editor not supported on mobile browsers). There is a similar visualization available for C++ in Visual Studio 2022 version 17.9 (currently in preview as of 2024-02-05).

Update (2024-02-05): Structured Buffers are now supported in the visualizer and I have added a short explanation of the rules to the end of the article, including how they can differ from the rules for C structs depending on the target platform.

Let's start by defining what alignment means. We say a memory address is "aligned to N" if it is evenly divisible by N, or in other words if the address is an integer multiple of N. If a type "has an alignment [requirement] of N" then an instance of it must always start at a memory address aligned to N. The alignment N is always a power of two.

In HLSL, we do not have access to memory addresses / pointers, so for our purpose of defining the layout of a constant buffer, we only need to care about the offset in bytes from the start of the buffer that is assigned to each member. The definitions of alignment work equivalently for that offset. In order to maintain proper alignment for all members of a struct/buffer, it may be necessary to add invisible padding bytes in between them.

If you have trouble reading the examples, here are some controls for the colors and a theme switch for the whole site:

1. Type Alignment

Rule #1: Basic scalar types such as float, uint, uint16_t, etc. have a "natural" alignment requirement equal to their size. This is also called "self-alignment".
Note: bool in HLSL is 4 bytes large!

cbuffer ExampleScalarAlignment { uint16_t smol; float f1; float f2; double lorg; bool four_bytes; };

You can mouse over the colored struct members or the rectangles to highlight the individual elements.

In the example above, there are 2 bytes of padding added after smol in order to align f1 properly to a multiple of 4. Similarly, lorg cannot start at an offset of 12 since that would not be an integer multiple of its alignment 8, resulting in an additional 4 bytes of padding in between f2 and lorg.

It's pretty rare to use the 2 or 8 byte large scalar types in HLSL, but this is the most fundamental alignment rule. It is common across most programming languages and target platforms because many processor architectures require the memory address of a load or store instruction to be aligned to the size of the load/store (x86 being one of the biggest exceptions). Misaligned addresses may result in either a CPU exception, unexpected results (e.g. the processor may round the address down to the nearest aligned one and thus load different data) or in some cases slower execution. This is why type self-alignment is very fundamental for program correctness and performance, so we make this our first rule, as it's practically universal.

Tip: If you need to supply a bool to HLSL in any buffer type, use an actual 4 byte type on the CPU code side (e.g. windows.h's BOOL, uint32_t or your own for C/C++). Do not use alignas(4) bool or manual padding (e.g. bool b; char[3] pad;) because you do not always have the guarantee that the internal padding bytes are zeroed out, but they will actually be read by the shader regardless and thus your boolean could produce a true value when it shouldn't!

For the sake of completeness, the minimum-precision types (min10/16*) and half have a storage size (and thus alignment) in memory of 4 bytes by default, even though the driver compiler is allowed to emit lower precision ALU instructions for the min-precision types (not half, which is directly mapped to float). When you enable native 16-bit type support via the -enable-16bit-types command line argument to DXC, all of those types including half turn into actual 16-bit types like (u)int16_t or float16_t and have a corresponding size and alignment of 2 bytes, as in the examples. There are no types smaller than 4 bytes in HLSL without using DXC and this command line flag.

Rule #1a: Vector types are aligned according to their scalar component type.

cbuffer ExampleVectorAlignment { float f1; float3 vec; uint3 other_vec; float f2; float16_t smol; int2 intvec; };

This is the first rule that might bring some surprise, especially if you are familiar with GLSL uniform buffers / std140, where a vec3 (equivalent to float3) would have an alignment of 16. This is not the case in HLSL. The first four members in this example all have an alignment requirement of 4 and they can all be neatly packed together, it also doesn't matter that some of them are floats and some are uints. Just like in the first example, we do incur some padding when a smaller type is followed by a larger one, but while the size of intvec is 8, it's alignment requirement is only 4 (based on its scalar type), so only 2 bytes of padding are needed.

The rules we've covered so far are the only ones that deal with type alignment in constant buffers. All of the following rules are entirely unique to constant buffers and only involve variable alignment. I'll explain the difference between these two terms as we cover more rules and examples.

2. Buffer Rows

Rule #2: A Constant Buffer is arranged conceptually like an array of 16-byte rows. The members of the buffer are packed into these rows and if an individual member would cross the boundary between two rows, its starting offset is forcibly aligned to 16 bytes, pushing it to the start of the next row if it isn't already aligned.

cbuffer ExampleBufferRows { float3 position; float3 normal; uint index; double4 d; };

This rule explains why I chose the visual memory layout diagrams to be in rows of 16 bytes.

In this example, two float3 members next to each other incur 4 bytes of padding in between, but this is not because of the inherent alignment requirement of their type (which is still 4). The padding results purely from crossing a 16-byte row boundary within the buffer. It is only dependent on the size of a member and its initially desired starting offset, or effectively the sequence of member types in the buffer. This rule does not confer alignment to a type, but to a specific individual member variable and is thus no longer a type alignment rule. normal has a desired offset of 12, right at the end of position, but it is forcibly pushed to start at offset 16 because it does not fit entirely into the rest of the first 16-byte row.

The member index is packed directly after normal with no padding and is here to demonstrate again that float3 does not actually have a type alignment of 16, otherwise the size of the type itself would have to be padded to 16 as well. In C-like struct packing, the size of a type must always be an integer multiple of its alignment. Otherwise, an array of that type would misalign its elements and a struct containing an overaligned type would have its layout depend on its starting address. This common rule never applies to types inside constant buffers because those specific problems with arrays and structs are already solved in other ways, which we will see further below in rules #3 and #4, but let's not get ahead of ourselves. The double4 variable is too large for one row even just by itself, but its starting offset is already aligned to 16 bytes without any modification, so it stays where it is.

The row-packing rule is probably the most confusing of them all, because only looking at the offsets may imply an increased type alignment where there is none. In other languages, aligning individual variables is usually only relevant for allocating entire buffers when you do SIMD work or want cache or page alignment for other reasons. HLSL is very much the odd one out with its "array of 16-byte aligned rows" design for constant buffers that affects individual struct members, so it takes some time to get used to it.

I usually recommend manually padding out structs in C++ and equivalently in HLSL for consistency, instead of trying to emulate this behavior with alignas(16) (which can do either variable or type alignment depending on where you put it). For example, you could put uint32_t pad0; after float3 position;. Since the 4 byte scalar types and their vectors are a large majority of constant buffer usage (apart from matrices, which we'll get to later), it can help to conceptually think of your buffer as an array of float4/uint4 elements and then assign "slots" for individual variables within those elements, leaving anything that isn't assigned as padding.

Minimum-precision types (min10/16*) have weird interactions with this rule when native 16-bit types are not enabled. In that case, min-precision types cannot be packed together with regular fixed-precision types in the same row. My opinion: just don't use them.

3. Arrays and Matrices

Rule #3: Each element of an array starts a new 16-byte row, i.e. its starting offset is forcibly aligned to the next 16 byte boundary.

cbuffer ExampleArrays { float2 before; float array[2]; float2 after; };

The individual array elements are expanded in this visualization to help clarify the layout. The resulting syntax isn't a legal struct declaration anymore.

This is one of the big pain points of constant buffers. Arrays are not packed tightly and instead must start a new 16-byte row for each element so that array indexing happens on a per-row basis, leading to potentially huge gaps in between array elements. This can be worked around by using larger element types (e.g. float4 arr[4] instead of float arr[16]), but then you have to use 2D indexing with division and modulo (e.g. arr[i / 4][i % 4] or equivalently arr[i >> 2][i & 3]) which can then hopefully be optimized out once you go through the driver compiler (unlikely with a value only known at runtime). Note that as before, we are not increasing the elements' type alignment and not increasing the size of them to full rows, we are just aligning their starting offset as if they were individual variables, so the total size of array in this example is 20 bytes and the member after can be packed right at the end of the array with no additional padding.

This is annoying to deal with in C++, because if the elements do not have a size divisible by 16, you would have to do away with the array and put them as individual variables with either added variable alignment via alignas(16) or manual padding before and in between (but not at the end!). You could also increase the size/alignment of the element type (e.g. making an array of padded structs), but then you have to make sure to add padding after the array on the HLSL side because it only adds it in between the array elements. None of these options are particularly great.

Rule #3a: Matrices are laid out equivalently to arrays of column vectors (row vectors for row-major matrices). Hence, the array size is the number of columns (rows for row-major) and the vector size is the number of rows (columns).
Exception: Matrices with only one column (row for row-major) are layout-equivalent to a simple vector, not an array of one vector (i.e. they do not incur the extra 16 byte row alignment).
Note: The HLSL type is always typeRxC with R being the number of rows and C being the number of columns (matches math notation, but reverse order of GLSL/GLM's matCxR) and the default storage order is column-major.

cbuffer ExampleMatrices { float2 before; float2x3 mat; float2x1 smol_mat; };

Matrices are converted into their layout-equivalent types in this visualization because it makes my job easier.

Thankfully, matrices just collapse down to the types and rules we've already learned about, with the only slightly tricky part being one-column matrices (note how smol_mat does not need to be 16-byte aligned, though those matrix dimensions do not seem particularly useful). The matrix mat turns into an array of three float2 vectors and we incur the excessive padding cost of arrays just as in the previous example. It would be worth it to split this into three individual float2 colN variables to save 24 bytes of padding, though I should note that when later reconstructing the matrix in the shader, HLSL always expects rows in the constructor and not columns, no matter which storage order you chose.

I realize that the term "row" from Rule #2 concerning the 16-byte boundaries within the buffer is now overloaded with matrix rows/columns, but what can you do.

4. Inner Structs

Rule #4: Inner/nested structs must start at a 16-byte aligned offset.

cbuffer ExampleInnerStructs { float2 before; struct Inner_t { float start; double d; float end; } inner; float3 after; };

Similar to array elements, inner struct variables are always forcibly aligned to start a new buffer row. As a result, we have some padding before the start of the struct in this example. All the previous rules still apply within the inner struct, in this case the self-alignment requirement for d dictates 4 bytes of padding so it can start at an offset divisible by 8.

Of particular note here is that in C (and structured buffers), a struct has a type alignment equivalent to the largest alignment requirement of any of its members. This is to ensure all members are automatically aligned properly when the struct is constructed at an address conforming to the struct's type alignment (also including arrays of structs). If we implemented this example in C, Inner_t would have an alignment of 8 because of the double sub-member and thus a size of 24 (the size must be a multiple of its type alignment, leading to 4 bytes of padding at the end inside the struct).

This is not the case in HLSL constant buffers. Since the start of any struct is always forcibly aligned to 16 bytes and 16 is larger than any possible type alignment requirement in HLSL, the concern of having members misaligned is nonexistent and the struct type itself never has any padding before the end to account for some inherent type alignment (structs effectively have no type alignment of their own, it's all handled by the forced row alignment). In fact, no type inside a constant buffer can have padding at the start or end within itself that would count towards its size, any padding is always in between variables or array elements. Thus, Inner_t has a size of 20 and after can fit snugly in the remaining 12 bytes of the last buffer row (remember that if it crossed a 16-byte row boundary it would be pushed to start at the next row in accordance with Rule #2).

This also means that this particular struct layout is very hard to implement in the C or C++ type system. You would first have to align the struct variable (not the type!) and then use your compiler's equivalent of __attribute__((packed)) to forcibly remove the padding at the end of the struct, but then also add back any padding in between the sub-members. This ends up very brittle and confusing to read. It might be better to massage the HLSL side into something that is easier to translate to C (I will note again that the use of anything other than 4 byte types is rare in HLSL, but I do need to explain the rules regardless).

Another consequence of this rule is that the memory layout within the inner struct is effectively independent of where it is in the buffer, because all the constant buffer rules work in alignments of 16 or smaller, so the struct layout is the same whether the start offset is 0 or any other integer multiple of 16. The mentioned "struct alignment equals the largest member alignment" rule for C similarly ensures the same layout independence of starting location, but in a more "fine-grained" way.

5. Sources and Methodology

I based the rules in this article on the following original sources:

In addition, I checked all the rules and possibly ambiguous cases using DXC and sometimes FXC. DXC is easily accessible via Compiler Explorer. To check a buffer's layout on Compiler Explorer, you must first use it in the shader function in some way (any contrived usage will do, it just needs to not be optimized out) and then disable filtering comments on the compiler output tab. Those comments show buffer layouts, resource bindings and a few other things. Also useful: you can check the actual GPU instructions emitted for AMD GPUs by switching to the RGA (Radeon GPU Analyzer) compiler (this might not be 100% accurate to live testing due to differences in driver versions and because Compiler Explorer passes the shader to RGA via the SPIR-V backend of DXC).

6. Addendum (2024-02-05): Structured Buffers and C Structs

I was originally not going to write about Structured Buffers, but I ended up adding support to the visualizer anyways and all the relevant rules have already been briefly mentioned in the chapters above.

Rules for HLSL Structured Buffers:

  1. All scalar types are self-aligned, i.e. their type alignment requirement is equal to their size.
    Note: bool has a size of 4 bytes in HLSL.
  2. Vectors and matrices are aligned according to their scalar component type.
  3. Structures have a type alignment equal to the largest alignment among all of their members.
  4. The size of a type must be an integer multiple of its type alignment. Types not naturally conforming to this rule will be padded at the end to enforce it. You could say this is the "stride" of a type in an array.
    Note: There are no overaligned types (alignment > size) in HLSL and no way to create them (no alignas), so this only ever applies to structs.
  5. Array elements are instances of the element type and must therefore obey the above rules for their type alignment. Beyond that, arrays are packed tightly and do not impose any additional alignment rules or padding on top.

Overall it's quite simple compared to Constant Buffers. Arrays, vectors and matrices really just fall back to the scalar and struct alignment rules, the extra rules for them are not technically needed and are just here for clarity. Padding can only happen if a larger type follows a smaller type or at the end of a struct to pad its size to a multiple of its alignment.

These same rules generally also apply to structs in C, C++ and other languages with C-like packing (apart from the HLSL-specific notes). However, the size of some types may differ from platform to platform (e.g. int, float or bool) and sometimes types are not even self-aligned. The most prominent example for non-self-alignment would be 32-bit x86 Linux-based targets, where double has a size of 8, but only an alignment of 4. There are many other architectures with uncommon type sizes, formats or alignments, but they're usually embedded systems, 8- or 16-bit and/or DSPs, so not particularly relevant to look at for HLSL programming. Apart from potentially 32-bit x86 Linux, there are probably no platforms anyone would write HLSL code for where equivalently defined C structs and Structured Buffers differ in their memory layout, outside of the bool type being a different size. I will not discuss bitfields here because the rules differ by compiler, platform and language standard and I'm not even sure how they match up with HLSL 2021, use at your own risk.

Let's do one example covering the rules above:

struct ExampleSB { float before; struct Inner_t { float start; double lorg; uint16_t u[2]; } inner; float after; }; StructuredBuffer<ExampleSB> ex;

The double member of the inner struct in this example not only needs to be aligned to a multiple of 8 itself, it is also the member with the largest alignment within the struct. If we took the members out of the inner struct, start could be at offset 4, lorg could be at offset 8, etc. and we would in fact have no padding at all inside the entire buffer. However, as a consequence of the third rule for structured buffers, the inner struct is also assigned a type alignment of 8 and it must start at offset 8 here, despite its first member not needing that strong of an alignment. Thus, we need 4 bytes of padding before the struct and another 4 bytes before lorg to align them both correctly.

Arrays are thankfully tightly packed in this type of buffer, so we don't directly provoke any extra padding for u. Still, we need to pad out the inner struct to a multiple of its alignment, hence we do end up with another extra 4 bytes after the array. The "largest member alignment" rule also goes for the outer struct, so it too is allotted an alignment of 8 (from the inner member, which itself inherited it from lorg) and we need to pad out the very end of the struct. This is important because structured buffers are conceptually arrays of the type given in the template, so that padding at the end actually matters for the access stride.

This example is definitely an extreme case, usually there is very little if any padding at all in structured buffers given the common use cases and types used in HLSL, but since the rules for C/C++ are mostly the same, this type of analysis can still be useful. As always, the examples are for illustrative purposes.