HLSL buffer step and threads - what happens here?

Question

HLSL buffer step and threads - what happens here?

I am really new to DirectCompute technologies and trying to learn from the documentation on the msdn website, which is at least dense.

I would like to create a basic hlsl file that takes a 4x4 matrix and a 4xN matrix and returns a multiplied result. But after spending some time playing with the code, I found some strange things that I don’t understand - mainly with the way the streams transmit buffers and output in the process.

With all these examples, I pass two 16 floating-point buffers and select 16 floating-point buffers, and then send with 4x1x1 grouping - I can show you the code, but I honestly don’t know what will help you. Let me know if there is a section of my C ++ code that you want to see.

with the following code:

StructuredBuffer<float4x4> base_matrix : register(t0); // byteWidth = 64 StructuredBuffer<float4> extended_matrix : register(t1); // byteWidth = 64 RWStructuredBuffer<float4> BufferOut : register(u0); // byteWidth = 64, zeroed out before reading from the GPU [numthreads(1, 1, 1)] void CSMain( uint3 DTid : SV_DispatchThreadID ) { BufferOut[DTid.x].x = 1; }

I get the following values:

 1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000

This makes sense to me - the buffer is parsed as 4 threads, each of which performs 1 grouping of float4.

with the following code:

 StructuredBuffer<float4x4> base_matrix : register(t0); // byteWidth = 64 StructuredBuffer<float4> extended_matrix : register(t1); // byteWidth = 64 RWStructuredBuffer<float4> BufferOut : register(u0); // byteWidth = 64, zeroed out before reading from the GPU [numthreads(1, 1, 1)] void CSMain( uint3 DTid : SV_DispatchThreadID ) { BufferOut[DTid.x].x = 1; BufferOut[DTid.x].y = 2; BufferOut[DTid.x].z = 3; BufferOut[DTid.x].w = 4; }

I get the following values:

 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

and with the actual code I want to run:

 StructuredBuffer<float4x4> base_matrix : register(t0); StructuredBuffer<float4> extended_matrix : register(t1); RWStructuredBuffer<float4> BufferOut : register(u0); [numthreads(1, 1, 1)] void CSMain( uint3 DTid : SV_DispatchThreadID ) { BufferOut[DTid.x] = mul(base_matrix[0],extended_matrix[DTid.x]) }

I get the following values:

 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

I can say that a critical thing is missing here, but for my life I can’t find the relevant documentation telling me how it works. Can someone help me understand what is going on in this code?

Thank you for your time,

Zach

As a further note, this code was minimized using the Microsoft DirectX SDK sample (June 2010) \ Samples \ C ++ \ Direct3D11 \ BasicCompute11. If I'm doing something terribly wrong, feel free to let me know. I am REALLY new in HLSL.

Edit: code to create my buffer.

 CreateStructuredBuffer( g_pDevice, sizeof(float)*16, 1, g_matrix, &g_pBuf0 ); CreateStructuredBuffer( g_pDevice, sizeof(float)*4, NUM_ELEMENTS, g_extended_matrix, &g_pBuf1 ); CreateStructuredBuffer( g_pDevice, sizeof(float)*4, NUM_ELEMENTS, NULL, &g_pBufResult ); //-------------------------------------------------------------------------------------- // Create Structured Buffer //-------------------------------------------------------------------------------------- HRESULT CreateStructuredBuffer( ID3D11Device* pDevice, UINT uElementSize, UINT uCount, VOID* pInitData, ID3D11Buffer** ppBufOut ) { *ppBufOut = NULL; D3D11_BUFFER_DESC desc; ZeroMemory( &desc, sizeof(desc) ); desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE; desc.ByteWidth = uElementSize * uCount; desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED; desc.StructureByteStride = uElementSize; if ( pInitData ) { D3D11_SUBRESOURCE_DATA InitData; InitData.pSysMem = pInitData; return pDevice->CreateBuffer( &desc, &InitData, ppBufOut ); } else return pDevice->CreateBuffer( &desc, NULL, ppBufOut ); }

Attempt .1, .2, .3, .4 ...

 StructuredBuffer<float4x4> base_matrix : register(t0); StructuredBuffer<float4> extended_matrix : register(t1); StructuredBuffer<uint> loop_multiplier : register(t2); RWStructuredBuffer<float4> BufferOut : register(u0); [numthreads(1, 1, 1)] void CSMain( uint3 DTid : SV_DispatchThreadID ) { BufferOut[DTid.x].x = .1; BufferOut[DTid.x].y = .2; BufferOut[DTid.x].z = .3; BufferOut[DTid.x].w = .4; }

got this:

 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100

+7

c ++ directx directx-11 hlsl directcompute

Zach h 12 sept '12 at 20:51

source share

1 answer

F.Eazism · Answer 1 · 2017-09-18T14:50:58+0000

I tried my way, but I got the correct result. I can not add a comment due to my small reputation. Here is my code.

HLSL:

RWStructuredBuffer Output: register (u0);

[numthreads (1, 1, 1)]

void main (uint3 DTid: SV_DispatchThreadID)

{if (DTid.x> 4)

  return; Output[DTid.x].x= 1.f; Output[DTid.x].y = 2.f; Output[DTid.x].z = 3.f; Output[DTid.x].w = 4.f;

}

C ++:

define PathName

L "C: \ Users \ e \ Desktop \ D3D_Reseach \ RenderPro \ x64 \ Debug \ ComputeShader.cso"

struct buffer

{

 XMFLOAT4 Test;

};

int APIENTRY wWinMain (HINSTANCE hInstance, HINSTANCE hPrevInstance, LPTSTR

lpCmdLine, int nCmdShow)

{

 Hardware HardWare; WinSystem Win; Win.CreateWindows(HardWare, 400, 300); ShowWindow(Win.hwnd, SW_HIDE); //UAV SharedComPtr<ID3D11UnorderedAccessView> Resource; SharedComPtr<ID3D11Buffer> _Buffer; ShaderResourceView::STRUCT_BUUFER_DESC Desc; Desc.ACCESS = 0; Desc.BIND = D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_UNORDERED_ACCESS; Desc.FORMAT = DXGI_FORMAT_UNKNOWN; Desc.HasScr = false; Desc.MISC = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED; Desc.USAGE = D3D11_USAGE_DEFAULT; Desc.ByteWidth= 4 * sizeof(Buffer); Desc.StructureByteStride= sizeof(Buffer); Desc.UAV_OR_SRV = ShaderResourceView::UAV;

ShaderResourceView :: CreateStructBuffer (HardWare.GetD3DDevice (), Product Description, nullptr, Resource.GetTwoLevel (), _Buffer.GetTwoLevel (), true);

 //CompilerShader SharedComPtr<ID3D11ComputeShader> ComputerSahder; SharedComPtr<ID3DBlob> Blob; WCHAR *Name = PathName; CompilerShader::CompileShaderFromBinary(ComputerSahder.GetTwoLevel(), Name, HardWare.GetD3DDevice(), Blob.GetTwoLevel(), CompilerShader::ShaderFlag::ComputeShader); //Set ComputerHlsl HardWare.GetDeviceContext()->CSSetUnorderedAccessViews(0, 1,

Resource.GetTwoLevel (), 0);

 HardWare.GetDeviceContext()->CSSetShader(ComputerSahder.Get(), 0, 0); HardWare.GetDeviceContext()->Dispatch(4, 1, 1); //SRV Buffer Hy[4]; VOID *P = Hy; ID3D11Buffer* pBuffer; BufferSystem::CreateConstanceBuffer(HardWare.GetD3DDevice(), P, pBuffer,

Desc.ByteWidth, D3D11_USAGE_STAGING);

 HardWare.GetDeviceContext()->CopyResource(pBuffer, _Buffer.Get()); D3D11_MAPPED_SUBRESOURCE Data; HardWare.GetDeviceContext()->Map(pBuffer, 0, D3D11_MAP_READ, 0, &Data); Buffer *PP = reinterpret_cast<Buffer*>(Data.pData); for (UINT i = 0; i < 4; ++i) { float a = PP[i].Test.x; a = PP[i].Test.y; a = PP[i].Test.z; a = PP[i].Test.w; a = PP[i].Test.w; }

}

HLSL buffer step and threads - what happens here?

More articles: