How to calculate the frequency of CPU cores

I am trying to use RDTSC, but it seems that my approach may be wrong to get kernel speed:

#include "stdafx.h" #include <windows.h> #include <process.h> #include <iostream> using namespace std; struct Core { int CoreNumber; }; static void startMonitoringCoreSpeeds(void *param) { Core core = *((Core *)param); SetThreadAffinityMask(GetCurrentThread(), 1 << core.CoreNumber); while (true) { DWORD64 first = __rdtsc(); Sleep(1000); DWORD64 second = __rdtsc(); cout << "Core " << core.CoreNumber << " has frequency " << ((second - first)*pow(10, -6)) << " MHz" << endl; } } int GetNumberOfProcessorCores() { DWORD process, system; if (GetProcessAffinityMask(GetCurrentProcess(), &process, &system)) { int count = 0; for (int i = 0; i < 32; i++) { if (system & (1 << i)) { count++; } } return count; } SYSTEM_INFO sysinfo; GetSystemInfo(&sysinfo); return sysinfo.dwNumberOfProcessors; } int _tmain(int argc, _TCHAR* argv[]) { for (int i = 0; i < GetNumberOfProcessorCores(); i++) { Core *core = new Core {0}; core->CoreNumber = i; _beginthread(startMonitoringCoreSpeeds, 0, core); } cin.get(); } 

It always prints around 3.3 GHz, which is not true because things like Turbo Boost turn on from time to time and my cores jump to 4.3 GHz. Let me cross-reference some of the articles behind this idea.

First ( http://users.utcluj.ro/~ancapop/labscs/SCS2.pdf ): "TSCs on processor cores are not synchronized. Therefore, he is not sure that if a process migrates from one core to another during execution, the measurement is not "To avoid this problem, the binding of measured processes should be installed on only one core to prevent process migration." This tells me that RDTSC should return a different value for each core. My thread uses the used affinity mask, which is great.

Secondly, and please check this article ( http://randomascii.wordpress.com/2011/07/29/rdtsc-in-the-age-of-sandybridge/ ): "If you need a sequential timer, it works for all cores and can be used to measure time, then this is good news. If you want to measure the actual processor clock cycles, you are out of luck. If you need consistency in a wide range of processor families, then it sucks to be you. Update: Section 16.11 of the Guide to Intel has documented system programming for this behavior of a time series counter. he says that on older processors the clock frequency changes, but on newer processors it remains the same. It ends by saying: from Constant TSC, β€œThis architectural behavior is moving forward.” Well, that tells me that RDTSC remains consistent which makes my above results make sense since my processor cores have a standard frequency of 3.3 GHz ...

What REALLY asks the question, how do applications such as Intel Turbo Boost Technology Monitor and Piriform Speccy and CPUID CPU-Z measure the processor clock speed while increasing the speed in the turbine?

+7
c ++ performance visual-c ++ winapi rdtsc
source share
1 answer

The following is a complete solution. To do this, I adapted the sample IOCTL driver on MSDN . Please note: the IOCTL sample is the only relative WDM sample skeleton driver I could find, as well as the closest part that I could find in the WDM template , because most kernel mode templates from box in WDK are WDF-based drivers (any template The WDM driver is actually empty without absolutely no source code), but the only example logic that I saw this input / output was through the WDM driver . In addition, some interesting facts that I learned along the way: kernel drivers do not like floating arithmetic, and you cannot use "windows.h", which really limits you to "ntddk.h", a special header in kernel mode. It also means that I cannot do all my calculations inside kernel mode because I cannot call functions like QueryPerformanceFrequency, so I had to get the average performance ratio between timestamps and put them back in user mode for some calculations (without QueryPerformanceFrequency, the values ​​you get from CPU registers that store ticks, like what QueryPerformanceCounter uses, are useless because you don't know the step size, maybe this is a workaround for this, but I decided simply and use the average value, since it works well), In addition, in accordance with the second dream, the reason I used it is that otherwise you almost twitch on multiple threads, which actually confuses your calculations, because your frequencies will increase by the core, constantly checking the results of the QueryPerformanceCounter (you move your kernels as you do more calculations) - NOT FOR ME - its ratio ... so delta time is not so important, because its cycles at a time. .. you can always increase the delta, oh anyway should give you the same ratio with respect to step size . Moreover, it is as minimalistic as I could understand. Good luck, making it much smaller or shorter than that. In addition, if you want to install the driver, you have two options , if you do not want to buy a code signing certificate from any third party, both suck, so select one and suck it up. Start with the driver:

driver.c

 // // Include files. // #include <ntddk.h> // various NT definitions #include <string.h> #include <intrin.h> #include "driver.h" #define NT_DEVICE_NAME L"\\Device\\KernelModeDriver" #define DOS_DEVICE_NAME L"\\DosDevices\\KernelModeDriver" #if DBG #define DRIVER_PRINT(_x_) \ DbgPrint("KernelModeDriver.sys: ");\ DbgPrint _x_; #else #define DRIVER_PRINT(_x_) #endif // // Device driver routine declarations. // DRIVER_INITIALIZE DriverEntry; _Dispatch_type_(IRP_MJ_CREATE) _Dispatch_type_(IRP_MJ_CLOSE) DRIVER_DISPATCH DriverCreateClose; _Dispatch_type_(IRP_MJ_DEVICE_CONTROL) DRIVER_DISPATCH DriverDeviceControl; DRIVER_UNLOAD DriverUnloadDriver; VOID PrintIrpInfo( PIRP Irp ); VOID PrintChars( _In_reads_(CountChars) PCHAR BufferAddress, _In_ size_t CountChars ); #ifdef ALLOC_PRAGMA #pragma alloc_text( INIT, DriverEntry ) #pragma alloc_text( PAGE, DriverCreateClose) #pragma alloc_text( PAGE, DriverDeviceControl) #pragma alloc_text( PAGE, DriverUnloadDriver) #pragma alloc_text( PAGE, PrintIrpInfo) #pragma alloc_text( PAGE, PrintChars) #endif // ALLOC_PRAGMA NTSTATUS DriverEntry( _In_ PDRIVER_OBJECT DriverObject, _In_ PUNICODE_STRING RegistryPath ) /*++ Routine Description: This routine is called by the Operating System to initialize the driver. It creates the device object, fills in the dispatch entry points and completes the initialization. Arguments: DriverObject - a pointer to the object that represents this device driver. RegistryPath - a pointer to our Services key in the registry. Return Value: STATUS_SUCCESS if initialized; an error otherwise. --*/ { NTSTATUS ntStatus; UNICODE_STRING ntUnicodeString; // NT Device Name "\Device\KernelModeDriver" UNICODE_STRING ntWin32NameString; // Win32 Name "\DosDevices\KernelModeDriver" PDEVICE_OBJECT deviceObject = NULL; // ptr to device object UNREFERENCED_PARAMETER(RegistryPath); RtlInitUnicodeString( &ntUnicodeString, NT_DEVICE_NAME ); ntStatus = IoCreateDevice( DriverObject, // Our Driver Object 0, // We don't use a device extension &ntUnicodeString, // Device name "\Device\KernelModeDriver" FILE_DEVICE_UNKNOWN, // Device type FILE_DEVICE_SECURE_OPEN, // Device characteristics FALSE, // Not an exclusive device &deviceObject ); // Returned ptr to Device Object if ( !NT_SUCCESS( ntStatus ) ) { DRIVER_PRINT(("Couldn't create the device object\n")); return ntStatus; } // // Initialize the driver object with this driver entry points. // DriverObject->MajorFunction[IRP_MJ_CREATE] = DriverCreateClose; DriverObject->MajorFunction[IRP_MJ_CLOSE] = DriverCreateClose; DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = DriverDeviceControl; DriverObject->DriverUnload = DriverUnloadDriver; // // Initialize a Unicode String containing the Win32 name // for our device. // RtlInitUnicodeString( &ntWin32NameString, DOS_DEVICE_NAME ); // // Create a symbolic link between our device name and the Win32 name // ntStatus = IoCreateSymbolicLink( &ntWin32NameString, &ntUnicodeString ); if ( !NT_SUCCESS( ntStatus ) ) { // // Delete everything that this routine has allocated. // DRIVER_PRINT(("Couldn't create symbolic link\n")); IoDeleteDevice( deviceObject ); } return ntStatus; } NTSTATUS DriverCreateClose( PDEVICE_OBJECT DeviceObject, PIRP Irp ) /*++ Routine Description: This routine is called by the I/O system when the KernelModeDriver is opened or closed. No action is performed other than completing the request successfully. Arguments: DeviceObject - a pointer to the object that represents the device that I/O is to be done on. Irp - a pointer to the I/O Request Packet for this request. Return Value: NT status code --*/ { UNREFERENCED_PARAMETER(DeviceObject); PAGED_CODE(); Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; IoCompleteRequest( Irp, IO_NO_INCREMENT ); return STATUS_SUCCESS; } VOID DriverUnloadDriver( _In_ PDRIVER_OBJECT DriverObject ) /*++ Routine Description: This routine is called by the I/O system to unload the driver. Any resources previously allocated must be freed. Arguments: DriverObject - a pointer to the object that represents our driver. Return Value: None --*/ { PDEVICE_OBJECT deviceObject = DriverObject->DeviceObject; UNICODE_STRING uniWin32NameString; PAGED_CODE(); // // Create counted string version of our Win32 device name. // RtlInitUnicodeString( &uniWin32NameString, DOS_DEVICE_NAME ); // // Delete the link from our device name to a name in the Win32 namespace. // IoDeleteSymbolicLink( &uniWin32NameString ); if ( deviceObject != NULL ) { IoDeleteDevice( deviceObject ); } } NTSTATUS DriverDeviceControl( PDEVICE_OBJECT DeviceObject, PIRP Irp ) /*++ Routine Description: This routine is called by the I/O system to perform a device I/O control function. Arguments: DeviceObject - a pointer to the object that represents the device that I/O is to be done on. Irp - a pointer to the I/O Request Packet for this request. Return Value: NT status code --*/ { PIO_STACK_LOCATION irpSp;// Pointer to current stack location NTSTATUS ntStatus = STATUS_SUCCESS;// Assume success ULONG inBufLength; // Input buffer length ULONG outBufLength; // Output buffer length void *inBuf; // pointer to input buffer unsigned __int64 *outBuf; // pointer to the output buffer UNREFERENCED_PARAMETER(DeviceObject); PAGED_CODE(); irpSp = IoGetCurrentIrpStackLocation( Irp ); inBufLength = irpSp->Parameters.DeviceIoControl.InputBufferLength; outBufLength = irpSp->Parameters.DeviceIoControl.OutputBufferLength; if (!inBufLength || !outBufLength || outBufLength != sizeof(unsigned __int64)*2) { ntStatus = STATUS_INVALID_PARAMETER; goto End; } // // Determine which I/O control code was specified. // switch ( irpSp->Parameters.DeviceIoControl.IoControlCode ) { case IOCTL_SIOCTL_METHOD_BUFFERED: // // In this method the I/O manager allocates a buffer large enough to // to accommodate larger of the user input buffer and output buffer, // assigns the address to Irp->AssociatedIrp.SystemBuffer, and // copies the content of the user input buffer into this SystemBuffer // DRIVER_PRINT(("Called IOCTL_SIOCTL_METHOD_BUFFERED\n")); PrintIrpInfo(Irp); // // Input buffer and output buffer is same in this case, read the // content of the buffer before writing to it // inBuf = (void *)Irp->AssociatedIrp.SystemBuffer; outBuf = (unsigned __int64 *)Irp->AssociatedIrp.SystemBuffer; // // Read the data from the buffer // DRIVER_PRINT(("\tData from User :")); // // We are using the following function to print characters instead // DebugPrint with %s format because we string we get may or // may not be null terminated. // PrintChars(inBuf, inBufLength); // // Write to the buffer // unsigned __int64 data[sizeof(unsigned __int64) * 2]; data[0] = __readmsr(232); data[1] = __readmsr(231); DRIVER_PRINT(("data[0]: %d", data[0])); DRIVER_PRINT(("data[1]: %d", data[1])); RtlCopyBytes(outBuf, data, outBufLength); // // Assign the length of the data copied to IoStatus.Information // of the Irp and complete the Irp. // Irp->IoStatus.Information = sizeof(unsigned __int64)*2; // // When the Irp is completed the content of the SystemBuffer // is copied to the User output buffer and the SystemBuffer is // is freed. // break; default: // // The specified I/O control code is unrecognized by this driver. // ntStatus = STATUS_INVALID_DEVICE_REQUEST; DRIVER_PRINT(("ERROR: unrecognized IOCTL %x\n", irpSp->Parameters.DeviceIoControl.IoControlCode)); break; } End: // // Finish the I/O operation by simply completing the packet and returning // the same status as in the packet itself. // Irp->IoStatus.Status = ntStatus; IoCompleteRequest( Irp, IO_NO_INCREMENT ); return ntStatus; } VOID PrintIrpInfo( PIRP Irp) { PIO_STACK_LOCATION irpSp; irpSp = IoGetCurrentIrpStackLocation( Irp ); PAGED_CODE(); DRIVER_PRINT(("\tIrp->AssociatedIrp.SystemBuffer = 0x%p\n", Irp->AssociatedIrp.SystemBuffer)); DRIVER_PRINT(("\tIrp->UserBuffer = 0x%p\n", Irp->UserBuffer)); DRIVER_PRINT(("\tirpSp->Parameters.DeviceIoControl.Type3InputBuffer = 0x%p\n", irpSp->Parameters.DeviceIoControl.Type3InputBuffer)); DRIVER_PRINT(("\tirpSp->Parameters.DeviceIoControl.InputBufferLength = %d\n", irpSp->Parameters.DeviceIoControl.InputBufferLength)); DRIVER_PRINT(("\tirpSp->Parameters.DeviceIoControl.OutputBufferLength = %d\n", irpSp->Parameters.DeviceIoControl.OutputBufferLength )); return; } VOID PrintChars( _In_reads_(CountChars) PCHAR BufferAddress, _In_ size_t CountChars ) { PAGED_CODE(); if (CountChars) { while (CountChars--) { if (*BufferAddress > 31 && *BufferAddress != 127) { KdPrint (( "%c", *BufferAddress) ); } else { KdPrint(( ".") ); } BufferAddress++; } KdPrint (("\n")); } return; } 

driver.h

 // // Device type -- in the "User Defined" range." // #define SIOCTL_TYPE 40000 // // The IOCTL function codes from 0x800 to 0xFFF are for customer use. // #define IOCTL_SIOCTL_METHOD_IN_DIRECT \ CTL_CODE( SIOCTL_TYPE, 0x900, METHOD_IN_DIRECT, FILE_ANY_ACCESS ) #define IOCTL_SIOCTL_METHOD_OUT_DIRECT \ CTL_CODE( SIOCTL_TYPE, 0x901, METHOD_OUT_DIRECT , FILE_ANY_ACCESS ) #define IOCTL_SIOCTL_METHOD_BUFFERED \ CTL_CODE( SIOCTL_TYPE, 0x902, METHOD_BUFFERED, FILE_ANY_ACCESS ) #define IOCTL_SIOCTL_METHOD_NEITHER \ CTL_CODE( SIOCTL_TYPE, 0x903, METHOD_NEITHER , FILE_ANY_ACCESS ) #define DRIVER_FUNC_INSTALL 0x01 #define DRIVER_FUNC_REMOVE 0x02 #define DRIVER_NAME "ReadMSRDriver" 

Now here is the application that downloads and uses the driver (Win32 console application):

FrequencyCalculator.cpp

 #include "stdafx.h" #include <iostream> #include <windows.h> #include <winioctl.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <strsafe.h> #include <process.h> #include "..\KernelModeDriver\driver.h" using namespace std; BOOLEAN ManageDriver( _In_ LPCTSTR DriverName, _In_ LPCTSTR ServiceName, _In_ USHORT Function ); HANDLE hDevice; TCHAR driverLocation[MAX_PATH]; void InstallDriver() { DWORD errNum = 0; GetCurrentDirectory(MAX_PATH, driverLocation); _tcscat_s(driverLocation, _T("\\KernelModeDriver.sys")); std::wcout << "Trying to install driver at " << driverLocation << std::endl; // // open the device // if ((hDevice = CreateFile(_T("\\\\.\\KernelModeDriver"), GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL)) == INVALID_HANDLE_VALUE) { errNum = GetLastError(); if (errNum != ERROR_FILE_NOT_FOUND) { printf("CreateFile failed! ERROR_FILE_NOT_FOUND = %d\n", errNum); return; } // // The driver is not started yet so let us the install the driver. // First setup full path to driver name. // if (!ManageDriver(_T(DRIVER_NAME), driverLocation, DRIVER_FUNC_INSTALL )) { printf("Unable to install driver. \n"); // // Error - remove driver. // ManageDriver(_T(DRIVER_NAME), driverLocation, DRIVER_FUNC_REMOVE ); return; } hDevice = CreateFile(_T("\\\\.\\KernelModeDriver"), GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); if (hDevice == INVALID_HANDLE_VALUE){ printf("Error: CreatFile Failed : %d\n", GetLastError()); return; } } } void UninstallDriver() { // // close the handle to the device. // CloseHandle(hDevice); // // Unload the driver. Ignore any errors. // ManageDriver(_T(DRIVER_NAME), driverLocation, DRIVER_FUNC_REMOVE ); } double GetPerformanceRatio() { BOOL bRc; ULONG bytesReturned; int input = 0; unsigned __int64 output[2]; memset(output, 0, sizeof(unsigned __int64) * 2); //printf("InputBuffer Pointer = %p, BufLength = %d\n", &input, sizeof(&input)); //printf("OutputBuffer Pointer = %p BufLength = %d\n", &output, sizeof(&output)); // // Performing METHOD_BUFFERED // //printf("\nCalling DeviceIoControl METHOD_BUFFERED:\n"); bRc = DeviceIoControl(hDevice, (DWORD)IOCTL_SIOCTL_METHOD_BUFFERED, &input, sizeof(&input), output, sizeof(unsigned __int64)*2, &bytesReturned, NULL ); if (!bRc) { //printf("Error in DeviceIoControl : %d", GetLastError()); return 0; } //printf(" OutBuffer (%d): %d\n", bytesReturned, output); if (output[1] == 0) { return 0; } else { return (float)output[0] / (float)output[1]; } } struct Core { int CoreNumber; }; int GetNumberOfProcessorCores() { SYSTEM_INFO sysinfo; GetSystemInfo(&sysinfo); return sysinfo.dwNumberOfProcessors; } float GetCoreFrequency() { // __rdtsc: Returns the processor time stamp which records the number of clock cycles since the last reset. // QueryPerformanceCounter: Returns a high resolution time stamp that can be used for time-interval measurements. // Get the frequency which defines the step size of the QueryPerformanceCounter method. LARGE_INTEGER frequency; QueryPerformanceFrequency(&frequency); // Get the number of cycles before we start. ULONG cyclesBefore = __rdtsc(); // Get the Intel performance ratio at the start. float ratioBefore = GetPerformanceRatio(); // Get the start time. LARGE_INTEGER startTime; QueryPerformanceCounter(&startTime); // Give the CPU cores enough time to repopulate their __rdtsc and QueryPerformanceCounter registers. Sleep(1000); ULONG cyclesAfter = __rdtsc(); // Get the Intel performance ratio at the end. float ratioAfter = GetPerformanceRatio(); // Get the end time. LARGE_INTEGER endTime; QueryPerformanceCounter(&endTime); // Return the number of MHz. Multiply the core frequency by the mean MSR (model-specific register) ratio (the APERF register value divided by the MPERF register value) between the two timestamps. return ((ratioAfter + ratioBefore) / 2)*(cyclesAfter - cyclesBefore)*pow(10, -6) / ((endTime.QuadPart - startTime.QuadPart) / frequency.QuadPart); } struct CoreResults { int CoreNumber; float CoreFrequency; }; CRITICAL_SECTION printLock; static void printResult(void *param) { EnterCriticalSection(&printLock); CoreResults coreResults = *((CoreResults *)param); std::cout << "Core " << coreResults.CoreNumber << " has a speed of " << coreResults.CoreFrequency << " MHz" << std::endl; delete param; LeaveCriticalSection(&printLock); } bool closed = false; static void startMonitoringCoreSpeeds(void *param) { Core core = *((Core *)param); SetThreadAffinityMask(GetCurrentThread(), 1 << core.CoreNumber); while (!closed) { CoreResults *coreResults = new CoreResults(); coreResults->CoreNumber = core.CoreNumber; coreResults->CoreFrequency = GetCoreFrequency(); _beginthread(printResult, 0, coreResults); Sleep(1000); } delete param; } int _tmain(int argc, _TCHAR* argv[]) { InitializeCriticalSection(&printLock); InstallDriver(); for (int i = 0; i < GetNumberOfProcessorCores(); i++) { Core *core = new Core{ 0 }; core->CoreNumber = i; _beginthread(startMonitoringCoreSpeeds, 0, core); } std::cin.get(); closed = true; UninstallDriver(); DeleteCriticalSection(&printLock); } 

It uses install.cpp, which you can get from the IOCTL sample. Over the next few days, if not today, I will post a working, fully working and turnkey solution (with code, obviously) on my blog .

Edit: block it http://www.dima.to/blog/?p=101 (the full source code is available there) ...

+4
source share

All Articles