Parsing nvprof Profiler Output with SQLite3
Background
nvprof
is a tool that is part of the CUDA Toolkit that allows profiling of processes that use CUDA. With it, you can view “kernel execution, memory transfers, memory set and CUDA API calls and events or metrics for CUDA kernels” (source).
Normally, I would use the visual profiler (nvvp
) to view the results in a timeline, but often times I just want to collect some specific metrics (e.g. the number of kernels executed, or the total running time of the session), where opening the output file in nvvp
feel cumbersome.
Our Sample Program
We test this with the following CUDA program. It’s a simple program that adds two arrays, where the size of the array doubles every time, and the we would output CUDA takes to execute this kernel.
#include <iostream>
#include <algorithm>
#include <chrono>
#include <cuda.h>
#include <cuda_profiler_api.h>
__global__
void add(int n, int *a, int *b, int *c) {
for (int i = 0; i < n; i++)
c[i] = a[i] + b[i];
}
void test(int N) {
// Allocate arrays A, B, C each with size N
// Populate A and B with numbers
add<<<1, 1>>>(N, A, B, C);
// Free arrays A, B, and C
}
int main() {
cudaProfilerStart();
for (int i = 15; i <= 25; ++i) test(1 << i);
cudaProfilerStop();
}
Profiling
After compiling our test program with nvcc
, we can profile it with the nvprof
command:
nvprof --export-profile profile.nvvp ./a.out
We can try to view the profile.nvvp
file, and we would see something like the following:
Note that, one the bottom, we can see the progressively larger calls to add()
.
Taking a Closer Look
Now, if we run binwalk
on the profiler output, it gives us some clues about what the .nvvp
file contains.
[/tmp]$ binwalk profile.nvvp
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 SQLite 3.x database,
352012 0x55F0C Ubiquiti firmware header, third party, ~CRC32: 0x0, version: "MP^CREATE TABLE CUPTI_ACTIVITY_KIND_OPENMP(_id_ INTEGER PRIMARY KEY AUTOINCREMENT, eventKind INT NOT NULL, version INT NOT NULL,"
352052 0x55F34 Ubiquiti firmware header, third party, ~CRC32: 0x0, version: "MP(_id_ INTEGER PRIMARY KEY AUTOINCREMENT, eventKind INT NOT NULL, version INT NOT NULL, threadId INT NOT NULL, start INT NOT NU"
Hmm, interesting. What would happen if we try to open this with a SQLite viewer? We get tables!
We can see that the kernels executed are in the table CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL
. We can then create a query to view the durations of each of the kernels ran:
SELECT _id_, start, end, (end-start) AS duration FROM CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL;
We get the following table: (here we can see duration
is increasing roughly in powers of 2)
Looking at the other tables, we can get similar information about memory copies, runtime API calls, driver API calls and so on.
I find this pretty useful since now I can programmatically extract kernel information from nvvp
files. However, since nvprof
might soon be deprecated, I wonder if I can do similar things with Nsight Compute. Perhaps I will take a look later.