nvprof
is a tool that is part of the CUDA Toolkit that allows profiling of processes that use CUDA. With it, you can view “kernel execution, memory transfers, memory set and CUDA API calls and events or metrics for CUDA kernels” (source).
Normally, I would use the visual profiler (nvvp
) to view the results in a timeline, but often times I just want to collect some specific metrics (e.g. the number of kernels executed, or the total running time of the session), where opening the output file in nvvp
feel cumbersome.
We test this with the following CUDA program. It’s a simple program that adds two arrays, where the size of the array doubles every time, and the we would output CUDA takes to execute this kernel.
#include <iostream>
#include <algorithm>
#include <chrono>
#include <cuda.h>
#include <cuda_profiler_api.h>
__global__
void add(int n, int *a, int *b, int *c) {
for (int i = 0; i < n; i++)
c[i] = a[i] + b[i];
}
void test(int N) {
// Allocate arrays A, B, C each with size N
// Populate A and B with numbers
add<<<1, 1>>>(N, A, B, C);
// Free arrays A, B, and C
}
int main() {
cudaProfilerStart();
for (int i = 15; i <= 25; ++i) test(1 << i);
cudaProfilerStop();
}
After compiling our test program with nvcc
, we can profile it with the nvprof
command:
nvprof --export-profile profile.nvvp ./a.out
We can try to view the profile.nvvp
file, and we would see something like the following:
Note that, one the bottom, we can see the progressively larger calls to add()
.
Now, if we run binwalk
on the profiler output, it gives us some clues about what the .nvvp
file contains.
[/tmp]$ binwalk profile.nvvp
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 SQLite 3.x database,
352012 0x55F0C Ubiquiti firmware header, third party, ~CRC32: 0x0, version: "MP^CREATE TABLE CUPTI_ACTIVITY_KIND_OPENMP(_id_ INTEGER PRIMARY KEY AUTOINCREMENT, eventKind INT NOT NULL, version INT NOT NULL,"
352052 0x55F34 Ubiquiti firmware header, third party, ~CRC32: 0x0, version: "MP(_id_ INTEGER PRIMARY KEY AUTOINCREMENT, eventKind INT NOT NULL, version INT NOT NULL, threadId INT NOT NULL, start INT NOT NU"
Hmm, interesting. What would happen if we try to open this with a SQLite viewer? We get tables!
We can see that the kernels executed are in the table CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL
. We can then create a query to view the durations of each of the kernels ran:
SELECT _id_, start, end, (end-start) AS duration FROM CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL;
We get the following table: (here we can see duration
is increasing roughly in powers of 2)
Looking at the other tables, we can get similar information about memory copies, runtime API calls, driver API calls and so on.
I find this pretty useful since now I can programmatically extract kernel information from nvvp
files. However, since nvprof
might soon be deprecated, I wonder if I can do similar things with Nsight Compute. Perhaps I will take a look later.
I was recently experimenting with some models implemented in TensorFlow 1.x. However, when trying to run them on a machine with CUDA 10.1, it seems to be having some trouble locating the libcu*.so.10.0
files.
2020-05-09 00:33:15.100129: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-05-09 00:33:15.100216: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-05-09 00:33:15.100317: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-05-09 00:33:15.100408: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-05-09 00:33:15.100470: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-05-09 00:33:15.100541: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-05-09 00:33:15.120235: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-09 00:33:15.120269: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
This article provides some information on the compatibility. The table below is taken from said page.
We can sort of assume that TF1.15 is not going to work with CUDA 10.1. Hence, we will have to build TF1.15 ourselves.
To build TensorFlow 1.15, we begin with the development docker image nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
, which we can run with the following command:
docker run --gpus all \
-v tensorflow_build:/mnt \
-v tmp:/root \
--shm-size=8G \
-it nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04 \
bash
We mount two directories, one for storing the build files, and the other to which bazel cache files will be written to. Under the default configuration, the container had 10GB allocated, and the directory /root/.cache/bazel
can take upwards of 6GB, which could easily exceed the limit set by the container and the install would fail.
Inside the container, we first install all the required packages:
apt update
apt install -y python3 python3-pip python3-dev git unzip
We can then clone the TensorFlow repository into the directory we mounted, then we can checkout r1.15
.
cd /mnt
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout r1.15
Then, we download and install bazel.
BAZEL_VERSION="0.26.0"
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
chmod +x bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
./bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
Part of the building process requires the python
binary, but installing python3
doesn’t give us that. To solve this, we symlink the binary as follows:
ln -s /usr/bin/python3 /usr/bin/python
Now, we can start the actual building process. We first run ./configure
inside the tensorflow
directory. Now, it will ask us if we want CUDA support, and to which we answer yes. We can leave all the other options as default.
root@2be0159ae22a:/mnt/tensorflow# ./configure
....
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
At this point, the configuration script should find the paths to the cuda libraries:
Found CUDA 10.1 in:
/usr/local/cuda/lib64
/usr/local/cuda/include
Found cuDNN 7 in:
/usr/lib/x86_64-linux-gnu
/usr/include
Now, to begin the actual compilation, we run the following command:
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
The build should begin now. For me, on a 16-core EPYC machine, this process took around 6 hours. The length of this duration might differ greatly based on your hardware configurations.
When the compilation completes, you should see something like this:
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 21506.854s, Critical Path: 1549.28s
INFO: 23749 processes: 23749 local.
INFO: Build completed successfully, 30612 total actions
According to the official documentation, bazel creates a binary called build_pip_package
, which we can call now to generate the whl file.
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Our whl file should be created inside /tmp/tensorflow_pkg
. We can now copy this file elsewhere or install it on our system.
You can test its functionality with the tf.test.is_gpu_available
function in TensorFlow.
At this point, we are done. You can install this whl file with:
pip install tensorflow*.whl
What makes things worse is my expired warranty. Lenovo support refused to ship me a repalcement even after my offer to pay for it. This isn’t a big issue, though, as ordering a $50 replacement on Amazon would be perfectly fine. The only downside is that the Amazon replacement ships from China, and the already slow delivery time was worsened by the COVID-19 pandemic. The package arrived in a whole month and 13 days.
The replacement process is simple. Removing the back panel reveals 3 screwholes, conveniently labelled with the keyboard logo. Then, undoing those, we are able to remove the keyboard in the front side.
Then, simply unplugging the ribbon cables for the keyboard as well as the trackpad, the new keyboard can be plugged in and reinstalled into the chassis.
After rebooting the computer, we can test all the keys. They are indeed all working.
The whole process of keyboard replacement took less than 15 minutes. The user servicability of the ThinkPad line is truly great and applaudable. I hope we can see the same on more devices, consumer ones especially.
]]>In first year, I have decided to take notes in \(\LaTeX\) for some of my introductory CS courses. These notes were taken both as an effort to kill time during lecture and they were also simultaneously submitted to the UofT note takers service for those who are unable to attend class.
I have previously set up a site notes.jimgao.tk
, but I have decided to kill it since hosting is expensive. I thought that the next best option is to host them here and hopefully Google would index this page eventually :P.
Below are the notes catagorized by courses. I might also add my notes for CSC265 here in a few days (or weeks). (CSC265 notes added on Apr 8th, 2022)
Links: FAS Calendar
Lecture # | Topics | Link |
---|---|---|
1 | Worst case analysis of running time; Abstract data types; Binary heaps | |
2 | Binomial heaps; Dictionary ADT | |
3 | AVL trees | |
4 | Augmented AVL trees; Order statistics | |
4 (tutorial) | Interval trees (augmented AVL) | |
5 | Average case analysis; Randomized Algorithms; Universal hashing | |
5 (tutorial) | Height of random BSTs | |
6 | Randomized quicksort; Principle of deferred decision; Amortized analysis | |
6 (tutorial) | Randomized Maj3; Reservoir sampling; Biased coins | |
7 | Potential function method for amortized analysis | |
8 | Disjoint sets and its optimizations | |
10 | Disjoint set log* analysis | |
11 | Graph traversals (BFS,DFS); Parenthesis property; White path theorem | |
11 (tutorial) | Strongly connected components | |
12 | Minimum spanning trees; Kruskal’s algorithm |
Links: FAS Calendar
This is a course basically taken by all first-year CS students, and covers the basics of predicate logic, number theory, and some concepts in graph theory. Much of the course is emphasized on stating questions rigorously in formal logic, as well as proving claims with logical rigor.
Lecture # | Topics | Link |
---|---|---|
1 | Mathematical preliminaries, sets, functions | |
2 | Functions, sigma/pi notation, propositional logic, truth tables | |
3 | Implications, biconditions, reordering quantifiers | |
4 | Negating quantified statements | |
5 | Precedence rule, proving existential and universal statements | |
6 | * Missing * | |
7 | Example: proof with primes | |
8 | Simple induction, examples of induction | |
9 | More on simple induction | |
10 | Strong induction, representation of natural numbers | |
11 | Representation in binary, analyzing running time | |
12 | More on binary numbers, introduction of \(\mathcal{O}\) notation | |
13 | Patterns of upper bound | |
14 | Examples of runtime analysis on code snippets | |
15 | More examples on runtime analysis | |
16 | Even more examples, Collatz sequence |
Links: FAS Calendar
This course is effectively an enriched version of CSC165+CSC236. It covers much of the formal logic stuff, but also covers structural induction, proofs of time complexity and correctness, formal language, DFAs, and NFAs. It was taught by Professor Faith Ellen, and this is definitely one of the more fun courses I have taken.
Lecture # | Topics | Link |
---|---|---|
1 | Induction, \(2^n\times 2^n\) chessboard problem | |
2 | Examples of simple induction, AM/GM inequality, strong (complete) induction | |
3 | More strong induction, recursively defined sets, structural induction | |
4 | Functions on recursively defined sets, example about leaves on binary trees | |
5 | Justification of structural induction by strong induction, partial orders, and the well-ordering principle | |
6 | Countable and uncountable sets, proof by diagonalization, the halting problem | |
7 | Introduction of asymptotic notation and their properties. Tips on solving recurrences | |
8 | More on solving recurrences, “plug-and-chug”, Master theorem, characteristic polynomials | |
9 | Runtime analysis, example about merge sort | |
10 | Analyzing running time for recursive binary search. Correctness of algorithms | |
11 | Examples on correctness proofs: pre/post conditions. Examples: merge sort and quick sort | |
12 | Language theory, regular expressions, definition of DFAs | |
13 | Example of DFAs. Non-deterministic Finite Automaton (NFAs) | |
14 | Pumping lemma and its applications | |
15 | Closure of operations on regular languages and their proofs |
Links: FAS Calendar
This is an introductory course on data science. It covers both programming in R, as well as some basic statistical concepts such as linear regression and hypothesis testing.
]]>Hopefully I would post more stuff here in the future.
]]>