.
September 5, 2023
- On nb-dual my 18.04 seems to work no longer, after the update.
- Downloaded Ubuntu 22.04 LTS from the App Store.
- Did a sudo apt upgrade, which worked.
May 24, 2023
- Looked at Deep Learning GPU Benchmarks. For Resnet the 4090 clearly outperforms the 3900, no measurements for the 4080 yet (other than tools like Blender).
- The Aurora R15 with a 4080 can now be bought with 25% reduction.
April 14, 2023
- Couldn't reach ws5 from home because the wired connection was loose, and the system used the wifi-connection. Once wired connected (no-wifi), I had to use Pulse to start a UvA vpn to reach the staff-webserver.
April 5, 2023
- Had to update my git-credentials again. Installed on ws7 the github-cli gh.
- Continued with gh auth login. Selecting to authenticate GitHub CLI, which tries to open a browser, which fails on failed snap on a gnome.terminal.
- Yet, I tried this as admin, as user the authenticate works (forgot to look for how long).
- Pulse is not working on ws7, waiting on user input but no pop-up appears.
- Installed Pulse on ws5, which works (as can be seen by this entree).
March 21, 2023
- Installed VPN Pulse secure at ws7. It seems that Ubuntu 22.04 not standardly accepts ssh-rsa, so I used the Remote Server Workaround as suggested here. That worked (otherwise I couldn't have made this post).
March 20, 2023
- My XPS workstation hangs again at boot. Luckily, the recovery option works.
- Tried the firefox removal trick, as done previously
- Followed the steps from here, yet the purge is incomplete because x2dhunspell is still mounted.
- Used the trick from this forum:
snap disable firefox
sudo systemctl stop var-snap-firefox-common-host\\x2dhunspell.mount
sudo systemctl disable var-snap-firefox-common-host\\x2dhunspell.mount
snap remove --purge firefox
- Still, the boot is hanging after Started File System Check Daemon.
- Did an sudo apt reinstall ubuntu-desktop and saw that 31 packages are held back.
- Used the trick in this post and did sudo apt install --only-upgrade --dry-run for each of the packages. Twice a new boot-record was made (the last one based on linux-nvidia-*-common). Now the XPS system boots again for Ubuntu 22.04!
March 17, 2023
- Did a fast scan on which operating system runs on which workstation:
- ws1: Ubuntu 20.04.5
- ws2: Ubuntu 18.04.4
- ws3: Ubuntu 22.04
- ws4: Ubuntu 18.04.6
- ws5: Ubuntu 18.04.3
- ws6: Ubuntu 18.04.6
- ws7: Ubuntu 22.04
- ws8: Ubuntu 20.04.5
- ws10: Ubuntu 22.04
February 15, 2023
- Looking into the github documentation on Token expiration and revocation.
- The next step seems to be creating a personal access token.
- The most difficult step is that the token is fine-grained, so you have to select all details on permissions (default is no-access).
- I have given 7 permissions on the repository and 3 account permissions, inspired by the previous classic token which expired on Oct 2022. Unfortunatelly, this token will already expire on March 12, 2023.
- Yet, it seems that I need a ssh-key.
- Back to basic: Authenticating with the command line.
- Tried gh auth login, selected GitHub.com, https, token, but that gives an error error validating token: HTTP 401: Bad credentials. My hyphothesis that I didn't select CLI in the fine-grain selection.
- Selecting login via the webbrowser in gh auth login seems to work. At least the git push in zedm_capture worked.
February 7, 2023
- Connecting the Core V2 to my nb-dual. Following the instructions, checked with lspci | grep -i nvidia if the eGPU is accessible. Looks good:
03:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1)
- Checked dmesg | grep -i lockdown, which finds no string.
- Checked nvidia-smi, which also looks good:
Tue Feb 7 13:48:01 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01 Driver Version: 515.86.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:03:00.0 Off | N/A |
| 22% 26C P8 14W / 250W | 0MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
- Run /usr/local/zed/tools/ZED_Diagnostic, which now gives green signs. Tool gives in the terminal one warning libpng warning: iCCP: known incorrect sRGB profile. Later I also get the warning QPixmap::scaled: Pixmap is a null pixmap. Even on USB-B the USB-Bandwidth is OK.
- Trying to modify zed_wrapper_nodelet.cpp in rescue labbook.
February 3, 2023
- Updated as requested my password. Webserver access still works!
- Printers should be reconfigured by removing and reinstalling again. Unfortunatelly, this fails with a CUPS server error 'client-error-not-possible
- Was redirected to the new Linux support commmunity page.
- There is good remark. The ppd uses as queue MacFollow-Me, while the instructions say LinuxFollow-Me. The ppd from the community is a broken link, so had to edit the ppd with a texteditor. Could print a test-page.
January 30, 2023
- Installed the keys on the Ubuntu partition of nb-dual.
- Used jumpbot to access ws1.
- Tried to login as regular user to ws7 and ws8, but at the end used the admin account.
- Both were used by students, so switched to ws10. There I could use my regular user account.
-
- Didn't use port forwarding, so didn't have the graphical interface of the Jupyter notebook. Instead copied the first cell with the imports.
- Importing tensorflow fails on ImportError: /lib/x86_64-linux-gnu/libnccl.so.2: undefined symbol: cudaGraphAddEventWaitNode, version libcudart.so.11.0.
- I have a conda environment with tf1.14, but ~/anaconda3/bin/conda activate tf1.14 fails with CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.. Running conda init bash doesn't help.
- In the bash there was both definitions for the PATH and LD_LIBRARY for CUDA 10.2 and 11.0 active. Switched the CUDA 11.0 off, but now import tensorflow failed. Reloaded the shell, now the undefined version libcudart.so.11.0 is back.
- Yet, conda activate tf1.14 now works! Inside this conda environment python circle_detection_and_localization.py also works (first cell).
- Created the simple model:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 128, 128, 3)] 0
_________________________________________________________________
conv2d (Conv2D) (None, 128, 128, 16) 1216
_________________________________________________________________
batch_normalization (BatchNo (None, 128, 128, 16) 64
_________________________________________________________________
conv2d_1 (Conv2D) (None, 128, 128, 32) 12832
_________________________________________________________________
batch_normalization_1 (Batch (None, 128, 128, 32) 128
_________________________________________________________________
conv2d_2 (Conv2D) (None, 128, 128, 1) 801
=================================================================
Total params: 15,041
Trainable params: 14,945
Non-trainable params: 96
Starting the training. Got some warnings:
_________________________________________________________________
(1000, 128, 128, 3)
Epoch 1/100
2023-01-30 17:23:22.305349: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2023-01-30 17:23:22.440027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:01:00.0
2023-01-30 17:23:22.442261: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2023-01-30 17:23:22.467410: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2023-01-30 17:23:22.482567: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10
2023-01-30 17:23:22.486446: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10
2023-01-30 17:23:22.513839: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10
2023-01-30 17:23:22.518528: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10
2023-01-30 17:23:22.568653: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2023-01-30 17:23:22.571769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22647 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
- Have not seen the output of epoch 2/100 yet.
January 27, 2023
- Finally have installed the Follow-Me printer on nb-dual, following these instructions.
- Initial attempts failed because I used a space in the description.
January 17, 2023
- ws4 has PCI-bus error, which seems related with its nvidia-drivers.
- Could still login via the ssh-server in the lab.
- Did a sudo apt upgrade (199 packages). Note that ws4 uses repositories from lambdalab.com.
- The update contains a kernel update (to 5.4.0-136). It contains also an update of tensorboard to v1.15.0.
- The upgrade solved the PCI-bus error.
- Note that ws5 also has problems with its fans.
Previous Labbooks