INSTALLARE NVIDIA + CUDA 4.0 su UBUNTU 10.10, 32bit
------------------------------------------------------

da http://www.r-tutor.com/gpu-computing/cuda-installation/cuda4.0-ubuntu
-------------------------------------------------------------------------

Installing CUDA Toolkit 4.0 on Ubuntu 10.10 Linux
----------------------------------------------------

The following explains how to install CUDA Toolkit 4.0 on 64-bit Ubuntu 10.10 Linux. I have tested it on a self-assembled desktop with AMD Phenom II X4 CPU, 4GB RAM, 500GB hard drive, 550W power supply and NVIDIA GeForce GTX 460 graphics card. The instruction assumes you have the necessary CUDA compatible hardware support. Depending on your system configuration, your mileage may vary.

Basic Video Driver
-------------------

First, you have to reconfigure Ubuntu with the basic video driver. Enter the following in the terminal.

$ sudo apt-get --purge remove nvidia*

Then create a new file in /etc/modprobe.d with the following content in order to blacklist the built-in nouveau driver, which conflicts with the CUDA developer video driver that you will install later.

# /etc/modprobe.d/blacklist-nouveau.conf
blacklist nvidiafb
blacklist nouveau
blacklist rivafb
blacklist rivatv
blacklist vga16fb
options nouveau modeset=0

You should manually update the kernel image in a terminal afterward:

$ sudo update-initramfs -u

Now you can reboot the system at this point for the change to take effect.

IN ALTERNATIVA:
---------------

per installare nvidia devo prima disable il nouveau kernel (free graphic):
in fondo a /etc/modprobe.d/blacklist.conf
add these entries at the bottom:
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv

reboot con shutdown -r now

-----------------------------

Linux Development Tools
-------------------------

After you have successfully configured Ubuntu Linux with the basic video driver, you can install the Linux development tools.
$ sudo apt-get update
$ sudo apt-get install build-essential

OpenGL Developer Driver
-------------------------

To prepare for compiling the OpenGL code samples in the CUDA SDK, you will have to install the OpenGL developer environment as well.

$ sudo apt-get install freeglut3-dev libxi-dev libxmu-dev

CUDA Developer Driver
------------------------

Download the CUDA developer driver from the CUDA download site. The graphical display manager must not be running during the CUDA video driver install. Hence you should logout your Linux desktop and switch to console mode with the Alt+Ctrl+F2 keystroke. You then login the text console, and stop the graphical display manager.

$ sudo service gdm stop

You may have to enter the same Alt+Ctrl+F2 keystroke again to resume the text console.

IN ALTERNATIVA, per essere proprio sicuri che il server X non riparta per sbaglio:
------------------------------------------------------------------------------

per andare al run level 3 (il livello non grafico-grafico e` 2):
The runlevel is now set by the file /etc/init/rc-sysinit.conf.
The default Runlevel is set to "2" which is the normal graphical mode.
If you have used any previous versions of linux you would be used to Graphical level being the run level 5, but that is again changed in ubuntu.

If you want change this run level you can set it as follows

1. Open a terminal
2. Run the command "sudo gedit /etc/init/rc-sysinit.conf"

3. In the file that opens search for the line
env DEFAULT_RUNLEVEL=
4. After "=" enter the new runlevel number you want and save the file.
5. Let say we want to boot into text mode every time, so put "3" after the "=" and save the file
6. Now to start in text mode we need to stop gdm from loading , open the file gdm.conf using
"sudo gedit /etc/init/gdm.conf
7. You will see the following set of lines

start on (filesystem
and started dbus
and (drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
or stopped udevtrigger))
stop on runlevel [016]

Change it to
start on (filesystem
and started dbus
and (drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
or stopped udevtrigger))
and runlevel[!3]
stop on runlevel [016]

and runlevel[!3] was the new line added.

Save the edited file and restart, it should boot in text mode.

------------------------------

poi installare nvidia con:
sudo sh NVIDIAXXXXXX.run

O MEGLIO ANCORA IL DEVEL DRIVER:
--------------------------------
Now install the CUDA developer video driver:

$ sudo sh devdriver_4.0_linux_32_270.41.19.run

--------------------------------------------------------------------------
NN.BB.: CON LA MIA CONFIGURAZIONE NON ERA POSSIBILE
PERMETTERE A NVIDIA DI RICONFIGURARE DA SOLA IL SERVER
X PERCHE' FACEVA CASINI E NON RIPARTIVA. QUINDI E' MEGLIO
RISPONDERE NO ALLA DOMANDA:
Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X?
NO
E POI UPDATARE A MANO USANDO nvidia-xconfig (vedi sotto)
-------------------------------------------------------------------------

And reboot afterward:
$ sudo reboot

NB: se sono stati modificati anche
/etc/init/rc-sysinit.conf e /etc/init/gdm.conf
rimetterli come erano prima, prima di ributtare

Dopo il reboot BISOGNA CONFIGURARE IL SERVER X:
--------------------------------------------------
sudo nvidia-xconfig

Ma io ottenevo qualcosa di bruttino, tipo:

Using X configuration file: "/etc/X11/xorg.conf".
NVIDIA: could not open the device file /dev/nvidia0 (Input/output error).
Backed up file '/etc/X11/xorg.conf' as '/etc/X11/xorg.conf.backup'
New X configuration file written to '/etc/X11/xorg.conf'

Tra l'altro, facendo

ls -l /dev/nvidia*

vedevo solo 1 delle due nvidia:

crw-rw-rw- 1 root root 195, 0 2011-09-13 15:45 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 2011-09-13 15:45 /dev/nvidiactl

Allora ho pensato che il server X vedesse 1 sola nvidia.
D'altra parte le nvidie c'erano tutte e due perche'

$ lspci | grep nVidia

03:00.0 VGA compatible controller: nVidia Corporation Device 0df8 (rev a1)
03:00.1 Audio device: nVidia Corporation Device 0bea (rev a1)
04:00.0 VGA compatible controller: nVidia Corporation GF100 [Tesla C2050] (rev a3)
04:00.1 Audio device: nVidia Corporation GF100 High Definition Audio Controller (rev a1)

ALLORA BISOGNA COSTRINGERE IL SERVER A VEDERE ENTRAMBE LE SCHEDE. MA

$ sudo nvidia-xconfig --enable-all-gpus

rispondeva:
Using X configuration file: "/etc/X11/xorg.conf".
NVIDIA: could not open the device file /dev/nvidia0 (Input/output error).

WARNING: Unable to use the nvidia-cfg library to query NVIDIA hardware.

ERROR: Unable to determine number of GPUs in system; cannot honor
'--enable-all-gpus' option.

Backed up file '/etc/X11/xorg.conf' as '/etc/X11/xorg.conf.backup'
New X configuration file written to '/etc/X11/xorg.conf'

e anche

$ sudo nvidia-xconfig --query-gpu-info

RISPONDEVA
WARNING: Unable to use the nvidia-cfg library to query NVIDIA hardware.

IL MIRACOLO E' AVVENUTO QUANDO HO TROVATO:
http://ubuntuforums.org/showthread.php?p=10454466

Ubuntu 10.10 & SLI GTX470
Hi guys!
I'm a new user and I'm trying to use a SLI configuration on Ubuntu 10.10. I've installed a SLI of two Zotac gtx 470 on windows and works perfectly, but on the same machine with ubuntu installed I think that option is not well configured.
Let me explain. On Ubuntu I've uninstalled default nVidia drivers.
[...]
I've found the solution!!!!!!!
YEAH!
ok let me explain.
1)
Code:

sudo pico /etc/default/grub

use gedit, nano instead of pico if you want.
and modify this line
Code:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

in
Code:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash vmalloc=256M"

(if 256M not works, try 192M).

2)on console type:
Code:

sudo update-grub

and reboot

3)Now if you try
Code:

sudo nvidia-xconfig --query-gpu-info

you will get something like this
Code:

Number of GPUs: 2

GPU #0:
Name : GeForce GTX 470
PCI BusID : PCI:3:0:0

Number of Display Devices: 1

Display Device 0 (DFP-0):
EDID Name : Ancor Communications Inc VE248
Minimum HorizSync : 30.000 kHz
Maximum HorizSync : 83.000 kHz
Minimum VertRefresh : 50 Hz
Maximum VertRefresh : 76 Hz
Maximum PixelClock : 170.000 MHz
Maximum Width : 1920 pixels
Maximum Height : 1080 pixels
Preferred Width : 1920 pixels
Preferred Height : 1080 pixels
Preferred VertRefresh : 60 Hz
Physical Width : 530 mm
Physical Height : 300 mm

GPU #1:
Name : GeForce GTX 470
PCI BusID : PCI:4:0:0

Number of Display Devices: 1

Display Device 0 (DFP-0):
EDID Name : Ancor Communications Inc VE248
Minimum HorizSync : 30.000 kHz
Maximum HorizSync : 83.000 kHz
Minimum VertRefresh : 50 Hz
Maximum VertRefresh : 76 Hz
Maximum PixelClock : 170.000 MHz
Maximum Width : 1920 pixels
Maximum Height : 1080 pixels
Preferred Width : 1920 pixels
Preferred Height : 1080 pixels
Preferred VertRefresh : 60 Hz
Physical Width : 530 mm
Physical Height : 300 mm

4) It's time for:
Code:

sudo nvidia-xconfig --sli=On

5) Reboot.
You will enjoy, finally, SLI on ubuntu.

So now a little issue..eh eh eh GPU 0 with sli enabled has an higher temperature compared to sli "off"!! :S It's normal with sli on?! ( the other video card it's around lower :S
10C

IL CAMBIAMENTO SUL GRUB MI HA PERMESSO DI OTTENERE:
---------------------------------------------------
sudo nvidia-xconfig --query-gpu-info
Number of GPUs: 2

GPU #0:
Name : Quadro 600
PCI BusID : PCI:3:0:0

Number of Display Devices: 1

Display Device 0 (CRT-0):
EDID Name : DELL E2210
Minimum HorizSync : 30.000 kHz
Maximum HorizSync : 83.000 kHz
Minimum VertRefresh : 56 Hz
Maximum VertRefresh : 75 Hz
Maximum PixelClock : 160.000 MHz
Maximum Width : 1680 pixels
Maximum Height : 1050 pixels
Preferred Width : 1680 pixels
Preferred Height : 1050 pixels
Preferred VertRefresh : 60 Hz
Physical Width : 470 mm
Physical Height : 300 mm

GPU #1:
Name : Tesla C2050 / C2070
PCI BusID : PCI:4:0:0

Number of Display Devices: 0

NB: e' anche carino installare glxinfo, per controllare la versione delle GPU

NB2: guarda anche cosa rispondono le nvidia-settings

Adesso che l'Xorg riconosce le schede, si puo' installare cuda.

Cuda Toolkit
--------------

Per installare cuda ho usato nuovamente:
http://www.r-tutor.com/gpu-computing/cuda-installation/cuda4.0-ubuntu
(e non http://ubuntuforums.org/showthread.php?t=1625433)

Download and install the CUDA Toolkit from the CUDA download site and run the following:
$ sudo sh cudatoolkit_4.0.17_linux_32_ubuntu10.10.run

Assuming you have accepted the default install location /usr/local/cuda, you should add the following in the .bashrc file of your home folder.
export CUDA_HOME="/usr/local/cuda"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${CUDA_HOME}/lib"
export PATH=${CUDA_HOME}/bin:${PATH}

sudo reboot
per permettere al path di essere letto!!

CUDA SDK Samples
-------------------
Download and install the CUDA SDK from the CUDA download site.
$ sh gpucomputingsdk_4.0.17_linux.run

With the default install location NVIDIA_GPU_Computing_SDK in your home folder, you can now build the SDK samples.
$ cd ~/NVIDIA_GPU_Computing_SDK/
$ make

If everything goes well, you should be able to verify your CUDA installation by running the deviceQuery sample in the NVIDIA_GPU_Computing_SDK/C/bin/linux/release folder of your home directory. You should find a similar output as below:

./deviceQueryDrv
[deviceQueryDrv] starting...
CUDA Device Query (Driver API) statically linked version
There are 2 devices supporting CUDA

Device 0: "Tesla C2050 / C2070"
CUDA Driver Version: 4.0
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 2687 MBytes (2817982464 bytes)
(14) Multiprocessors x (32) CUDA Cores/MP: 448 CUDA Cores
GPU Clock rate: 1.15 GHz
Memory Clock rate: 1500.00 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 786432 bytes
Max Texture Dimension Sizes 1D=(65536) 2D=(65536,65535) 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Texture alignment: 512 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: Yes
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 4 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Quadro 600"
CUDA Driver Version: 4.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1023 MBytes (1072889856 bytes)
( 2) Multiprocessors x (48) CUDA Cores/MP: 96 CUDA Cores
GPU Clock rate: 1.28 GHz
Memory Clock rate: 800.00 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 131072 bytes
Max Texture Dimension Sizes 1D=(65536) 2D=(65536,65535) 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Texture alignment: 512 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 3 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
[deviceQueryDrv] test results...
PASSED

Press ENTER to exit...

./deviceQuery
[deviceQuery] starting...
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 2 CUDA Capable device(s)

Device 0: "Tesla C2050 / C2070"
CUDA Driver Version / Runtime Version 4.0 / 4.0
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 2687 MBytes (2817982464 bytes)
(14) Multiprocessors x (32) CUDA Cores/MP: 448 CUDA Cores
GPU Clock Speed: 1.15 GHz
Memory Clock rate: 1500.00 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 786432 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: Yes
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 4 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Quadro 600"
CUDA Driver Version / Runtime Version 4.0 / 4.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1023 MBytes (1072889856 bytes)
( 2) Multiprocessors x (48) CUDA Cores/MP: 96 CUDA Cores
GPU Clock Speed: 1.28 GHz
Memory Clock rate: 800.00 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 131072 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 3 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 4.0, NumDevs = 2, Device = Tesla C2050 / C2070, Device = Quadro 600
[deviceQuery] test results...
PASSED

Press ENTER to exit...
-------------------------------------------------------------------------
-------------------------------------------------------------------------
NOTE VARIE:
---------------------------------------------

**Nota** il gcc e g++ 4.5 non e` ancora supportato
quindi aggiungere:
sudo apt-get install gcc-4.4
sudo ln -s -f /usr/bin/gcc-4.4 /usr/bin/gcc

sudo apt-get install g++-4.4
sudo ln -s -f /usr/bin/g++-4.4 /usr/bin/g++

---------------------------------------------

PER VEDERE SE LA SCHEDA GRAFICA C'E`: lspci

---------------------------------------------

per cambiare IP address usare direttamente change eth0 connection in connections
NO /etc/network/interfaces

-----------------------------
per cambiare la porta di ssh:

se non c'e` open ssh, installare:
sudo apt-get install openssh-server

change Port in

sudo emacs -nw /etc/ssh/ssh_config

and in

sudo emacs -nw /etc/ssh/sshd_config

Finire con:

sudo /etc/init.d/ssh restart

-----------------------------------------------

per chiamare ssh:
ssh -p nuovaporta
rsync -a --rsh='ssh -pnuovaporta'

---------------------------------------------

per spegnere/accendere X server:
DA CTRL ALT F1 prima di loggarsi
sudo service gdm stop
sudo service gdm start
---------------------------------------------
ripeto la serie di comandi magici su xorg:

sudo nvidia-xconfig

sudo pico /etc/default/grub

change
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

to

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash vmalloc=256M"

sudo update-grub

sudo reboot

sudo nvidia-xconfig --query-gpu-info

sudo nvidia-xconfig --sli=On

sudo reboot

-----------------------------
Ho 2 GPU: il tesla solo per il calcolo e una quadro per la grafica.
Cuda di default usa tutte le GPU che trova (quindi anche la quadro) per i conti.
Pero` la quadro e` molto + delicata del tesla (ad esempio è fatta per sopportare temperature massime + basse).

Oggi, probabilmente a causa della particolare pesantezza del job, la quadro si surriscaldava troppo nel runnare starlab.
Allora ho scoperto una cosa interessante per costringere cuda a usare una sola delle 2 gpu:
*ho prima usato il comando deviceQuery delle SDK per capire come si identificano le 2 GPU

./deviceQuery -noprompt | egrep "^Device"
risposta
Device 0: "Tesla C2050 / C2070"
Device 1: "Quadro 600"

*ho poi modificato una apposita variabile d'ambiente per costringere cuda a usare solo la 0:

export CUDA_VISIBLE_DEVICES="0"

* poi ho lanciato starlab dalla stessa shell in cui avevo modificato la variabile CUDA_VISIBLE_DEVICES
in questo modo starlab runna solo sul tesla e non mi arrostisce la quadro!

ps: ovviamente per controllare la temperatura delle schede si usa il comando nvidia-settings

-----------------------