Run the sample parallelly by running (e.g., there are 4 devices, each with 4 processors, so 4*4=16):mpirun -np 16 --hostfile host_file mpi_hello_world
You should see something like:Hello world from processor tegra-server, rank 0 out of 16 processors
Hello world from processor tegra-server, rank 1 out of 16 processors
Hello world from processor tegra-server, rank 2 out of 16 processors
Hello world from processor tegra-server, rank 3 out of 16 processors
Hello world from processor tegra-1, rank 4 out of 16 processors
Hello world from processor tegra-1, rank 5 out of 16 processors
Hello world from processor tegra-1, rank 6 out of 16 processors
Hello world from processor tegra-1, rank 7 out of 16 processors
Hello world from processor tegra-2, rank 8 out of 16 processors
Hello world from processor tegra-2, rank 9 out of 16 processors
Hello world from processor tegra-2, rank 10 out of 16 processors
Hello world from processor tegra-2, rank 11 out of 16 processors
Hello world from processor tegra-3, rank 12 out of 16 processors
Hello world from processor tegra-3, rank 13 out of 16 processors
Hello world from processor tegra-3, rank 14 out of 16 processors
Hello world from processor tegra-3, rank 15 out of 16 processors
The actual orders and messages might stagger, this is just an artificial one for simplicity.