Started Labbook 2010.
March 4, 2009
Tested perflab-rotate on ow125 with -02 option:
Rotate: Version = naive_rotate: baseline implementation:
Dim 64 128 256 512 1024 Mean
Your CPEs 8.4 9.5 17.2 31.0 105.1
Baseline CPEs 8.7 9.5 17.2 31.3 103.6
Speedup 1.0 1.0 1.0 1.0 1.0 1.0
Rotate: Version = rotate_b64: average optimalization using 64x64 blocking:
Dim 64 128 256 512 1024 Mean
Your CPEs 10.6 10.3 12.0 20.4 22.9
Baseline CPEs 8.7 9.5 17.2 31.3 103.6
Speedup 0.8 0.9 1.4 1.5 4.5 1.5
Rotate: Version = rotate_b128: average optimalization using 128x128 blocking:
Dim 64 128 256 512 1024 Mean
Your CPEs 10.5 10.4 17.9 20.6 25.1
Baseline CPEs 8.7 9.5 17.2 31.3 103.6
Speedup 0.8 0.9 1.0 1.5 4.1 1.4
Rotate: Version = rotate_block: maximum optimalization 112x112 block2 (nxn):
Dim 128 256 512 1024 2048 Mean
Your CPEs 10.0 17.7 23.2 26.0 58.4
Baseline CPEs 9.5 17.2 31.3 103.6 107.4
Speedup 0.9 1.0 1.4 4.0 1.8 1.6
Rotate: Version = rotate_block: maximum optimalization nxn block(n=128):
Dim 64 128 256 512 1024 Mean
Your CPEs 7.9 8.0 12.5 16.6 23.3
Baseline CPEs 8.7 9.5 17.2 31.3 103.6
Speedup 1.1 1.2 1.4 1.9 4.5 1.7
Rotate: Version = rotate_block: maximum optimalization nxn block2(n=128):
Dim 128 256 512 1024 2048 Mean
Your CPEs 8.1 12.2 16.6 23.1 62.4
Baseline CPEs 9.5 17.2 31.3 103.6 107.4
Speedup 1.2 1.4 1.9 4.5 1.72 1.7
Rotate: Version = rotate_block: maximum optimalization 136x136 block (nxn):
Dim 128 256 512 1024 2048 Mean
Your CPEs 8.1 12.7 17.4 23.1 65.8
Baseline CPEs 9.5 17.2 31.3 103.6 107.4
Speedup 1.2 1.4 1.8 4.5 1.6 1.8
Rotate: Version = rotate_block: maximum optimalization 140x140 block:
Dim 128 256 512 1024 2048 Mean
Your CPEs 8.1 12.4 16.4 22.7 65.2
Baseline CPEs 9.5 17.2 31.3 103.6 107.4
Speedup 1.2 1.4 1.9 4.6 1.6 1.9
Rotate: Version = rotate_block: maximum optimalization 142x142 block:
Dim 256 512 1024 1536 2048 Mean
Your CPEs 12.7 16.5 25.1 18.7 66.9
Baseline CPEs 17.2 31.3 103.6 98.1 107.4
Speedup 1.4 1.9 4.1 5.6 1.6 2.5
Rotate: Version = rotate_block: maximum optimalization 144x144 block (nxn):
Dim 128 256 512 1024 2048 Mean
Your CPEs 8.0 13.0 16.5 21.6 65.3
Baseline CPEs 9.5 17.2 31.3 103.6 107.4
Speedup 1.2 1.3 1.9 4.8 1.6 1.9
Rotate: Version = rotate_block: maximum optimalization 152x152 block (nxn):
Dim 128 256 512 1024 2048 Mean
Your CPEs 8.0 12.5 16.4 22.9 70.7
Baseline CPEs 9.5 17.2 31.3 103.6 107.4
Speedup 1.2 1.4 1.9 4.5 1.5 1.8
Rotate: Version = rotate_block: maximum optimalization 160x160 block2 (nxn):
Dim 128 256 512 1024 2048 Mean
Your CPEs 10.0 18.5 23.1 28.2 76.8
Baseline CPEs 9.5 17.2 31.3 103.6 107.4
Speedup 1.0 0.9 1.4 3.7 1.4 1.4
Rotate: Version = rotate_b32_u2: Rotate using 32x32 blocking, 2x unrolling:
Dim 256 512 1024 1536 2048 Mean
Your CPEs 9.9 14.3 15.8 14.7 17.9
Baseline CPEs 17.2 31.3 103.6 98.1 107.4
Speedup 1.7 2.2 6.6 6.7 6.0 4.0
Rotate: Version = rotate_hybrid: Hybrid of toggle32 and toggle16x2:
Dim 256 512 1024 1536 2048 Mean
Your CPEs 4.6 10.1 13.9 14.2 14.2
Baseline CPEs 17.2 31.3 103.6 98.1 107.4
Speedup 3.7 3.1 7.4 6.9 7.6 5.4
Can it much faster, can blocking help?
February 25, 2009
- Tested perflab on faro with -g option:
Dim 64 128 256 512 1024 Mean
Your CPEs 5302.9 10403.6 20070.4 53159.6 251782.8
Baseline CPEs 72.0 74.0 78.0 190.0 260.0
Speedup 0.0 0.0 0.0 0.0 0.0 0.0
- Tested perflab on faro with -02 option:
Line: Version = line() function:
Dim 64 128 256 512 1024 Mean
Your CPEs 3879.0 7540.6 15569.7 41288.9 219640.0
Baseline CPEs 72.0 74.0 78.0 190.0 260.0
Speedup 0.0 0.0 0.0 0.0 0.0 0.0
- Tested perflab on ow125 with -02 option:
Line: Version = line() function:
Dim 64 128 256 512 1024 Mean
Your CPEs 1495.7 3006.3 6746.9 27087.2 100104.9
Baseline CPEs 72.0 74.0 78.0 190.0 260.0
Speedup 0.0 0.0 0.0 0.0 0.0 0.0
- Tested perflab on ow125 with blackline
Line: Version = black_line: maximum-call out of the loop:
Dim 64 128 256 512 1024 Mean
Your CPEs 18.4 18.1 19.4 36.8 69.3
Baseline CPEs 72.0 74.0 78.0 190.0 260.0
Speedup 3.9 4.1 4.0 5.2 3.8 4.2
- Tested perflab on ow125 with naive -O10 -fomit-frame-pointer -fno-branch-count-reg -fstrength-reduce -fexpensive-optimizations -funroll-loops
Line: Version = naive_line: baseline implementation:
Dim 64 128 256 512 1024 Mean
Your CPEs 1101.6 2231.7 4781.8 21081.2 68834.4
Baseline CPEs 72.0 74.0 78.0 190.0 260.0
Speedup 0.1 0.0 0.0 0.0 0.0 0.0
- Tested perflab on ow125 with blackline with -g
Dim 64 128 256 512 1024 Mean
Your CPEs 24.2 24.0 26.5 53.2 99.9
Baseline CPEs 1102.0 2232.0 4782.0 21081.0 68834.0
Speedup 45.5 93.1 180.1 396.6 689.0 183.5
- Tested perflab on ow125 with blackline with -g
Dim 64 128 256 512 1024 Mean
Your CPEs 1494.1 3031.4 6745.2 27140.5 102481.8
Baseline CPEs 1102.0 2232.0 4782.0 21081.0 68834.0
Speedup 0.7 0.7 0.7 0.8 0.7 0.7
- Tested perflab on ow125 with best one with -g
Dim 64 128 256 512 1024 Mean
Your CPEs 12.5 12.1 11.9 12.1 12.1
Baseline CPEs 1102.0 2232.0 4782.0 21081.0 68834.0
Speedup 88.1 184.9 402.3 1740.6 5711.4 579.1
February 10, 2009
- Tested cpuinfo at pc-unreal in linux-mode.
- pc-unreal has no internet. eth2 is not active. Added /sbin to PATH in .bashrc to be able to call ifconfig.
- Downloaded softpkg-2.4 on pc-unreal. Configure and make install gave no problems, make install-pkg did (no default package descriptions).
- Defined a package discription for softpkg. Commands man and desc work fine, using path has strange side-effects (no path at all). Unclear where path is set in SUSE (not in $HOME/.bashrc).
- By inspecting /var/log/messages and with dmsg inspected the problem of the internet at pc-unreal. Configuration-file with mac-adress of the 3-Com Network card was missing. Created as superuser root a configuration for the 3-Com Network card by copying the existing ifcfg-eth-id-00:15:f2:4e:0a:69 to a config-file with the mac-adress of the 3-Com card. The mac-adress of the 3-Com card was found by typing ifconfig eth1. Changed the only the name in this configuration file. Gave as su the command /etc/init.d/network restart and we could ping!
- Started YaST as su, and did an update and added findutils-locate.
- Initialized the locate-database by typing 'sudo updatedb'.
January 28, 2009
- Downloaded and installed boomerang in ~/packages. Program only runs in its own directory.
- Program seems only to decompile executables (no object-files).
- Decompiled code/data/problem2_29.c
- Result:
arith(int param2, int param1) {
int local2; // r26{30}
int local5; // r26
int local6; // r28
local5 = param1;
if (param1 < 0) {
local5 = param1 + 3;
}
local2 = (local5 >> 2) + param2 * 15;
return local2; /* WARNING: Also returning: (local5 >> 2) + param2 * 15, local2 */
}
January 22, 2009
- Installed gcc-2.95.3 in edu:${HOME}/packages with a remote copy from faro.
- Added PACKAGEPATH to .cshrc
January 21, 2009
- Checked that make was GNU make, and configured faro as i686-redhat-linux-gnu.
- Performed 'make bootstrap-lean' (after make move-if-changed executable).
- Performed 'make -k check', but test mostly fails
- Performed 'make install', which has created a 32-bits ~/packages/gcc-2.95.3
- Need 44 Mb to install packages/gcc-2.95.3 on edu (only 5 Mb available)
- Cleaned up 30 Mb in dot. Cleaned 30 Mb in onderwijs (69 Mb available)
- Remode copied cpuinfo, and with the 32-bits ~/packages/gcc-2.95.3 the code works as expected (even on foobar1).
-
- Still, configure still cannot determine the host on foobar1.
- Did a configure on foobar1 with host x86_64-redhat-linux-gnu.
- Performed 'make bootstrap-lean'
- Performed 'make install', which has created ~/x86_64/gcc-2.95.3. Unfortunatelly, still 32-bits.
- Performing 'make bootstrap-lean' with gcc-4.1.2 directly fails.
-
- Downloaded gcc-4.1.2 and unpacked it in ~/src/.
- Did a configure on foobar1 with host x86_64-redhat-linux-gnu.
- Performed 'make bootstrap-lean' with /usr/local/gcc (64bits version 4.1.2) on scratch (obj-directory is huge).
- Had to make some scripts executable, and modify some old-dependencies in the Makefiles, to build succesfully.
- Performed 'make install', which has created ~/x86_64/gcc-4.1.2. Package is quite large (480 Mb). Still no support for other versions.
January 20, 2009
- Downloaded gcc-2.95.3 and unpacked it in ~/src/.
- Both on faro and foobar1 configure failed: ~/src/gcc-2.95.3/configure --prefix=/home/arnoud/packages/gcc-2.95.3
Config.guess failed to determine the host type. You need to specify one.
- Specifying i386 or i686 on faro works, until the error 'Configuration i686-pc-none not supported'
- Changed on faro to gcc-2.95.3. Looked at the specs, and found i686-pc-linux-gnu.
- Now, configure works.
- Same command fails for foobar (gcc 4.1.2). No younger versions can be found on foobar1.
- Plan, build gcc-2.95.3 at faro, install in packages, and try again at foobar1.
January 7, 2009
- Started with cpuinfo-assignment on ow130 (Intel(R) Pentium(R) Dual CPU E2180 @ 2.00GHz). Build failed for clock.c with gcc version 4.1.2.
- In /usr/libexec/gcc/x86_64-redhat-linux/ also version 3.4.6 is available.
- Could only find gcc version 4.2.0 in /usr/local/arch/gcc-4.2.0/libexec.
- Most promising looks /usr/bin/gcc34
- Finally got something working (changing registers from (eax) to (a)).
- Unfortunatelly, Intel processors seem to react differenly to CPUID (shorter vendor-name, but a lot of information in register_c and _d). The model numbers are not that important, many processors with different names (i.e. E2180 and E2200) report the same CPUID model 06FD. Linux /proc/cpuinfo knows the E-name, and reports a cpuid level of 10. Anyway, CPUID is still valuable, for instance to ask for the number of cores.
- Strange enough, the register-values are still the same as in the example cpuid.c, so their seems to be something going wrong in the 64-bits convertion to the vendor-string. TBC.