Forum Discussion
GNL
New Contributor
5 years agoHi again, it seems like ahead-of-time compiling drastically reduced our run-time performance for the device. On the other hand, it increased the host execution time..
This is the result for AOT compiling:
########################################################################
# Date: Tue Apr 14 05:18:06 PDT 2020
# Job ID: 573459.v-qsvr-1.aidevcloud
# User: u38134
# Resources: neednodes=1:gpu:ppn=2,nodes=1:gpu:ppn=2,walltime=06:00:00
########################################################################
:: setvars has already been run. Skipping any further invocation. To force its re-execution, pass --force
./vector-add
Vector Size: 100000
-------------------------------------------
Device: Intel(R) Gen9 HD Graphics NEO
Kernel exec: 0.088999 msec
Cmd Group submission: 3.9869 msec
Real exec: 9.85082 msec
VectorAddInDPCPP exec: 17.543 msec
Scalar exec: 1.785 msec
success
########################################################################
# End of output for job 573459.v-qsvr-1.aidevcloud
# Date: Tue Apr 14 05:18:12 PDT 2020
########################################################################And this is the result for JIT compiling:
########################################################################
# Date: Tue Apr 14 05:36:10 PDT 2020
# Job ID: 573469.v-qsvr-1.aidevcloud
# User: u38134
# Resources: neednodes=1:gpu:ppn=2,nodes=1:gpu:ppn=2,walltime=06:00:00
########################################################################
:: setvars has already been run. Skipping any further invocation. To force its re-execution, pass --force
./vector-add
Vector Size: 100000
-------------------------------------------
Device: Intel(R) Gen9 HD Graphics NEO
Kernel exec: 0.149666 msec
Cmd Group submission: 3.13661 msec
Real exec: 175.897 msec
VectorAddInDPCPP exec: 243.523 msec
Scalar exec: 0.099 msec
success
########################################################################
# End of output for job 573469.v-qsvr-1.aidevcloud
# Date: Tue Apr 14 05:36:15 PDT 2020
########################################################################And finally this my makefile for AOT:
CXX = dpcpp
#CXXFLAGS = -O2 -g
#LDFLAGS = -lOpenCL -lsycl
EXE_NAME = vector-add
SOURCES = src/vector-add.cpp
all: main
main:
$(CXX) -fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xsycl-target-backend '-device skl' -o $(EXE_NAME) $(SOURCES)
run:
./$(EXE_NAME)
clean:
rm -rf $(EXE_NAME)Did I skip something here? We increased the device kernel exec performance but why the host performance (Scalar exec above) is suffering now?
GNL