Forum Discussion

RubenPadial's avatar
RubenPadial
Icon for Contributor rankContributor
2 years ago

Intel FPGA AI suite accuracy drop

Hello,

I'm using Intel FPGA AI 2023.2 on ubuntu 20.04 host computer and trying to infer a custom CNN in a Intel Arria 10 SoC FPGA.

The CNN was trained with TensorFlow and the accuracy is 98.89% across the test dataset.

After converting the model to IR model with OpenVINO model optimer the accuracy remains the same.

mo
--saved_model_dir "{path_savedModelPath}"
--input_shape "{lst_inputShape}"
--model_name "{str_modelName}"
--output_dir "{path_irTargetPath}"
--use_new_frontend

However afer running the model in the Intel FPGA AI Suite IP the accuracy drops to 74.64% across the same test dataset. The architecture used is A10_FP16_Generic.arch, which has "arch_precision"=FP16. I have also tested with A10_FP16_Performance.arch and A10_Performance.arch.

dla_compiler
--march "{path_archPath}"
--network-file "{path_xmlPath}"
--o "{path_binPath}"
--foutput-format=open_vino_hetero
--fplugin "HETERO:FPGA,CPU"
--fanalyze-performance
--fdump-performance-report
--fanalyze-area
--fdump-area-report

I tried to optimize the model with "compress_to_fp16" openVINO model optimizer option but when compiling with dla_compiler I get this error:
"Layer (Name: Transpose_517_compressed, Type: Constant) is not supported:
Error occurred.
../compiler/aot_plugin/src/dla_executable_network.cpp:134 Graph is not supported on FPGA plugin due to existance of layer (Name: Transpose_517_compressed, Type: Constant)
in topology. Most likely you need to use heterogeneous plugin instead of FPGA plugin directly."
As you can see, hetero plugin option is set to FPGA and CPU. It was also tested with Intel FPGA AI Suite 2023.3 and OpenVINO 2022.3.1 with the same error message.

The accuracy in software with this compressd IR model to FP16 is 98.91 so in the FPGA the accuracy should be almos the same but there is a 24% of accuracy drop.

Find attached both IR model files.

What could be the rootcause of this accuracy drop?
What solution I can implement to improve the accuracy?

21 Replies

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    I suspect that the provided architecture might not be able to fully fit your needs. If you need to have a better performance then customizing into new architecture will better benefit your requirement. Unless the DLA you plan to run is the same as what has been tested by Intel.


  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    May I know if it is possible for you to implement custom AI bitstream or you are planning to just run on the provided bitstream?


  • Hello @JohnT_Intel,

    At this moment it is only planned to use the already provided example architectures and build the bitstreams.

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    Due you are planning to use prrovided bitstream then it will be hard for us to improve the perforrmance on the AI workload. Please let me know if you have any other queries.


  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    We do not receive any response from you to the previous question/reply/answer that I have provided. This thread will be transitioned to community support. If you have a new question, feel free to open a new thread to get the support from Intel experts. Otherwise, the community users will continue to help you on this thread. Thank you.


  • One thing to check is your channel order. By default dla_benchmark will feed input data as RGB, but the dla_benchmark option `-bgr` will reverse the channels. This could be the culprit for a performance drop of this scale.

    • RubenPadial's avatar
      RubenPadial
      Icon for Contributor rankContributor

      Hello @AndrewRooney,

      Thank you for your help but that's not the problem. The same NN was compiled to be run only in the ARM CPU by only changing the

      --fplugin flag to --fplugin "HETERO:CPU" and it worked. So the same NN result in a 74.64% acc in HETERO:FPGA,CPU mode and the wcuracy goes to 98% with HETERO:,CPU.

      After performing some tests, we found the problem was related to the fully connected layers. The input shape of the FullyConnected layer, due to some design constrains, was [1, 1, 512] without a flatten layer. If a flatten layer is included before the FC layer, the inference turns to be right. So the problem was related with the FC layer and the HETERO:FPGA:CPU compilation.