What precision are you looking at? Vector add on non-HBM FPGAs will become memory-bound long before you can use up the compute resources. If your application is compute-heavy with little data reuse, there is no point in using FPGAs for it since it will always be faster on GPUs simply because they have considerably higher peak compute performance and memory bandwidth compared to their same-generation FPGAs.
For current-generation Arria 10 and Stratix 10 Intel FPGAs, you can do one FP32 MAC per DSP. The biggest Arria 10 has 1518 DSPs. The peak DSP operating frequency on Arria 10 is 480 MHz. Hence, the theoretical peak FP32 performance of Arria 10 will be ~1.45 TFLOP/s. However, in reality, you will never be able to fully utilize all the DSPs and run at 480 MHz. Best case scenario you will run at ~350 MHz. Moreover, it is near-impossible for an application to fully map to MAC operations and it is also impossible to map every application to exactly 1518 DSPs which means you will always have some unused DSPs. The real-world peak FP32 performance of Arria 10 will be around 900 GFLOP/s. Similarly, for the biggest Stratix 10 GX, do not expect a frequency over 450 MHz with hyperflex which would give you a real-world peak FP32 performance of ~4.5 TFLOP/s. Of course these numbers will be achievable only if your application has an abnormally high amount of data reuse that is exploited using on-chip memory to minimize external memory access or else, as I mentioned above, your performance will be bound by the extremely low external memory bandwidth of these FPGAs long before you can fully use their compute potential. Of course there is also Stratix 10 MX with HBM, which gives you a reasonable amount of memory bandwidth, but will have 30% lower peak compute performance compared to Stratix 10 GX due to a lot of FPGA area being taken by the HBM controller, leading to the largest MX FPGA having a lot less DSPs, logic, BRAM, etc, compared to the largest GX FPGA.
Take a look at the roofline model, if you are not already familiar with it, and use the numbers I mentioned above and the theoretical peak memory bandwidth of these FPGAs to determine what kind of performance rations you can expect compared to CPUs and GPUs.