Upcoming Webinar: Computational Storage Acceleration Using Intel® Agilex™ FPGAs
Today’s computational workloads are larger, more complex, and more diverse than ever before. The explosion of applications such as high-performance computing (HPC), artificial intelligence (AI), machine vision, analytics, and other specialized tasks is driving the exponential growth of data. At the same time, the trend towards using virtualized servers, storage, and network connections means that workloads are growing in scale and complexity. Traditionally, data is conveyed to a computational engine -- such as a central processing unit (CPU) -- for processing, but transporting the data takes time, consumes power, and is increasingly proving to be a bottleneck in the overall process. The solution is computational storage, also known as in-storage processing (ISP). The idea here is that, instead of bringing the data to the computational engine, computational storage brings the computational engine to the data. In turn, this allows the data to be processed and analyzed where it is generated and stored. To learn more on this concept, please join us for a webinar led by Sean Lundy from Eideticom, Craig Petrie from BittWare, and myself: Computational Storage Using Intel® Agilex™ FPGAs: Bringing Acceleration Closer to Data The webinar will take place on Thursday 11 November 2021 starting at 11:00 a.m. EST. Sean will introduce Eideticom’s NoLoad computational storage technology for use in data center storage and compute applications. NoLoad technology provides CPU offload for applications, resulting in the dramatic acceleration of compute-plus-data intensive tasks like storage workloads, data bases, AI inferencing, and data analytics. NoLoad’s NVMe-compliant interface simplifies the deployment of computational offload by making it straightforward to deploy in servers of all types and across all major operating systems. Craig will introduce BittWare’s new IA-220-U2 FPGA-based Computational Storage Processor (CSP) that supports Eideticom’s NoLoad technology as an option. The IA-220-U2 CSP, which is powered by an Intel Agilex F-Series FPGA with 1.4M logic elements (LEs), features PCIe Gen 4 for twice the bandwidth offered by PCIe Gen 3 solutions. This CSP works alongside traditional Flash SSDs, providing accelerated computational storage services (CSS) by performing compute-intensive tasks, including compression and/or encryption. This allows users to build out their storage using standard SSDs instead of being locked into a single vendor’s storage solutions. BittWare’s IA-220-U2 accelerates NVMe FLASH SSDs by sitting alongside them as another U.2 Module. (Image source: Bittware) We will also discuss the Intel Agilex™ FPGAs that power BittWare’s new CSP. Built on Intel’s 10nm SuperFin Technology, these devices leverage heterogeneous 3D system-in-package (SiP) technology. Agilex I-Series FPGAs and SoC FPGAs are optimized for bandwidth-intensive applications that require high-performance processor interfaces, such as PCIe Gen 5 and Compute Express Link (CXL). Meanwhile, Agilex F-Series FPGAs and SoC FPGAs are optimized for applications in data center, networking, and edge computing. With transceiver support up to 58 Gbps, advanced DSP capabilities, and PCIe Gen 4 x16, the Agilex F-Series FPGAs that power BittWare’s new CSP provide the customized connectivity and acceleration required by compute-plus-data intensive power sensitive applications such as HPC, AI, machine vision, and analytics. This webinar will be of interest to anyone involved with these highly complex applications and environments. We hope to see you there, so Register Now before all the good virtual seats are taken.2.2KViews0likes0CommentsArrow’s Tech Snacks Festival features three delicious presentations to help you develop motion-control, video, and high-performance FPGA applications
This year, Arrow’s 10-day virtual Tech Snacks Festival (June 7-18) will include more than forty snack-sized sessions led by industry experts on topics ranging from AI/ML to Industry 4.0 and the future of automotive electronics. During the festival, you’ll have the opportunity to get some expert help from Intel presenters who will cover three hot application areas: motion-control, video, and high-performance FPGA applications. These are 15-minute sessions, so you won’t need to reschedule your entire day to attend. Here are the details for the three Intel® FPGA Tech Snack presentations: Wednesday, June 9: Ben Jeppesen and Diwakar Bansal will discuss real-time, deterministic, single- and multi-axis motion control for industrial drives and robotics. Tuesday, June 15: Jean-Michel Vuillamy will discuss FPGA-based video processing, made easy. Thursday, June 17: Graham Baker will discuss the breakthrough performance of 10nm Intel® Agilex® FPGAs. (See “Breakthrough FPGA News from Intel” for more details.) Each 15-minute session is directly followed by an optional 45-minutes deep-dive on the topic, including Q&A for those looking for a more in-depth experience. In addition, there are a couple of hour-long panel discussions that include Intel FPGA experts that you might want to attend: Tuesday, June 8: “AI is predicted to reach human levels by 2029. Where are we right now and how will this impact designs of the future?” Dr. Mark Jervis from the Intel Programmable Solutions Group, is participating on this panel. Thursday, June 17: “Getting ready for 5G. Doing things better and faster and creating new services and products we can’t foresee yet.” Martin Wiseman from the Intel Programmable Solutions Group, is participating on this panel. For more information and to register, click here. Notices and Disclaimers Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.810Views0likes0CommentsCan FPGAs outperform GPUs for some AI workloads? Answer: Yes
For many AI workloads, it can be challenging to achieve the full compute capacity reported by GPU vendors. Even for highly parallel computation such as general matrix multiplication (GEMM), GPUs can only achieve high utilization at certain large matrix sizes. FPGAs offer a different approach to AI-optimized hardware. Unlike GPUs, FPGAs offer unique fine-grained spatial reconfigurability where the output of each function can be routed directly to the input of the function that needs it. This approach allows greater flexibility to accommodate specific AI algorithms and application characteristics that enable improved utilization of available FPGA compute capabilities and, therefore, improved performance. Specialized soft processors, also called overlays, allow FPGA programming in a fashion similar to processors, where the FPGA programming is done purely via software toolchains. This programming approach abstracts away FPGA-specific hardware complexity. A new White Paper titled “Real Performance of FPGAs Tops GPUs in the Race to Accelerate AI” presents the first performance evaluation of the new Intel® Stratix® 10 NX FPGA in comparison to the Nvidia T4 and V100 GPUs. This performance evaluation was conducted over a suite of real-time inference workloads, based on results published in a paper presented at the 2020 IEEE International Conference on Field Programmable Technology. The workloads for the FPGA were deployed using an implementation of a soft AI processor overlay called the Neural Processing Unit (NPU) with a tool chain that enables software-centric FPGA programming without invoking FPGA-specific hardware EDA tools. Results show that the Intel Stratix 10 NX FPGA achieves far better utilization and performance than the tested GPUs for these AI workloads. Want the details? Click here to download the White Paper. Notices and Disclaimers Intel technologies may require enabled hardware, software, or service activation. No product or component can be absolutely secure. Your costs and results may vary. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.1.1KViews0likes0CommentsNextPlatform.com article describes Intel® oneAPI use at CERN for Large Hadron Collider (LHC) research
Independent consultant James Reinders has just published a comprehensive article on the NextPlatform.com Web site titled “CERN uses [Intel®] DL Boost, oneAPI to juice inference without accuracy loss,” which describes the use of deep learning and Intel® oneAPI by CERN to accelerate Monte Carlo simulations for Large Hadron Collider (LHC) research. Reinders writes that CERN researchers “have demonstrated success in accelerating inferencing nearly two-fold by using reduced precision without compromising accuracy at all.” The work is being carried out as part of Intel’s long-standing collaboration with CERN through CERN openlab. If Reinders’ name looks familiar to you, that’s because he recently published a book about the use of Data Parallel C++ (DPC++), which is the foundation compiler technology at the heart of Intel oneAPI. (See “Springer and Intel publish new book on DPC++ parallel programming, and you can get a free PDF copy!”) CERN researchers found that about half of the computations in a specific neural network (NN) called a Generative Adversarial Network (GAN) could be switched from FP32 to INT8 numerical precision, which is directly supported by Intel® DL Boost, without loss of accuracy. GAN performance doubled as a result while accuracy was not affected. Although this work was done using Intel® Xeon® Scalable Processors with direct INT8 support, Reinders’ article also makes the next logical jump: “INT8 has broad support thanks to Intel Xeon [Scalable Processors], and it is also supported in Intel® Xe GPUs. FPGAs can certainly support INT8 and other reduced precision formats.” Further, writes Reinders: “The secret sauce underlying this work and making it even better: oneAPI makes Intel DL Boost and other acceleration easily available without locking in applications to a single vendor or device” “It is worth mentioning how oneAPI adds value to this type of work. Key parts of the tools used, including the acceleration tucked inside TensorFlow and Python, utilize libraries with oneAPI support. That means they are openly ready for heterogeneous systems instead of being specific to only one vendor or one product (e.g. GPU). “oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architectures. Intel helped create oneAPI, and supports it with a range of open source compilers, libraries, and other tools. By programming to use INT8 via oneAPI, the kind of work done at CERN described in this article could be carried out using Intel Xe GPUs, FPGAs, or any other device supporting INT8 or other numerical formats for which they may quantize.” For additional information about Intel oneAPI, see “Release beta09 of Intel® oneAPI Products Now Live – with new programming tools for FPGA acceleration including Intel® VTune™ Profiler.” You may also be interested in an instructor-led class titled “Using Intel® oneAPI Toolkits with FPGAs (IONEAPI).” Notices & Disclaimers Performance varies by use, configuration, and other factors. Learn more at www.Intel.com/PerformanceIndex. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. 1.5KViews0likes0CommentsAre FPGAs good for accelerating AI? VentureBeat takes a closer look
VentureBeat has just posted an article titled “FPGA chips are coming on fast in the race to accelerate AI” that takes an in-depth look at the use of FPGAs for Artificial Intelligence (AI) applications. The article cites five AI application challenges that FPGAs help to overcome: Overcoming I/O bottlenecks Providing acceleration for high performance computing (HPC) clusters Integrating AI into workloads Enabling sensor fusion Adding extra capabilities beyond AI The article also discusses Microsoft’s integration of FPGA-based AI into Microsoft Azure and Project Brainwave and ends with the following statement: “Today’s FPGAs offer a compelling combination of power, economy, and programmable flexibility for accelerating even the biggest, most complex, and hungriest models.” If you are developing applications that incorporate AI, be sure to take a look at “FPGA chips are coming on fast in the race to accelerate AI.” Notices & Disclaimers Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.1.7KViews0likes0CommentsBittWare’s 520NX Accelerator Card harnesses AI-optimized power of the Intel® Stratix® 10 NX FPGA
BittWare has just announced the 520NX AI Accelerator PCIe card based on the AI-optimized Intel® Stratix® 10 NX FPGA, which incorporates specialized AI Tensor blocks with a theoretical peak computational speed of 143 INT8 TOPS and 8 Gbytes of in-package, stacked high-bandwidth memory (HBM2). In addition to the Intel Stratix 10 NX FPGA’s internal resources, the 520NX AI Accelerator card’s on-board resources include a PCIe Gen3 x16 host interface, four independently clocked QSFP28 card cages that support as many as four 100G optical transceiver modules, and two DIMM sockets that can accommodate as much as 256 Gbytes of memory. The 520NX offers enterprise-class features and capabilities for application development and deployment including: HDL developer toolkit: API, PCIe drivers, application example designs, and diagnostic self-test Passive, active, or liquid cooling options Multiple OCuLink expansion ports for additional PCIe, storage, or network I/O The BittWare 520NX AI Accelerator card based on the AI-optimized Intel Stratix 10 NX FPGA The Intel Stratix 10 NX FPGA was introduced earlier this year. (See “Intel has just announced its first AI-optimized FPGA – the Intel® Stratix® 10 NX FPGA – to address the rapid increase in AI model complexity.”) More recently, the FPGA’s AI capabilities have been demonstrated by Myrtle.ai, running a WaveNet text-to-speech application that can synthesize 256 simultaneous streams of 16 kbps audio. (See “WaveNet Neural Network runs on Intel® Stratix® 10 NX FPGA, synthesizes 256 16 kHz audio streams in real time.”) The new BittWare 520NX AI Accelerator card makes it much easier to develop applications based on the Intel Stratix 10 NX FPGA by providing the FPGA on a proven, ready-to-integrate PCIe card. For more information about the 520NX AI Accelerator card, please contact BittWare directly. Notices & Disclaimers Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.2KViews0likes0CommentsMore details on the Intel® Stratix® 10 NX FPGA, the first AI-optimized Intel® FPGA, now available in a new White Paper
The increasing complexity of AI models and the explosive growth of AI model size are both rapidly outpacing innovations in compute resources and memory capacity available on a single device. AI model complexity now doubles every 3.5 months or about 10X per year, driving rapidly increasing demand in AI computing capability. Memory requirements for AI models are also rising due to an increasing number of parameters or weights in a model. The Intel® Stratix® 10 NX FPGA is Intel’s first AI-optimized FPGA, developed to enable customers to scale their designs with increasing AI complexity while continuing to deliver real-time results. The Intel Stratix 10 NX FPGA fabric includes a new type of AI-optimized tensor arithmetic block called the AI Tensor Block. These AI Tensor Blocks are tuned for the common matrix-matrix or vector-matrix multiplications used for AI computations and contain dense arrays of lower precision multipliers typically used for AI model arithmetic. The smaller multipliers in these AI Tensor Blocks can also be aggregated to construct larger-precision multipliers. The AI Tensor Block’s architecture contains three dot-product units, each of which has ten multipliers and ten accumulators for a total of 30 multipliers and 30 accumulators within each block. The AI Tensor Block multipliers’ base precisions are INT8 and INT4 along with shared exponent to support Block Floating Point 16 (Block FP16) and Block Floating Point 12 (Block FP12) numerical formats. Multiple AI Tensor Blocks can be cascaded together to support larger vector calculations. A new White Paper titled “Pushing AI Boundaries with Scalable Compute-Focused FPGAs” covers the new features and performance capabilities of the Intel Stratix 10 NX FPGAs. Click here to download the White Paper. If you’d like to see the Intel Stratix 10 NX FPGA in action, please check out the recent blog “WaveNet Neural Network runs on Intel® Stratix® 10 NX FPGA, synthesizes 256 16 kHz audio streams in real time.” Notices & Disclaimers Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.2.1KViews0likes0CommentsWaveNet Neural Network runs on Intel® Stratix® 10 NX FPGA, synthesizes 256 16 kHz audio streams in real time
State-of-the-art text-to-speech (TTS) synthesis systems generally employ two neural network models that run sequentially to generate audio. The first model generates acoustic features such as spectrograms from input text. The second model, a vocoder, takes intermediate features from the first model and produces speech. Tacotron 2 is often used as the first model. A new White Paper from Myrtle.ai titled “Implementing WaveNet Using Intel® Stratix® 10 NX FPGA for Real-Time Speech Synthesis” focuses on the second model, a state-of-the-art vocoder based on a neural network model called WaveNet, which produces natural-sounding speech with near-human fidelity. The key to the WaveNet model’s high speech quality is an autoregressive loop, but this property also makes the network exceptionally challenging to implement for real-time applications. Efforts to accelerate WaveNet models generally have not achieved real-time audio synthesis. The Myrtle.ai White Paper describes efforts to implement a WaveNet model using an Intel® Stratix® 10 NX FPGA. By using Block Floating Point (BFP16) quantization, which the Intel Stratix 10 NX FPGA supports, Myrtle.ai has been able to deploy a real-time WaveNet model that synthesizes 256 16 kHz audio streams in real time. For more details and to download the White Paper, click here. To see a video demo on this system in action, click here. Notices & Disclaimers Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.1.7KViews0likes0CommentsNew article discusses why FPGAs are a good choice for deep-learning applications and research
Many AI workloads such as image recognition rely heavily on parallelism to achieve good performance. For that reason, early AI researchers swiftly adopted GPUs, which provide significant amount of computational parallelism. GPUs were originally designed to render video and graphics, so they excel at parallel processing and can perform a very large number of arithmetic operations in parallel. GPUs deliver incredible acceleration for cases where the same computations must be performed many times in rapid succession. However, GPUs have their limits and can’t deliver as much performance as an AI-specific ASIC purpose-built for a given deep-learning application. ASICs are limited in a different way, because of their high non-recurring engineering (NRE) costs and long development cycle, which can be anywhere from 12 months to years for development, verification, and fabrication. FPGAs offer ASIC-like hardware customization and can be programmed to deliver performance similar to a GPU or an ASIC for AI workloads but with speedy development cycles. The FPGA’s reprogrammable, reconfigurable nature makes FPGAs well suited to the rapidly evolving AI landscape. FPGAs allow designers to test algorithms quickly and get to market faster with a high-performance solution. A new article titled “FPGA vs. GPU for Deep Learning” explores these topics in detail. The article discusses the unique advantages FPGAs enjoy for deep-learning applications. It also discusses the unique FPGA-related offerings from Intel including: Intel® FPGAs and the AI-optimized Intel® Stratix® 10 NX FPGA Intel® Distribution of OpenVINO™ toolkit Intel® FPGA Deep Learning Acceleration Suite Intel® FPGA SDK for OpenCL™ software technology Click here to read the article. For more information about the AI-optimized Intel Stratix 10 NX FPGA, see “Intel has just announced its first AI-optimized FPGA – the Intel® Stratix® 10 NX FPGA – to address the rapid increase in AI model complexity.” Notices & Disclaimers Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.883Views0likes0CommentsTerasic DE10-Agilex Accelerator PCIe board combines Intel® Agilex™ F-Series FPGA with four DDR4 SO-DIMM SDRAM sockets and two QSFP-DD connectors
If you’re itching to get your hands on the innovative features built into the new family of Intel® Agilex™ FPGAs like the second-generation Intel® HyperFlex™ architecture or the improved DSP capabilities including half-precision floating point (FP16) and BFLOAT 16 computational abilities, then consider the new Terasic DE10-Agilex Accelerator board. This PCIe card combines an Intel Agilex F-Series FPGA with four independent DDR4 SO-DIMM SDRAM sockets and two QSFP-DD connectors on a three-quarter length PCIe board. The board’s host interface is a PCIe Gen 4.0 x16 port. Each SO-DIMM memory socket accommodates 8 or 16 Gbytes of DDR4 memory, for a maximum total SDRAM capacity of 64 Gbytes, and each QSFP-DD connector accommodates Ethernet transceiver modules to 200G. The board is available with two different cooling options: a 2-slot version with integrated fans or a single-slot, passively cooled version. The Terasic DE10-Agilex Accelerator PCIe card combines an Intel® Agilex™ F-Series FPGA with four independent DDR4 SO-DIMM SDRAM sockets and two QSFP-DD connectors The Terasic DE10-Agilex PCIe board supports the Intel® OpenVINO™ toolkit, OpenCL™ BSP, and Intel® oneAPI Toolkits used for developing code for myriad high-performance workloads including computer vision and deep learning. The Intel Agilex FPGA family delivers up to 40% higher performance 1 or up to 40% lower power 1 for data center, NFV and networking, and edge compute applications. For more technical information about the Terasic DE10-Agilex Accelerator Board or to order the product, please contact Terasic directly. Notices and Disclaimers 1 This comparison based on Intel® Agilex™ FPGA and SoC family vs. Intel® Stratix® 10 FPGA using simulation results and is subject to change. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications, and roadmaps. Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.1.9KViews0likes0Comments