Establish an FPGA deep learning acceleration ecosystem to empower artificial intelligence

Undoubtedly, if you select the top ten most brilliant words in the technology industry in 2016, artificial intelligence & deep learning will surely be on the list. From the beginning of the Alpha dog victory over the world Go champion Li Shishi, to the end of the World Internet Conference BAT big group collective AI show, every signal is constantly reminding us, the era of artificial intelligence has come, the future has come, and is by our side .

In 1956, the Dartmouth meeting attended by John McCarthy and others officially marked the birth of artificial intelligence. After 60 years of development, artificial intelligence has experienced three waves, and development can be described as ups and downs.

For the first time, the Dartmouth Conference established the term artificial intelligence. The first sensory neural network software and chat software were born. The mathematical theorem proved that human beings entered the artificial intelligence carnival for the first time. However, people quickly discovered that these theories and models can only solve some very simple problems. They think the problem is too simple, and artificial intelligence immediately enters the trough.

The second, the 1980s Hopfield neural network and BT training algorithm proposed, artificial intelligence re-emerged, the emergence of speech recognition, speech translation model also ignited the fire of hope for the second artificial intelligence boom. However, these ideas have not been able to enter people's lives, and the second wave has also been shattered.

For the third time, with the deep learning technology proposed by Hinton in 2006 and the breakthrough in the field of image recognition in the 2012 ImageNet competition, artificial intelligence once again ushered in a fire-burning explosion. Especially in 2016, the maturity of big data technology and the leap-forward progress of high-performance computing and algorithms have brought real power to the development of artificial intelligence.

High-performance computing accelerates deep learning, GPU and FPGA have their own strengths

The rise of the artificial intelligence boom this time has a very different essential difference from the previous two, in which the acceleration of the high-performance computing application acceleration has contributed. According to Zhang Jun, product manager of Hengyang Data Application Acceleration Department, the deep learning model is based on massive data and powerful computing power. The traditional computing architecture can no longer support the deep learning of massively parallel computing needs. Accelerating the computational process through the underlying application combined with the optimization of the deep learning algorithm is an important link to promote the development of the entire industrial chain of artificial intelligence.

There are currently three main methods for distributed acceleration for deep learning: GPU, FPGA, and NPU. GPU is the first to be introduced into the field of deep learning. The well-known Alpha dog is a super-calculation platform built by 1920 CPUs + 280 GPUs. As the first company to be optimistic about the acceleration of deep learning applications, Nvidia is the leader in this field. By building the CUDA platform, the advantages of the GPU in the SIMD single-instruction multi-stream architecture are fully motivated. Combined with the features of high parallel computing and high computational throughput, the GPU can be applied to large-scale applications with computationally intensive, high-parallel, SIMD applications. Deep learning training model field. At present, GPU has pioneered the application acceleration platform and complete ecosystem including algorithms such as CNN, DNN, RNN, LSTM and enhanced learning network in the field of deep learning training model.

Although the GPU is hot, technically, it also has certain limitations.

First of all, the operating energy efficiency ratio is not good. In contrast, running a deep learning algorithm to achieve the same performance, the GPU requires much more power than the FPGA, usually, the GPU can only achieve half or less of the FPGA energy efficiency ratio.

Second, the parallel computing advantage cannot be fully utilized in the application process. Deep learning involves two computational aspects, namely, an off-line and an on-line. The GPU is very efficient in training deep learning algorithm models, but in the case of inference, only one input can be processed at a time, and the advantages of parallel computing cannot be exerted.

Third, the hardware structure is fixed without programmability. At present, the deep learning algorithm is not fully mature, and the algorithm is still in the iterative process. If the deep learning algorithm changes greatly, the GPU cannot flexibly configure the hardware structure like the FPGA and quickly enter the market. These shortcomings can be compensated for by FPGA acceleration.

It can be said that GPUs and FPGAs have their own advantages in accelerating deep learning algorithms. According to Zhang Jun, from the performance point of view, the performance of the FPGA is weaker than that of the GPU, but the performance of the FPGA in terms of performance and power ratio is very amazing. According to the Hengyang NSA series deep learning acceleration card, performance The consumption ratio can be 70GFlops/W, which is more than twice that of the NVIDIA M4. It is 5 times that of Intel's many core chips. The advantage is quite obvious. This is critical for large data centers, which can save a lot of server deployment and computer room construction and electricity costs.

From the point of view of computing advantages, we can see that the SIMD architecture is basically used in the field of deep learning model training, that is, only one instruction can process a large amount of data in parallel, which can just take advantage of GPU parallel computing, but it is easy to ignore. Yes, after completing the training model, deep learning also needs to carry out the inference calculation process, that is, the single data of MISD needs to be processed in parallel by multiple instructions, and this part of the calculation is the best at FPGA.

According to Steve Glasev, senior vice president of global strategy and marketing at Xilinx, he believes that FPGAs will become mainstream applications in the super data center in the future, especially in terms of deep learning. In this respect, GPUs are strongly trained, and FPGAs are strongly inferred. Once a large-scale deployment of training models is required, efficiency must be greatly improved. This requires a new inference engine to increase efficiency, such as three to four times, while minimizing loss of accuracy, which is the strength of FPGAs. He predicts that at least 95% of machine learning calculations will be used for inference in the future, and less than 5% will be used for model training. This is the difference between the two.

In addition, FPGAs with different architectures have different utilization and computational efficiency in different fields due to different underlying DSPs.

In terms of flexibility, once the GPU is designed, it cannot adjust the hardware resources according to the application, but the FPGA can program the hardware according to the specific application and can optimize the hardware for any new application and algorithm. It is this unique reconfigurable and reprogrammable feature that enables FPGAs to quickly switch to different designs in less than a second, hardware optimization for the next workload, and high flexibility for very large data center applications. The most versatile and energy efficient best calculation solution.

Create a deep learning ecological closed loop to empower artificial intelligence

As a founding member of the Second Generation Distributed Computing Alliance and a member of the OpenPower Organization Alliance, Hengyang Data has deep and close cooperation with Xilinx and IBM in the field of distributed computing. According to Heng Yang data Zhang Jun, since 2015, Heng Yang has been concerned about the 7 super data centers in the world, especially in the field of machine learning and deep learning, FPGA will become one of the mainstream applications. In October this year, Baidu announced the design. The Xilinx UltraScale FPGA pool to accelerate machine learning inference and the Amazon EC2 F1 instance at the AWS conference in November have strengthened our thinking.

For example, the Amazon EC2 F1 instance is the first programmable hardware cloud instance for FPGA application acceleration. Users can choose to integrate up to eight high-performance 16nm Xilinx UltraScale+ VU9P FPGAs to set up the FPGA in the cloud, similar to Vivado HLS. /SDSoC remote server mode, users only need to purchase cloud services, you can get the corresponding development tools, including application development kits, application function libraries, configuration management tools, etc. that integrate deep learning mainstream framework, for development, simulation, debugging, compilation Work, customize FPGA hardware acceleration, significantly reduce development difficulty, reduce development time, and make cloud service users more convenient to accelerate deep learning inference, genetic analysis, financial analysis, video processing, big data, security and other workloads.

At present, Hengyang is very optimistic about similar applications, and is working with several major data centers in China. We are committed to teaming up with Xilinx to deepen cooperation with the ultra-large data center to tap the potential of FPGA in deep learning.

If it is said that the in-depth cooperation with the data center is the first heavy fist for Hengyang to establish a deep learning eco-combination boxing, then enriching its own algorithm library and integrating it with industry applications to form a complete commercial closed loop is the second of Hengyang. Remember the heavy punch. In comparison, the development of FPGA is relatively difficult. It requires developers to be familiar with the underlying hardware programming language such as Verilog. Even if it is developed using OpenCL like C, it is still difficult for software developers. Therefore, it is necessary to integrate rapid development tools, algorithm libraries, algorithm framework integration packages and reference design templates to provide the fastest and most convenient development and deployment path for application deep learning algorithm developers and platform designers. At present, Hengyang's key algorithm library includes CNN convolutional neural network, compression algorithm, video image scaling, etc. In the next step, Hengyang will gradually improve the support of mainstream caffe and tensorflow algorithm frameworks, and further improve the system.

While perfecting the algorithm, the technology and industry are closely integrated, and opening the business closed loop is also the direction of Hengyang's key layout. The characteristics of the FPGA determine its unique advantages in certain industry applications. For example, in the medical field, medical images have more textures than ordinary images, higher resolution, greater correlation, and greater storage space requirements. To ensure the reliability of clinical applications, image compression, segmentation and other image preprocessing, image analysis and image understanding are more demanding. These features can fully leverage the advantages of FPGAs, accelerate image compression through FPGAs, remove redundancy, increase compression ratio, and ensure image diagnostic reliability. Similar medical applications are also accelerated by gene sequencing.

For example, in the financial field, due to the pipeline logic architecture, data stream processing requires low latency and high reliability, which is a key point in the application of financial transaction risk modeling algorithms. FPGAs have such advantages. There are many similar industries and fields, such as video transcoding live, content filtering, video security and so on.

In addition to building the Hengyang deep learning ecosystem in the cloud, for certain terminal applications that are not suitable for the cloud, such as drones, autopilots, and robots, Hengyang has begun to explore gradually. We believe that in the intelligent era, FPGA will Playing an increasingly important role as an important technological means of transforming the global economy, more changing the way we learn, work and play.

E-cigarette Stick Without Nicotine

1. Work with heat but not burn device, like IQOS, LIL, JOUZ...

2. Non tobacco ingredient but provide a real smoking feelings.

3. Strong throat hit, each puff is the ultimate taste feast.

4. TPD, MSDS, and ROHS are available,make your rest assured.

5. Extracted from pure natural tea, blended herbal essence.

6. No tar, no formaldehyde, and no second-hand smoke damage.

E Cigarette Stick Without Nicotine,Smooth Mint Flavor Heat Stick,Cool Mint Flavor Heat Stick,E Cigarette Stick With Smooth Mint

Mainstay (Guangdong) biotechnology Co., Ltd. ,