Share this post on:

29 7.8 0.12 A5 259 3.9 0.12 A6 246 four.1 0.13 A7 492 two.0 0.13 A8 140 7.1 0.Future World wide web 2021, 13,16 of120 A1 – (13,8)Number of
29 7.8 0.12 A5 259 three.9 0.12 A6 246 4.1 0.13 A7 492 two.0 0.13 A8 140 7.1 0.Future World-wide-web 2021, 13,16 of120 A1 – (13,8)Number of Cores60 A8 – (13,four) 40 A6 – (four,8) A3 – (13,two) 20 A7 – (4,four)A4 – (8,eight);A2 – (13,four)A5 – (8,4)0,two,4,6,0 eight,0 10,0 Frames per Second (FPS)12,14,16,Figure 9. The number of cores versus frames per second of each configuration of the architecture. The graphs indicate the configuration as quantity of lines of cores and quantity of columns of cores).Table 9 presents the Tiny-YOLOv3 network execution occasions on a number of platforms: Intel i7-8700 @ three.2 GHz, GPU RTX 2080ti, and embedded GPU Jetson TX2 and Jetson Nano. The CPU and GPU final results were obtained working with the original Tiny-YOLOv3 network [42] with floating-point representation. The CPU result corresponds to the execution of Tiny-YOLOv3 implemented in C. The GPU result was obtained from the execution of Tiny-YOLOv3 inside the Pytorch environment utilizing CUDA libraries.Table 9. Tiny-YOLOv3 execution occasions on several platforms. Computer software Version Floating-point Floating-point Floating-point Floating-point Fixed-point-16 Fixed-point-8 Platform CPU (Intel i7-8700 @ three.two GHz) GPU (RTX 2080ti) eGPU (Jetson TX2) [43] eGPU (Jetson Nano) [43] ZYNQ7020 ZYNQ7020 CNN (ms) 819.two 7.5 140 68 FPS 1.2 65.0 17 1.two 7.1 14.The Tiny-YOLOv3 on desktop CPUs is as well slow. The inference time on an RTX Polmacoxib manufacturer 2080ti GPU showed a 109 speedup versus the desktop CPU. Working with the proposed accelerator, the inference times have been 140 and 68 ms, within the ZYNQ7020. The low-cost FPGA was 6X (16-bit) and 12X (8-bit) more quickly than the CPU having a tiny drop in accuracy of 1.four and 2.1 points, respectively. When Olesoxime Mitochondrial Metabolism compared with the embedded GPU, the proposed architecture was 15 slower. The advantage of utilizing the FPGA will be the energy consumption. Jetson TX2 features a power close to 15 W, though the proposed accelerator has a energy of around 0.five W. The Nvidia Jetson Nano consumes a maximum of 10 W but is around 12slower than the proposed architecture. 5.3. Comparison with Other FPGA Implementations The proposed implementation was compared with earlier accelerators of TinyYOLOv3. We report the quantization, the operating frequency, the occupation of FPGA resources (DSP, LUTs, and BRAMs), and two functionality metrics (execution time and frames per second). Moreover, we deemed three metrics to quantify how efficientlyFuture Net 2021, 13,17 ofthe hardware resources were becoming applied. Given that distinct solutions ordinarily have a unique number of resources, it is actually fair to consider metrics to somehow normalize the results before comparison. FSP/kLUT, FPS/DSP, and FPS/BRAM establish the number of every resource that’s utilised to generate a frame per second. The larger these values, the greater the utilization efficiency of those sources (see Table ten).Table 10. Performance comparison with other FPGA implementations. [38] Device Dataset Quant. Freq. (MHz) DSPs LUTs BRAMs Exec. (ms) FPS FPS/kLUT FPS/DSP FPS/BRAM ZYNQZU9EG Pedestrian signs eight 9.six 104 16 100 120 26 K 93 532.0 1.9 0.07 0.016 0.020 18 200 2304 49 K 70 [39] ZYNQ7020 [41] [40] Ours ZYNQVirtexVX485T US XCKU040 COCO dataset 16 143 832 139 K 384 24.4 32 0.23 0.038 0.16 100 208 27.five K 120 140 7.1 0.26 0.034 0.8 100 208 33.four K 120 68 14.7 0.44 0.068 0.The implementation in [39] will be the only previous implementation using a Zynq 7020 SoC FPGA. This device has considerably fewer sources than the devices utilised within the other operates. Our architecture implemented within the exact same device was 3.7X and 7.4X faster, depend.

Share this post on:

Author: calcimimeticagent