Ambarella makes real time low-power computer vision possible to the edge by deploying optimized neural networks onto the unique CVFlow architecture. Quantization and pruning are two important optimization methods that could be used to max-out the performance of Ambarella hardware. However, the performance benefit might come at a cost of neural network accuracy and consistency. Our team works on providing optimal quantization strategies during and after training to minimize loss of accuracy and consistency, and resolving performance and accuracy issues when a neural network is deployed on Ambarella hardware.
Look into an unfamiliar neural network model being ported to our chip and look for potential accuracy issues.
Be innovative to implement neural network layers or modules in accurate and hardware-efficient ways.
Study approximations of real-valued functions, including mixtures of transcendental functions.
Accelerate tools running on PC side with extended x86 instruction sets, multi-threading and GPU.
Work with TensorFlow and PyTorch to extract models from these frameworks and create custom operators for these frameworks.
3-5 years of experience is preferred
Sensitive to math: transformations of transcendental functions, derivatives, chain rule.
Self-driven in looking for problems buried deep under large a quantity of data.
Good C++, Python and CUDA programming skill.