Senior Engineer
Develop and optimize GPU compute kernels using CUDA, Tensor Cores, and NVIDIA libraries Implement and tune high-performance matrix operations using cuBLAScuBLASLt, CUTLASS, and cuSPARSE Design batch grouped execution flows for large-scale GEMM SpMM workloads Optimize memory layouts, data movement, and kernel launch efficiency Conduct profiling and performance tuning using Nsight Compute Nsight Systems Build clean and stable CC APIs and integrate with Python (PyBind11 PyTorch extensions) Ensure numerical correctness, mixed-precision stability, and robust error handling
Essential Skills:
We are seeking a highly skilled Senior CCUDA Engineer with strong experience in GPU-accelerated compu-tation and deep learning inference. The role involves developing high-performance GPU kernels, optimizing large-scale matrix operations, and integrating with Python-based orchestration layers.Responsibilities De-velop and optimize GPU compute kernels using CUDA, Tensor Cores, and NVIDIA libraries Implement and tune high-performance matrix operations using cuBLAScuBLASLt, CUTLASS, and cuSPARSE Design batch grouped execution flows for large-scale GEMM SpMM workloads Optimize memory layouts, data move-ment, and kernel launch efficiency Conduct profiling and performance tuning using Nsight Compute Nsight Systems Build clean and stable CC APIs and integrate with Python (PyBind11 PyTorch extensions) Ensure numerical correctness, mixed-precision stability, and robust error handling .
Responsibilities
Salary : As per industry standard.
Industry :IT-Software / Software Services
Functional Area : IT Software - Application Programming , Maintenance