With the advancements in hardware-optimized deployment of Spiking Neural Networks (SNNs), SNN processors based on Field-Programmable Gate Arrays (FPGAs) have become a research hotspot due to their efficiency and flexibility. However, existing methods rely on multi-timestep training and reconfigurable computing architectures, which increase computational and memory overhead, reducing deployment efficiency. This work presents a high-efficiency, lightweight residual SNN accelerator that couples algorithmic and hardware co-design to optimize inference energy efficiency. On the algorithm side, we employ single-timesteps training, integrate grouped convolutions, and fuse Batch Normalization (BN) layers, compressing the network to only 0.69 M parameters. Quantization-aware training (QAT) further constrains all weights and activations to 8-bit precision. On the hardware side, intra-layer resource reuse maximizes FPGA utilization, a fully pipelined cross-layer architecture boosts throughput, and on-chip Block RAM (BRAM) stores both network parameters and intermediate results to improve memory efficiency. Experimental results demonstrate that the proposed processor achieves an 87.11% classification accuracy on the CIFAR-10 dataset, with an inference time of 3.98 ms per image and an energy efficiency of 183.5 FPS/W. Compared to mainstream Graphics Processing Unit (GPU) platforms, it achieves over twice the energy efficiency. Furthermore, compared to other SNN processors, it achieves at least a 4×improvement in inference speed and a 5×improvement in energy efficiency.