Coherent-dispersion spectrometer (CODES) is an exoplanet detection instrument based on the radial velocity (RV) method. It detects changes in RV by measuring the Doppler phase shift of the interference spectrum of stellar absorption line. However, the background white light in the stellar absorption spectrum disturbs the phase analysis of CODES, which leads to phase error and seriously affects the accuracy of RV inversion. The larger the cosine amplitude of the background white light, the greater the error is. In order to effectively remove background white light and correct Doppler phase shift, a background white light prediction network (BWP-Net) is proposed based on the U-Net architecture by utilizing the principle and data characteristics of CODES in this study. To accelerate the convergence of the BWP-Net model, the interference spectrum of absorption line from CODES and the ideal interference spectrum of background white light are used as inputs and labels for the model after image normalization, while the model output becomes the predicted interference spectrum of background white light after inverse normalization. The BWP-Net consists of symmetric 6-layer encoding path and decoding path. First, in the encoding path, different levels of features are extracted step by step from the interference spectrum of stellar absorption line through combination of multi-channel convolution and depthwise separable convolution, extracting features effectively while reducing computational costs reasonably. In each convolution layer, spatial downsampling is performed through convolution with a stride of 2 and the number of feature channels is increased until the fourth layer, thus various features, from simple to abstract, local to global, are extracted for the preparation of image reconstruction in the decoding path. Second, in the decoding path, the image details are gradually reconstructed from the features extracted through several layers of attention transposed-convolution. In each layer of attention transposed-convolution, spatial upsampling is performed based on the fusion of shallow features and deep features through matrix addition and the number of feature channels are reduced, at the same time attention of different levels is paid to the features through a learnable weight matrix, so as to suppress the absorption line information gradually during image reconstruction. At the last layer of the decoding path, the sigmoid activation function is used to control the model output in the 0-1 interval, making it easier to denormalize. Finally, a region weighted loss function that combines mean-square error and multi-scale structural similarity is used for training so as to consider pixel level differences and structural similarity between the model output and the labels, while enhancing the suppression of absorption lines in the central region of the interference spectrum through region weighting. And the output of BWP-Net is the prediction of the interference spectrum of background white light, which is subtracted from the interference spectrum of stellar absorption lines for phase analysis. The experimental results show that under different absorption lines, different fixed optical path differences, and different RVs, after removing background white light from the output of BWP-Net, the RV inversion error is less than 1 m/s, mainly concentrated in the region of 0–0.4 m/s, with an average error of 0.2353 m/s and a root mean square error of 0.3769 m/s. And the distribution of RV inversion error is relatively uniform under different parameter conditions, the median error is less than 0.25 m/s at different absorption line wavelengths, and less than 0.2 m/s at different fixed optical path differences. Thes indicate that BWP-Net not only predicts background white light accurately, but also has good stability and robustness, providing strong support for high-precision and stable RV inversion for CODES.