The acceleration of urbanization has rendered accurate prediction of intra-urban population mobility a fundamental requirement for urban planning and policy formulation. However, the adaptability and performance of existing mobility models on different spatial scales are still poorly understood, and there is a clear lack of a systematic evaluation framework that integrates spatial granularity, travel distance, and population heterogeneity. This study addresses these gaps by proposing a cross-scale comparative framework to evaluate three representative mobility models under varying urban conditions: the gravity model (GM), the radiation model (RM), and the population-weighted opportunities model (PWO). Using high-resolution mobile phone data from Shanghai, we construct three groups of controlled experiments to assess the performance of the model on spatial (grid size), distance, and population density scales. Furthermore, the multivariate analysis of variance (MANOVA) is further used to decompose the relative contributions of different spatial factors to prediction errors.The results indicate that there is distinct scale sensitivity between the models. Based on Newton’s principle of gravity, the GM exhibits high robustness over longer distances (>5 km), but its performance decreases under fine spatial granularity due to spatial heterogeneity. GM accuracy improves with population density but decreases significantly when regional area disparity exceeds a threshold, with prediction performance dropping by over 40% when grid size difference exceeds 3 km. The RM, based on the nearest-best-opportunity assumption, performs well for short-distance, origin-driven flows, such as commuting, but introduces systematic bias on a small scales. Its sensitivity to origin population density renders it more suitable for high-density urban cores. The PWO model enhances RM by combining destination population weights, demonstrating superior compatibility with spatial heterogeneity in dense and polycentric cities. Although it performs best in short distances (<5 km) PWO will fail as the driving distance increases.The MANOVA results demonstrate that GM is primarily influenced by population density and area scale, whereas RM and PWO exhibit greater sensitivity to distance and destination-related factors. On the basis of these findings, we propose a model selection strategy suitable for mobility drivers: GM is recommended for long-distance traffic prediction in spatially homogeneous regions, while PWO is recommended for short distance traffic prediction between densely populated small areas. RM serves as a complementary model when origin-driven flows dominate.This study not only elucidates the physical mechanisms behind the performance of scale-dependent model but also provides an actionable decision-making framework for model selection in different urban mobility scenarios. Future research will further improve predictive accuracy through the following methods: 1) developing hybrid models that integrate strengths of multiple frameworks; 2) incorporating multi-source spatial data (e.g. POIs land use); 3) coupling traditional models with deep learning approaches to enhance non-linear pattern recognition while maintaining interpretability.By revealing the scale sensitivity of mobility models, this work lays theoretical and methodological foundations for multi-scenario mobility prediction in smart city planning and fine-grained urban governance.