https://andrewpwheeler.com/2023/11/19/forecasts-need-to-have-error-bars/ * About * C.V. * Comment Policy * Consulting * Courses + Advanced Criminology (Undergrad) Crim 3302 + Communities and Crime (Undergrad) Crim 4323 + Crim 7301 - UT Dallas - Seminar in Criminology Research and Analysis + Crime Science (Graduate) Crim 7381 + GIS in Criminology/Criminal Justice (Graduate) + Crime Analysis (Special Topics) - Undergrad * Code Snippets TOC [andrewpwhe] Andrew Wheeler Crime Analysis and Crime Mapping Forecasts need to have error bars Richard Rosenfeld in the most recent Criminologist published a piece about forecasting national level crime rates. People complain about the FBI releasing crime stats a year late, academics are worse; Richard provided "forecasts" for 2021 through 2025 for an article published in late 2023. Even ignoring the stalecasts that Richard provided - these forecasts had/have no chance of being correct. Point forecasts will always be wrong - a more reasonable approach is to provide the prediction intervals for the forecasts. Showing error intervals around the forecasts will show how Richard interpreting minor trends is likely to be misleading. Here I provide some analysis using ARIMA models (in python), to illustrate what reasonable forecast error looks like in this scenario, code and data on github. You can get the dataset on github, but just some upfront with loading the libraries I need and getting the data in the right format: import pandas as pd from statsmodels.tsa.arima.model import ARIMA import matplotlib.pyplot as plt # via https://www.disastercenter.com/crime/uscrime.htm ucr = pd.read_csv('UCR_1960_2019.csv') ucr['VRate'] = (ucr['Violent']/ucr['Population'])*100000 ucr['PRate'] = (ucr['Property']/ucr['Population'])*100000 ucr = ucr[['Year','VRate','PRate']] # adding in more recent years via https://cde.ucr.cjis.gov/LATEST/webapp/#/pages/docApi # I should use original from counts/pop, I don't know where to find those though y = [2020,2021,2022] v = [398.5,387,380.7] p = [1958.2,1832.3,1954.4] ucr_new = pd.DataFrame(zip(y,v,p),columns = list(ucr)) ucr = pd.concat([ucr,ucr_new],axis=0) ucr.index = pd.period_range(start='1960',end='2022',freq='A') # Richard fits the model for 1960 through 2015 train = ucr.loc[ucr['Year'] <= 2015,'VRate'] Now we are ready to fit our models. To make it as close to apples-to-apples as Richard's paper, I just fit an ARIMA(1,1,2) model - I do not do a grid search for the best fitting model (also Richard states he has exogenous factors for inflation in the model, which I do not here). Note Richard says he fits an ARIMA(1,0,2) for the violent crime rates in the paper, but he also says he differenced the data, which is an ARIMA(1,1,2) model: # Not sure if Richard's model had a trend term, here no trend violent = ARIMA(train,order=(1,1,2),trend='n').fit() violent.summary() This produces the output: SARIMAX Results ============================================================================== Dep. Variable: VRate No. Observations: 56 Model: ARIMA(1, 1, 2) Log Likelihood -242.947 Date: Sun, 19 Nov 2023 AIC 493.893 Time: 19:33:53 BIC 501.923 Sample: 12-31-1960 HQIC 496.998 - 12-31-2015 Covariance Type: opg ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ ar.L1 -0.4545 0.169 -2.688 0.007 -0.786 -0.123 ma.L1 1.1969 0.131 9.132 0.000 0.940 1.454 ma.L2 0.7136 0.100 7.162 0.000 0.518 0.909 sigma2 392.5640 104.764 3.747 0.000 187.230 597.898 =================================================================================== Ljung-Box (L1) (Q): 0.13 Jarque-Bera (JB): 0.82 Prob(Q): 0.72 Prob(JB): 0.67 Heteroskedasticity (H): 0.56 Skew: -0.06 Prob(H) (two-sided): 0.23 Kurtosis: 2.42 =================================================================================== So some potential evidence of over-differencing (with the negative AR (1) coefficient). Looking at violent.test_serial_correlation ('ljungbox') there is no significant serial auto-correlation in the residuals. One could use some sort of auto-arima approach to pick a "better" model (it clearly needs to be differenced at least once, also maybe should also be modeling the logged rate). But there is not much to squeeze out of this - pretty much all of the ARIMA models will produce very similar forecasts (and error intervals). So in the statsmodels package, you can append new data and do one step ahead forecasts, so this is comparable to Richard's out of sample one step ahead forecasts in the paper for 2016 through 2020: # To make it apples to apples, only appending through 2020 av = (ucr['Year'] > 2015) & (ucr['Year'] <= 2020) violent = violent.append(ucr.loc[av,'VRate'], refit=False) # Now can show insample predictions and forecasts forecast = violent.get_prediction('2016','2025').summary_frame(alpha=0.05) If you print(forecast) below are the results. One of the things I want to note is that if you do one-step-ahead forecasts, here the years 2016 through 2020, the standad error is under 20 (this is well within Richard's guesstimate to be useful it needs to be under 10% absolute error). When you start forecasting multiple years ahead though, the error compounds over time. So to forecast 2022, you need a forecast of 2021. To forecast 2023, you need to forecast 21,22 and then 23, etc. VRate mean mean_se mean_ci_lower mean_ci_upper 2016 397.743461 19.813228 358.910247 436.576675 2017 402.850827 19.813228 364.017613 441.684041 2018 386.346157 19.813228 347.512943 425.179371 2019 379.315712 19.813228 340.482498 418.148926 2020 379.210158 19.813228 340.376944 418.043372 2021 412.990860 19.813228 374.157646 451.824074 2022 420.169314 39.803285 342.156309 498.182318 2023 416.906654 57.846105 303.530373 530.282936 2024 418.389557 69.535174 282.103120 554.675994 2025 417.715567 80.282625 260.364513 575.066620 The standard error scales pretty much like sqrt(steps*se^2) (it is additive in the variance). Richard's forecasts do better than mine for some of the point estimates, but they are similar overall: # Richard's estimates forecast['Rosenfeld'] = [399.0,406.8,388.0,377.0,394.9] + [404.1,409.3,410.2,411.0,412.4] forecast['Observed'] = ucr['VRate'] forecast['MAPE_Andy'] = 100*(forecast['mean'] - forecast['Observed'])/forecast['Observed'] forecast['MAPE_Rick'] = 100*(forecast['Rosenfeld'] - forecast['Observed'])/forecast['Observed'] And this now shows for each of the models: VRate mean mean_ci_lower mean_ci_upper Rosenfeld Observed MAPE_Andy MAPE_Rick 2016 397.743461 358.910247 436.576675 399.0 397.520843 0.056002 0.372095 2017 402.850827 364.017613 441.684041 406.8 394.859716 2.023785 3.023931 2018 386.346157 347.512943 425.179371 388.0 383.362999 0.778155 1.209559 2019 379.315712 340.482498 418.148926 377.0 379.421097 -0.027775 -0.638103 2020 379.210158 340.376944 418.043372 394.9 398.500000 -4.840613 -0.903388 2021 412.990860 374.157646 451.824074 404.1 387.000000 6.715985 4.418605 2022 420.169314 342.156309 498.182318 409.3 380.700000 10.367563 7.512477 2023 416.906654 303.530373 530.282936 410.2 NaN NaN NaN 2024 418.389557 282.103120 554.675994 411.0 NaN NaN NaN 2025 417.715567 260.364513 575.066620 412.4 NaN NaN NaN So MAPE in the held out sample does worse than Rick's models for the point estimates, but look at my prediction intervals - the observed values are still totally consistent with the model I have estimated here. Since this is a blog and I don't need to wait for peer review, I can however update my forecasts given more recent data. # Given updated data until end of series, lets do 23/24/25 violent = violent.append(ucr.loc[ucr['Year'] > 2020,'VRate'], refit=False) updated_forecast = violent.get_forecast(3).summary_frame(alpha=0.05) And here are my predictions: VRate mean mean_se mean_ci_lower mean_ci_upper 2023 371.977798 19.813228 333.144584 410.811012 2024 380.092102 39.803285 302.079097 458.105106 2025 376.404091 57.846105 263.027810 489.780373 You really need to graph these out to get a sense of the magnitude of the errors: [ForecastVi] Note how Richard's 2021 and 2022 forecasts and general increasing trend have already been proven to be wrong. But it really doesn't matter - any reasonable model that admitted uncertainty would never let one reasonably interpret minor trends over time in the way Richard did in the criminologist article to begin with (forecasts for ARIMA models are essentially mean-reverting, they will just trend to a mean term in a short number of steps). Richard including exogenous factors actually makes this worse - as you need to forecast inflation and take that forecast error into account for any multiple year out forecast. Richard has consistently in his career overfit models and subsequently interpreted the tea leaves in various macro level correlations (Rosenfeld, 2018). His current theory of inflation and crime is no different. I agree that forecasting is the way to validate criminological theories - picking up a new pet theory every time you are proven wrong though I don't believe will result in any substantive progress in criminology. Most of the short term trends criminologists interpret are simply due to normal volatility in the models over time (Yim et al., 2020). David McDowall has a recent article that is much more measured about our cumulative knowledge of macro level crime rate trends - and how they can be potentially related to different criminological theories (McDowall, 2023). Matt Ashby has a paper that compares typical errors for city level forecasts - forecasting several years out tends to product quite inaccurate estimates, quite a bit larger than Richard's 10% is useful threshold (Ashby, 2023). Final point that I want to make is that honestly it doesn't even matter. Richard can continue to keep making dramatic errors in macro level forecasts - it doesn't matter if he publishes estimates that are two+ years old and already wrong before they go into print. Because unlike what Richard says - national, macro level violent crime forecasts do not help policy response - why would Pittsburgh care about the national level crime forecast? They should not. It does not matter if we fit models that are more accurate than 5% (or 1%, or whatever), they are not helpful to folks on the hill. No one is sitting in the COPS office and is like "hmm, two years from now violent crime rates are going up by 10, lets fund 1342 more officers to help with that". Richard can't have skin the game for his perpetual wrong macro level crime forecasts - there is no skin to have. I am a nerd so I like looking at numbers and fitting models (or here it is more like that XKCD comic of yelling at people on the internet). I don't need to make up fairy tale hypothetical "policy" applications for the forecasts though. If you want a real application of crime forecasts, I have estimated for cities that adding an additional home or apartment unit increases the number of calls per service by about 1 per year. So for growing cities that are increasing in size, that is the way I suggest to make longer term allocation plans to increase police staffing to increase demand. References * Ashby, M. (2023). Forecasting crime trends to support police strategic decision making. CrimRxiv. * McDowall, D. (2023). Empirical Properties of Crime Rate Trends. Journal of Contemporary Criminal Justice, 10439862231189979. * Rosenfeld, R. (2018). Studying crime trends: Normal science and exogenous shocks. Criminology, 56(1), 5-26. * Yim, H. N., Riddell, J. R., & Wheeler, A. P. (2020). Is the recent increase in national homicide abnormal? Testing the application of fan charts in monitoring national homicide trends over time. Journal of Criminal Justice, 66, 101656. Share this: * Twitter * Facebook * LinkedIn * Pinterest * Like this: Like Loading... Related Leave a comment by apwheele on November 19, 2023 * Permalink Posted in Crime Analysis, Criminal Justice, data science, Python Tagged arima, forecast Posted by apwheele on November 19, 2023 https://andrewpwheeler.com/2023/11/19/ forecasts-need-to-have-error-bars/ Previous Post The sausage making behind peer review Next Post Sentence embeddings and caching huggingface models in github actions Leave a comment Leave a Reply Cancel reply [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] * Search for: [ ] [Search] * Recent Posts + Sentence embeddings and caching huggingface models in github actions + Forecasts need to have error bars + The sausage making behind peer review + Fitting a gamma distribution (with standard errors) in python + AMA: Advice on clustering * Categories Categories[Select Category ] * Site RSS Feeds + RSS - Posts + RSS - Comments * Follow Blog via Email Enter your email address to follow this blog and receive notifications of new posts by email. Email Address: [ ] Follow Join 352 other subscribers * aoristic cartography census choropleth citeulike color cost-benefit courses crime-mapping crime-trends Crime Analysis Criminal Justice data-manipulation data visualization deep-learning excel flow-data geocoding ggplot2 github google-streetview-api grammar of graphics group-based-trajectory gun-violence healthcare homicide-rates hot spots hypothesis-testing kernel-density linear programming logistic-regression machine-learning MACRO mapping matplotlib meta multi-level negative-binomial network NetworkX officer-involved-shooting open-science paper Papers peer-review Poisson prediction Predictive-Policing presentation Python Python-programability pytorch quasi-experiment r recidivism regression resources scatterplot scholarly seaborn shootings simulation slopegraph small-multiples social-networking SPSS stackexchange Stata statistics survey time-series uncertainty wdd web-scraping writing * Top Posts & Pages + Forecasts need to have error bars + About + Testing the equality of two regression coefficients * Stack Exchange profile for Andy W on Stack Exchange, a network of free, community-driven Q&A sites Blog at WordPress.com. * Comment * Follow Following + [apwfac] Andrew Wheeler Join 352 other followers [ ] Sign me up + Already have a WordPress.com account? Log in now. * + [apwfac] Andrew Wheeler + Customize + Follow Following + Sign up + Log in + Copy shortlink + Report this content + View post in Reader + Manage subscriptions + Collapse this bar %d [b]