Im pretty deep into this topic and what might be interesting to an outsider is that the leading models like neuralgcm/weathernext 1 before as well as this model now are all trained with a "crps" objective which I haven't seen at all outside of ml weather prediction.
Essentially you add random noise to the inputs and train by minimizing the regular loss (like l1) and at the same time maximizing the difference between 2 members with different random noise initialisations.
I wonder if this will be applied to more traditional genai at some point.
I find it interesting that they quantify the improvement on speed and number of forecast-ed scenarios but lack details on how it results in improved accuracy of the forecast per:
```
WeatherNext 2 can generate forecasts 8x faster and with resolution up to 1-hour. This breakthrough is enabled by a new model that can provide hundreds of possible scenarios.
```
As an end user, all I care is that there's one accurate forecasted scenario.
This is really important: You're not the end user of this product. These types of models are not built for laypeople to access them. You're an end user of a product that may use and process this data, but the CRPS scorecard, for example, should mean nothing to you. This is specifically addressing an under-dispersion problem in traditional ensemble models, due to a limited number (~50) and limited set of perturbed initial conditions (and the fact that those perturbations do very poorly at capturing true uncertainty).
Again, you, as an end user, don't need to know any of that. The CRPS scorecard is a very specific measure of error. I don't expect them to reveal the technical details of the model, but an industry expert instantly knows what WeatherBench[1] is, the code it runs, the data it uses, and how that CRPS scorecard was generated.
By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts (aka the ones you get on your phone). These are a piece of the puzzle, though, and not one that you will ever actually encounter as a layperson.
> By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts.
Sorry - not sure this is a reasonable take-away. The models here are all still initialized from analysis performed by ECMWF; Google is not running an in-house data assimilation product for this. So there's no feedback mechanism between ensemble spread/uncertainty and the observation itself in this stack. The output of this system could be interrogated using something like Ensemble Sensitivity Analysis, but there's nothing novel about that and we can do that with existing ensemble forecast systems.
For lay-users they could have explained that better. I think they may not have completely uninformed users in mind for this page though.
Developing an ensemble of possible scenarios has been the central insight of weather forecasting since the 1960s when Edward Lorenz discovered that tiny differences in initial conditions can grow exponentially (the "butterfly effect"). Since they could really do it in the 90s, all competitive forecasts are based on these ensemble models.
When you hear "a 70% chance of rain," it more or less means "there was rain in 70 of the 100 scenarios we ran."[0] There is no "single accurate forecast scenario."
[0] Acknowledging this dramatically oversimplifies the models and the location where the rain could occur.
Indeed. The most important benchmark is accuracy and how well it stacks up against existing physics-based models like GFS or ECMWF.
Sure, those big physics-based models are very computationally intensive (national weather bureaus run them on sizeable HPC clusters), but you only need to run them every few hours in a central location and then distribute the outputs online. It's not like every forecaster in a country needs to run a model, they just need online access to the outputs. Even if they could run the models themselves, they would still need the mountains of raw observation data that feeds the models (weather stations, satellite imagery, radars, wind profilers...). And these are usually distributed by... the national weather bureau of that country. So the weather bureau might as well do the number crunching as well and distribute that.
It feels like real weather AI|Forecast|whatever_you_want_to_call_it is still far, far away. Maybe it's just the consumer aspect of weather apps but I don't feel as if I get any more accurate data now than I did back when my parents turned to the daily weather channel for the forecast. Still a lot of clear days when rain was predicted or the even more dreaded torrential downpour when it was supposed to be sunny and clear.
Obviously all I have is anecdata for what I'm mentioning here but from a consumer perspective I don't feel like these model enhancements are really making average folks feel as if weather is any more understood than it was decades ago.
> I don't feel as if I get any more accurate data now than I did back when my parents turned to the daily weather channel for the forecast.
The accuracy improvement is provable. A four-day forecast today is as accurate as a one-day forecast 30 years ago. And this is supremely impressive, because the difficulty of predicting the weather grows exponentially, not linearly, with time.
You are welcome to your feelings - and to be fair, I'm not sure that our understanding of the weather has improved as much as our computational power to extend predictions has.
I've found this to be more related to poor representation of the data than inaccurate data.
For example on Apple's Weather app, a "rainy" day means a high chance of rain at any point during the day. If it's 80% chance of rain at 5am and sunny the rest of the day– that counts as rainy. You can see an hourly report for more info, and generally this is pretty accurate. You have to learn how to find the right data, know your local area, and interpret it yourself.
Then you have to consider what effects this has on your plans and it gets more complicated. Finding a window to walk the dog, choosing a day to go sailing, or determining conditions for backcountry skiing all have different requirements and resources. What I'd like AI to do is know my own interests and highlight what the forecast means for me.
In Norway people are extremely weather-focused, and the national weather service delivers quite advanced graphics for people to understand what is going on.
The live weather radar which shows where it is raining right now and prediction/history for rain +/- 90 minutes. This is accurate enough that you can use it to time your walk from the office to the subway and avoid getting wet:
https://www.yr.no/en/map/radar/1-72837/Norway/Oslo/Oslo/Oslo
Then you have more specialised forecasts of course. Dew point, feels like temperature, UV, pollution, avalanche risks, statistics, sea conditions, tides, ... People tend to geek out quite heavily on these.
Anyone know whether we can use this to simulate hurricanes/floods in particular areas, instead of looking at real existing data and helping model an existing hurricane as it's happening? (which is definitely more important and impactful, but the simulation angle is the one I happen to be curious about at the moment).
Like if I wanted to simulate whether something like Hurricane Melissa would've gone through a handful of southern US states, what would the effect have been, from an insurance or resiliency standpoint.
Is anyone aware of good sources of higher resolution models? Hourly resolution like this model provides doesn’t help much now that energy markets have moved to 15-min and 5-min resolution.
Windy allows you to select your model. For that reason it's my go to for accuracy.
Different models have different strengths, though. Some are shorter range (72h) or longer range (1-3 weeks). Some are higher resolution for where you live (the size of an area which it assigns a forecast to, so your forecast is more local).
Some governments will have their own weather model for your country that is the most accurate for where you live. What I did for a long time was use Windy and use HDRPS (a Canadian short range model with a higher resolution in Canada so I have more accurate forecasts). Now I just use the government of Canada weather app.
I genuinely wonder what the weather Channel, iPhone/Android official weather apps, etc. use under the hood for global models. My gut says ECMWF (a European model with global coverage) mixed with a little magic.
15 years later and still no word from Google if they will use the barometers in Android devices to assimilate surface pressure data. It has been shown that this can improve forecast accuracy. I think IBM may be doing it with their weather apps, but Google/Apple would have dramatically more data available.
Apple even bought Dark Sky, which purported to do this but never released any information - so I doubt they really did do it. And if they did, I doubt Apple continued the practice.
Been waiting a long time to hear Google announce they'll use your barometer to give you a better forecast. Still waiting I guess.
> 15 years later and still no word from Google if they will use the barometers in Android devices to assimilate surface pressure data.
For WeatherNext, the answer is 'no'. The paper (https://arxiv.org/abs/2506.10772) describes in detail what data the model uses, and direct assimilation of user barometric data is not on the list.
This year, the wild variance in hourly weather reports on my phone has really been something. I attributed it to likely budget cuts as a result of DOGE, but if those forecasts came from Google itself the whole time, all is clear now.
How do DOGE implemented budget cuts affect European or East Asian forecasts? Those are not the forecasts that someone suspecting departmental DOGEing to be a fault.
If the US does less data gathering (balloon starts, buoy maintenance, setting up weather huts in super remote sites, etc.) it will affect all forecasts.
Models all use a "current world state" of all sensors available to bootstrap their runs.
Similar thing happened during the beginning of Covid-19: they are using modified cargo/passenger planes to gather weather data during their routine trips. Suddenly this huge data source was gone (but was partially replaced by the experimental ADM-Aeolus satellite - which turned out to be a huge global gamer changer due to its unexpected high quality data)
Im pretty deep into this topic and what might be interesting to an outsider is that the leading models like neuralgcm/weathernext 1 before as well as this model now are all trained with a "crps" objective which I haven't seen at all outside of ml weather prediction.
Essentially you add random noise to the inputs and train by minimizing the regular loss (like l1) and at the same time maximizing the difference between 2 members with different random noise initialisations. I wonder if this will be applied to more traditional genai at some point.
What is the goal of doing that vs using L2 loss?
I find it interesting that they quantify the improvement on speed and number of forecast-ed scenarios but lack details on how it results in improved accuracy of the forecast per:
``` WeatherNext 2 can generate forecasts 8x faster and with resolution up to 1-hour. This breakthrough is enabled by a new model that can provide hundreds of possible scenarios. ```
As an end user, all I care is that there's one accurate forecasted scenario.
This is really important: You're not the end user of this product. These types of models are not built for laypeople to access them. You're an end user of a product that may use and process this data, but the CRPS scorecard, for example, should mean nothing to you. This is specifically addressing an under-dispersion problem in traditional ensemble models, due to a limited number (~50) and limited set of perturbed initial conditions (and the fact that those perturbations do very poorly at capturing true uncertainty).
Again, you, as an end user, don't need to know any of that. The CRPS scorecard is a very specific measure of error. I don't expect them to reveal the technical details of the model, but an industry expert instantly knows what WeatherBench[1] is, the code it runs, the data it uses, and how that CRPS scorecard was generated.
By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts (aka the ones you get on your phone). These are a piece of the puzzle, though, and not one that you will ever actually encounter as a layperson.
[1]: https://sites.research.google/gr/weatherbench/
> By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts.
Sorry - not sure this is a reasonable take-away. The models here are all still initialized from analysis performed by ECMWF; Google is not running an in-house data assimilation product for this. So there's no feedback mechanism between ensemble spread/uncertainty and the observation itself in this stack. The output of this system could be interrogated using something like Ensemble Sensitivity Analysis, but there's nothing novel about that and we can do that with existing ensemble forecast systems.
For lay-users they could have explained that better. I think they may not have completely uninformed users in mind for this page though.
Developing an ensemble of possible scenarios has been the central insight of weather forecasting since the 1960s when Edward Lorenz discovered that tiny differences in initial conditions can grow exponentially (the "butterfly effect"). Since they could really do it in the 90s, all competitive forecasts are based on these ensemble models.
When you hear "a 70% chance of rain," it more or less means "there was rain in 70 of the 100 scenarios we ran."[0] There is no "single accurate forecast scenario."
[0] Acknowledging this dramatically oversimplifies the models and the location where the rain could occur.
As a end user I also want to see the variance to get a feeling of the uncertainty.
Quite a lot of weather sites offer this data in an easily eatable visual format.
Indeed. The most important benchmark is accuracy and how well it stacks up against existing physics-based models like GFS or ECMWF.
Sure, those big physics-based models are very computationally intensive (national weather bureaus run them on sizeable HPC clusters), but you only need to run them every few hours in a central location and then distribute the outputs online. It's not like every forecaster in a country needs to run a model, they just need online access to the outputs. Even if they could run the models themselves, they would still need the mountains of raw observation data that feeds the models (weather stations, satellite imagery, radars, wind profilers...). And these are usually distributed by... the national weather bureau of that country. So the weather bureau might as well do the number crunching as well and distribute that.
They integrated "MetNet-3" into Google products and my personal perception was accuracy decreased.
Is this the same model as provided the most accurate hurricane predictions this season?
https://arstechnica.com/science/2025/11/googles-new-weather-...
It feels like real weather AI|Forecast|whatever_you_want_to_call_it is still far, far away. Maybe it's just the consumer aspect of weather apps but I don't feel as if I get any more accurate data now than I did back when my parents turned to the daily weather channel for the forecast. Still a lot of clear days when rain was predicted or the even more dreaded torrential downpour when it was supposed to be sunny and clear.
Obviously all I have is anecdata for what I'm mentioning here but from a consumer perspective I don't feel like these model enhancements are really making average folks feel as if weather is any more understood than it was decades ago.
No need for anecdata! We have the data: https://ourworldindata.org/weather-forecasts
tdlr: Weather forecasts have improved a lot
That's actually really helpful to understand better, thank you!
I remember when it was a trope that the weatherman was always wrong and that the weather was the prototypal thing that was inherently “unpredictable”.
> I don't feel as if I get any more accurate data now than I did back when my parents turned to the daily weather channel for the forecast.
The accuracy improvement is provable. A four-day forecast today is as accurate as a one-day forecast 30 years ago. And this is supremely impressive, because the difficulty of predicting the weather grows exponentially, not linearly, with time.
You are welcome to your feelings - and to be fair, I'm not sure that our understanding of the weather has improved as much as our computational power to extend predictions has.
The thing is that regular weather forecasts are also not that great.
I've found this to be more related to poor representation of the data than inaccurate data.
For example on Apple's Weather app, a "rainy" day means a high chance of rain at any point during the day. If it's 80% chance of rain at 5am and sunny the rest of the day– that counts as rainy. You can see an hourly report for more info, and generally this is pretty accurate. You have to learn how to find the right data, know your local area, and interpret it yourself.
Then you have to consider what effects this has on your plans and it gets more complicated. Finding a window to walk the dog, choosing a day to go sailing, or determining conditions for backcountry skiing all have different requirements and resources. What I'd like AI to do is know my own interests and highlight what the forecast means for me.
In Norway people are extremely weather-focused, and the national weather service delivers quite advanced graphics for people to understand what is going on.
The standard graph that most people look at to get an idea about today and tomorrow: https://www.yr.no/en/forecast/graph/1-72837/Norway/Oslo/Oslo...
The live weather radar which shows where it is raining right now and prediction/history for rain +/- 90 minutes. This is accurate enough that you can use it to time your walk from the office to the subway and avoid getting wet: https://www.yr.no/en/map/radar/1-72837/Norway/Oslo/Oslo/Oslo
Then you have more specialised forecasts of course. Dew point, feels like temperature, UV, pollution, avalanche risks, statistics, sea conditions, tides, ... People tend to geek out quite heavily on these.
Anyone know whether we can use this to simulate hurricanes/floods in particular areas, instead of looking at real existing data and helping model an existing hurricane as it's happening? (which is definitely more important and impactful, but the simulation angle is the one I happen to be curious about at the moment).
Like if I wanted to simulate whether something like Hurricane Melissa would've gone through a handful of southern US states, what would the effect have been, from an insurance or resiliency standpoint.
That's not really what a weather model "does."
Is anyone aware of good sources of higher resolution models? Hourly resolution like this model provides doesn’t help much now that energy markets have moved to 15-min and 5-min resolution.
Windy allows you to select your model. For that reason it's my go to for accuracy.
Different models have different strengths, though. Some are shorter range (72h) or longer range (1-3 weeks). Some are higher resolution for where you live (the size of an area which it assigns a forecast to, so your forecast is more local).
Some governments will have their own weather model for your country that is the most accurate for where you live. What I did for a long time was use Windy and use HDRPS (a Canadian short range model with a higher resolution in Canada so I have more accurate forecasts). Now I just use the government of Canada weather app.
I genuinely wonder what the weather Channel, iPhone/Android official weather apps, etc. use under the hood for global models. My gut says ECMWF (a European model with global coverage) mixed with a little magic.
How does one use weather data in an energy market, if you don't mind my asking?
15 years later and still no word from Google if they will use the barometers in Android devices to assimilate surface pressure data. It has been shown that this can improve forecast accuracy. I think IBM may be doing it with their weather apps, but Google/Apple would have dramatically more data available.
Apple even bought Dark Sky, which purported to do this but never released any information - so I doubt they really did do it. And if they did, I doubt Apple continued the practice.
Been waiting a long time to hear Google announce they'll use your barometer to give you a better forecast. Still waiting I guess.
> 15 years later and still no word from Google if they will use the barometers in Android devices to assimilate surface pressure data.
For WeatherNext, the answer is 'no'. The paper (https://arxiv.org/abs/2506.10772) describes in detail what data the model uses, and direct assimilation of user barometric data is not on the list.
Pricing, I think?
https://developers.google.com/maps/billing-and-pricing/prici...
This year, the wild variance in hourly weather reports on my phone has really been something. I attributed it to likely budget cuts as a result of DOGE, but if those forecasts came from Google itself the whole time, all is clear now.
I find that unlikely, my forecasts for much of Europe and East Asia have been consistently accurate.
How do DOGE implemented budget cuts affect European or East Asian forecasts? Those are not the forecasts that someone suspecting departmental DOGEing to be a fault.
If the US does less data gathering (balloon starts, buoy maintenance, setting up weather huts in super remote sites, etc.) it will affect all forecasts.
Models all use a "current world state" of all sensors available to bootstrap their runs.
Similar thing happened during the beginning of Covid-19: they are using modified cargo/passenger planes to gather weather data during their routine trips. Suddenly this huge data source was gone (but was partially replaced by the experimental ADM-Aeolus satellite - which turned out to be a huge global gamer changer due to its unexpected high quality data)