I am working though my thinking on the proposed Numerai changes coming September and how that affects profitability and staking. My tl;dr, especially for beginners that plan to take part in the tournament, is to join for the fun of it and not for profit. I am skeptical about most data scientists' ability to earn anything through participation after the changes go into effect.
Background reading
Here are some optional readings on the anticipated changes ending with the elimination of the leaderboard bonus in September.
https://forum.numer.ai/t/mmc-payout-details-and-analysis/220
https://forum.numer.ai/t/code-to-calculate-mmc-vs-regular-payouts/238/6
https://docs.numer.ai/tournament/learn#metamodel-contribution
Understand the current system
Under the current leaderboard system, participating in the Numerai tournament for the data scientists can be quite profitable -- the longer the participation, the more profitable. Why? Because as long as you rank in the top 300 most of the time, the bonus allows your stake to grow exponentially. I write about this exponential growth via the leaderboard bonus and implication for Numerai in an earlier post. The caveat is that this is only true in the longer run, since the doubling rate is nowhere near what we see with say the Covid19 situation. The current system went into effect around November of 2019, and fully kicked in around April 2020. By phasing this out in September, Numerai is not giving the system much time for any meaningful growth saves for those that took risk and staked large amounts of NMRs back in November.
To be more precise, the exponential growth via the bonus requires two things. First. Numerai doesn't cap the payout in a way that allows high staking participants to take the whole bonus. When the bonus was first introduced, Numerai said in the chat forum that in reality there is no cap because they will raise it as needed, and that the cap is for safety in case something unforeseen happened. They have walked back from that since. Moreover, rather than keeping a leaderboard bonus system with a cap, they have opted to just eliminate it.
Second. The staked model needs to rank high enough for the bonus. This isn't that hard, actually. You can "cheat" a little by just basing your analysis on the Numerai example model, which ranks top 100 consistently. Now, the new MMC ranking metric will penalize you for using the example model, but there are only so many models to forecast data. Much of the variations comes from hyperparameter calibration and data selection anyway. If you just use the example model and change the hyperparameters, and say put weights on the eras you feed into your model, you will have a different enough model that will likely do quite well. The example model code is included with each week's data file. If you are interested in coding in python and running your model in Colab, you can start by using my version here. (I'm in the middle of updating the code to work with the larger dataset, you can expect a working version by the end of May 2020.) My code handles everything from downloading and unzipping the data to running the example Numerai code. I also commented in the code on how you can save your model instead of calibrating everything from scratch each time. My code does not upload automatically though, since that requires hard-coding your public and secret key, something I don't recommend. But if you want to do you, I do comment on how you would go about automating that part too.
If you like neural networks and have access to Matlab, you can use the code I posted to Matlab Central for the forecast. I use this code for my own modeling and forecasting for the Numerai tournament. The code is generic, and you would need to figure out the hyperparameters and data selections.
So that is how the leaderboard bonus system works and why it makes the tournament profitable for participants. The bonus was needed, in part, because Numerai at the time wanted to increase participation and the rank correlation ranking and the daily payout associated doesn't really reward good forecasts.
The glaring issue is volatility of the rank correlation metric and the payout being daily. In the short run, with luck, you can earn from the daily payout. But you are gambling against periods of booms and busts. Moreover, in the long run, given that we are rewarded by the delta of the rank correlation, your expected profit in the long run is actually zero.
Also, a lot of the top forecasters at any given time achieve the top rank via overfitting. This is partly motivated by possible profits using the p1p exploit to gain high risked adjusted return. In the current system, this exploit is profitable, and an imperfect implementation of this would be hard to catch while being still somewhat profitable. With the Numerai staff applying subjective penalty rather than designing a system that eliminates the incentive, I anticipate this cat and mouse game to continue.
What that means is that with a properly fitted model, you should on average rank high enough to get the leaderboard bonus.
MMC and leaderboard bonus elimination
The MMC metrics and related daily payout is similar to the current systems. From the documentation on how it's calculated:- select a random 67% of all staking users (with replacement)
- assume U staked the mean of these users
- calculate the stake weighted predictions of these users
- score those predictions against the round's real targets using rank correlation. This gives us score S
- remove U from the 67% of users
- recalculate the stake weighted predictions without U included
- score those new predictions against the real targets. This gives us S'
- U's MMC = S - S'
- repeat this whole process 20 times and keep the average MMC score
- multiplied by the total number of stakers for the round for your final MMC score
Read the above again. MMC addresses the volatility issue of the current rank correlation metrics. However, nothing in the new ranking metric changes the expected long run profitability from being 0.
Conclusion
So come September, with the leaderboard bonus eliminated, here's no expected long run profit to be had. The official Numerai write up for MMC points out a few models that are very profitable: integration_test and dataman_ai, but what about everyone else's models? I am guessing that this is not the case for say the top 300 models on average, though I have not crunch the number to confirm this. I welcome anyone to prove my wrong and crunch the average payout for the top 300 participant right now under MMC and no leaderboard bonus.