I'm testing 20 pin submit offers now at the same time. Same Geo. Never tested that many before.
Usually I am looking for a 95% statistically significant ROI to get the winner. But now it feels like it's better to cut aggressively the worst ones from the group. But my concern is that in this case the result wouldn't be a "safe" winner. So even if I have 5 offers with 3 to 5 conversions each on a 200 visitors each, would it be safe to cut the 3 lowest performing ones out of the 20 with also 200 visitors each and no conversions? Even tho the result would not be statistically significant?
How do you guys do it?
Are you using a Bayesian or Frequentist calculator to get your statistical significance here? In a split-test like this I'd strongly recommend a Bayesian calculator like the one mentioned here.
Cutting based on rules of thumb of any sort, will not be as accurate as cutting by statistical significance, period.
The question to ask here is: How important is it for your decision to be accurate?
If you were cutting placements for a RON pop camp, where you're getting traffic from hundreds or thousands of placements, then I would say that using rules of thumb may be the better approach, as the sacrifice in accuracy may be justified by the gain in efficiency (in being able to cut faster and therefore cheaper). Plus, cutting out a few placements that would have ended up being profitable will not make-or-break your campaign.
However, for cutting offers and landers - ESPECIALLY offers - it is crucial for your decision to be accurate - because the offer is something that CAN make-or-break your campaign.
BTW kudos for being willing to test 20 offers! When you're testing so many offers, chances are you'll see the best offers rise to the top early on, which will enable you to cut a lot of the inferior offers sooner than you may expect.
Generally speaking, the more candidates you include in a split-test, the wider the range of performance - and therefore the faster you'll be able to cut the losers (compared to if you were to test the candidates in several batches).
Another suggestion: Consider setting up camps to either 1)target different OSs (e.g. one for Android and one for IOS, and maybe one for Windows Phones), and set up tracker rules for different carriers, or 2)target different carriers (e.g. one for wifi and another one for carrier/3g/4g or even one camp per each major carrier), and set up tracker rules for different OSs. This will allow you to test more granularly to find out which offer will convert the best for each OS/carrier. Also, when you're testing so many offers, it's inevitable that each offer will accept different OSs and carriers, and by trying to run all offers in a single camp, it would be hell trying to set up tracker rules correctly.
Another check: You DO have a proven lander at this point I hope? Without one, your offer test will cost you.
Have fun and best of luck!
Amy
Thanks both for your responses. Amy, your win vector calc I found you recommended on another thread is really cool stuff. So far had I had a $6 and a $9 offer I'd enter 1.5 conversions of the first one for every one conversion of the latter. No more!
About creatives, would you cut aggressively or wait for s-significant results?
LOL yeah that win-vector calculator saved the day! Big thanks to the member who discovered it (edgekaos). Before that I was never able to cut offers very accurately.
When you say creatives do you mean landers or banners? For landers, DEFINITELY cut as they reach stat sig.
For banners - that would depend.
If you're still in the initial testing stage where you're testing different angles to try to find the best one, then you'd probably want to look at trends. e.g. If you have several banners made for each angle, and judge how good each angle is based on performance across all banners for that angle.
If your camp is already in profit, and you're just testing more banners and wanting to only keep the ones that are likely to remain profitable, then you'd need this:
http://stmforum.com/forum/showthread...Banners-Part-2
And if you need to test several banners to find out which one is the best, you can use the peakconversion calculator described here:
http://stmforum.com/forum/showthread...Banners-Part-1
Hope that helps! 
Amy