Sounds like you are on the right track and you caught your errors already. The biggest one is only testing 1 offer. So get at least 3 offers from 3 different aff networks. Preferably 3 from each if viable.
Also you are definitely cutting landers way way too early with 300 impressions on pop. If you are taking the time to rip and modify, give them a chance. Use: http://getdatadriven.com/ab-significance-test or similar. Maybe apply a 2-4x payout rule before cutting a LP.
Looking forward to hearing your progress and good luck on this go round.
If possible wait for more conversions before making decisions. The reason is "random" conversions that can happen and also late conversions that can happen too. In cases, where 1 or 2 conversions can totally change the outcome, you need wait for more data.
It's not possible to give you an exact number of conversions to wait for, but you might want to read this thread, I described it there in more detail : http://stmforum.com/forum/showthread...nuggets-inside
sny insights anyone?
Sorry for the late response! Don't know how I missed this one!
First of all - when comparing split-test candidates using the peakconversion calculator, there are 2 steps. 1)Identify the candidate that's in the lead (i.e. the "current-best" which may not actually end up as being the best when stat sig is reached), then 2)compare EACH of the other candidates against this "current-best" and cut anything with <10% probability of being best.
So don't just plug 4 offers/landers into the calculator and compare them all at the same time - it takes 2 steps.
Nowadays I cut at <5% or even 0% probability of being best for the inferior candidate (or 95%-100% for the superior candidate). This is because there are so many variables that can affect the performance of the test candidates, and also because landers and offers are important enough to warrant a high degree of accuracy - they can make or break a camp.
Another thing to clarify: You don't need to wait until ad spend reaches a certain multiple of the payout, or until a certain number of impressions has been reached. The calculator already takes sample size into consideration. When there's not enough data, it will be reflected in the "probability of being best" itself. So just cut when you see >90% (or better above 95%) probability of being best.
As for cutting placements - as you're observed, cutting your way to green is not very effective. If you're swamped with traffic volume and in loss, cutting placement aggressively from the start could be a good strategy. But in your case you're breaking even already, so I'd say to NOT focus on cutting placements for now - unless you see big placements draining your budget without converting.
It sounds like you weren't cutting landers correctly the last time you tested them. I would suggest to rip more landers from adplexity, fix them up, and RETEST all landers using the best offer you have to date. Cut landers properly this time to arrive at a winning lander. You already have a pretty good offer, so retesting will not cost you that much.
Then, use this winning lander to test ALL the offers you can find for this offer type and geo. Doing this will have the best chances of increasing your ROI by the largest amount.
You can even get granular while mass-testing offers: By identifying the best offer for each major OS (android vs. ios) and/or for wifi vs. individual carriers. You could set up separate camps for this purpose, find the best offer for each traffic segment, then optimize the bid. I would only suggest doing this if you're getting a lot of volume for your geo (otherwise it wouldn't be worth it to maintain a bunch of really small camps make little bits of profits each).
Looking forward to seeing new and better stats! 
Amy
Thanks Amy, appreciated - yes will try and get some new stats up for everyone shortly!