So, this isn't going to be your regular Follow-Along.
Rather than a campaign, I'm going to be working on a different project - an attempt to fix, once and for all, one of the biggest Pains In Our Collective Ass in affiliate marketing.
This is original research, so I'm going to be posting my findings as I go, so that we can combine the awesome powers of the STM collective to kill this hulking great bugbear once and for all.
So What's The Problem?
Prosper 202 slows to a crawl when it has more than a million or so clicks recorded, and eventually crashes.
Sometimes this happens with even less clicks. Sometimes it happens with a lot more. Regardless, when it does happen it's a colossal pain in the butt.
This problem tends to kick in just as affiliates are starting to have their first real success, and it continues to plague us up to super-affiliate level. I've heard tales of people dumping their entire tracking database every day - or even every 6 hours - just to keep the machinery running.
It affects everything that's built on the Prosper codebase, and similar things also happen with CPVLab.
What Makes Me Think It's Fixable?
The database engine that Prosper 202 uses, MySQL, certainly isn't limited to only a couple of million records. Here's an example of MySQl forum code that delivers results from 500 million rows in 0.02 seconds.
In addition, I've performed SQL queries on a Prosper database that was experiencing extreme slowdown, using the command line rather than the Prosper interface, and found that it was still very responsive. That suggests that we're not hitting hard data limits, but rather something internal - that we can, with any luck, fix.
So What's The Plan?
Firstly, I'm going to create a new test server, and I'm going to either write or modify a program to generate a dataset of any size in Prosper. That will mean I can easily simulate the effects of having a few million clicks stored in Prosper without having to actually pay for all that traffic
TIP: once I've got that test program up and running, you'll be able to use it to check how any new server will handle a large Prosper database - very useful information to know in advance!
After that, I'm going to figure out what size of database slows Prosper down, but doesn't crash it completely, and what size of database just locks Prosper up. I'll then start debugging, using MySQL's Slow Queries Log, to determine what's causing the slowdown.
Once I know exactly where the problem lies, I'll either attempt to speed up a small subsection of the code - probably the Keyword Analysis View, initially - or start making database changes to speed up the system overall.
Of course, I will publish my results here as I go!
How Can We Help?
We've got some very big programming brains here at STM, of course - including the maintainer of Prosper! I'd definitely welcome any and all technical input in terms of things to try, things you've tried before, or things that I definitely shouldn't try because they'll melt my server.
If you're not a technical expert but you've run into the Prosper 202 slowdown problem before, it would also be very helpful to know how much data caused the problem, what spec of server you were running, and if upgrading your hardware helped at all!
Right, let's go. First update in a couple of hours.
Let's get this thing done.
Day 1 progress
Version 0.1 of the Prosper Data-Testing Tool is finished, and you can get it from here. It's a couple of Ruby scripts (using Ruby 1.9.3) that will generate a Prosper click database of any size you like as a SQL file, and likewise, a keyword database of any size you want too.
Feel free to test it out if you like! If you don't know how to use it, don't worry - when it's a bit more advanced I will package it up with a GUI or at least a slightly more usable command-line and write instructions for it.
Currently it's quite simplistic: it only handles direct-link data, it doesn't produce unique timestamps on the data (so all clicks appear to have come in at the same time), and it only links clicks to the first keyword in your database.
Nonetheless, it's already giving some quite interesting results.
Results So Far
I'm testing on a 1Gb VPS running on DigitalOcean, using a standard Apache/MySQL setup. Because it's DigitalOcean, it's using SSDs, which I suspect make a huge difference to complex SQL queries.
Prosper started to slow down around the 100,000 recorded clicks mark, taking 3 seconds to produce a keyword report. At 500,000 recorded clicks, it was up to 11-12 sec to produce a keyword report.
Assuming that scales in a linear fashion, I'd expect the keyword reports to become effectively unusuable at about 4,000,000 clicks, which sounds about right.
Interestingly, MySQL doesn't eat massive quantities of memory when it's producing a report at these data levels. That implies that some configuration changes in MySQL, increasing memory allowances, might be the first things to test.
My initial assessment of the code that generates the keyword reports ( /tracking202/ajax/sort_keywords.php in the standard Prosper codebase ) shows that it appears to run a number of queries in a loop depending on the number of active keywords, so that might produce a HUGE slowdown with more keywords. However, I've done some initial testing on that, and it seems the code is efficient enough that it's not a simple geometric slowdown with more keywords.
Next Step
The next thing to check is whether those times for reports go up massively if the data is associated with a wide variety of keywords. I'm going to modify my data generator to associate clicks with keywords randomly, and also to spread out click times over a wider timeframe.
I'm also going to put some debugging code into Prosper, either using good old Echo commands or possibly NewRelic, and see if I can get an idea of what's going on, when and why, under the hood.
Incidentally, one tool I've been finding very useful is the Github repository for Prosper 202. You can browse and search the code there.
Couple of questions, one for the techies and one for everyone, as I keep developing this:
TECHIES: Any recommendations on what sort of data I should try to attempt to bring Prosper to its knees? I'm really looking for anything that will suddenly stop the wait times being a linear progression, and start being geometric.
NON-TECHIES: How much data have you found Prosper takes before it starts to behave poorly or not at all when it's processing?
Onward!
For the hosted Tracking202 that I spun off the prosper202 codebase I'm using summary tables for the reporting. That's enabled me to generate most reports even with 10million+ click in under a second.
I still haven't got around to optimizing the keyword reports, I've modified some of the table structures(data types) and indexes to speed things up. There's still a lot more work to be done
Also please note that the prosper202 on github is 1.6.1 I'll push the latest version up and would love to actually get a community actively submitting pull requests with fixes and updates.
Check out https://github.com/avazu/Avazu_202 a Mongodb version of 202, I haven't scanned it to see if it's safe to use (no backdoors) but it looks very promising
Awesome! Thanks, that's extremely useful. I'll look into both using summary tables and altering the schema to speed things up.
Would be very good to get the latest version on Github! I don't know how many commits I'll make at this point, but I'm sure I'll be doing something at least.
Avazu: that sounds very interesting. I've been considering writing a MongoDB-based tracker, so finding one that exists already is... nice.
Caurmen, this is awesome you're working to tackle this beast of a task....I salute you. I regularly have to export my prosper databases to keep things running smoothly for reporting, and I know lots of other people on here that haven't spoken up that have to do the same....One thing that seemed to helped my speeds was switching from Apache to Litespeed, so that may be something you could look into testing. I think collectively we can all make an already amazing software even better, more refined, and much faster. In the end I believe it would greatly benefit all Prosper users, with the end result being more profit for all of us.
This is probably the only gripe I've got with Prosper. I'm having to clear my database every day or two and I've become too lazy to make database backups. Sometimes I will just export Excel files but its a real pain in the ass.
Having the ability to handle large amounts of data would be GREAT!
Another thing that I was thinking of was having a function to easily export all your data to a separate database and have the ability to view it from within Prosper.
I'm no programmer but I would think that's not too hard to do?
@deondup - that's a definite backup plan! Hopefully, though, we'll be able to solve the scaling problem so you don't even need to export data any more.
MongoDB and NoSQL is the way to go, if someone could add mobile capacities to Avazu 202 and commit on github that would be great!
How about just change to TokuDB ?
@hd2010 - Good suggestion! That's on the list for testing, along with testing a couple of the MySQL variants.
@offerkiller - you've got a very interesting project there! I'm going to take a look over the next couple of days.
I love where this thread is heading!
Righty! Time to try some fixes.
So far, on my list to try are:
Things have gone very well so far, I must say!
OPTIMIZE TABLES: 2x speed improvement!
OK, I wasn't expecting this one.
I'm using a fresh Prosper install which has just had a SQL document containing 500k rows read into it. Given the data's only just been added, I really wasn't expecting OPTIMIZE TABLES to have any effect. I only even tried it to make sure my MariaDB test (below) was fair.
I used my standard test: my 500k rows of data and a refresh of the Analyze Keywords screen. I repeated the test 3 times and averaged the results.
Result?
BEFORE: 11.2 sec
AFTER: 6.8 sec
CONCLUSION: Even if your Prosper has only been running for a day, OPTIMIZE TABLES has a pretty massive effect. It might even be enough to bring a Prosper install that has stopped running back from the brink.
Testing MariaDB: 3x speed improvement!
So, the first thing I was intending to try was replacing MySQL with the far more up-to-date drop-in replacement, MariaDB.
I wasn't expecting much from this test - maybe a 10%-20% speed improvement.
Installing MariaDB is not exactly trivial - given the results I've seen from this test, I'll write up a tutorial on how to do it later this week.
However, the speed improvement it gave was more than worth it.
Once again, I was using a 500k rows data set here, and timing a refresh of the Analyze Keywords screen. I repeated the test 3 times and averaged the results.
BEFORE: 6.8 sec.
AFTER: 2.5 sec.
That's a 172% improvement in speed from the change!
Not Too Shabby.
CONCLUSION: Run, do not walk, to install MariaDB. It has a very good reputation and I've not heard any horror stories, and it appears to have a major speed impact.
As I said, I'll write up instructions for changing to MariaDB as a tutorial, but for now, if you want to get started here's:
The above post is pure gold!
It's something that requires minimal effort and has such a big impact.
Will def update all my P202 setups to MariaDB for the moment.
Wow I love this thread alread. I wish I had enough DB knowledge to help out more.
Thanks a ton for this caurmen!
No problem! I was pretty startled at its impact myself.
Do let me know how the upgrade goes on your systems - I'll be very interested to see if the effects of MariaDB vary between servers, datasets, etc.
Have you tried http://www.postgresql.org/ ?
I run a site on a dedicated server which handles massive amounts of data each day, and seen a huge increase in speed, and overall reduction in load by swapping from mySql..
I'll look into testing PostgreSQL - that's a good idea.
The major problem is likely to be that from what I recall last time I split-tested it with MySQL, it's not exactly a drop-in solution. I'm trying to avoid making huge changes to the Prosper codebase if I can, so anything that requires a rewrite of the entire schema and DB layer is probably somewhere I don't want to go immediately.
Yeah agree.. its not the simple solution.
Testing Day 2
No such huge breakthroughs today, I'm afraid.
Today I've tested:
Keep us updated with the new tests!
awesome! Gonna ditch MySQL for MariaDB asap!
Question: are you running Apache? Over the past few months I have transitioned every single one of my servers to Nginx...and for the first time my pages feel like they really load instantaneously. My RAM usage isn't taking a beating either. And for those wondering, I've been using this script to automate installation of everything.
@bbrock - will do!
@gritz - yep, currently this server is running Apache. I run Nginx on other servers (and LiteSpeed on still others), but for this test I wanted to go for the most standard configuration.
For tracking, there are actually some solid arguments for running an Apache server. The implementation of mod_php has been tested, and it's actually somewhat faster than Nginx/PHP-FPM. Not a huge amount, but it is.
Additionally, I've found that a lot of the memory advantages of Nginx are offset on complex PHP scripts by the memory usage of PHP-FPM.
So my usual approach as of June 2013 is to use Nginx for anything that isn't using PHP - static files, particularly - and to use Apache on servers with plenty of RAM which are primarily going to run massive, hard-to-cache PHP scripts.
In this case, however, I'm pretty sure that the bottleneck isn't the web server, but the database or possibly the code - although you have just reminded me that I should probably test with a PHP accelerator in the mix!
Nginx is the dog's bollocks, definitely, for an awful lot of things. I really like it as a server in general. (The fact that the darn thing can also cache better than Varnish, by some reports, doesn't hurt either.)
Interesting data - thanks! Are you tracking with Prosper on those servers?
^^ Nevermind, problem solved.
For anyone having the same issue, make sure you have phpmyadmin installed. I somehow skipped that from the installation guide anyways.
Thanks
BTW, has anyone tried to run STM mobile tracker on MariaDB? As it is a fork of Prosper, I tend to think it should work as well, but I would like to check if someone has already gone through the experience.
Thanks
I can't recall if I have, but there shouldn't be any issues with doing that.