As the daily revenues keep on climbing and it gets more and more important having your tracker & websites stay up 24/7 to avoid making huge losses, it should be in every affiliates best interests to have a server configuration that offers high availability and redundancy in case something went terribly wrong!
Prior to this date I've been running a setup with 3 virtual private servers (tracking, hosting, backup), with static files being served from a CDN, important data being backed up with rsync every night and important services being monitored by Pingdom. It's a setup that has luckily worked flawlessly for me but there are still the odd chances of DDoS-attacks, catastrophic failures of the RAID-system and what not, which made me want to take my setup to a much more reliable level.
What kind of setups are other STM'ers running, how have you made sure that everything works even in case something unfortunate happened?
I used to run 6 servers (one tracker in every geo) and then used pingdom RUM monitoring, and had 3 VAs hired on call 24/7 to restart if anything went down right away. Now that
Voluum saved me money, stress, and made everything more efficient.
You could use something like CloudFlare which offers some level of security against attacks.
Otherwise you could have autoscaling Amazon stuff, or load balancing which can be done through many services.
DigitalOcean are really good but they had a few storage failures and that is why I am not using those guys anymore. Their droplets are really fast and their support if you know what you want is pretty good. However you are not protected from DDoS or/and storage failure with them. They are perfect for development purposes.
There is no 100% secure way to protect from both these and have high availability setup at the same time, or at least there is no way to do that with less than 4 zeros in the number you pay monthly. The best option which is also the cheapest will be to:
-- Get a server in a really good data-center with 2 x RAID 10 or higher arrays with 4+ disks on each ( if they put spare disk you will never have to worry about disk failure ). 1 array SATA drives, 1 with SSD drives. With this setup and high read/write operations # the failure of one or more disk is estimated to be 9 - 11 months for the SATA drives and 12 - 14 months for the SSD drives ( this has nothing to do with the disk lifetime listed on the manufacturer info page ). So with this in mind you can schedule maintenance for each array when needed and switch from one to the other while you are rebuilding which will allow you to have all that set with minimal downtime ( unfortunately no downtime is not an option ). This is for your data protection.
-- DDoS -- Any dedicated server can be easily DDoS protected with juniper or CiscoGuard and most of the data-centers will provide that free for up to 48 hours or monthly for some fee ( the fee depends on the data-center you use ). I pay for the CiscoGuard 130/mo which includes the setup and the maintenance of the same. I am not Cisco Certified so I prefer someone who knows what he is doing to get that set for me. This stops pretty much any known DDoS attack of course if is not 50%+ of the pipe going to your data-center. If you get close to 50% of their uplink speed they will null route your IPs and this is valid for any data-center.
Using cloudflare indeed helps a bit but it will not protect you completely as when the attacker gets your server IP which is not that hard they will initiate the attack against the IP and here cloudflare is no longer protecting you.
Now Amazon and any similar cloud provider will give you a ton of resources during the DDoS attack to keep your server running. However the bill at the end of the month will be a very unpleasant #.
You can also use cloudflare for DNS only as their TTL can be set to a really low number like 1 minute for an example. This way you can switch from one server to another really fast changing the domains IP addresses. You will need 2+ identical servers and keep those in sync each hour. The files sync is the only tricky part. The database will be set with replication to multiple servers. If the tracking script you are using has the option to use multiple servers and split the read/write queries this will also help you speed it up. This is note that easy to setup but is possible and when set will make your life way easier
.
I have set a few of those dedicated servers with lots of drives on them and the speed is way better if you compare it to any other services such as Amazon, DigitalOcean, Linode or any virtual server. As for the MySQL replication anyone who knows a bit about databases should be able to set that up for you.
I hope this is not too boring as I know all of you guys have a lot of experience testing servers here and there. I have not played with servers in too many data-centers but from experience I can tell you the above is absolutely true. If anyone tells you their have 100% DDoS protected environment this is BS. Even cloudflare got DDoS last year 300gbps is hard to stop. Also for 100% guaranteed storage -- the best option to know your data is safe will be to have your own storage and to know how it was set or backup as often as possible
.
If you find a cloud host where you get distributed storage this guarantees 99% data protection but it most of the cases is pretty expensive. Most of you guys need such storage so if you get together and have one of those set this will be the cheapest option. You can split it in separate volumes which will be completely independent and no-one will have access to the files of his colleague
. Just an idea.
Very true about the DO storage failures, heard some people lost their data completely. That was a couple years ago though, and so far I have had 0 problems with them and like you said the droplets have been insanely fast and their support's been top notch. They actually introduced automated droplet backups, but I do agree that their decision to use RAID5 is not probably for the best when looking for reliability, and they are more focused on attracting developers instead of providing a very redundant production environment.
As for the getting a dedicated server, do you have any providers you would recommend specifically? Having multiple RAID's is good, but obviously those wouldn't protect from other problem points, like PSU failures and what not. I have been talking with Rackspace recently, and they seem to have a "real" cloud where a very reliable set up could be very possible. Think 2 servers in different cloud centers, one in EU and another one in the US replicating 1:1 as master-master & a load balancer checking up on both of them to make sure they are healthy, and incase another one fails it would just switch to the working one and fire up alerts.
Good points about DDoS, there is pretty much no way to be completely protected from it but we can do some things to mitigate the risk, like using firewalls such as Juniper / CiscoGuard. There can still be application level DDoS-attacks and what not that are much harder or impossible for a network firewall to detect.
PSU failure is the one thing I have never experienced. The servers I use have 2 of those enterprise grade and if one fails the other one kicks in and they will replace the faulty part without taking down the server.
Not cloud. Get your own hardware for all the data. And you run one small server on linode with haproxy on it for an example. This will load balance and will not allow downtime if one of the dedicated servers go down because of some reason. In any case keeping identical content on 2+ servers is pretty hard task if the size of the content is over a few GB.
As for the Application level DDoS this is easier to stop and can be also pinpointed pretty fast. One of the main problems with DDoS are some apache based attacks as these a hardware firewall cannot prevent. However if you use nginx or varnish infront of the Apache
.
In any case if you will need at any time really good managed dedicated server let me know
.