Optimizing Pro V5...

Insane ramblings, mumblings, whining, and life in general

Optimizing Pro V5...

Postby mickyj » Sat Jun 21, 2008 12:48 am

As many of you know, we released SyncBackSE V5 and SyncBackPro V5 betas recently. Here are some quick answers to obvious questions:

- Do I have to pay to upgrade from SyncBackSE V4 (or V3) to SyncBackSE V5? No. It's a free upgrade regardless of which version of SyncBackSE you have.

- Do I have to pay to upgrade from SyncBackSE (V5, V4, or V3) to SyncBackPro V5? Yes, US$19.95 per upgrade, regardless of which version of SyncBackSE you have.

- Do I need to upgrade from SE to Pro? No, it's your choice. If you don't need the extra features in Pro then stick with SE.

- If I have not purchased SyncBackSE, how much is it to buy SyncBackPro? US$49.95

- What's the difference between SyncBackSE V5 and SyncBackPro V5? Pro has all the features in SE, plus extras, e.g. CD/DVD burning, unlimited file backups, scripting, etc.

- When are they going to be released? Once we're confident we've squashed the bugs. You beta testing them will help us get to that point faster :wink:

One of the big features in SyncBackPro is the fact that it can now backup an unlimited number of files. In SyncBackSE all the file information (not the file contents) is stored in memory. The benefit is that it is very quick to access that information, however it also means that you are limited in the number of files you can backup (based on how much free RAM you have). With SyncBackPro it will still store the information in memory, but once it reaches a certain number of files, or free RAM is getting low, it will dump it all into a database on disk (automatically, no user intervention required). The benefit is that you can now backup an unlimited number of files, however accessing information on disk is a lot slower than accessing it in memory.

We can test the performance impact of using a database by using another feature introduced in Pro: scripting. We've written a little script that pretends to be a disk and pretends it has hundreds of thousands of files on it. Scripting allows for a high level of configuration and control and is something I'll need to cover in another post. Anyway, back to testing. Using the script we discovered that when you sort hundreds of thousands of items in the Differences window, and the database is being used, then it can be really, really slow. This is not surprising because, for example, with 200,000 files, the sorting algorithm (in the tree component on the Differences window) requests details on the files around 3.8 million times. That's not good. So to alleviate this we've done two things: provide more feedback (so users know what is going on), and speed up the sorting. To speed up the sorting we decided to let the database do the sort instead of the tree. With 200,000 files this brought the time down from around 2 minutes to 23 seconds. In future we'll have to look at changing the Differences window so it resembles Explorer, i.e. has a tree structure. That's not a small change as it changes the way many things work.

When profiling (seeing what took the longest) we discovered something very interesting. Filtering has a huge impact on performance. It adds roughly 25% to the run-time of a profile (and even worse if regular expressions are used). So if you want a profile to run a lot faster then simply stop using filtering. We've added an option to the next beta release so that you can switch off filtering. Can we speed up filtering? I don't think so. We've already optimized the filtering. The problem is simply that the more filters you have then the more strings you must compare.

Anyway, back to testing and optimizing... :color:
User avatar
mickyj
2BrightSparks Staff
2BrightSparks Staff
 
Posts: 7845
Joined: Mon Jan 05, 2004 6:51 pm
Location: In front of computer

Postby ianq » Sat Jun 21, 2008 5:44 am

One of my disks (I just checked ) has 349,120 actual files on it - so I'll try and be a useful beta tester!

But first, I hope you don't mind me asking some techo developer questions, after all this is the developer blog! :-)

a) Why is it a problem storing them in memory? Say 250K files - and say the names are an average 250 characters (they should be less), and you store another 30 byes of attributes. Thats still only 100Mbytes, and assuming the output directory has the same files in it, that's only another 100MB.
200Mb on a modern PC (which will be what anybody with that many files is running..) isn't that bad!

b) is a lot of this work for the differences screen? As I - and assume others - never display that anyway! So I hope it isn't doing the work if we aren't displaying the screen!


Ian
ps this is the post that it wouldn't let me post, so I'm going to do it in two bits to see what happens...
ianq
Advanced
Advanced
 
Posts: 42
Joined: Fri Jun 13, 2008 1:13 am

Postby ianq » Sat Jun 21, 2008 5:45 am

c) why do you site the whole file list? I would have thought you'd only be sorting withing each directory (excluding whatever you're doing for the difference screen).

d) On the filtering (which I've done a lot of work on in the past!) I think your sort of right, and sort of wrong.

Yes, filtering fundamentally adds cpu cycles. Yes, one you've done basic optimizing it gets very hard and is quite tricky to get significant improvements (many papers have been written on it..)

No - depending on where you do your filtering, it could actually not increase the time of the program a all. Any backup program is fundamentally I/O bound (leaving compression and encryption aside for the moment) - so if you did the filtering while you were waiting for I/O to happen ie while you were copy the previous file, it shouldn't increase the overall running time...

(...And I suspect most people don't need filtering anyway..)

Ian
ps I did the same text as two posts, and it worked.. Not sure why..
ianq
Advanced
Advanced
 
Posts: 42
Joined: Fri Jun 13, 2008 1:13 am

Postby ianq » Sat Jun 21, 2008 5:50 am

and I just ran a test - I stored a tree with every file name (on my disk with 350K) files including the created date/time, and the directory name (only once fore each directory, as it was a tree). Total space was an insignificant 25Mbytes...

:-)

Ian
ianq
Advanced
Advanced
 
Posts: 42
Joined: Fri Jun 13, 2008 1:13 am

Postby mickyj » Mon Jun 23, 2008 6:44 am

Yes, I'm sure your list of strings didn't take up much memory. One thing I learnt a very long time ago was to never get into discussions about development with other developers. All developers think their code is perfect and every else's is garbage, they're right and everyone else is wrong, everything is easy until you actually have to do it, etc. I'm no different :lol:
User avatar
mickyj
2BrightSparks Staff
2BrightSparks Staff
 
Posts: 7845
Joined: Mon Jan 05, 2004 6:51 pm
Location: In front of computer

Postby ianq » Mon Jun 23, 2008 8:42 am

mickyj wrote:Yes, I'm sure your list of strings didn't take up much memory. One thing I learnt a very long time ago was to never get into discussions about development with other developers. All developers think their code is perfect and every else's is garbage, they're right and everyone else is wrong


Well, no. I wasn't making a comment about your code or efficiency, or how good mine was, I was making the point that - and ignore code for a moment - I couldn't see why memory would be a constraint. If somebody has hundreds of thousands of files, they won't be running on a 486 with 32Mb, and the file list isn't going to be that big (even if it is unicode and include fast hash values) ie I don't see why RAM is a limiting factor.

On a general point, I have the opposite view to you on discussing things with other developers (else I wouldn't bother posting on this forum) as that's one of the ways I've improved over the years. It's also how I've trained the many programmers who have worked for me.

Each to his own, I won't try and discuss design, algorithms, or implementation in this developer blog group.
ianq
Advanced
Advanced
 
Posts: 42
Joined: Fri Jun 13, 2008 1:13 am


Return to Development Blog

Who is online

Users browsing this forum: No registered users and 0 guests

User Control Panel

Login

Who is online

In total there are 0 users online :: 0 registered, 0 hidden and 0 guests (based on users active over the past 5 minutes)
Most users ever online was 619 on Tue Jun 26, 2007 10:08 am

Users browsing this forum: No registered users and 0 guests