A beta version of WebCopy 1.4 complete with a fundamental change to how rules are ran, various performance improvements, UI tweaks and miscellaneous bug fixes has been released.
In previous versions of WebCopy, rule processing would stop as soon as the first rule was matched. This made it impossible to do standard tasks like exclude all HTML pages from being downloaded (but still scan them) and only download image resources, as an example.
Now rule processing will continue and the last match is the final result. This allows for much better control over processing. We've added a new Stop Processing flag too, so that you can halt the processing when a desirable match is found.
All projects created with older versions of WebCopy will automatically have the Stop Processing flag applied to existing rules so that it behaves in a backwards compatible manner.
As a result of this change, the hacked in Reverse and Do not allow children to inherit this rule flags are deprecated will be removed in the next version of the software.
Improved Quick Scan
When I was looking at metrics, I was shocked to see huge amount of calls to the documentation for the Quick Scan dialog. On the one hand, it was interesting to note that hyperlinks to more information were being used, but on the other hand Quick Scan was a hack I put in and the dialog was almost completely useless. Oh, and the documentation for it wasn't very helpful anyway.
We've made some improvements to the dialog so that it is now hopefully useful. The main part of the dialog is now taken up with a diagram showing the results of the quick scan. This dialog updates in real time as you change options so you can get a feel for how to configure the crawl.
You can also include or exclude domains and pages via a context menu - this will set up additional hosts or rules as appropriate.
There's still more work to be done - the diagram control doesn't support keyboard users at all, and really needs to be able to zoom and so we'll continue to improve this over future updates.
It has been a long annoyance to me the strange way list based editors for Rules and Forms worked - it felt as though they went out of their way to make it difficult to add or edit items. These have now been rewrote to actually make sense, although visually they look the same as they did previously.
You can also now reorder rules in their list by dragging and dropping.
A new tool for quickly checking rules has been added - this can be useful if you want to test what rules will match a given URI.
You can activate this tool from the new button on the main window although I expect this will be removed in a future update when we try to de clutter the UI (it's also available from the window's menu)
Quite a few changes have been made to improve memory usage to avoid "Out of memory" crashes which can occur just about anywhere.
Previously WebCopy would load all data into these lists at once. Apart from the slow performance of filling lists with tens of thousands of items, it doesn't help with memory usage. Now, the lists are "virtual", meaning they actually only contain enough items to fill what's currently visible and the rest are fetched when required. You can still sort the lists by any column or search the lists, it's just a lot more efficient than it was.
We've also began work on reducing the memory requirements of site maps, and while we've made some progress (viewing a website diagram should be less likely to crash), there's some major work needs to be done - the site map you see in the application can have scant resemblance to its internal structure, and this will take more time to resolve.
Quite a number of additional minor bugs have been fixed, view the release notes for more information.
We've continued to iterate on the documentation in an effort to improve it.
We hope this update is useful. Of course we'll continue to improve WebCopy, there's still lots more to be done!
- 2018-04-15 - First published
- 2020-11-23 - Updated formatting
Like what you're reading? Perhaps you like to buy us a coffee?