A beta version of WebCopy 1.4 complete with a fundamental change to how rules are ran, various performance improvements, UI tweaks and miscellaneous bug fixes has been released.

Rule Changes

In previous versions of WebCopy, rule processing would stop as soon as the first rule was matched. This made it impossible to do standard tasks like exclude all HTML pages from being downloaded (but still scan them) and only download image resources, as an example.

Now rule processing will continue and the last match is the final result. This allows for much better control over processing. We've added a new Stop Processing flag too, so that you can halt the processing when a desirable match is found.

All projects created with older versions of WebCopy will automatically have the Stop Processing flag applied to existing rules so that it behaves in a backwards compatible manner.

As a result of this change, the hacked in Reverse and Do not allow children to inherit this rule flags are deprecated will be removed in the next version of the software.

Improved Quick Scan

When I was looking at metrics, I was shocked to see huge amount of calls to the documentation for the Quick Scan dialog. On the one hand, it was interesting to note that hyperlinks to more information were being used, but on the other hand Quick Scan was a hack I put in and the dialog was almost completely useless. Oh, and the documentation for it wasn't very helpful anyway.

The original Quick Scan dialog
The original Quick Scan dialog

We've made some improvements to the dialog so that it is now hopefully useful. The main part of the dialog is now taken up with a diagram showing the results of the quick scan. This dialog updates in real time as you change options so you can get a feel for how to configure the crawl.

You can also include or exclude domains and pages via a context menu - this will set up additional hosts or rules as appropriate.

The new and improved Quick Scan dialog
The new and improved Quick Scan dialog

There's still more work to be done - the diagram control doesn't support keyboard users at all, and really needs to be able to zoom and so we'll continue to improve this over future updates.

The diagram updates with a real-time preview of what will be downloaded
The diagram updates with a real-time preview of what will be downloaded

Tweaked Editors

It has been a long annoyance to me the strange way list based editors for Rules and Forms worked - it felt as though they went out of their way to make it difficult to add or edit items. These have now been rewrote to actually make sense, although visually they look the same as they did previously.

You can also now reorder rules in their list by dragging and dropping.

Rule Checker

A new tool for quickly checking rules has been added - this can be useful if you want to test what rules will match a given URI.

The new Rule Checker dialog in action
The new Rule Checker dialog in action

You can activate this tool from the new button on the main window although I expect this will be removed in a future update when we try to de clutter the UI (it's also available from the window's menu)

Performance

Quite a few changes have been made to improve memory usage to avoid "Out of memory" crashes which can occur just about anywhere.

URL Lists

Previously WebCopy would load all data into these lists at once. Apart from the slow performance of filling lists with tens of thousands of items, it doesn't help with memory usage. Now, the lists are "virtual", meaning they actually only contain enough items to fill what's currently visible and the rest are fetched when required. You can still sort the lists by any column or search the lists, it's just a lot more efficient than it was.

Site maps

We've also began work on reducing the memory requirements of site maps, and while we've made some progress (viewing a website diagram should be less likely to crash), there's some major work needs to be done - the site map you see in the application can have scant resemblance to its internal structure, and this will take more time to resolve.

Bug fixes

Quite a number of additional minor bugs have been fixed, view the release notes for more information.

Documentation

We've continued to iterate on the documentation in an effort to improve it.

Continual improvement

We hope this update is useful. Of course we'll continue to improve WebCopy, there's still lots more to be done!

Update History

  • 2018-04-15 - First published
  • 2020-11-23 - Updated formatting

Like what you're reading? Perhaps you like to buy us a coffee?

Donate via Buy Me a Coffee

Donate via PayPal


Comments