The latest update to WebCopy has just been released, and includes two new features which expand the usefulness of the product.
These features are considered experimental at this stage - they haven't been as fully tested as some other features, and as a result they either might not work properly or have unintended side effects.
Multiple Hosts
One of the more odd omissions of WebCopy was the fact it wouldn't crawl other hosts. You could copy sub domains, but what about if you used a CDN with a completely different domain name? Fortunately that deficit has now been rectified. The Additional Hosts configuration page lets you specify additional domains to crawl.
Now, when WebCopy finds an external URI, it will check to see if the domain is listed as safe to crawl. If it is, it will promptly download the linked resource, and then attempt to scan it for further links, and expand from there.
As these additional hosts can be jumped into from any level, some project settings won't apply to the additional hosts - for example the Crawl Above Root setting. Therefore it is important to make sure you use rules to control how content is downloaded.
Proxy Server Support
Previously, WebCopy would use the system defined proxy server settings. Now you can config your own independent settings on a per-project based. This allows all requests during a crawl to be sent via the proxy.
Odds and ends
With these features being new and only tested in a limited fashion, there could be bugs or side effects - please let us know if you experience any problems.
As is usual for these updates, there is also a handful of bug fixes and minor new functionality, mostly around the UI interactions, but also including a fix where WebCopy would treat certain URI's as sub domains even though they weren't.
We hope you enjoy this update to the product!
Update History
- 2014-06-01 - First published
- 2020-11-23 - Updated formatting
Like what you're reading? Perhaps you like to buy us a coffee?