SCM comes pre-installed with its own list of article sources to scrape content from.
You can add to this list and customize it.
This feature is available via custom article sources.
Example: Just check the box of the URL you want SCM to limit its search to. For instance, when you check buzzle.com, SCM will do a search for buzzle.com articles. SE can be Google or Bing, with the latter as the default.
Custom Search Engines
Outside of the Bing/BingCache/Google options you can add your own Custom Search Engines.
There are many sites who have their own search boxes which you can use to find content which may not appear in Bing’s or Google’s search listings.
Using a custom SE only requires two steps – 1) Add the article Sources; and 2) Run the Search.
1. Setup Article Sources
To go to Article Sources: Main Menu -> Article Downloader
(Another option is Main Menu -> Article Generator -> Article Sources -> Edit. )
The first you need is a domain URL for the site that you want to scrape content from.
Lets do a simple example below:
Visit the page and do a simple search.
Lets try “dog” for example. Run the search and copy the resulting URL that is generated.
See the image below for an example.
For SCM to pass the correct keyword at runtime, replace the keyword in the URL with %keyword%.
So in our example, instead of:
For the SE column you must select “CustomSearchEngine”
Make sure your new entry is checked.
Optional: In the Sources region, set Google to None to limit results to only your new custom search engine.
2. Run Search
Go to tab Keyword and input “dog training”. Then click, Scape Urls.
You can see the actual status of this job at any time by…
Go to Main Menu -> Content Task and in the Application Log, you will see some statistics.
You can see the scraper fetches junks, pulling out every single link on this page. However, the actual juicy pages with content are the ones with “hub”in the url.
To make our life easier, we can actually filter the returned URLs by adding a simple filter.
Go back to Article Sources and put “hub” at the end of your custom url. Make sure you put space between the URL and the filter word.
If you go back to search, clear all and re-run “Scrape Urls”, you will see in the URL box, only pages with “/hub/” are saved.
Do “Download Article” to fetch the content on each page.
Tip: You can use any page URL you like. SCM will fetch links on that page and follow them to find content for you.