How to use the Custom Search Engine Feature

SCM comes pre-installed with its own list of article sources to scrape content from.

You can add to this list and customize it.

This feature is available via custom article sources.

custom sources

Custom sources grid via Article Downloader

Example: Just check the box of the URL you want SCM to limit its search to. For instance, when you check buzzle.com, SCM will do a search for buzzle.com articles. SE can be Google or Bing, with the latter as the default.

Custom Search Engines

Outside of the Bing/BingCache/Google options you can add your own Custom Search Engines.

There are many sites who have their own search boxes which you can use to find content which may not appear in Bing’s or Google’s search listings.

Using a custom SE only requires two steps – 1) Add the article Sources; and 2) Run the Search.

1. Setup Article Sources

To go to Article Sources: Main Menu -> Article Downloader

customSE_artdownldr

customSE_adsettingsNote that the Edit button doubles as “Save.”

(Another option is Main Menu -> Article Generator -> Article Sources -> Edit. )

customSE_artgen

The first you need is a domain URL for the site that you want to scrape content from.

Lets do a simple example below:

Visit the page and do a simple search.

Lets try “dog” for example. Run the search and copy the resulting URL that is generated.

See the image below for an example.

customSE_hpGo back into SCM and add a new row and under the “DOMAIN” column paste the URL.

For SCM to pass the correct keyword at runtime, replace the keyword in the URL with %keyword%.

So in our example, instead of:

http://example.com/search/?s=dogs

It becomes:

http://example.com/search/?s=%keyword%

For the SE column you must select “CustomSearchEngine

customSE_URL entrySave your your changes by clicking “Save”.

Make sure your new entry is checked.

Optional: In the Sources region, set Google to None to limit results to only your new custom search engine.

customSE_google to none

 

2. Run Search

Go to tab Keyword and input “dog training”. Then click, Scape Urls.

customSE_runscraperThe program will visit the URL you specified to find links to content.

You can see the actual status of this job at any time by…

Go to Main Menu -> Content Task and in the Application Log, you will see some statistics.

customSE_scraperstatOnce scraping is done, we can view the pages it found in the URL window.

You can see the scraper fetches junks, pulling out every single link on this page. However, the actual juicy pages with content are the ones with “hub”in the url.

customSE_scraperresults

/hub/ are the only pages we want here

To make our life easier, we can actually filter the returned URLs by adding a simple filter.

Go back to Article Sources and put “hub” at the end of your custom url. Make sure you put space between the URL and the filter word.

http://example.com/search/?s=%keyword% /hub/

If you go back to search, clear all and re-run “Scrape Urls”, you will see in the URL box, only pages with “/hub/” are saved.

Do “Download Article” to fetch the content on each page.

customSE_scraperresultsfiltered

Tip: You can use any page URL you like. SCM will fetch links on that page and follow them to find content for you.

admin has written 31 articles

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.