How to determine the number of changes an incremental crawl will process prior to initiating the crawl

by dipbiswas 24. September 2010 06:21

The incremental process is dependent on the protocol handler being used. We will first attempt to get changes from the last crawl.  We do this through MSDMN.exe process and hit a webservice called sitedatawebservice.  The URL is:

http://servername/_vti_bin/sitedata.asmx


How to detect changes before starting incremental crawl:

This can all be accomplished by a series of SQL queries. The first table you need to check is the MSSChangeLogCookies table within the Search database. This table keeps track of the last change that the crawler processed for each content database. You'll want to look at the ChangeLogCookie_new column and you'll see several rows but the output of each will look something like this:
 

1;0;3cf6820c-b653-458c-a92e-6f50ae229f35;634206188395670000;46530572

 

The GUID - 3cf6820c-b653-458c-a92e-6f50ae229f35 - is the actual database were crawling against.  The last value, 46530572, is the latest change ID.   So first, we need to find which content database this row is referencing.  To do this, we take the Guid, 3cf6820c-b653-458c-a92e-6f50ae229f35, and perform the following query against the objects table of the configuration database:

 

select * from objects with (NOLOCK) where ID = '3cf6820c-b653-458c-a92e-6f50ae229f35'

 

This will output the name of the content DB.  So for my case, it's MOSS_ContentDB_Tax.  So at this point, we know that last change that was processed against the MOSS_ContentDB_Tax is 46530572. 

Now we need to determine all of the changes from 46530572 to latest from the MOSS_ContentDB_Tax.  The eventcache table within the content database contains all of the changes up to the most recent.   So in our example above, we need to know all of the changes greater than 46530572 so we perform the following query:

 

 select * from eventcache with (NOLOCK) where ID > '46530572'

 

The ID column will show you all changes after 46530572.  The last row will be the latest change so in my case it's 46530572.  So before starting the incremental crawl, I know that the crawler will process 10 changes against this content database.   After running an incremental crawl if I check the MSSChangeLogCookies table in the Search DB, I'll see the following:

ChangeLogCookie_old column will contain:

1;0;3cf6820c-b653-458c-a92e-6f50ae229f35;634206188395670000;46530572

ChangeLogCookie_New column will now contain:

1;0;3cf6820c-b653-458c-a92e-6f50ae229f35;634206188395670000;46530582

And the process repeats itself...

Tags:

Search and crawl

Comments

Add comment


(Will show your Gravatar icon)

  Country flag

biuquote
  • Comment
  • Preview
Loading



The opinions expressed herein are personal opinions and do not represent others' view in anyway. All posts are provided AS IS with no warranties, and confers no rights.

Web Development & Design Blogs    Web design blogs   

You are visitor#