CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Kyle Baley - The Coding Hillbilly

"We are stuck with technology when what we really want is just stuff that works" -- Douglas Adams

Trying to access Windows Search from SQL Server: An Appeal

Had some time to review my search dilemma today and three billable hours later, I'm no further along in my task.

There is no shortage of documentation on the new Windows Search and I do so desperately want to try it out. Just not as a client.

To sum up my requirements, I have a search screen that combines meta data in a SQL database with a contents search. That is, the user types in a search word or phrase and/or selects meta data about the documents (e.g. country, document type, date, and so on and so forth).

At present, using Indexing Services, I am able to perform such a query within SQL Server with a single SQL Statement. That is what I would like to do with Windows Search.

Alas, all efforts to connect to Windows Search via SQL Server have failed. The nearest I've come is to find a couple of lost souls to commiserate with who are having the same problem. I've tried every possible combination of provider, datasource, and provstr I can fathom with the connection string, Provider=Search.CollatorDSO;Extended Properties=\"Application=Windows\", and have come up empty.

The issue appears to be with SQL Server as I am able to run pretty much every sample app under the sun and get search results back. I've even overcome my fear of C++ to read through some samples in the Search SDK.

So I'm appealing to you, generous reader(s), for some help. I know I've broken the unwritten rule of finding at least one workable solution, however hideous, before asking for help but the nature of Windows Search is such that any workable solution would be the one I glom onto.

Again, the criteria is relatively straightforward: I want to be able to search for documents (Word, PowerPoint, PDF, and Excel only for the moment) based on their contents as well as metadata.

I can do that with Windows Search now but it would involve retrieving a result set based on metadata, retrieving a second result set based on contents, then merging the two. Given the size of the repository (about 2600 documents), this is do-able but it's the kind of bastardized union that perpetuates the hillbilly stereotype. And I'm trying to be more PC.

I would consider SharePoint only if someone can convince me it is an elegant solution that specifically meets these requirements. I am also open to a third-party component that indexes files and provides an API. But again, I can do this already with Windows Search. It would have to be something that can combine with SQL Server.

And if I don't get a decent answer, then I'm gonna...well, swear a bit maybe but that's probably it.

Kyle the Unthreatening



Comments

Sean Carpenter said:

I think you should check out Lucene.Net (incubator.apache.org/.../lucene.net.html).  You can index and store the document content as well as the metadata and searches are very fast.  The index file format is also compatible with the Java version, so anything that can read those indexes will be able to use yours as well (something like Luke (http://www.getopt.org/luke/) is very useful).

# July 8, 2008 8:08 AM

Kyle Baley said:

Thanks Sean. That's two endorsements I've had for Lucene.net which is enough to look into it in more detail. The part about having to extract the text from PDF documents and Office documents manually to insert into the index kind of troubles me but I like the idea of having the metadata and the contents in the same place.

# July 8, 2008 1:08 PM

BrianW said:

The technology at the foundation of Microsoft Sharepoint and Windows Search is Filtdump and iFilters. Filtdump is an exe that utilizes iFilters to extract the text from documents (word, excel, pdfs, etc). Best part is that adding support for additional file types is just a matter of finding an iFilter that supports it.

You can either crawl the documents, call filtdump.exe, and insert the output (i.e. the text of the documents) into a SQL Server table or you can use SQL Server's Full Text Index to do all this for you as it also uses iFilters under the hood (www.codeproject.com/.../sqlfulltextindexing.aspx).  

# July 8, 2008 4:01 PM

Kyle Baley said:

Sweet. Thanks! That looks even better assuming I can work out how to get the contents into the index somehow.

# July 8, 2008 4:25 PM

Greg said:

I was gonna say lucene as well but someone already has, consider it another endorsement.

# July 9, 2008 4:46 PM

Ted Jardine said:

Another plug for Lucene.

# July 11, 2008 2:30 AM

MSenn said:

Kyle?

Did you get information more about adding documents to the 'Windows Search' index?

Manfred

# August 1, 2008 2:11 PM

Kyle Baley said:

@MSenn

Not especially. The two questions I asked on the forums remain either unanswered or with "I'd like this too" responses. I have put this on hold for the time being but will report back once it becomes urgent again.

# August 1, 2008 8:54 PM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add
Check out Devlicio.us!

Our Sponsors

Free Tech Publications