by James Crowe
19. August 2008 14:32
Searching the contents PDF files is a common requirement that seems to cause some confusion regarding how best to implement it.
A simple and flexible solution is using an
Adobe PDF IFilter.
IFilters are a Microsoft specification that scans a document for its text and properties, allowing
Microsoft’s Indexing Service to extract portions of data.
Step 1 – Install the IFilter downloaded from Adobe. Accept all default steps.
Step 2 – Configure the Indexing service.
Basic steps include:
- Open up computer management > Services and Application > Indexing Service
- Create a new Catalog, Name and Folder location for Catalog
- Expand the new catalog and add the directory containing the PDF’s to be indexed
- Start the indexing service
- Under the directories section right click the directory containing the PDF’s and select Rescan (full)
- Select ‘Query the catalog’, from this page you should be able to search for text contained in the PDF files. For more specific results you can query the metadata contained in PDF files. Standard attributes include Title, Subject, Author and Keywords. For further details refer to the Adobe PDF IFilter installation readme file.
- If the query’s fail to return results, try restarting the server and re-indexing the PDF directory. For further details refer to the Adobe PDF IFilter installation readme file.
Step 3 – Add a reference to the ixsso Control Library

Step 4 – Write .NET code to query the indexing server
// Indexing Service Librarys
CissoQueryClass query = new CissoQueryClass();
CissoUtilClass util = new CissoUtilClass();
OleDbDataAdapter dataAdaptor = new OleDbDataAdapter();
DataSet resultsDs = new DataSet("IndexServerResults");
string pdfFolder = @"C:\PDFS\";
// Search query
query.Query = txbSearchValue.Text;
// Catalog Name
query.Catalog = "PDFSearch";
// Columns to return
query.Columns = "Filename, Path, Size";
// Adds search path to query
// 'deep' will search subdirectories
// Or replace with 'shallow' to search specified folder only
util.AddScopeToQuery(query, pdfFolder, "deep");
// Create recordset
object recordSet = query.CreateRecordset("nonsequential");
// Populate dateset
dataAdaptor.Fill(resultsDs, recordSet, "IndexServerResults");
// Bind results to gridview
grdSearchResults.DataSource = resultsDs;
grdSearchResults.DataBind();
Search Results

The code sample above relates to PDF files because of the type of IFilter used. If others IFilters are installed other types of document can easily be indexed.
Further Reading
Microsoft IFilters
Abode PDF IFilter
738a98c6-230d-4331-9236-34a101c92345|0|.0
Tags: