
Software Engineer and DWF Technical Evangelist, Archana Naik, submitted the following article.
If you are looking for some specific text in your collection of DWF files, this is the article for you. Lately, you must have noticed that DWF files are being indexed by Google. Google is indexing over 150,000 DWF files on the internet. But what about DWF files on your computer or your company’s private network? Most of the desktop search engines still don’t have the capability of searching DWF files for specific text. So for that purpose I have written a small application which converts a given DWF file into either HTML or ASCII text based on a command line argument provided. The generated output files can be fed into one of those Desktop search engines provided by Yahoo or Google for indexing purposes. After it's done indexing, all of the DWF files can be searchable.
So let’s get down to the details of how you can build and run this sample application. I have named this application DWFStrings. DWFStrings uses the DWF Toolkit to parse DWF files and extracts any piece of text it finds. The application compiles and links against DWFToolkit 7.3 but should be able to be built with prior versions like 7.1 and 7.2 too. It can be used on both Windows and Linux platforms. For Windows, different kinds of build settings are configured in a .vcproj file, and for Linux, there is make file provided in the source tree. Originally, I designed the application to be just an executable, but co-worker and fellow Software Engineer and DWF Technical Evangelist, Gyorgy Ordody, made some improvements, so it can be used as a dll or lib on the Windows platform. Even better, isn’t it?
Let’s get down to the code base and how to build and run the application. Please download the DWFStrings zip file. Unzipping will create a folder called "DWFStrings" which will have build, doc, and src sub-folders. To compile the application, follow these steps.
- Build the toolkit first.
- For Windows:
- Set up two environment variables as shown below:
- DWFTK_SRC_73 = "C:\ DWFToolkit-7.3\develop\global\src" [or wherever your toolkit is installed\develop\global\src"]
- DWFTK_LIB_73 = "C:\ DWFToolkit-7.3\develop\global\lib" [or wherever your toolkit is installed\develop\global\lib"]
- If you are setting these in a command window then open the .vcproj (located in build folder) file from the same window. Choose either Debug (Exe) or Release (Exe) from configuration manager. Build the project.
- For Linux: Run the makefile (located in build\linux folder) using "make."
Compiling on windows will generate DWFStrings.exe in "DWFStrings\bin\DWFStrings\Debug (Exe)\vc8.0" folder.
Usage of the application is shown below:
dwfstrings -f [-h] [-o ] [-r] [-u url] [-fw] [-mvversion] [-mvexitonmin] [-mvnoindexonmin]
-f : Specifies the path of the source file
-h: Generates Html output - if not specified it generates Text file
-o : Specifies the file path of the output file. If the -r switch is used, the images will be generated in the parent folder of the output file.
-r: Images/resources will be extracted
-u : Url of original document. If this switch is set, the url will be displayed on the top of the document
-fw: If the -u and the -h switch is used, the resulting HTML file will contain an IFrame with the Freewheel viewer containing the file.
-mvversion : Min dwf version to convert
-mvexitonmin : Exit if dwf version is not min
-mvnoindexonmin : Set noindex tag if version is not min
As you can see from the usage only option required is "-f".
To generate HTML output you need to use it like this:
dwfstrings -f "C:\test.dwf" -h -o "C:\output\test.html"
To generate text output,
dwfstrings -f "C:\test.dwf" -o "C:\output\test.txt"
If you don’t specify "-o" option then it will generate the resultant file in the same directory as the DWF File.
If you want to extract thumbnails from the dwf file then use "-r" option. Thumbnail images will be extracted in the output folder and will be provided in html page under each sheet title.
The other command-line options are pretty much self explanatory.
Thanks Archana and Gyorgy!