MS Office (COM, OOXML)
The MS Office plug-in reads and writes metadata items from a large number of files that have a COM-based structure (older MS Office files, default format up to Office 2003) as well as all OpenXML formats. It consists of three files: ScavPISC.COM.dll, ScavPISC.zip.dll and ScavPI.MSOffice.dll.
| File Extensions | COM | OpenXML | Description |
| DOC, XLS, PPT, PPS, PUB, MSP |
Read & Write | (n/a) | Legacy MS Office applications files and templates (Word, Excel, PowerPoint 97 to 2003); most MS Project files, most MS Publisher files, some MS Works files |
| DOCX, XLSX, PPTX |
(n/a) | Read & Write |
Current MS Office file format (2007 and newer) for Word, Excel, PowerPoint |
RawSourceElements in Office 97-2003 (COM-based)
The plug-in can process these common Summary items that are common across all document types (text, spreadsheet, presentation, etc.,):
| RSE DescriptorPath | Type | Description |
| |COM|Summary|Title| |
String |
The descriptive title of the document. |
| |COM|Summary|Subject| | String | The subject for this document. |
| |COM|Summary|Author| | String |
The main Author of the document. |
| |COM|Summary|Keywords| | String |
Keywords that characterize the document. |
| |COM|Summary|Comments| | String | Comments to this document. |
| |COM|Summary|Template| | String (read-only) | |
| |COM|Summary|LastEditAuthor| | String | The editor that last saved this document. |
| |COM|Summary|RevisionN| | String | |
| |COM|Summary|LastPrintedDate| |COM|Summary|LastSavedDate| |COM|Summary|CreationDate| |
DateTime (read-only) |
Date and time when this document was created, last printed, saved. |
| |COM|Summary|TotalEditTime| | DateTime (read-only) | Total editing time. |
| |COM|Summary|PagesCount| |COM|Summary|WordsCount| |COM|Summary|CharsCount| |
Numeric (read-only) |
Number of pages, words and characters in the document. |
| |COM|Summary|Thumbnail| | Thumbail (read-only) |
Preview of the first page of the document. |
| |COM|Summary|AppName| | String | Name of the application that created the document. |
In addition, the following documents-specific items are supported:
| RSE DescriptorPath | Type | Description |
| |COM|DocSummary|Category| |
String |
The descriptive title of the document. |
| |COM|DocSummary|Company| | String | Company name. |
| |COM|DocSummary|Manager| | String | Manager of the project. |
| |COM|DocSummary|Bytes| |COM|DocSummary|Lines| |COM|DocSummary|Paragraphs| |COM|DocSummary|Slides| |
Numeric (read-only) | Number of lines, slides, paragraphs, byte size. |
| |COM|DocSummary|HiddenSlides| | Numeric (read-only) | Number of slides that are hidden. |
| |COM|DocSummary|MMClips| | Numeric (read-only) | Number of sound or video clips. |
| |COM|DocSummary|ScaleCrop| | Boolean (read-only) | Set to "true" if scaling of the thumbnails is desired.If not set, cropping is desired. |
| |COM|DocSummary|PresentationTarget| | String | The subject for this document. |
| |COM|DocSummary|Notes| | Boolean (read-only) | Number of pages that contain notes. |
| |COM|DocSummary|LinksUptoDate| | Boolean (read-only) | Indicates if the custom links are hampered by excessive noise, for all applications. |
Note than not all file formats include all documents-specific items above; some are only used in Word, others only used in PowerPoint or Excel. The plugin also extracts so called TitlesofParts and HeadingPairs as well as custom-defined items (used heavily by Microsoft Project).
RawSourceElements in Office 2007/2010 (OpenXML)
Excel 2007, Word 2007 and PowerPoint 2007 (and newer versions) now use the OpenXML format, which has its own set of metadata items. The plug-in can process the following items:
| RSE DescriptorPath | Type | Description |
| |OpenXML|Core|Title| |
String |
The descriptive title of the document. |
| |OpenXML|Core|Creator| | String | |
| |OpenXML|Core|Description| | String | |
| |OpenXML|Core|Created| | DateTime (read-only) | |
| |OpenXML|Core|ContentStatus | String |
The status of the document (i.e., Draft, Reviewed, Final) |
| |OpenXML|Core|Keywords| | String | |
| |OpenXML|Core|Subject| | String | |
| |OpenXML|Core|LastModifiedBy| | String | |
| |OpenXML|Core|Modified| | DateTime (read-only) | |
| |OpenXML|Core|Category| | String | |
| |OpenXML|Core|Revision| | Numeric | |
| |OpenXML|Core|LastPrinted| | DateTime (read-only) | |
| |OpenXML|Core|Thumbnail| | Thumbnail (read-only) | |
| |OpenXML|App|Application| | String | |
| |OpenXML|App|AppVersion| | String | Specified the version of the application which created the document file. |
| |OpenXML|App|Template| | String (read-only) | |
| |OpenXML|App|Company| | String | |
|
|OpenXML|App|Pages| |
Numeric (read-only) | |
| |OpenXML|App|DocSecurity| | Numeric (read-only) | Specifies the security level of a document as a simple numeric value. |
| |OpenXML|App|ScaleCrop| | Boolean | |
| |OpenXML|App|HyperlinksChanged| |OpenXML|App|LinksUpToDate| |
Boolean (read-only) | |
| |OpenXML|App|SharedDoc| | Boolean (read-only) | Indicates if this document is currently shared between multiple producers. If set to TRUE, producers should take care when updating the document. |
| |OpenXML|App|TotalTime| | Numeric (read-only) | |
| |OpenXML|App|Manager| | String | |
| |OpenXML|App|HyperlinkBase| | String | |
| |OpenXML|App|PresentationFormat| | String | |
| |OpenXML|App|Slides| |OpenXML|App|Notes| |OpenXML|App|HiddenSlides| |OpenXML|App|MMClips| |
Numeric (read-only) |
Custom-defined OpenXML items are also extracted. Note than not all file formats include all items above; some are only used in Word, others only used in PowerPoint or Excel. More details can be found on the OOXML Wikipedia page.
Synchronization of COM and OpenXML
MS Office 2007/2010 applications (Word, Excel, PowerPoint) use a new file format structure called OpenXMP that also has an extended metadata schema. Corresponding metadata items in COM (MS Office 97-2003) and OpenXML may be mapped to an single InfoElement for consistency.
Metadata Scrubbing
Scrubbing metadata is not yet implemented. Most metadata items in COM and OOXML files cannot be completely removed, they can only be overwritten with an empty value. In addition, older values of metadata fields may still be stored within the file structure. Other information (prior edits, additional internal metadata, metadata created by application plug-ins) inside the file will not be removed or overwritten.
