The Office Open XML format introduced with MS Office 2007, is essentially composed of XML files stored inside a ZIP container.
When an OOXML file (like a .docx file) is protected with a password for reading, it is encrypted. The encrypted OOXML file is stored inside a Compound File Binary Format file, or what I like to call an OLE file. This is the “old” MS Office file format (like .doc), the default file format used before MS Office 2007.
This is how an encrypted .docx file looks like, when analyzed with oledump:
Stream EncryptedPackage contains the encrypted document, and stream EncryptionInfo contains information necessary to help with the decryption of stream EncryptedPackage.
The structure of stream EncryptedPackage is simple:
First there’s an integer with the size of the encrypted document, followed by the encrypted document. If we decode the binary data for the integer with format-bytes.py, we get the size 11841:
The EncryptionInfo stream starts with binary data, the version format, and is then followed by more binary data, or XML data, depending on the version:
The first bytes specify the major and minor version used for the EncryptionInfo stream. This example is mostly XML:
Which can be further parsed with xmldump.py:
To help identifying what version is used, I developed an oledump plugin named plugin_office_crypto:
Depending on the version, different tools can be used to decrypt office documents.
Python program msoffcrypto-tool can only decrypt agile encryption (for the moment, it’s a work in progress).
C program msoffice-crypt can decrypt standard, extended and agile encryption.
Sometimes, malicious documents will be encrypted to try to avoid detection. The victim will have to enter the password to open the document. There is one exception though: Excel documents encrypted with password VelvetSweatshop.