Rahul Sharma

Solutions Architect - Microsoft Dynamics AX | Azure

Blog
This is a technology blog covering Microsoft Dynamics AX, Microsoft Dynamics CRM, Microsoft Azure, IoT, .Net, SharePoint, SQL Server, SSRS, SSAS, SSIS, Apache Cassandra, MongoDB, and related technologies. Join this blog on facebook {Rahul Sharma}, to start a discussion !!! NOTE: My employer is not responsible for the content published here.

Index | MS Dynamics AX | MS Dynamics CRM
View blog as >> Magazine | Sidebar | Flipcard | Mosaic | Snapshot | Timeslide

Office Open XML and Dynamics Ax

Ever wondered what's special about new Microsoft Office file formats, why there is X in all office 2007 or later version file extensions, like docX, xlsX, pptX etc.?

Yes, this is a file format specification from Microsoft and this Office Open XML File Format specification is an open, international, ECMA-376, Second Edition and ISO/IEC 29500 standard. That means, now office document formats are not something that is hidden from the developers or proprietary to Microsoft, its open for all now. If you understand this, then it means you can read, write or modify existing office documents without even installing MS Office on your computer.

The purpose of the Open XML standard is to de-couple documents created by Microsoft Office applications so that they can be manipulated by other applications independent of proprietary formats and without the loss of data.

How it works?

Office Open XML = ZIP compressed package + XML
This is a zip-based file format that defines office documents in multiple XML files and packages them all together in a compressed ZIP file. An open xml document is created with multiple document parts with xml markups. As this is a plain xml, you can view the content using any text editor and modify in same way.

Need a proof?
  • Create a word document and type Hello World and save. Let's, bold and italic first letters of these words as well.
  • Now save this document and change it's extension from .docx to .zip. 
  • Extract this zip file and see what it contains. There are few folders and xml files in it.

These folders and xml contain everything about an office file. You can use any programming language to read or write these xml files. Just for the info, main document content is in the folder word and file document.xml.

To understand this xml, you can use microsoft xml tool specifically created for reading open xml files.
Open XML SDK 2.0 Productivity Tool for Microsoft Office
Below is the sample of our Hello World document.xml file:

   1:  <w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
   2:              xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
   3:              xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml"
   4:              xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
   5:              xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
   6:              xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
   7:              xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
   8:              xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
   9:    <w:body>
  10:      <w:p w:rsidR="006D0D97" w:rsidRDefault="00334C80" w14:paraId="2DC4F30E" w14:textId="77777777">
  11:        <w:r w:rsidRPr="00334C80">
  12:          <w:rPr>
  13:            <w:b />
  14:            <w:i />
  15:          </w:rPr>
  16:          <w:t>H</w:t>
  17:        </w:r>
  18:        <w:r>
  19:          <w:t xml:space="preserve">ello </w:t>
  20:        </w:r>
  21:        <w:r w:rsidRPr="00334C80">
  22:          <w:rPr>
  23:            <w:b />
  24:            <w:i />
  25:          </w:rPr>
  26:          <w:t>W</w:t>
  27:        </w:r>
  28:        <w:r>
  29:          <w:t>orl</w:t>
  30:        </w:r>
  31:        <w:bookmarkStart w:name="_GoBack" w:id="0" />
  32:        <w:bookmarkEnd w:id="0" />
  33:        <w:r>
  34:          <w:t>d</w:t>
  35:        </w:r>
  36:      </w:p>
  37:      <w:sectPr w:rsidR="006D0D97">
  38:        <w:pgSz w:w="12240" w:h="15840" />
  39:        <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0" />
  40:        <w:cols w:space="720" />
  41:        <w:docGrid w:linePitch="360" />
  42:      </w:sectPr>
  43:    </w:body>
  44:  </w:document>

To understand this xml better, modify your word document or modify this document.xml and see what comes out. But to get you started here is the quick brief explanation of some tags:
  • Root tag is <w:document>. 'w:' is the xml namespace defined in the xmlns attribute of this tag.
  • Content always goes into a specific hierarchy of tags. Paragraph --> Run --> text.

    <w:p>
      <w:r>
        <w:t>your content.</w:t>
      </w:r>
    </w:p>
  • These attributes has some properties as well and that is self explanatory. Check how this xml defined our Hello World document content. Check what it does to make it bold or italic.
    Note: An interesting thing that you should know, if you are inserting a space before or after your content then pay attention to Space=preserver property of Text node.

    <w:t xml:space="preserve">ello </w:t>
    If you don't specify that space should be preserved, you wont see this space in your document. Try to play with this xml in notepad and open you word document to see your changes.
If you don't want to play or feel lazy dealing directly with this xml, then here comes a special SDK for Open XML from Microsoft, called Open XML SDK 2.0 for Microsoft Office. This provides a set of strongly typed .Net classes and a XML tool for use with open xml.

After installing this SDK, create a .Net project and add reference to DocumentFormat.OpenXml and WindowsBase to start playing with this new amazing developer friendly document format.

To create same above document.xml using c# code use below code:

   1:  using DocumentFormat.OpenXml.Wordprocessing;
   2:  using DocumentFormat.OpenXml;
   3:   
   4:  namespace GeneratedCode
   5:  {
   6:      public class GeneratedClass
   7:      {
   8:          // Creates an Document instance and adds its children.
   9:          public Document GenerateDocument()
  10:          {
  11:              Document document1 = new Document() { MCAttributes = new MarkupCompatibilityAttributes() { Ignorable = "w14 wp14" } };
  12:              document1.AddNamespaceDeclaration("wpc", "http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas");
  13:              document1.AddNamespaceDeclaration("mc", "http://schemas.openxmlformats.org/markup-compatibility/2006");
  14:              document1.AddNamespaceDeclaration("o", "urn:schemas-microsoft-com:office:office");
  15:              document1.AddNamespaceDeclaration("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
  16:              document1.AddNamespaceDeclaration("m", "http://schemas.openxmlformats.org/officeDocument/2006/math");
  17:              document1.AddNamespaceDeclaration("v", "urn:schemas-microsoft-com:vml");
  18:              document1.AddNamespaceDeclaration("wp14", "http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing");
  19:              document1.AddNamespaceDeclaration("wp", "http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing");
  20:              document1.AddNamespaceDeclaration("w10", "urn:schemas-microsoft-com:office:word");
  21:              document1.AddNamespaceDeclaration("w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
  22:              document1.AddNamespaceDeclaration("w14", "http://schemas.microsoft.com/office/word/2010/wordml");
  23:              document1.AddNamespaceDeclaration("wpg", "http://schemas.microsoft.com/office/word/2010/wordprocessingGroup");
  24:              document1.AddNamespaceDeclaration("wpi", "http://schemas.microsoft.com/office/word/2010/wordprocessingInk");
  25:              document1.AddNamespaceDeclaration("wne", "http://schemas.microsoft.com/office/word/2006/wordml");
  26:              document1.AddNamespaceDeclaration("wps", "http://schemas.microsoft.com/office/word/2010/wordprocessingShape");
  27:   
  28:              Body body1 = new Body();
  29:   
  30:              Paragraph paragraph1 = new Paragraph() { RsidParagraphAddition = "006D0D97", RsidRunAdditionDefault = "00334C80", 
ParagraphId = "2DC4F30E", TextId = "77777777" };
  31:   
  32:              Run run1 = new Run() { RsidRunProperties = "00334C80" };
  33:   
  34:              RunProperties runProperties1 = new RunProperties();
  35:              Bold bold1 = new Bold();
  36:              Italic italic1 = new Italic();
  37:   
  38:              runProperties1.Append(bold1);
  39:              runProperties1.Append(italic1);
  40:              Text text1 = new Text();
  41:              text1.Text = "H";
  42:   
  43:              run1.Append(runProperties1);
  44:              run1.Append(text1);
  45:   
  46:              Run run2 = new Run();
  47:              Text text2 = new Text() { Space = SpaceProcessingModeValues.Preserve };
  48:              text2.Text = "ello ";
  49:   
  50:              run2.Append(text2);
  51:   
  52:              Run run3 = new Run() { RsidRunProperties = "00334C80" };
  53:   
  54:              RunProperties runProperties2 = new RunProperties();
  55:              Bold bold2 = new Bold();
  56:              Italic italic2 = new Italic();
  57:   
  58:              runProperties2.Append(bold2);
  59:              runProperties2.Append(italic2);
  60:              Text text3 = new Text();
  61:              text3.Text = "W";
  62:   
  63:              run3.Append(runProperties2);
  64:              run3.Append(text3);
  65:   
  66:              Run run4 = new Run();
  67:              Text text4 = new Text();
  68:              text4.Text = "orl";
  69:   
  70:              run4.Append(text4);
  71:              BookmarkStart bookmarkStart1 = new BookmarkStart() { Name = "_GoBack", Id = "0" };
  72:              BookmarkEnd bookmarkEnd1 = new BookmarkEnd() { Id = "0" };
  73:   
  74:              Run run5 = new Run();
  75:              Text text5 = new Text();
  76:              text5.Text = "d";
  77:   
  78:              run5.Append(text5);
  79:   
  80:              paragraph1.Append(run1);
  81:              paragraph1.Append(run2);
  82:              paragraph1.Append(run3);
  83:              paragraph1.Append(run4);
  84:              paragraph1.Append(bookmarkStart1);
  85:              paragraph1.Append(bookmarkEnd1);
  86:              paragraph1.Append(run5);
  87:   
  88:              SectionProperties sectionProperties1 = new SectionProperties() { RsidR = "006D0D97" };
  89:              PageSize pageSize1 = new PageSize() { Width = (UInt32Value)12240U, Height = (UInt32Value)15840U };
  90:              PageMargin pageMargin1 = new PageMargin() { Top = 1440, Right = (UInt32Value)1440U, Bottom = 1440, Left = (UInt32Value)1440U, 
Header = (UInt32Value)720U, Footer = (UInt32Value)720U, Gutter = (UInt32Value)0U };
  91:              Columns columns1 = new Columns() { Space = "720" };
  92:              DocGrid docGrid1 = new DocGrid() { LinePitch = 360 };
  93:   
  94:              sectionProperties1.Append(pageSize1);
  95:              sectionProperties1.Append(pageMargin1);
  96:              sectionProperties1.Append(columns1);
  97:              sectionProperties1.Append(docGrid1);
  98:   
  99:              body1.Append(paragraph1);
 100:              body1.Append(sectionProperties1);
 101:   
 102:              document1.Append(body1);
 103:              return document1;
 104:          }
 105:      }
 106:  }

Hierarchy is same Document -- Body -- Paragraph -- Run -- Text.

Here is a real world example that uses the Open XML SDK and XML parser classes to modify word document. In this example, we will find all the bookmarks in the word document using XPath and will modify the text nodes falling between the bookmarks. And after our modification, we will save the xml back to the zip package. This will create a smart and fast word mail merge application for our learning. Though the example itself is in C#, but this should give you an idea to use it with any other language as you can always modify XML files using base XML classes of any modern programming language.

NOTE: Dynamics AX developers can utilize the rich XML* system X++ classes to modify open xml directly, OR create a .Net assembly using open xml sdk and add it as a reference in AOT. Though Ax has all of the classes mentioned in this C# example, so this should not be a problem converting this example to X++.

   1:      //xml w namespace
   2:      string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
   3:   
   4:      //Open the document as an Open XML package and extract the main document part.
   5:      WordprocessingDocument wordPackage = WordprocessingDocument.Open(currentFile, true);
   6:      MainDocumentPart part = wordPackage.MainDocumentPart;
   7:   
   8:      //Setup the namespace manager so you can perform XPath queries 
   9:      //to search for bookmarks in the part.
  10:      NameTable nt = new NameTable();
  11:      XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
  12:      nsManager.AddNamespace("w", wordmlNamespace);
  13:   
  14:      //Load the part's XML into an XmlDocument instance.
  15:      XmlDocument xmlDoc = new XmlDocument(nt);
  16:      xmlDoc.Load(part.GetStream());
  17:   
  18:      //Find number of bookmarks in the document
  19:      int numOfBookmarks = xmlDoc.DocumentElement.SelectNodes("//w:bookmarkStart", nsManager).Count;
  20:   
  21:      //Iterate through the bookmarks.
  22:      for (int i = 0; i <= numOfBookmarks; i++)
  23:      {
  24:          bool firstTextNodeFound = false;
  25:   
  26:          //"i" is the index number of the bookmarks and the loop counter. 
  27:          //You use it to retrieve the bookmark by ID.
  28:          XmlElement bookmarkStartNode = (XmlElement)xmlDoc.
  29:              DocumentElement.SelectSingleNode("//w:bookmarkStart[@w:id='" + i + "']", nsManager);
  30:          
  31:          //Get the beginning and end bookmark nodes as well as the text node for that ID.
  32:          XmlNodeList followingNodesList = bookmarkStartNode.
  33:              SelectNodes(".//following::w:t | .//following::w:bookmarkEnd[@w:id='" + i + "']", nsManager);
  34:   
  35:          foreach (XmlElement el in followingNodesList)
  36:          {
  37:              //play with the nodes that fall between your bookmark
  38:          }
  39:   
  40:      }
  41:   
  42:      //Write the changes back to the document part.
  43:      xmlDoc.Save(wordPackage.MainDocumentPart.GetStream(FileMode.Create));
  44:      wordPackage.Close();


Please join this blog if you liked this post.
Also feel free to post your comment / feedback / queries.
Comments
2 Comments