使用 Open XML 操作文檔模板自動生成報表

使用 Open XML 操作文檔模板自動生成報表：如何創(chuàng)建文檔模板，通過編程方式修改模板內(nèi)容，在這里只講怎樣修改文本和圖片。

本文遵循“署名-非商業(yè)用途-保持一致”創(chuàng)作公用協(xié)議

Open XML SDK 是微軟提供的一個用于編輯于操作 MS Office 文檔的類庫，通過該類庫我們可以用編程方式創(chuàng)建，編輯Office 文檔，當(dāng)然這對 Office 版本是有要求的，只支持Office 2007+。

Open XML SDK 下載：點(diǎn)此鏈接

開發(fā)者博客：http://openxmldeveloper.org/

微軟文檔：http://msdn.microsoft.com/zh-cn/library/bb448854.aspx

本文源代碼下載：點(diǎn)此下載

自 Offce 2007開始，微軟使用了新的架構(gòu)來實(shí)現(xiàn) Office 套件，那就是基于 xml。如果我們給一個word 2007 或 word 2010文檔添加.zip后綴，并用解壓縮文件打開，可以看到該文檔包含了一堆 xml 文件。如下圖所示：

上圖就是一個 Word 的構(gòu)成，其中 word 目錄是其關(guān)鍵內(nèi)容部分，word/media 包含該文檔用到的多媒體資源文件，如圖片，聲音等，word/theme 包含對該文檔的主題定義，如字體神馬的，有點(diǎn)象網(wǎng)站的 css 文件，word/document.xml包含了具體的內(nèi)容，如文字內(nèi)容，布局，圖片引用等，是我們研究的重點(diǎn)文檔。下面顯示只包含一行“羅朝輝的blog”的文檔的word/document.xml內(nèi)容：

  <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
- <w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
- <w:body>
- <w:p w:rsidR="00111330" w:rsidRDefault="000D4700">
- <w:r>
- <w:rPr>
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" /> 
  <w:color w:val="000000" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>羅朝輝的</w:t> 
  </w:r>
  <w:proofErr w:type="spellStart" /> 
- <w:r>
- <w:rPr>
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" /> 
  <w:color w:val="000000" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>blog</w:t> 
  </w:r>
- <w:r w:rsidR="00984A94">
- <w:rPr>
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" w:hint="eastAsia" /> 
  <w:color w:val="000000" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>:</w:t> 
  </w:r>
- <w:hyperlink r:id="rId5" w:history="1">
- <w:r w:rsidR="00984A94" w:rsidRPr="00984A94">
- <w:rPr>
  <w:rStyle w:val="a3" /> 
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" w:hint="eastAsia" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>http</w:t> 
  </w:r>
  <w:proofErr w:type="spellEnd" /> 
- <w:r w:rsidR="00984A94" w:rsidRPr="00984A94">
- <w:rPr>
  <w:rStyle w:val="a3" /> 
  <w:rFonts w:ascii="Verdana" w:hAnsi="Verdana" w:hint="eastAsia" /> 
  <w:szCs w:val="21" /> 
  <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" /> 
  </w:rPr>
  <w:t>://kesalin.cnblogs.com</w:t> 
  </w:r>
  </w:hyperlink>
  <w:bookmarkStart w:id="0" w:name="_GoBack" /> 
  <w:bookmarkEnd w:id="0" /> 
  </w:p>
- <w:sectPr w:rsidR="00111330">
  <w:pgSz w:w="11906" w:h="16838" /> 
  <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="851" w:footer="992" w:gutter="0" /> 
  <w:cols w:space="425" /> 
  <w:docGrid w:type="lines" w:linePitch="312" /> 
  </w:sectPr>
  </w:body>
  </w:document>

上面的 xml 看起來很凌亂，如果我們通過 Open XML SDK 工具來查看的話就一目了然了：

從上面我們就可以清晰看出 word 文檔的結(jié)構(gòu)。一個 word文檔包含一個主 document 元素，該 document 又包含 body 元素，body包含paragraph 元素或 table 元素；而 paragraph 元素包含 run 元素，一個 run 元素包含 text 元素；一個 table 元素包含 tableRow元素，tableRow包含 tableCell元素，tableCell 是個容器可以包含 paragraph 或其他運(yùn)行時元素 run等。具體層次結(jié)構(gòu)請參考：控制 Open XML WordprocessingML 文檔中文本

有了這些前奏知識，下面步入正題：如何創(chuàng)建文檔模板，通過編程方式修改模板內(nèi)容，在這里只講怎樣修改文本和圖片。

一，首先，創(chuàng)建文檔模板。

打開 word 2010 or 2007，在文件->選型->自定義功能區(qū)，選擇開發(fā)工具，讓開發(fā)工具在word上面的工具欄上顯示。

然后向文檔中中添加文本和圖片內(nèi)容控件，如下圖所示：

添加方法：選擇一個內(nèi)容控件，然后為內(nèi)容控件添加默認(rèn)的內(nèi)容（文字或圖片），選中內(nèi)容控件，點(diǎn)擊開發(fā)工具->屬性，為該內(nèi)容控件添加標(biāo)題或標(biāo)記（tagID），這一步很重要，這個tagID是唯一標(biāo)識該內(nèi)容控件的，在代碼中我們就是通過該tagID來定位具體內(nèi)容控件的。

最終結(jié)果：（請參考下載文件中的 Template.docx 文件。）

在上面的圖中可以看出我們添加富文本，純文本以及圖片內(nèi)容控件。下面我們使用代碼在代碼中將這些 placeholder 控件的內(nèi)容替換。這是自動生成報表文檔的關(guān)鍵技術(shù)所在。

如果我們打開 document.xml 文件，查看文本內(nèi)容控件部分，就可以清晰地看出內(nèi)容控件的布局：

在上圖可以看到這個文本內(nèi)容控件包含在一個 sdt （Structured Document Tag）元素中，在前面的介紹中，我們知道文本內(nèi)容最終會被包含在一個 Run->Text元素中，因替換操作只需要按照內(nèi)容控件的 tagID 找到該 sdt 元素，將其 Text 元素內(nèi)容替換即可。圖像替換操作也是同樣的處理，只是有一些額外的事情需要注意。內(nèi)容控件都是包含在某個 sdt 元素中的，sdt 元素可能是 SdtBlock, SdtCell, SdtRun等，它們都是 SdtElement的子類。

二，使用 Open XML 打開和關(guān)閉 Word 文檔。

1，Open XML 中用于操作 Word 的類為 WordprocessingDocument，通過該類提供的接口，我們可以方便地打開和關(guān)閉 word 文檔。WordprocessingDocument.Open帶有兩個參數(shù)：一個是文檔路徑，一個用于指示是否是可編輯方式打開。

        /// <summary>
        /// Contains the word processing document
        /// </summary>
        private WordprocessingDocument _wordProcessingDocument;

        /// <summary>
        /// Contains the main document part
        /// </summary>
        private MainDocumentPart _mainDocPart;

        /// <summary>
        /// Open an Word XML document 
        /// </summary>
        /// <param name="docname">name of the document to be opened</param>
        public void OpenDocuemnt(string docname)
        {
            // open the word docx
            _wordProcessingDocument = WordprocessingDocument.Open(docname, true);

            // get the Main Document part
            _mainDocPart = _wordProcessingDocument.MainDocumentPart;
        }

        /// <summary>
        /// Close the document
        /// </summary>
        public void CloseDocument()
        {
            _wordProcessingDocument.Close();
        }

打開文檔之后，我們獲取主 document 部分（即word/document.xml那部分）。

2，下面我們來替換文檔中的文本內(nèi)容控件。讓我們來試驗下TDD流程，首先我們知道具體的內(nèi)容控件的 tagID和想要替換的文字，這兩個就是我們的輸入：

var textDict = new Dictionary<string, string>
                               {
                                   {"TextPlaceholder_01", "SdtBlock替換文本"},
                                   {"PH_Name", "張三"},
                                   {"PH_Age", "18"},
                                   {"PH_Class", "C82"},
                                   {"PH_Grade", "83.0"},
                                   {"PH_SdtRun", "SdtRun替換"},
                               };

然后我們想要調(diào)用一個方法，將模板文檔中所匹配 tagID 的文本內(nèi)容控件的文字替換掉：

        /// <summary>
        /// Updated text placeholders with texts.
        /// </summary>
        /// <param name="tagValueDict">Pair of placeholder tagID and text to replace.</param>
        public void UpdateText(Dictionary<string, string> tagValueDict)
        {
            foreach (var pair in tagValueDict)
            {
                var tagID = pair.Key;
                var value = pair.Value;

                foreach (var sdtElement in _mainDocPart.Document.Body.Descendants<SdtElement>())
                {
                    if (sdtElement.SdtProperties.GetFirstChild<Tag>().Val == tagID)
                    {
                        OpenXmlElement parantElement = sdtElement.Descendants<Paragraph>().SingleOrDefault();
                        if (null == parantElement)
                        {
                            SdtContentRun cr = sdtElement.Descendants<SdtContentRun>().SingleOrDefault();
                            parantElement = cr;
                        }

                        if (null != parantElement)
                        {
                            Run r = parantElement.Descendants<Run>().SingleOrDefault();
                            if (null != r)
                            {
                                Text t = r.Descendants<Text>().SingleOrDefault();
                                if (null != t)
                                {
                                    r.AppendChild(new Text(value));
                                    r.RemoveChild(t);
                                }
                            }

                            break;
                        }
                    }
                }
            }
        }

上面的代碼遍歷 body 元素中所以的 sdt 元素，如果某個 sdt 的tagID與要查找的 tagID相等，則說明找到了相應(yīng)的內(nèi)容控件，然后找到該 sdt 元素下的 Run 元素，將其子元素 Text 用賦予了新內(nèi)容的 Text 替換掉即可。

3，下面來看看如何實(shí)現(xiàn)圖片的替換，還是用TDD流程，首先我們有圖片內(nèi)容控件的tagID 以及圖片資源。

var imageDict = new Dictionary<string, MemoryStream>
                                {
                                    {"PH_ImageInSdtBlock_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtRun", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtBlock_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                };

然后我們想要調(diào)用一個方法，將模板文檔中所匹配 tagID 的圖片內(nèi)容控件的圖片替換掉，先前我們介紹到圖片資源是放在 media目錄下的，Open XML 會對圖片資源進(jìn)行管理，分配給資源一個 rid，然后在其他地方使用該 rid 來引用具體的資源。所以我們需要找到圖片內(nèi)容控件，然后在該控件下找到引用的圖片資源id，根據(jù)跟資源id獲取內(nèi)容控件的相關(guān)信息，如圖片大小等，然后將改資源id 對應(yīng)的資源替換掉。下面來看代碼：

internal static string GetImageRelID<TSdtType>(TSdtType sdt, string imageTag) where TSdtType : SdtElement
        {
            // loop through all tags in the document within the sdt element
            foreach (Tag t in sdt.Descendants<Tag>().ToList())
            {
                // Do we have the correct tag?
                if (t.Val.ToString().ToUpper() == imageTag.ToUpper())
                {
                    // Get the BLIP for the image - there is only one image per placeholder so no need to loop through anything
                    Blip b = sdt.Descendants<Blip>().FirstOrDefault();
                    if (null != b)
                    {
                        // return the image id tag
                        return b.Embed.Value;
                    }
                }
            }

            return string.Empty;
        }

上面的代碼用于在某個 sdt 元素下面查找匹配內(nèi)容控件ID所使用的圖像資源id。然后我們根據(jù)該資源id來獲取placeholder image的大小：

internal static void GetPlaceholderImageSize(IEnumerable<Drawing> drawingList, string relID, out int width, out int height)
        {
            width = -1;
            height = -1;

            // Loop through all Drawing elements in the document
            foreach (Drawing d in drawingList)
            {
                // Loop through all the pictures (Blip) in the document
                if (d.Descendants<Blip>().ToList().Any(b => b.Embed.ToString() == relID))
                {
                    // The document size is in EMU. 1 pixel = 9525 EMU

                    // The size of the image placeholder is located in the EXTENT element
                    Extent e = d.Descendants<Extent>().FirstOrDefault();
                    if (null != e)
                    {
                        width = (int)(e.Cx / 9525);
                        height = (int)(e.Cy / 9525);
                    }

                    if (width == -1)
                    {
                        // The size of the image is located in the EXTENTS element
                        Extents e2 = d.Descendants<Extents>().FirstOrDefault();
                        if (null != e2)
                        {
                            width = (int)(e2.Cx / 9525);
                            height = (int)(e2.Cy / 9525);
                        }
                    }
                }
            }
        }

獲取到大小信息之后，我們就可以使用資源id以及圖像大小信息，替換圖像來替換具體的placeholder圖像了。

        private void UpdateImagePart(string relID, MemoryStream imageStream, int width, int height)
        {
            var originalBitmap = Image.FromStream(imageStream);
            var bitmap = originalBitmap;
　　　　　　　// resize image
            if (width != -1)
            {
                bitmap = new Bitmap(originalBitmap, width, height);
            }

            // Save image data to ImagePart
            var stream = new MemoryStream();
            bitmap.Save(stream, originalBitmap.RawFormat);

            // Get the ImagePart
            var imagePart = (ImagePart)_mainDocPart.GetPartById(relID);

            // Create a writer to the ImagePart
            var writer = new BinaryWriter(imagePart.GetStream());

            // Overwrite the current image in the docx file package
            writer.Write(stream.ToArray());

            // Close the ImagePart
            writer.Close();
        }

最終，我們就得到了更新圖片的接口：

        public void UpdateImage(Dictionary<string, MemoryStream> tagValueDict)
        {
            foreach (var pair in tagValueDict)
            {
                var tagID = pair.Key;
                var imageStream = pair.Value;

                foreach (SdtElement sdtElement in _mainDocPart.Document.Body.Descendants<SdtElement>())
                {
                    string relID = GetImageRelID(sdtElement, tagID);
                    if (!string.IsNullOrEmpty(relID))
                    {
                        // Get size of image
                        int imageWidth;
                        int imageHeight;
                        GetPlaceholderImageSize(_mainDocPart.Document.Body.Descendants<Drawing>(), relID, out imageWidth, out imageHeight);

                        UpdateImagePart(relID, imageStream, imageWidth, imageHeight);

                        break;
                    }
                }
            }
        }

三，測試

寫一個控制臺測試程序，將拷貝模板文檔至輸出文檔，將輸出文檔中的內(nèi)容和圖片替換：

        static void Main()
        {
            const string templateDocx = @"..\..\Template.docx";
            const string outputDocx = @"..\..\Output.docx";

            // copy the word doc so you can see the difference between the two
            File.Delete(outputDocx);
            File.Copy(templateDocx, outputDocx);

            var contentControlManager = new ContentControlManager();
            contentControlManager.OpenDocuemnt(outputDocx);

            var textDict = new Dictionary<string, string>
                               {
                                   {"TextPlaceholder_01", "SdtBlock替換文本"},
                                   {"PH_Name", "張三"},
                                   {"PH_Age", "18"},
                                   {"PH_Class", "C82"},
                                   {"PH_Grade", "83.0"},
                                   {"PH_SdtRun", "SdtRun替換"},
                               };

            contentControlManager.UpdateText(textDict);

            var imageDict = new Dictionary<string, MemoryStream>
                                {
                                    {"PH_ImageInSdtBlock_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_01", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtCell_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtRun", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                    {"PH_ImageInSdtBlock_02", GetStreamFromImage(@"..\..\TestImage.jpg")},
                                };

            contentControlManager.UpdateImage(imageDict);

            contentControlManager.CloseDocument();
        }

打開生成 Output.docx，可以看到內(nèi)容已經(jīng)替換掉了：

源碼下載：點(diǎn)此下載

posted @ 2012-04-18 18:32 飄飄白云閱讀(10633) 評論(7) 收藏舉報

刷新頁面返回頂部

使用 Open XML 操作文檔模板自動生成報表

公告