LINQ之路20：LINQ to XML之Documents、Declarations和Namespaces

本篇我們會(huì)討論與XML文檔相關(guān)的另外幾個(gè)重要概念：Documents、Declarations和Namespaces。一個(gè)XDocument封裝了根元素并且允許我們添加XDeclaration, processing instructions, 文檔類型和其他根級類型對象；一個(gè)標(biāo)準(zhǔn)的XML文件總是從一個(gè)declaration（聲明）開始，它的作用是確保文件會(huì)被文件讀取器正確的讀取和理解；就像.NET類型可以有namespaces（命名空間）一樣，XML元素和屬性也可以有namespaces，用于對XML文檔進(jìn)行更有效的管理。

XDocument

我們前面已經(jīng)說過，一個(gè)XDocument封裝了根元素并且允許我們添加XDeclaration, processing instructions, 文檔類型和其他根級類型對象。和W3C DOM不同的是，對于LINQ to XML中的X-DOM來講，一個(gè)XDocument對象是可選的，X-DOM并不需要它來把所有對象聯(lián)系起來。

XDocument提供了和XElement一致的函數(shù)式構(gòu)造器。并且由于XDocument是從Container繼承而來，所以它也支持AddXXX、RemoveXXX、和ReplaceXXX方法。但和XElement不同的是，XDocument可以接受的內(nèi)容是有限制的，它可以接受如下參數(shù)：

一個(gè)XElement對象（根元素）
一個(gè)XDeclaration對象
一個(gè)XDocumentType對象
任意數(shù)量的XProcessingInstruction對象
任意數(shù)量的XComment對象

要?jiǎng)?chuàng)建一個(gè)有效的XDocument，只有根元素是必須的。XDeclaration是可選的，如果它被省略，序列化時(shí)將會(huì)應(yīng)用默認(rèn)的設(shè)置。

一個(gè)最簡單的XDocument僅有一個(gè)根元素：

            var doc = new XDocument(
                          new XElement("test", "data")
                      );

注意我們并沒有包含XDeclaration對象，但是調(diào)用doc.Save方法生成的文件中還是會(huì)包含XML declaration信息，它會(huì)使用默認(rèn)設(shè)置自動(dòng)生成。

下面的示例用來生成一個(gè)簡單但完全正確的XHTML文件，它很好的演示了XDocument可以接受的各種構(gòu)造參數(shù)：

            var styleInstruction = new XProcessingInstruction(
                "xml-stylesheet", "href='styles.css' type='text/css'");
 
            var docType = new XDocumentType("html",
                "-//W3C//DTD XHTML 1.0 Strict//EN",
                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd", null);
 
            XNamespace ns = "http://www.w3.org/1999/xhtml";
 
            var root = new XElement(ns + "html",
                            new XElement(ns + "head",
                                new XElement(ns + "title", "An XHTML page")),
                            new XElement(ns + "body",
                                new XElement(ns + "p", "This is the content"))
                       );
 
            var doc = new XDocument(
                          new XDeclaration("1.0", "utf-8", "no"),
                          new XComment("Reference a stylesheet"),
                          styleInstruction,
                          docType,
                          root);

            doc.Save("D:\\test.html");

test.html文件的最終結(jié)果如下：

<?xml version="1.0" encoding="utf-8" standalone="no"?>

<!--Reference a stylesheet-->
<?xml-stylesheet href='styles.css' type='text/css'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
       <head>
              <title>An XHTML page</title>
       </head>
       <body>
              <p>This is the content</p>
       </body>
</html>

XDocument的Root屬性可以用來快速的存取其唯一的XElement元素。而XObject的Document屬性則可以快速訪問所在的XDocument對象，X-DOM tree里面的所有對象都繼承了該屬性：

            Console.WriteLine(doc.Root.Name.LocalName);         // html
            XElement bodyNode = doc.Root.Element(ns + "body");
            Console.WriteLine(bodyNode.Document == doc);        // True

再次強(qiáng)調(diào)一下，document對象的子節(jié)點(diǎn)并沒有Parent對象：

            Console.WriteLine(doc.Root.Parent == null); // True
            foreach (XNode node in doc.Nodes())
                Console.Write(node.Parent == null); // TrueTrueTrueTrue

XML Declarations

一個(gè)標(biāo)準(zhǔn)的XML文件總是從一個(gè)declaration（聲明）開始，如下所示：

        <?xml version="1.0" encoding="utf-8" standalone="no"?>

XML declaration的作用是確保文件會(huì)被文件讀取器正確的讀取和理解。XElement和 XDocument會(huì)遵循以下規(guī)則來處理XML declarations：

提供一個(gè)文件名來調(diào)用Save方法時(shí)，總是寫入declaration。
提供XmlWriter來調(diào)用Save方法時(shí)，寫入declaration，除非XmlWriter作了專門的指定。
ToString方法從不產(chǎn)生XML declaration。

在構(gòu)造XmlWriter對象時(shí)，我們可以通過設(shè)置OmitXmlDeclaration和ConformanceLevel屬性來指示XmlWriter不要產(chǎn)生declaration。

XDeclaration對象的存在與否并不影響是否寫入XML declaration。換句話說：即使沒有XDeclaration對象，Save方法也會(huì)寫入默認(rèn)的declaration；相反，即使存在XDeclaration對象，ToString()方法也不會(huì)產(chǎn)生declaration。它的目的是告知XML serialization下面的信息：

使用什么字符編碼（text encoding）
如何設(shè)置XML declaration的encoding和standalone屬性

XDeclaration的構(gòu)造函數(shù)接受3個(gè)參數(shù)：version, encoding, 和standalone屬性。下面的例子中，test.xml擁有UTF-16編碼：

            var doc = new XDocument(
                          new XDeclaration("1.0", "utf-16", "yes"),
                          new XElement("test", "data")
                      );
            doc.Save("test.xml");

實(shí)際上，XML writer會(huì)忽略第一個(gè)參數(shù)version值，而總是寫入"1.0"。

Names和Namespaces

就像.NET類型可以有namespaces（命名空間）一樣，XML元素和屬性也可以有namespaces。雖然我們在一般的小項(xiàng)目中可能不會(huì)用到namespaces，但是在產(chǎn)品級的軟件中，我們就必須使用它來對各種XML文檔進(jìn)行管理了。

XML namespaces有兩個(gè)作用，首先，和C#的namespaces一樣，它可以防止命名沖突。在我們把一個(gè)XML文件中的數(shù)據(jù)合并到另外一個(gè)XML文件時(shí)可能會(huì)出現(xiàn)。其次，它賦予一個(gè)Name（名稱）以特別的含義。比如，名稱”nil”可以表示任何東西。但是在http://www.w3.org/2001/xmlschema-instance命名空間中，“nil”表示了C#中的null語義。

因?yàn)閄ML命名空間非常容易引起混淆，我們將先行介紹該主題的普通含義，然后再討論LINQ to XML對他們的使用方式。

XML中的Namespaces

假設(shè)我們希望在CNBlogs.LINQ命名空間中定義一個(gè)customer元素。那么有兩種方法可以選擇。第一種是使用xmlns屬性，如下所示：

        <customer xmlns="CNBlogs.LINQ"/>

Xmlns是一個(gè)特殊的保留屬性。上例中的xmlns有兩個(gè)作用：

它為當(dāng)前元素指定了一個(gè)命名空間。
它為所有的后代節(jié)點(diǎn)指定了默認(rèn)的命名空間。

這意味著下面的示例中，address和postcode會(huì)隱式位于CNBlogs.LINQ命名空間之中：

<customer xmlns="CNBlogs.LINQ">
  <address>
    <postcode>02138</postcode>
  </address>
</customer>

如果我們希望address和postcode不使用命名空間，我們需要明確指定，如下所示：

<customer xmlns="CNBlogs.LINQ">
  <address xmlns="">
    <postcode>02138</postcode><!-- postcode 現(xiàn)在繼承空的命名空間 -->
  </address>
</customer>

前綴（Prefix）

指定命名空間的另一種方式是使用前綴。前綴是我們?yōu)槊臻g指定的一個(gè)別名，目的是為了節(jié)省輸入時(shí)間。

使用前綴分兩步：定義前綴和使用前綴。我們可以按如下方式同時(shí)完成這兩步操作：

    <nut:customer xmlns:nut="CNBlogs.LINQ"/>

在上面的代碼中會(huì)發(fā)生兩件特別的事情。后面的xmlns:nut="..."定義了一個(gè)名為nut的前綴，這樣該元素和所有的后代節(jié)點(diǎn)都可以使用該前綴了。前面的nut:customer把新創(chuàng)建的前綴賦給了customer元素。

和xmlns不同的是，帶前綴的元素并沒有為后代節(jié)點(diǎn)指定默認(rèn)的命名空間，即前綴只作用于當(dāng)前元素。下面的XML中，firstname的命名空間為空：

    <nut:customer xmlns:nut="CNBlogs.LINQ">
      <firstname>Joe</firstname>
    </customer>

如要為firstname也指定CNBlogs.LINQ命名空間，我們得進(jìn)行如下改寫：

    <nut:customer xmlns:nut="OReilly.Nutshell.CSharp">
      <nut:firstname>Joe</firstname>
    </customer>

當(dāng)然，我們也可以為了后代節(jié)點(diǎn)定義一個(gè)或多個(gè)前綴，而不必在當(dāng)前節(jié)點(diǎn)中使用它。下面的XML定義了兩個(gè)前綴：i和z，但當(dāng)前的customer元素命名空間保持為空：

    <customer xmlns:i="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:z="http://schemas.microsoft.com/2003/10/Serialization/">
      ...
    </customer>

如果customer是根節(jié)點(diǎn)，那么整個(gè)document都將擁有i和z命名空間。

當(dāng)我們需要從多個(gè)命名空間中獲取元素時(shí)，使用前綴就會(huì)非常方便，我們可以為每個(gè)命名空間定義一個(gè)前綴。

在X-DOM中指定Namespaces

到目前為止，我們只為XElement和XAttribute使用了簡單的名字（一個(gè)簡單的字符串）。它表示該XML name位于空的命名空間中，就像.NET全局命名空間中的類型一樣。

有數(shù)種方式可以在X-DOM中指定命名空間。第一種是在名稱之前的大括號中指定，如下所示：

            var e = new XElement ("{http://www.rzrgm.cn/xmlspace}customer", "LifePoem");
            Console.WriteLine (e.ToString());
 
            //產(chǎn)生的XML如下:
            <customer xmlns="http://www.rzrgm.cn/xmlspace">LifePoem</customer>

第二張（更有效率）的方式是使用XNamespace和XName類型。下面是他們的定義：

    public sealed class XNamespace
    {
        public string NamespaceName { get; }
    }
    public sealed class XName // 一個(gè)本地名稱和一個(gè)可選的namespace
    {
        public string LocalName { get; }
        public XNamespace Namespace { get; } // 可選
    }

這兩個(gè)類型都定義了來自string的隱式轉(zhuǎn)換，所以下面的代碼是正確的：

            XNamespace ns = "http://www.rzrgm.cn/xmlspace";
            XName localName = "customer";
            XName fullName = "{http://www.rzrgm.cn/xmlspace}customer";

XName還重載了+運(yùn)算符，這允許我們不使用大括號來組合namespace和name：

            XNamespace ns = "http://www.rzrgm.cn/xmlspace";
            XName fullName = ns + "customer";
            Console.WriteLine(fullName); // {http://www.rzrgm.cn/xmlspace}customer

實(shí)際上，X-DOM中所有接受元素或?qū)傩悦Q的的構(gòu)造函數(shù)和方法，其參數(shù)類型為XName而不是string。在我們前面的例子中可以使用string的原因是隱式轉(zhuǎn)換。

不管是元素還是屬性，指定命名空間的方式都是一樣的：

            XNamespace ns = "http://domain.com/xmlspace";
            var data = new XElement(ns + "data",
                new XAttribute(ns + "id", 123)
            );

X-DOM和默認(rèn)命名空間

在X-DOM的創(chuàng)建過程中它會(huì)忽略默認(rèn)命名空間的概念，直到真正輸出XML時(shí)才會(huì)使用默認(rèn)命名空間。這意味著，我們在創(chuàng)建子元素時(shí)，我們必須顯示指定必須的命名空間，它不會(huì)從父節(jié)點(diǎn)繼承：

            XNamespace ns = "http://www.rzrgm.cn/xmlspace";
            var data = new XElement(ns + "data",
                           new XElement(ns + "customer", "Bloggs"),
                           new XElement(ns + "purchase", "Bicycle")
                       );
            Console.WriteLine(data.ToString());

但是，當(dāng)X-DOM讀取或輸出XML時(shí)，它會(huì)應(yīng)用默認(rèn)的命名空間，所以上面的代碼輸出如下：

    <data xmlns="http://www.rzrgm.cn/xmlspace">
      <customer>Bloggs</customer>
      <purchase>Bicycle</purchase>
    </data>

下面的代碼：

            Console.WriteLine (data.Element (ns + "customer").ToString());
            // 輸出單個(gè)元素時(shí)會(huì)加上默認(rèn)的命名空間：
            <customer xmlns="http://www.rzrgm.cn/xmlspace">Bloggs</customer>

如果我們在創(chuàng)建子XElement時(shí)沒有指定命名空間，那么他們的命名空間為空，如下所示：

            XNamespace ns = "http://www.rzrgm.cn/xmlspace";
            var data = new XElement(ns + "data",
                           new XElement("customer", "Bloggs"),
                           new XElement("purchase", "Bicycle")
                       );
            Console.WriteLine(data.ToString());

我們會(huì)得到如下結(jié)果：

    <data xmlns="http://www.rzrgm.cn/xmlspace">
      <customer xmlns="">Bloggs</customer>
      <purchase xmlns="">Bicycle</purchase>
    </data>

另外一個(gè)可能的陷阱是我們在導(dǎo)航X-DOM時(shí)沒有加上適當(dāng)?shù)拿臻g：

            XNamespace ns = "http://www.rzrgm.cn/xmlspace";
            var data = new XElement(ns + "data",
                           new XElement(ns + "customer", "Bloggs"),
                           new XElement(ns + "purchase", "Bicycle")
                       );
            XElement x = data.Element(ns + "customer");     // 正確
            XElement y = data.Element("customer");          // 錯(cuò)誤，返回null

如果我們在創(chuàng)建X-DOM tree時(shí)沒有指定命名空間，我們可以通過如下代碼為所有元素指定一個(gè)命名空間：

            foreach (XElement e in data.DescendantsAndSelf())
                if (e.Name.Namespace == "")
                    e.Name = ns + e.Name.LocalName;

前綴/Prefixes

X-DOM只是把前綴作為序列化功能來使用，這意味著我們可以選擇完全忽略前綴問題。唯一的例外是當(dāng)我們在輸出一個(gè)XML文件時(shí)為了取得更好的效率。比如，考慮下面的代碼：

            XNamespace ns1 = "http://www.rzrgm.cn/space1";
            XNamespace ns2 = "http://www.rzrgm.cn/space2";
            var mix = new XElement(ns1 + "data",
                new XElement(ns2 + "element", "value"),
                new XElement(ns2 + "element", "value"),
                new XElement(ns2 + "element", "value")
            );

默認(rèn)情況下，XElement會(huì)被序列化為如下格式：

<data xmlns="http://www.rzrgm.cn/space1">
       <element xmlns="http://www.rzrgm.cn/space2">value</element>
       <element xmlns="http://www.rzrgm.cn/space2">value</element>
       <element xmlns="http://www.rzrgm.cn/space2">value</element>
</data>

正如你所看到的，這里有一些沒有必要的重復(fù)。解決方案不是改變創(chuàng)建X-DOM的方式，而是給序列化器一個(gè)暗示，用前綴來簡化相關(guān)的命名空間。這可以通過添加定義前綴的屬性來完成，并且通常由根元素來完成：

            mix.SetAttributeValue(XNamespace.Xmlns + "ns1", ns1);
            mix.SetAttributeValue(XNamespace.Xmlns + "ns2", ns2);

這會(huì)把前綴 “ns1”賦給XNamespace變量ns1，“ns2”賦給ns2。這樣X-DOM在序列化時(shí)會(huì)自動(dòng)使用這些屬性來簡化最終的XML。下面是在mix上調(diào)用ToString方法的最終結(jié)果：

<ns1:data xmlns:ns1="http://domain.com/space1"
xmlns:ns2="http://domain.com/space2">
       <ns2:element>value</ns2:element>
       <ns2:element>value</ns2:element>
       <ns2:element>value</ns2:element>
</ns1:data>