LINQ之路13:LINQ Operators之連接(Joining)
Joining
IEnumerable<TOuter>, IEnumerable<TInner>→IEnumerable<TResult>
|
Operator |
說明 |
SQL語義 |
|
Join |
應(yīng)用一種查詢策略來匹配兩個(gè)集合中的元素,產(chǎn)生一個(gè)平展的結(jié)果集 |
INNER JOIN |
|
GroupJoin |
同上,但是產(chǎn)生一個(gè)層次結(jié)果集 |
INNER JOIN, LEFT OUTER JOIN |
Join & GroupJoin Arguments
|
參數(shù) |
類型 |
|
外層/Outer sequence |
IEnumerable<TOuter> |
|
內(nèi)層/Inner sequence |
IEnumerable<TInner> |
|
外鍵選擇器/Outer key selector |
TOuter => TKey |
|
內(nèi)鍵選擇器/Inner key selector |
TInner => TKey |
|
結(jié)果選擇器/Result selector |
Join: (TOuter,TInner) => TResult GroupJoin: (TOuter,IEnumerable<TInner>) => TResult |
查詢表達(dá)式語法
from outer-var in outer-enumerable
join inner-var in inner-enumerable on outer-key-expr equals inner-key-expr
[ into identifier ]
簡(jiǎn)介
Join和GroupJoin通過匹配兩個(gè)輸入sequence來產(chǎn)生單個(gè)輸出sequence。Join產(chǎn)生平展結(jié)果集,而GroupJoin產(chǎn)生層次結(jié)果集。Join和GroupJoin提供了Select和SelectMany的替代策略。
Join和GroupJoin的優(yōu)點(diǎn)是他們對(duì)于本地內(nèi)存集合的執(zhí)行更加有效,因?yàn)樗麄冮_始就把內(nèi)層sequence裝載到一個(gè)按鍵排序的查找器,這樣就避免了重復(fù)的遍歷每一個(gè)內(nèi)層元素。他們的缺點(diǎn)則是他們只提供了inner和left out join的功能,而cross joins和不等連接non-equi joins還是只能通過Select/SelectMany來實(shí)現(xiàn)。
對(duì)于LINQ to SQL和Entity Framework查詢來講,Join和GroupJoin并沒有提供相對(duì)于Select和SelectMany的任何真正優(yōu)化,因?yàn)樗麄冎皇巧上鄳?yīng)的SQL語句,而執(zhí)行是在數(shù)據(jù)庫引擎中完成的。
下表總結(jié)了 每種join策略的差異:
表:連接策略
|
策略 |
結(jié)果形狀 |
本地查詢效率 |
Inner joins |
Left outer joins |
Cross joins |
Nonequi joins |
|
Select + SelectMany |
Flat |
Bad |
Yes |
Yes |
Yes |
Yes
|
|
Select + Select |
Nested |
Bad |
Yes |
Yes |
Yes |
Yes
|
|
Join |
Flat |
Good |
Yes |
- |
- |
- |
|
GroupJoin |
Nested |
Good |
Yes |
Yes |
- |
- |
|
GroupJoin + SelectMany |
Flat |
Good |
Yes |
Yes |
- |
- |
Join
Join運(yùn)算符執(zhí)行一個(gè)inner join,產(chǎn)生一個(gè)平展的輸出sequence。示范Join的最簡(jiǎn)單方式是使用LINQ to SQL,下面的示例列出所有的Customers和他們的Purchases:
IQueryable<string> query =
from c in dataContext.Customers
join p in dataContext.Purchases on c.ID equals p.CustomerID
select c.Name + " bought a " + p.Description;
//Result:
Tom bought a Bike
Tom bought a Holiday
Dick bought a Phone
Harry bought a Car
結(jié)果與我們使用SelectMany方式查詢時(shí)是一致的。
要看到Join相對(duì)于SelectMany的優(yōu)勢(shì),我們必須先把查詢轉(zhuǎn)為本地查詢,如下所示:
//把所有customers和purchases拷貝到數(shù)組,以實(shí)現(xiàn)本地查詢
Customer[] customers = dataContext.Customers.ToArray();
Purchase[] purchases = dataContext.Purchases.ToArray();
var slowQuery = from c in customers
from p in purchases
where c.ID == p.CustomerID
select c.Name + " bought a " + p.Description;
var fastQuery = from c in customers
join p in purchases on c.ID equals p.CustomerID
select c.Name + " bought a " + p.Description;
盡管上面兩個(gè)查詢產(chǎn)生相同的結(jié)果,但是使用join的查詢快得多,因?yàn)樗腅numerable實(shí)現(xiàn)把內(nèi)層collection預(yù)先裝載到一個(gè)按鍵排序的查找器。
join的查詢語法如下:
join inner-var in inner-sequence on outer-key-expr equals inner-key-expr
LINQ中的Join運(yùn)算符會(huì)區(qū)分對(duì)待outer sequence和inner sequence,語法上看:
- Outer sequence是輸入sequence (本例中是customers).
- Inner sequence是我們引入的新集合 (本例中是purchases).
Join執(zhí)行內(nèi)連接(inner joins),意味著沒有任何Purchases的Customers會(huì)被排除在結(jié)果之外。對(duì)于inner joins,我們可以交換查詢的inner和outer sequences并得到相同的結(jié)果:
var fastQuery = from p in purchases
join c in customers on p.CustomerID equals c.ID
select c.Name + " bought a " + p.Description;
我們可以在查詢中繼續(xù)添加join子句。比如,如果每個(gè)purchase有一個(gè)或多個(gè)purchase items,我們可以使用如下查詢來join purchase items:
from c in customers
join p in purchases on c.ID equals p.CustomerID // first join
join pi in purchaseItems on p.ID equals pi.PurchaseID // second join
...
purchases是第一個(gè)join的inner sequence和第二個(gè)join的outer sequence。我們可以通過嵌套的foreach來獲得相同的結(jié)果(低性能):
foreach (Customer c in customers)
foreach (Purchase p in purchases)
if (c.ID == p.CustomerID)
foreach (PurchaseItem pi in purchaseItems)
if (p.ID == pi.PurchaseID)
Console.WriteLine(c.Name + "," + p.Price + "," + pi.Detail);
在查詢語法中,前一個(gè)join引入的變量會(huì)保持在作用域之內(nèi),就像SelectMany樣式查詢中的外部范圍變量那樣,我們還可以在join子句中間插入where和let子句。
對(duì)多個(gè)鍵值進(jìn)行Join
我們可以使用匿名類型來對(duì)多個(gè)鍵值進(jìn)行Join,如下所示:
from x in sequenceX
join y in sequenceY on new { K1 = x.Prop1, K2 = x.Prop2 }
equals new { K1 = y.Prop3, K2 = y.Prop4 }
...
要使上面的查詢正確執(zhí)行,兩個(gè)匿名類型的結(jié)構(gòu)必須完全一致,這樣編譯器把他們對(duì)應(yīng)到同一個(gè)實(shí)現(xiàn)類型,從而使連接鍵值彼此兼容。
Join的方法語法
下面的join查詢語法
from c in customers
join p in purchases on c.ID equals p.CustomerID
select new { c.Name, p.Description, p.Price };
對(duì)應(yīng)的方法語法如下:
customers.Join( // outer collection
purchases, // inner collection
c => c.ID, // outer key selector
p => p.CustomerID, // inner key selector
(c, p) => new { c.Name, p.Description, p.Price } // result selector
);
結(jié)果選擇器表達(dá)式為輸出sequence創(chuàng)建每個(gè)element。如果我們需要在數(shù)據(jù)轉(zhuǎn)換之前添加其他子句比如orderby:
from c in customers
join p in purchases on c.ID equals p.CustomerID
orderby p.Price
select c.Name + " bought a " + p.Description;
那么在方法語法中,我們必須在結(jié)果選擇器中手動(dòng)構(gòu)建一個(gè)臨時(shí)的匿名類型,在該匿名類型中保存c和p:
customers.Join( // outer collection
purchases, // inner collection
c => c.ID, // outer key selector
p => p.CustomerID, // inner key selector
(c, p) => new { c, p }) // result selector
.OrderBy(x => x.p.Price)
.Select(x => x.c.Name + " bought a " + x.p.Description);
通常情況下,join的查詢表達(dá)式語法更加簡(jiǎn)潔。
GroupJoin
GroupJoin和Join一樣執(zhí)行連接操作,但它不是返回一個(gè)平展的結(jié)果集,而是一個(gè)層次結(jié)構(gòu)的結(jié)果集,使用每個(gè)外層element進(jìn)行分組。除了inner joins,GroupJoin還允許outer joins。GroupJoin的查詢語法也與Join相似,只是后面緊跟著into關(guān)鍵字。
// Here’s the most basic example of GroupJoin:
IEnumerable<IEnumerable<Purchase>> query =
from c in customers
join p in purchases on c.ID equals p.CustomerID
into custPurchases
select custPurchases; // custPurchases is a sequence
直接出現(xiàn)在join子句之后的into關(guān)鍵字會(huì)被翻譯為GroupJoin,而在select或group子句之后的into表示繼續(xù)一個(gè)查詢。雖然他們有一個(gè)共同的特征:都引入了一個(gè)新的查詢變量,但是into關(guān)鍵字的這兩種使用方式大不相同,必須引起注意。
其結(jié)果是一個(gè)包含了多個(gè)sequences的sequence,我們可以通過如下方式進(jìn)行遍歷:
foreach (IEnumerable<Purchase> purchaseSequence in query)
foreach (Purchase p in purchaseSequence)
Console.WriteLine(p.Description);
但是這種方式不是非常有用,原因在于outer sequence并沒有到outer customer的引用,所以purchaseSequence雖然是按customer進(jìn)行分組的,但我們?cè)诮Y(jié)果中卻失去了customer的相關(guān)信息。通常情況下,我們會(huì)在數(shù)據(jù)轉(zhuǎn)換中添加對(duì)外部范圍變量的引用:
from c in customers
join p in purchases on c.ID equals p.CustomerID
into custPurchases
select new { CustName = c.Name, custPurchases };
這會(huì)得到和下面的Select子查詢相同的結(jié)果(對(duì)于本地查詢來說,Select效率不如join):
from c in customers
select new
{
CustName = c.Name,
custPurchases = purchases.Where(p => c.ID == p.CustomerID)
};
默認(rèn)情況下,GroupJoin相當(dāng)于left outer join。要得到inner join(排除沒有任何purchases的customers),可以對(duì)custPurchases添加過濾條件:
from c in customers
join p in purchases on c.ID equals p.CustomerID
into custPurchases
where custPurchases.Any()
select ...
GroupJoin的into關(guān)鍵字之后的子句比如where針對(duì)subsequence,而不是單個(gè)的child elements。如果要對(duì)單獨(dú)的purchases添加條件,必須在join之前調(diào)用Where:
from c in customers
join p in purchases.Where(p2 => p2.Price > 1000)
on c.ID equals p.CustomerID
into custPurchases ...
平展的外連接/Flat outer joins
事情在我們希望得到一個(gè)outer join和平展的結(jié)果集時(shí)會(huì)陷入兩難的境地。GroupJoin讓我們獲得outer join;Join給了我們平展的結(jié)果集。所以解決方案就是先調(diào)用GroupJoin,然后對(duì)每個(gè)child sequence使用DefaultIfEmpty,最后調(diào)用SelectMany來獲取平展結(jié)果集:
from c in customers
join p in purchases on c.ID equals p.CustomerID into custPurchases
from cp in custPurchases.DefaultIfEmpty()
select new
{
CustName = c.Name,
Price = cp == null ? (decimal?)null : cp.Price
};
如果custPurchases為空,DefaultIfEmpty將產(chǎn)生一個(gè)null值。第二個(gè)from子句會(huì)被翻譯成SelectMany。所以,它會(huì)平展輸出所有的purchase subsequence,將他們合并到單一的輸出sequence。
使用lookups
在Enumerable的實(shí)現(xiàn)中,Join和GroupJoin的工作分為兩個(gè)步驟。首先,他們會(huì)把內(nèi)層sequence裝載到一個(gè)查找器,然后通過該查找器來查詢外層sequence。一個(gè)查找器(lookup)是一個(gè)分組的sequence并可以通過key來直接訪問。或者我們可以把它想象成一個(gè)dictionary,其中每個(gè)元素是一個(gè)sequence和對(duì)應(yīng)的key。
查找器是只讀的并通過如下接口定義:
public interface ILookup<TKey, TElement> :
IEnumerable<IGrouping<TKey, TElement>>, IEnumerable
{
int Count { get; }
bool Contains(TKey key);
IEnumerable<TElement> this[TKey key] { get; }
}
在處理本地集合時(shí),我們甚至可以手動(dòng)創(chuàng)建和查詢lookups來作為join運(yùn)算符的替代策略。這么做有如下優(yōu)點(diǎn):
- 我們可以在多個(gè)查詢之間重用同一個(gè)查找器(lookup)
- 對(duì)lookup進(jìn)行查詢可以讓我們更好的理解Join和GroupJoin的工作方式
ToLookup擴(kuò)展方法創(chuàng)建一個(gè)lookup,下面的代碼把所有的purchases裝載到一個(gè)lookup(用CustomerID作為Key):
ILookup<int?, Purchase> purchLookup =
purchases.ToLookup(p => p.CustomerID, p => p);
第一個(gè)參數(shù)選擇鍵值,第二個(gè)參數(shù)選擇作為value值被裝載到lookup中的對(duì)象。讀取一個(gè)lookup就像讀取一個(gè)dictionary,不同之處在于索引器返回的是一個(gè)包含多個(gè)匹配元素的sequence。下面的代碼遍歷所有CustomerID為1的purchases:
foreach (Purchase p in purchLookup[1])
Console.WriteLine(p.Description);
通過使用lookup,我們可以讓SelectMany/Select查詢運(yùn)行得像Join/GroupJoin查詢一樣高效。Join相當(dāng)于在lookup上使用SelectMany:
from c in customers
from p in purchLookup[c.ID]
select new { c.Name, p.Description, p.Price };
通過添加DefaultIfEmpty可以讓上面的查詢變成一個(gè)outer join:
from c in customers
from p in purchLookup[c.ID].DefaultIfEmpty()
select new
{
c.Name,
Descript = p == null ? null : p.Description,
Price = p == null ? (decimal?)null : p.Price
};
GroupJoin等價(jià)于在數(shù)據(jù)轉(zhuǎn)換時(shí)讀取lookup:
from c in customers
select new
{
CustName = c.Name,
CustPurchases = purchLookup[c.ID]
};
Enumerable實(shí)現(xiàn)
通過前面的介紹,現(xiàn)在我們可以來看看Join和GroupJoin在LINQ中的實(shí)現(xiàn)了。
下面是Enumerable.Join實(shí)現(xiàn)的簡(jiǎn)化版本(沒有null checking):
public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> resultSelector)
{
ILookup<TKey, TInner> lookup = inner.ToLookup(innerKeySelector);
return from outerItem in outer
from innerItem in lookup[outerKeySelector(outerItem)]
select resultSelector(outerItem, innerItem);
}
GroupJoin的實(shí)現(xiàn)與Join類似:
public static IEnumerable<TResult> GroupJoin<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, IEnumerable<TInner>, TResult> resultSelector)
{
ILookup<TKey, TInner> lookup = inner.ToLookup(innerKeySelector);
return from outerItem in outer
select resultSelector(outerItem, lookup[outerKeySelector(outerItem)]);
}

浙公網(wǎng)安備 33010602011771號(hào)