1.需求
![]()
2.應用場景
簡單的線性預測銷售額,比如三月份累計銷售額應該達到2a+b,實際a+b+c看看差距
3.sql實現
這道題先需要去分析結果集,本質上是一個迭代累加的過程,先要得到如下結果
![]()
如果在面試數倉中實現了以上結果,基本上面試官會很通過,也在短時間內可以實現,實現sql如下
with tb as (
select 1 as s,'a' as pv
union all
select 2 as s,'b' as pv
union all
select 3 as s,'c' as pv
union all
select 4 as s,'d' as pv
union all
select 5 as s,'e' as pv
union all
select 6 as s,'f' as pv
)
select s,pv,concat_ws('+',collect_list(re1) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) from
(select *,concat_ws('+',collect_list(pv) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as re1 from tb) tb1;
首先開窗將pv分組連接起來作為一個結果集,然后再將結果集分組連接起來。核心知識點:collect_list,concat_ws,開窗函數及開窗中的ROWS BETWEEN。其中`ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`為開始行到當前行
![]()
另一種實現方式,通過不斷解析實現,分組實現,這種實現主要考察對sql的深度理解和組裝,實現起來還是比較復雜的,sql如下
with tb as (
select '1' as s,'a' as pv
union all
select '2' as s,'b' as pv
union all
select '3' as s,'c' as pv
union all
select '4' as s,'d' as pv
union all
select '5' as s,'e' as pv
union all
select '6' as s,'f' as pv
),
tb1 as
(select *,
concat_ws(',',collect_list(pv) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as pv_n,
reverse(concat_ws(',',collect_list(s) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))) as s_n
from tb),
tb2 as
(select s,pv,pv_n,s_n from tb1
lateral view explode(SPLIT(pv_n,',')) tb1 as pv_n),
tb3 as
(select *,row_number() over(partition by s order by pv_n) as rn from tb2)
select s,pv,concat_ws('+',collect_list(concat(pv_n,'*',split(s_n,',')[rn-1]))) as res from tb3 group by s,pv order by s;
該種實現方式包含的知識點比較多,包括collect_list,concat_ws,lateral view explode,開窗函數。其中將開窗出來的值作為切割出來的數組下標來使用這種思路在實際中不容易想到。但這個sql本質上有隱患,其中reverse函數如果s字段超過個位數后存在問題,當做練sql吧 實際中要多想想
![]()
上圖中,要取pv_n*s_n[rn-1],然后再將結果連接起來
![]()
然后再分組連接起來
![]()