機器學習資料匯總
對于集成機器學習的應用, 我認為下面3個思路比較好:
- 模型訓練/推理 使用 SK-learn 做模型訓練, 使用 SHAP 做模型解釋, 然后將模型通過 sklearn-onnx 項目將模型導出成 onnx, 然后使用 ML.Net 使用模型來推理.
- 模型訓練仍然由SK-learn 完成, 另外推理也交由 SK-learn 完成, 但數據處理過程使用 C#/Java 來完成, 即工程化這塊交過C#/Java, 數據處理的結果通過 duckdb 形式傳到Python端完成推理過程.
- 模型訓練/推理/數據處理都交由ML.net完成.
通用入門知識
https://developers.google.com/machine-learning/crash-course
https://github.com/microsoft/ML-For-Beginners
機器學習開源的數據集
https://archive.ics.uci.edu/datasets
該網站包含很多種類的數據集, 并給出了使用不同算法的performance, 非常適合學習. 比如預測收入的數據集, https://archive.ics.uci.edu/dataset/2/adult
ML.net sample使用的dataset
https://github.com/dotnet/machinelearning-samples/blob/main/docs/DATASETS.md
rapaio jar自帶的dataset
https://padreati.github.io/rapaio/tutorials/BuiltinDataSets.html
Python
在機器學習和深度學習領域, python毫無疑問生態最好. 其中機器學習領域 sklearn + SHAP 算是最主流的.
Github Machine Learning Repositories for Data Scientists
https://www.geeksforgeeks.org/15-github-machine-learning-repositories-for-data-scientists/
https://www.geeksforgeeks.org/gradientboosting-vs-adaboost-vs-xgboost-vs-catboost-vs-lightgbm/?ref=asr3
該網頁包含了各種常用的ML算法和計算框架和超參調優工具和可解釋性工具
Using XGBoost in Python Tutorial
https://www.datacamp.com/tutorial/xgboost-in-python
https://www.datacamp.com/tutorial/decision-tree-classification-python
https://www.datacamp.com/tutorial/machine-learning-python
張宇翔同學的機器學習課程結課論文, 整體的非常好, 使用Python實踐了大多數機器學習算法
https://zjtdzyx.github.io/machine-learning-project/
https://github.com/zjtdzyx/machine-learning-project
使用 xgboost 分析預測收入
https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/tree_based_models/Census income classification with XGBoost.html#Load-dataset
https://github.com/shap/shap/blob/master/notebooks/tabular_examples/tree_based_models/Census income classification with XGBoost.ipynb
https://www.kaggle.com/code/grayphantom/income-prediction-using-random-forest-and-xgboost
https://www.kaggle.com/code/apantazo/income-census-adult-xgboost#Feature-Engineering
C# 類庫
C# 領域微軟 ML.Net 是最主流的機器學習框架, 該框架的一個優點是, 經歷了很多版本, 但概念和核心API一直沒有變化.
微軟ML.net cookbook
https://github.com/dotnet/machinelearning/blob/main/docs/code/MlNetHighLevelConcepts.md
https://github.com/dotnet/machinelearning/blob/main/docs/code/MlNetCookBook.md
該 cookbook 比較老了, 但基本概念仍然適用
官方的tutorial
https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/
PDF文章: Introduction to ML.NET
https://assets.ctfassets.net/9n3x4rtjlya6/1WpeTHDK1eIRe1Toj0w8mU/eff3ee2e8eb5ed98c11bc3e46a716379/100533_1238993435_Jeff_Prosise_Machine_learning_for_C_developers_Introducing_ML.NET.pdf
PDF書籍:
https://ptgmedia.pearsoncmg.com/images/9780137383658/samplepages/9780137383658_Sample.pdf
博文: ML.NET:
https://www.todaysoftmag.com/article/3286/machine-learning-101-with-microsoft-ml-net-part-1-3
https://rubikscode.net/2021/04/12/machine-learning-with-ml-net-evaluation-metrics/
https://rubikscode.net/2021/04/26/machine-learning-with-ml-net-sentiment-analysis/
https://rubikscode.net/2021/09/27/net-interactive-jupyter-notebooks/
https://rubikscode.net/2022/08/29/machine-learning-with-ml-net-introduction/
https://www.codemag.com/Article/1911042/ML.NET-Machine-Learning-for-.NET-Developers
https://www.microsoftpressstore.com/articles/article.aspx?p=3129454&seqNum=2
ML.Net 的示例項目, 包含很多示例, 并且代碼包含數據集
https://github.com/jeffprosise/ML.NET
使用 ML.Net 的示例項目, 包含很多示例, 并且代碼包含數據集
https://github.com/feiyun0112/machinelearning-samples.zh-cn/tree/master
https://github.com/dotnet/machinelearning-samples
使用 C# 實現的機器學習算法庫
https://github.com/mdabros/SharpLearning
https://github.com/mdabros/XGBoostSharp
Java 類庫
Smile — Statistical Machine Intelligence and Learning Engine
同時支持SHAP,
https://github.com/haifengl/smile
https://haifengl.github.io/regression.html
https://haifengl.github.io/quickstart.html
tribuo: Oracle 出的機器學習庫, apache 許可
https://tribuo.org/
rapaio: 偏向統計的數據挖掘庫
https://github.com/padreati/rapaio
https://padreati.github.io/rapaio/tutorials/BuiltinDataSets.html
為 Jupyter 增加 Java Kernel 功能
https://github.com/padreati/rapaio-jupyter-kernel
xgboost 官方提供的 xgboost4j 類庫
https://xgboost.readthedocs.io/en/latest/jvm/java_intro.html#
https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-example/README.md
weka 數據挖掘工具, 包含很多經典機器學習算法
https://ml.cms.waikato.ac.nz/weka/
Java 的 deeplearning4j 項目
https://github.com/deeplearning4j/deeplearning4j

浙公網安備 33010602011771號