moe - 技术专题深度解读 | GitHub 中文社区

vllm-project / vllm

Sponsor

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated Jun 5, 2026
Python

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Updated Jun 5, 2026
Python

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

reinforcement-learning cuda inference transformer moe attention llama glm minimax wan diffusion vlm blackwell llm qwen deepseek gpt-oss qwen-image

Updated Jun 5, 2026
Python

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).

Updated Jun 5, 2026
Python

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

cuda pytorch moe blackwell llm-serving

Updated Jun 5, 2026
Python

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

gpu cuda jit pytorch nvidia moe attention llm-inference large-large-models distributed-inference

Updated Jun 5, 2026
Python

czy0729 / Bangumi

An unofficial https://bgm.tv ui first app client for Android and iOS, built with React Native. 一个无广告、以爱好为驱动、不以盈利为目的、专门做 ACG 的类似豆瓣的追番记录，bgm.tv 第三方客户端。为移动端重新设计，内置大量加强的网页端难以实现的功能，且提供了相当的自定义选项。目前已适配 iOS / Android。

react android ios design react-native mobx ios-app moe bangumi android-app expo

Updated Jun 5, 2026
TypeScript

zai-org / GLM-4.5

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

agent moe glm llm reasoning-language-models

Updated Feb 1, 2026
Python

PKU-YuanGroup / MoE-LLaVA

【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models

moe multi-modal mixture-of-experts large-vision-language-model

Updated Jul 15, 2025
Python

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

pytorch transformer moe llm llm-serving llm-training flash-attention

Updated Apr 3, 2025
Python

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

ai networking hpc amd gpu collective cuda p2p nvidia broadcom moe rdma allreduce llm kvcache

Updated Jun 5, 2026
C++

davidmrau / mixture-of-experts

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

pytorch moe re-implementation mixture-of-experts sparsely-gated-mixture-of-experts

Updated Apr 19, 2024
Python

pjlab-sys4nlp / llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

moe llama mixture-of-experts llm continual-pre-training expert-partition

Updated Dec 6, 2024
Python

microsoft / Tutel

Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4

pytorch moe mixture-of-experts llm deepseek

Updated Jun 4, 2026
C

NVIDIA / cudnn-frontend

cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

Updated Jun 5, 2026
Python

sail-sg / Adan

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Updated Jun 8, 2025
Python

ScienceOne-AI / DeepSeek-671B-SFT-Guide

An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions. (DeepSeek-V3/R1 满血版 671B 全参数微调的开源解决方案，包含从训练到推理的完整代码和脚本，以及实践中积累一些经验和结论。)

python moe sft llm deepseek-r1

Updated Mar 13, 2025
Python

open-compass / MixtralKit

A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI

moe mistral llm

Updated Dec 15, 2023
Python

SharpAI / SwiftLM