Feixiang Shu's Blog
首页
关于
分类
归档
搜索
论文阅读
分类
2025
11-03
S³: Social-network Simulation System with Large Language Model-Empowered Agents
10-29
Advancing LLM Safe Alignment with Safety Representation Ranking
10-29
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
10-29
Aligning Large Language Models with Human Preferences through Representation Engineering
10-29
CONTRANS: Weak-to-Strong Alignment Engineering via Concept Transplantation
10-29
DeAL: Decoding-time Alignment for Large Language Models
10-29
Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
10-29
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
10-29
Improving Alignment and Robustness with Circuit Breakers
10-29
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance
1
…
7
8
9
…
12
0%
Theme NexT works best with JavaScript enabled