Probing Chemical Language Models: Effects of Pre-training and Fine-tuning
Chemical language models (CLMs) are trained with linearized representations such as SMILES, yet it remains unclear which chemically meaningful substructures they encode. To foster a better understanding of CLMs, we conduct a systematic study and probe for 78 molecular substructures across eight pre-trained and six randomly initialized models. We furthermore study how fine-tuning on chemical downstream tasks affects the learned representations of molecular substructures. Our results show that pre-training generally improves molecular structure awareness of CLMs, particularly in the upper layers
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorAnna Karnysheva →
Probing Chemical Language Models: Effects of Pre-training and Fine-tuning
- Linked via arxiv authorDietrich Klakow →
Probing Chemical Language Models: Effects of Pre-training and Fine-tuning
- Linked via arxiv authorJi-Ung Lee →
Probing Chemical Language Models: Effects of Pre-training and Fine-tuning
