Hello! I am a MS candidate in the Computer Science & Engineering Department at the Korea University and a member of Data Mining & Information Systems Lab (DMIS) Lab, advised by Prof. Jaewoo Kang. Prior to my MS studies, I received a Bachelor's degree in Life Sciences & Bioinformatics from Ewha Womans University.
My research area covers the broad field of machine learning, with a focus on causal learning and generative models. I am also interested in applying machine learning techniques to diverse domains, including drug discovery and bioinformatics. Especially, I am focusing in single cell omics, spatial omics, and perturbation researches.
[sʌ.jʌn pak]
Seungheun Baek*, Soyon Park*, Chok Yan Ting, Junhyun Lee, Jueon Park, Mogan Gim, Jaewoo Kang (*Equal contribution).
[ISMB/ECCB 2025 / Bioinformatics] GPO-VAE: Modeling Explainable Gene Perturbation Responses utilizing GRN-Aligned Parameter Optimization
Seungheun Baek*, Soyon Park*, Chok Yan Ting, Mogan Gim, Jaewoo Kang (*Equal contribution).
[Bioinformatics Advances] BADGER: biologically-aware interpretable differential gene expression ranking model
Hajung Kim*, Mogan Gim*, Seungheun Baek, Soyon Park, Sunkyu Kim, Jaewoo Kang (*Equal contribution).
[BIBM 2025] CoTox: Chain-of-Thought based Molecular Toxicity Reasoning and Prediction
Jueon Park, Yein Park, Minju Song, Soyon Park, Donghyeon Lee, Seungheun Baek, Jaewoo Kang.
[CLEF 2025] Prompting Matters: Snippet-Aware Strategies for Biomedical QA with LLMs in BioASQ 13b
Hajung Kim*, Hoonick Lee*, Yewon Cho*, Jungwoo Park*, Jueon Park*, Soyon Park*, Yan Ting Chok*, Seungheun Baek*, Donghyeon Lee*, Jaewoo Kang. (2025)
[Under Review] Transductive Learning for Out-of-Distribution Molecular Property Prediction
Kiwoong Yoo, Hajung Kim, Soyon Park, Junseok Choe, Sunkyu Kim, Jaewoo Kang. (2025)
[Under Review] HiRef: Leveraging Hierarchical Ontology and Network Refinement for Robust Medication Recommendation
Yan Ting Chok, Soyon Park, Seungheun Baek, Hajung Kim, Junhyun Lee, Jaewoo Kang. (2025)
[ISMB 2024 / Bioinformatics] MolPLA: A Molecular Pretraining Framework for Learning Cores, R-Groups and their Linker Joints
Mogan Gim*, Jueon Park*, Soyon Park, Sanghoon Lee, Seungheun Baek, Junhyun Lee, Ngoc-Quang Nguyen, Jaewoo Kang (*Equal contribution)
Data Mining and Information Systems Lab (advisor: Prof. Jaewoo Kang)
Research Intern Machine Learning for Biomedicine Lab (Prof. Maria Brbić)
@Lausanne, Swizterland
Remote Research Intern Pinello Lab (Prof. Luca Pinello)
@Boston, MA, United States
Research Intern Functional & Molecular Imaging System Lab (Prof. Jaesung Lee)
@Seoul, South Korea
Research Intern Cancer Biology and Genomics Lab (Prof. Mijung Kwon)
@Seoul, South Korea
Organizer: CLEF 2025 BioASQ Workshop
Developed an ensemble of large language models (LLMs) for large-scale biomedical semantic indexing and question answering competition (Team DMIS-KU)
Organizer: National Institute for Korean Medicine Development & Ministry of Health and Welfare
Analyzing herbal medicines using network pharmacology and AI models (Team 2Park)
Drug-Target Interaction (DTI) Prediction
Organizer: Ministry of Science and ICT
Scholarship for outstanding science and engineering students
Organizer: Ministry of Science and ICT
[Research with EPFL] Foundation model for single cell phenotype modeling
- designed a foundation model that predicts gene expression responses to perturbations and, in reverse, infers the underlying perturbations from post-perturbation expression profiles, spanning genetic mutations, chemical treatments, and developmental perturbations.
[Research with Harvard Medical School] Probabilistic modeling for single cell multiomics dataset
- proposed a probabilistic framework integrated with a heterogeneous graph structure to capture multimodal relationships in single cell multiomics data
[ISMB/ECCB 2025 poster, Ongoing Project] SPARTHA: Enhancing Spatial Gene Expression Prediction with Artifact Disentanglement from Histology Images
- developed a multi-modal conditional VAE that integrates histology images and spatial transcriptomics to generate spatial gene expression from tissue morphology while disentangling artifacts
Development of Multi-Cancer Biomarkers via AI-Driven Proteomic Analysis of Plasma MS/MS Data
- proposed a transformer-based, ID-free representation learning approach on peptide DIA MS/MS spectra to identify cancer-specific embeddings and classify ten cancer types