Publications | Hyeonseok Moon

2025

Metric Calculating Benchmark: Complicate Instruction Following Benchmark for Large Language Models

Hyeonseok Moon, Seongtae Hong, Jaehyung Seo, and 1 more author

In The 2025 Conference on Empirical Methods in Natural Language Processing, 2025
LimaCost: Data Valuation for Instruction Tuning of Large Language Models

Hyeonseok Moon, Jaehyung Seo, Seonmin Koo, and 4 more authors

In Findings of the Association for Computational Linguistics: EMNLP 2025, 2025
The Impact of Negated Text on Hallucination with Large Language Models

Jaehyung Seo, Hyeonseok Moon, and Heuiseok Lim

In The 2025 Conference on Empirical Methods in Natural Language Processing, 2025
Call for Rigor in Reporting Quality of Instruction Tuning Data

Hyeonseok Moon, Jaehyung Seo, and Heuiseok Lim

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Jul 2025

DOI
Cross-Lingual Optimization for Language Transfer in Large Language Models

Jungseob Lee, Seongtae Hong, Hyeonseok Moon, and 1 more author

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Jul 2025

DOI
Semantic Aware Linear Transfer by Recycling Pre-trained Language Models for Cross-lingual Transfer

Seungyoon Lee, Seongtae Hong, Hyeonseok Moon, and 1 more author

In Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025

DOI
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models

Dahyun Jung, Seungyoon Lee, Hyeonseok Moon, and 2 more authors

In Findings of the Association for Computational Linguistics: NAACL 2025, Apr 2025

HTML
MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation

Chanhee Park, Hyeonseok Moon, Chanjun Park, and 1 more author

In Findings of the Association for Computational Linguistics: NAACL 2025, Apr 2025

HTML
Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models

Hyeonseok Moon, Jaehyung Seo, Seungyoon Lee, and 2 more authors

In Findings of the Association for Computational Linguistics: NAACL 2025, Apr 2025

HTML
MIGRATE: Cross-Lingual Adaptation of Domain-Specific LLMs through Code-Switching and Embedding Transfer

Seongtae Hong, Seungyoon Lee, Hyeonseok Moon, and 1 more author

In Proceedings of the 31st International Conference on Computational Linguistics, Jan 2025

HTML

2024

Leveraging Pre-existing Resources for Data-Efficient Counter-Narrative Generation in Korean

Seungyoon Lee, Chanjun Park, DaHyun Jung, and 4 more authors

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024

HTML
Detecting Critical Errors Considering Cross-Cultural Factors in English-Korean Translation

Sugyeong Eo, Jungwoo Lim, Chanjun Park, and 5 more authors

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024

HTML
Translation of Multifaceted Data without Re-Training of Machine Translation Systems

Hyeonseok Moon, Seungyoon Lee, SeongTae Hong, and 3 more authors

In Findings of the Association for Computational Linguistics: EMNLP 2024, Nov 2024

DOI HTML
Length-aware Byte Pair Encoding for Mitigating Over-segmentation in Korean Machine Translation

Jungseob Lee, Hyeonseok Moon, Seungjun Lee, and 6 more authors

In Findings of the Association for Computational Linguistics: ACL 2024, Aug 2024

DOI HTML
Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation

Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, and 3 more authors

In Findings of the Association for Computational Linguistics: EACL 2024, Mar 2024

HTML
Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing

Chanjun Park, Jaehyung Seo, Seolhwa Lee, and 5 more authors

In Findings of the Association for Computational Linguistics: EACL 2024, Mar 2024

Abs HTML

The recent advancements in the realm of Automatic Speech Recognition (ASR) post-processing have been primarily driven by sequence-to-sequence paradigms. Despite their effectiveness, these methods often demand substantial amounts of data, necessitating the expensive recruitment of phonetic transcription experts to rectify the erroneous outputs of ASR systems, thereby creating the desired training data. Back TranScription (BTS) alleviates this issue by generating ASR inputs from clean text via a Text-to-Speech (TTS) system. While initial studies on BTS exhibited promise, they were constrained by a limited dataset of just 200,000 sentence pairs, leaving the scalability of this method in question. In this study, we delve into the potential scalability of BTS. We introduce the “Hyper-BTS” dataset, a corpus approximately five times larger than that utilized in prior research. Additionally, we present innovative criteria for categorizing error types within ASR post-processing. This not only facilitates a more comprehensive qualitative analysis, which was absent in preceding studies, but also enhances the understanding of ASR error patterns. Our empirical results, both quantitative and qualitative, suggest that the enlarged scale of the Hyper-BTS dataset sufficiently addresses a vast majority of the ASR error categories. We make the Hyper-BTS dataset publicly available.
Exploiting hanja-based resources in processing korean historic documents written by common literati

Hyeonseok Moon, Myunghoon Kang, Jaehyung Seo, and 4 more authors

IEEE Access, Mar 2024

2023

KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing

Seonmin Koo, Chanjun Park, Jinsung Kim, and 4 more authors

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Mar 2023

HTML
Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations

Yoonna Jang, Suhyune Son, Jeongwoo Lee, and 6 more authors

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Mar 2023

HTML
CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients

Jaehyung Seo, Hyeonseok Moon, Jaewook Lee, and 3 more authors

In The 2023 Conference on Empirical Methods in Natural Language Processing, Mar 2023

HTML
Informative Evidence-guided Prompt-based Fine-tuning for English-Korean Critical Error Detection

Dahyun Jung, Sugyeong Eo, Chanjun Park, and 3 more authors

In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Mar 2023

HTML
Improving formality-sensitive machine translation using data-centric approaches and prompt engineering

Seungjun Lee, Hyeonseok Moon, Chanjun Park, and 1 more author

In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), Mar 2023

HTML
PEEP-Talk: A Situational Dialogue-based Chatbot for English Education

Seungjun Lee, Yoonna Jang, Chanjun Park, and 7 more authors

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Mar 2023

HTML
Towards Diverse and Effective Question-Answer Pair Generation from Children Storybooks

Sugyeong Eo, Hyeonseok Moon, Jinsung Kim, and 6 more authors

In Findings of the Association for Computational Linguistics: ACL 2023, Jul 2023

DOI HTML
Doubts on the reliability of parallel corpus filtering

Hyeonseok Moon, Chanjun Park, Seonmin Koo, and 8 more authors

Expert Systems with Applications, Jul 2023

HTML
Uncovering the Risks and Drawbacks Associated with the Use of Synthetic Data for Grammatical Error Correction

Seonmin Koo, Chanjun Park, Seolhwa Lee, and 4 more authors

IEEE Access, Jul 2023

HTML
A Survey on Evaluation Metrics for Machine Translation

Seungjun Lee, Jungseob Lee, Hyeonseok Moon, and 5 more authors

Mathematics, Jul 2023

HTML

2022

QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation

Sugyeong Eo, Chanjun Park, Hyeonseok Moon, and 4 more authors

In Proceedings of the 29th International Conference on Computational Linguistics, Jul 2022

HTML
A dog is passing over the jet? a text-generation dataset for korean commonsense reasoning and evaluation

Jaehyung Seo, Seounghoon Lee, Chanjun Park, and 5 more authors

In Findings of the Association for Computational Linguistics: NAACL 2022, Jul 2022

HTML
Empirical Analysis of Noising Scheme based Synthetic Data Generation for Automatic Post-editing

Hyeonseok Moon, Chanjun Park, Seolhwa Lee, and 4 more authors

In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Jul 2022

HTML
Priming Ancient Korean Neural Machine Translation

Chanjun Park, Seolhwa Lee, Jaehyung Seo, and 3 more authors

In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Jul 2022

HTML
PU-GEN: Enhancing generative commonsense reasoning for language models with human-centered knowledge

Jaehyung Seo, Dongsuk Oh, Sugyeong Eo, and 5 more authors

Knowledge-Based Systems, Jul 2022

HTML
K-nct: Korean neural grammatical error correction gold-standard test set using novel error type classification criteria

Seonmin Koo, Chanjun Park, Jaehyung Seo, and 4 more authors

IEEE Access, Jul 2022

HTML
Plain Template Insertion: Korean-Prompt-Based Engineering for Few-Shot Learners

Jaehyung Seo, Hyeonseok Moon, Chanhee Lee, and 5 more authors

IEEE Access, Jul 2022

HTML
BERTOEIC: Solving TOEIC Problems Using Simple and Efficient Data Augmentation Techniques with Pretrained Transformer Encoders

Jeongwoo Lee, Hyeonseok Moon, Chanjun Park, and 3 more authors

Applied Sciences, Jul 2022

HTML
Empirical Analysis of Parallel Corpora and In-Depth Analysis Using LIWC

Chanjun Park, Midan Shim, Sugyeong Eo, and 4 more authors

Applied Sciences, Jul 2022

HTML
AI for Patents: A Novel Yet Effective and Efficient Framework for Patent Analysis

Junyoung Son, Hyeonseok Moon, Jeongwoo Lee, and 4 more authors

IEEE Access, Jul 2022

HTML
Return on Advertising Spend Prediction with Task Decomposition-Based LSTM Model

Hyeonseok Moon, Taemin Lee, Jaehyung Seo, and 7 more authors

Mathematics, Jul 2022

HTML
Word-level quality estimation for korean-english neural machine translation

Sugyeong Eo, Chanjun Park, Hyeonseok Moon, and 2 more authors

IEEE Access, Jul 2022

HTML
Dense-to-question and sparse-to-answer: Hybrid retriever system for industrial frequently asked questions

Jaehyung Seo, Taemin Lee, Hyeonseok Moon, and 7 more authors

Mathematics, Jul 2022

HTML
Mimicking Infants’ Bilingual Language Acquisition for Domain Specialized Neural Machine Translation

Chanjun Park, Woo-Young Go, Sugyeong Eo, and 3 more authors

IEEE Access, Jul 2022

HTML
An automatic post editing with efficient and simple data generation method

Hyeonseok Moon, Chanjun Park, Jaehyung Seo, and 2 more authors

IEEE Access, Jul 2022

HTML

2021

Should we find another model?: Improving neural machine translation performance with ONE-piece tokenization method without model modification

Chanjun Park, Sugyeong Eo, Hyeonseok Moon, and 1 more author

In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, Jul 2021

HTML
An empirical study on automatic post editing for neural machine translation

Hyeonseok Moon, Chanjun Park, Sugyeong Eo, and 2 more authors

IEEE Access, Jul 2021

HTML
Comparative analysis of current approaches to quality estimation for neural machine translation

Sugyeong Eo, Chanjun Park, Hyeonseok Moon, and 2 more authors

Applied Sciences, Jul 2021

HTML