Junwen Chen

Junwen Chen image

Ph.D. at The University of Electro-Communications · Tokyo, Japan

Hello, I am Junwen Chen, from Sichuan, China. I am currently pursuing my Ph.D. at The University of Electro-Communications and belong to the Yanai Lab. My research mainly focuses on Human-Object Interaction Detection, MLLM, and AIGC.

Contact Me

🌍 Visitor Map

Visitor Map

About

I mainly conduct research on deep learning in the field of computer vision, focusing on improving the accuracy and generalization of Human-Object Interaction (HOI) detection methods from my master's to doctoral studies. Recently, I have been exploring the integration of Multimodal Large Language Models (MLLMs) and AI-Generated Content (AIGC) into my research topics.

Research Interests

  • Machine Learning, Deep Learning
  • Computer Vision: Object Detection, Image Segmentation, Visual Question Answering, Video Action Recognition
  • AIGC: Text-to-Image Generation, Multi-layer Image Generation, Image Editing

Education

The University of Electro-Communications · Ph.D.

2023/10 — Present

Major: Informatics

Research Theme: Improving the Efficiency and Generality of Human-Object Interaction Detection Methods

  • Deep Learning
  • Computer Vision
  • Human-Object Interaction Detection
  • Transformer
  • VLM
  • MLLM

Major Achievement:

1. Chen, Junwen, Peilin Xiong, and Keiji Yanai. "HOI-R1: Exploring the Potential of Multimodal Large Language Models for Human-Object Interaction Detection." International Conference on Pattern Recognition (ICPR). [PDF] [Code] Research Card
2. Chen, Junwen, and Keiji Yanai. "Bridging Detection Architectures with Foundation Models: A Unified Framework for Human-Object Interaction Detection." IEEE Access, doi: 10.1109/ACCESS.2026.3659132. 2025. [PDF] Research Card
3. Chen, Junwen, Yingcheng Wang, and Keiji Yanai. "Focusing on what to Decode and what to Train: SOV Decoding with Specific Target Guided DeNoising and Vision Language Advisor." 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025. [PDF] [Code] Research Card

The University of Electro-Communications · M.S.

2021/10 — 2023/09

Major: Informatics

Degree: Master of Informatics

Research Theme: Improvement of Human-Object Interaction Detection Methods and Their Application to Dietary Analysis

  • Deep Learning
  • Computer Vision
  • Human-Object Interaction Detection
  • Transformer

Major Achievement:

1. Chen, Junwen, and Keiji Yanai. "QAHOI: Query-based anchors for human-object interaction detection." 2023 18th International Conference on Machine Vision and Applications (MVA). IEEE, 2023. [PDF] [Code] Research Card
2. Chen, Junwen, and Keiji Yanai. "Parallel Queries for Human-Object Interaction Detection." Proceedings of the 4th ACM International Conference on Multimedia in Asia. 2022. [PDF] Research Card
3. Wang, Yingcheng, Junwen Chen, and Keiji Yanai. "HowToEat: Exploring Human Object Interaction and Eating Action in Eating Scenarios." Proceedings of the 8th International Workshop on Multimedia Assisted Dietary Management. 2023. [PDF] Research Card

North China University of Technology · B.S.

2016/09 — 2020/07

Major: Automation

Degree: Bachelor of Electrical and Control Engineering

Research Theme: Intelligent Driving Scene Segmentation with Deep Detection Model and Graph Convolutional Network

  • Deep Learning
  • Computer Vision
  • Instance Segmentation
  • Graph Neural Networks

Major Achievement:

1. Chen, J., Lu, Y., Chen, Y., Zhao, D., & Pang, Z. (2020, November). Contourrend: a segmentation method for improving contours by rendering. In International Symposium on Neural Networks (pp. 251-260). Cham: Springer International Publishing. [PDF] [Code] Research Card

Internships

Microsoft Research Asia · Full-time Research Intern

2024.10 — 2025.04 · Beijing
  • Research on Layout-based, Multi-layer Image Generation and Knowledge Graph-based Image Generation Benchmark.
  • Major Achievement:

    1. Chen, J., Jiang, H., Wang, Y., Wu, K., Li, J., Zhang, C., ... & Yuan, Y. (2025). PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models. arXiv preprint arXiv:2505.22523. [PDF] [Datasets] Research Card

    2. Wu, K., Chen, J., Liang, Z., Wang, Y., Li, J., Zhang, C., ... & Yuan, Y. (2025). Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 17930-17940). [PDF] [Code] Research Card

    3. Luo, Y., Yuan, Y., Chen, J., Cai, H., Yue, Z., Yang, Y., ... & Lian, Z. (2025). MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning. arXiv preprint arXiv:2506.10963. (NeurIPS 2025 Poster) [PDF] [Project Page] Research Card

Institute of Automation Chinese Academy of Sciences · Full-time Research Intern

2019.09 — 2020.06 · Beijing
  • Research on improving the interactive instance segmentation method and developed applications for Huawei Atlas 200 DK AI Kit.
  • Major Achievement:

    1. Chen, J., Lu, Y., Chen, Y., Zhao, D., & Pang, Z. (2020, November). Contourrend: a segmentation method for improving contours by rendering. In International Symposium on Neural Networks (pp. 251-260). Cham: Springer International Publishing. [PDF] [Code] Research Card
    2. Road Segmentation Application based on Huawei Atlas 200 DK AI Kit. [Code]
    3. Object Detection Application based on Huawei Atlas 200 DK AI Kit. [Code]

Awards

Best paper awards in the 18th International Conference on Machine Vision Applications (MVA2023)

2023/06/25

This award has been given since 2011 to the authors of an paper that was most excellent from the viewpoint of machine vision applications.

[Official Site]

情報処理学会第86回全国大会学生奨励賞

2024/03/15

Presentation Title: 画像認識技術を活用した冷蔵庫内食材自動判別システムの開発

As the Teaching Assistant, I supported the student on this project.

[Award Info] [Official Site] [Paper]

日本国内会議の参加

画像の認識・理解シンポジウムMIRU2025

京都

Peilin Xiong, Junwen Chen, Honghui Yuan, and Keiji Yanai: "Controlling Unseen Compositions in Diffusion Models by Swapping Positional Embeddings"

拡散モデルにおいて未知の物体構成を生成するために、ポジショナル埋め込みを動的に入れ替えることで、モデルを再学習することなく新奇な視覚的合成(例:人間の頭部を犬の体に合成)を実現する手法を提案している。

  • Peilin Xiong口頭発表

Honghui Yuan, Junwen Chen, and Keiji Yanai: Few-shot Font Generation for Japanese Kuzushiji with Differentiable Renderer

Word-as-ImageをベースとしてDiffVGでくずし文字を生成.

  • Honghui Yuan口頭発表

画像の認識・理解シンポジウムMIRU2024

熊本

Jing Yang, Junwen Chen, Jingbin Xu and Keiji Yanai: RecipeSD: Injecting Recipe Embedding into Food Image Synthesis using Stable Diffusion

Cross-modal recipe embeddingをControlNetの入力にしたCookNetを用いてStableDiffusionでレシピからの食事画像生成を実現.

  • ポスター

画像の認識・理解シンポジウムMIRU2023

浜松

陳俊文, 王瀛成, 柳井啓司: 人物・物体・動作デコーダの分離によるHOI検出

本研究では人物デコーダ,物体デコーダ,動作デコーダからなる新しい one-stage フレームワークを提案する.HICO-DET で本手法は学習エポックの 3 分の 1 で最先端手法より高い精度を達成した.

  • 陳俊文口頭発表

画像の認識・理解シンポジウムMIRU2022

姫路

陳俊文, 柳井啓司: マルチスケールのアンカーを用いた人間と物体のインタラクション検出

QAHOIを提案.階層型バックボーンと Deformable Transformer Encoder を用いて、マルチスケールで特徴を抽出する

  • ポスター

電子情報通信学会パターン認識・メディア理解研究会(PRMU)

2022

陳俊文, 柳井啓司: クエリベースのアンカーを用いた人間と物体のインタラクション検出

Transformer を用いたマルチスケールアーキテクチャを採用し,クエリに基づくアンカーを用いて HOI インスタンスの全ての要素を予測する one-stage の手法を提案する

  • 陳俊文オンライン発表

Skills

Programming

  • Python
  • C++
  • JavaScript
  • HTML/CSS
  • Kotlin

Frameworks/Tools

  • PyTorch
  • TensorFlow
  • Docker
  • Linux

Languages

  • Chinese (Native)
  • English (TOEFL 84, TOEIC 845)
  • Japanese (N1)

Personal Projects

Japanese Words Learning App

SuperWordsR project thumbnail

A mobile application written in Kotlin designed to help users learn Japanese words through interactive quizzes and flashcards.


Code

Japan Ski Resort Info

SkiMeta project thumbnail

A website providing information about ski resorts in Japan, including location, facilities, and user reviews.


Website

Qwen-ASR-LLM-TTS_MCP-Chat-AI-Assistant

AI Assistant project thumbnail

An easy-to-deploy real-time multilingual AI assistant integrating Qwen3-ASR for speech recognition, Ollama + Qwen3-30B-A3B for LLM, Qwen3-TTS / Kokoro for speech synthesis, and MCP for smart device control.


code

Hobbies

Tennis, Photography, Traveling