cv
Basics
Name | Yuhang Wen |
wenyh29[at]mail2.sysu.edu.cn | |
Url | https://necolizer.github.io |
Education
-
2023.09 - current Guangdong, China
-
2019.09 - 2023.06 Guangdong, China
Projects
- 2023.01 - current
Learning Skeletal Multi-Entity Interactions
This project aimed to recognize and understand interactions involving multiple entities—such as human bodies, hands, objects, or other elements within a scene. Our goal was to unify the recognition tasks of person-peron, hand-hand, hand-object interactions, and even group activities.
- Led and played a key role in every aspect of the project, including method design, coding, conducting experiments, result analysis and manuscript writing.
- Proposed a model based on entity rearrangement and token attention, achieving SOTA performance across 4 benchmarks.
- Enabled the single-entity backbones (e.g., CTR-GCN) to achieve SOTA performance on 6 multi-entity action benchmarks in subsequent research.
- Published this work in IROS 2023, with subsequent research published in NeurIPS 2024.
- 2023.05 - 2023.09
Vision-Language-Action Models for Robotic Manipulation
This project built a robot manipulation simulator based on UE5, which can automatically generate robot manipulation demonstration data containing complex language instructions. With this simulator, we created a robot manipulation benchmark with progressive inference tasks, and proposed a world model-based robot manipulation method.
- Reviewed recent works related to robotic manipulation and deployed 3+ multimodal models (e.g., BC-Z, RT-1) using PyTorch.
- Trained multimodal models using reinforcement learning in the simulation environment to enable accurate comprehension of human language instructions and object grasping.
Publications
-
2024 Facial Prior Guided Micro-Expression Generation
IEEE Transactions on Image Processing
-
2024 CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition
Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS)
-
2023 Interactive Spatiotemporal Token Attention Network for Skeleton-Based General Interactive Action Recognition
2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
-
2021 Facial Prior Based First Order Motion Model for Micro-Expression Generation
Proceedings of the 29th ACM International Conference on Multimedia
Work
-
2024.07 - 2024.09 Deep Learning Intern (Sony R&D Center)
Sony (China) Ltd.
- Assisted in developing deep learning models on human action analysis, based on skeletal data and RGB-D videos.
-
2022.10 - 2022.12 Algorithm Development Intern (Team Leader)
Hangzhou Lingxi Robot Intelligent Technology Co., Ltd. (LINX ROBOT)
- Completed a defect detection project utilizing few-shot learning, involving the review of 40+ research papers, archiving 16 open-source datasets, and successfully reproducing 4 key methods.
- Completed an image matching and retrieval project, collecting and annotating over 400 images, and fine-tuning DELF and CGD methods, resulting in a 10% increase in top-1 accuracy.
Awards
- 2022.06
CVPR 2022 5th UG2+ Challenge Second Runner-up
CVPR 2022 5th UG2+ Workshop Organization Committee
It is awarded to our team (Titan5-HPL) in recognition of our solution submission to Challenge 2: Semi-supervied Action Recognition in the Dark.
- 2021.10
ACMMM 2021 Facial Micro-Expression (FME) Challenge Winner
ACMMM 2021 Facial Micro-Expression (FME) Challenge Organization Committee
It is awarded to our team (Titan5-HPL) for ranking 1st in Facial Micro-Expression Generation Task.
Skills
Programming | |
Python | |
C++ | |
C# |
Deep Learning | |
PyTorch | |
Tensorflow | |
Work with Multi-GPU Server & GPU Cluster |
Languages
Mandarin | |
Native |
English | |
Fluent |
Japanese | |
Basic |
References
Prof. Mengyuan Liu | |
Peking University |
Prof. Beichen Ding | |
Sun Yat-sen University |