cv

Education

Projects

  • 2023.01 - current
    Learning Skeletal Multi-Entity Interactions
    This project aimed to recognize and understand interactions involving multiple entities—such as human bodies, hands, objects, or other elements within a scene. Our goal was to unify the recognition tasks of person-peron, hand-hand, hand-object interactions, and even group activities.
    • Led and played a key role in every aspect of the project, including method design, coding, conducting experiments, result analysis and manuscript writing.
    • Proposed a model based on entity rearrangement and token attention, achieving SOTA performance across 4 benchmarks.
    • Enabled the single-entity backbones (e.g., CTR-GCN) to achieve SOTA performance on 6 multi-entity action benchmarks in subsequent research.
    • Published this work in IROS 2023, with subsequent research published in NeurIPS 2024.
  • 2023.05 - 2023.09
    Vision-Language-Action Models for Robotic Manipulation
    This project built a robot manipulation simulator based on UE5, which can automatically generate robot manipulation demonstration data containing complex language instructions. With this simulator, we created a robot manipulation benchmark with progressive inference tasks, and proposed a world model-based robot manipulation method.
    • Reviewed recent works related to robotic manipulation and deployed 3+ multimodal models (e.g., BC-Z, RT-1) using PyTorch.
    • Trained multimodal models using reinforcement learning in the simulation environment to enable accurate comprehension of human language instructions and object grasping.

Publications

Work

  • 2024.07 - 2024.09
    Deep Learning Intern (Sony R&D Center)
    Sony (China) Ltd.
    • Assisted in developing deep learning models on human action analysis, based on skeletal data and RGB-D videos.
  • 2022.10 - 2022.12
    Algorithm Development Intern (Team Leader)
    Hangzhou Lingxi Robot Intelligent Technology Co., Ltd. (LINX ROBOT)
    • Completed a defect detection project utilizing few-shot learning, involving the review of 40+ research papers, archiving 16 open-source datasets, and successfully reproducing 4 key methods.
    • Completed an image matching and retrieval project, collecting and annotating over 400 images, and fine-tuning DELF and CGD methods, resulting in a 10% increase in top-1 accuracy.

Awards

Skills

Programming
Python
C++
C#
Deep Learning
PyTorch
Tensorflow
Work with Multi-GPU Server & GPU Cluster

Languages

Mandarin
Native
English
Fluent
Japanese
Basic

References

Prof. Mengyuan Liu
Peking University
Prof. Beichen Ding
Sun Yat-sen University