Detecting and Characterizing Group Interactions Using 3D Spatial Data to Enhance Human-Robot Engagement
Date of Conference:
As robotic systems become increasingly integrated into human environments, it is critical to develop advanced methods that enable them to interpret and respond to complex social dynamics. This work combines a YOLOv8-based human pose estimation approach with 3D Mean Shift clustering for the detection and analysis of behavioral characteristics in social groups, using 3D point clouds generated by the Intel® RealSense™D435i as a cost-effective alternative to LiDAR systems. Our proposed method achieves 97% accuracy in classifying social group geometric configurations (L, C, and I patterns) and demonstrates the value of depth information by reaching 50% precision in 3D group detection using adaptive clustering, significantly outperforming standard 2D approaches. Validation was conducted with 12 participants across 8 experimental scenarios, demonstrating robust estimation of body orientation (40° error), a key indicator for interaction analysis, while head direction estimation presented greater variability (70° error), both measured relative to the depth plane and compared against OptiTrack ground truth data. The framework processes 120 samples at 2–6m distances, achieving 70% torso orientation accuracy at 5m and identifying triadic L-shaped groups with F1-score=0.91. These results enable autonomous robots to quantify group centroids, analyze interaction patterns, and navigate dynamically using real-time convex hull approximations. The integration of accessible 3D perception with efficient processing could enhance human-robot interactions, demonstrating its feasibility in applications such as social robotics, healthcare, care environments, and service industries, where social adaptability and collaborative decision-making are essential.