Parsa, BehnooshBehnooshParsaSamani, Ekta U.Ekta U.SamaniHendrix, RoseRoseHendrixDevine, CameronCameronDevineSingh, Shashi M.Shashi M.SinghDevasia, SantoshSantoshDevasiaBanerjee, Ashis G.Ashis G.Banerjee2025-08-312025-08-312019-10-0110.1109/LRA.2019.29253052-s2.0-85069767933https://d8.irins.org/handle/IITG2025/23176Automated real-time prediction of the ergonomic risks of manipulating objects is a key unsolved challenge in developing effective human-robot collaboration systems for logistics and manufacturing applications. We present a foundational paradigm to address this challenge by formulating the problem as one of action segmentation from RGB-D camera videos. Spatial features are first learned using a deep convolutional model from the video frames, which are then fed sequentially to temporal convolutional networks to semantically segment the frames into a hierarchy of actions, which are either ergonomically safe, require monitoring, or need immediate attention. For performance evaluation, in addition to an open-source kitchen dataset, we collected a new dataset comprising 20 individuals picking up and placing objects of varying weights to and from cabinet and table locations at various heights. Results show very high (87%-94%) F1 overlap scores among the ground truth and predicted frame labels for videos lasting over 2 min and consisting of a large number of actions.falseAction segmentation | Computer vision for automation | Deep learning in robotics and automation | Ergonomic safety | Human-centered automationToward ergonomic risk prediction via segmentation of indoor object manipulation actions using spatiotemporal convolutional networksArticlehttps://arxiv.org/pdf/1902.05176237737663153-3160October 2019278746140arJournal25