
Tokyo, Japan, August 27, 2025 - (JCN Newswire) — NEC Corporation (NEC; TSE: 6701) has developed AI technology capable of recognizing and digitalizing the tasks of workers without pre-training and utilizing on-site video from wide area worksites that use multiple cameras, including distribution warehouses, factories, and construction sites.
As this newly developed technology can be installed immediately at industrial workplaces, it will contribute to the visualization of entire worksites, which had previously not been possible, thereby improving productivity, optimizing the allocation of human resources, and streamlining workflows. NEC intends to commercialize this technology by fiscal year 2026.

While labor shortages are intensifying at distribution warehouses, factories, construction sites, and other industrial workplaces, many processes reliant on manual labor remain. As such, there is a growing demand to optimize the allocation of personnel and work processes by visualizing working conditions to utilize the limited labor force more effectively. Although technology for recognizing work tasks from video already exists, in order to recognize specific tasks at workplaces, a great deal of time and effort have been needed for preparations, including collecting on-site video data and training AI models.
Moreover, to digitalize work tasks over an entire wide area worksite using multiple cameras, workers must be identified across video from all cameras, and the work tasks recognition results must be consolidated for each worker. With conventional technology, however, it has been challenging to accurately distinguish workers wearing identical uniforms and to continue identifying the same person without error across multiple cameras.
The features of NEC’s newly developed technology are as follows.
1. Can be installed immediately since it is capable of recognizing work tasks in video from text input alone
Utilizing a vision language model (VLM), NEC has developed AI technology capable of recognizing a wide variety of work tasks without the need for pre-training and using video data. Recognition simply requires text input explaining individual work tasks, such as "retrieving packages from a shelf" for picking tasks or "pushing a cart to transport items" for cart transporting tasks.
In the past, work task recognition required collecting and annotating video data, and conducting AI model training, which could take several weeks to complete. In addition, identifying relevant objects that workers interact with or operate has conventionally proven difficult, thus recognizing tasks from video at industrial sites where various objects are intermingled has been a challenge.
This newly developed technology first (A) utilizes VLM in advance to analyze and extract features from text input describing individual work tasks. When analyzing video, this technology (B) identifies the relevant objects which a worker interacts with or operates using a proprietary AI model* for capturing relationships between people and objects, and then utilizes VLM to extract features from images containing the worker and the identified objects. By comparing and matching features extracted in (A) and (B), work tasks can be recognized from text input alone.
2. Contributes to the optimization of wide area worksites through digitalization of worker movement and work tasks
The utilization of multiple cameras contributes to the optimization of on-site work by identifying workers moving around the entire site without relying on clothing or other visual characteristics, and digitalizing the work tasks of each worker over an extended period of time across the entire worksite.
By estimating the locations of individual workers in a digital twin space (three-dimensional coordinates) by referencing the locations of the workers visible in each camera image (two-dimensional coordinates) and measuring the proximity of their locations and movement patterns on a digital twin, this technology makes it possible to identify the same person across multiple cameras with a high degree of accuracy. Moreover, since the camera parameters (i.e., camera position and orientation) necessary to estimate locations of workers are automatically estimated, the time and effort required for on-site installation can be minimized.
* NEC develops image recognition technology to digitalize a wide variety of work activities
https://www.nec.com/en/press/202211/global_20221128_01.html
About NEC Corporation
NEC Corporation has established itself as a leader in the integration of IT and network technologies while promoting the brand statement of “Orchestrating a brighter world.” NEC enables businesses and communities to adapt to rapid changes taking place in both society and the market as it provides for the social values of safety, security, fairness and efficiency to promote a more sustainable world where everyone has the chance to reach their full potential. For more information, visit NEC at https://www.nec.com.