AWS for Industries
Using AWS generative AI to improve defect detection in Manufacturing
Product quality control and surface defect detection in manufacturing is very important for overall product cost reduction and customer satisfaction. Simply relying on humans to visually inspect products is unsatisfactory and does not scale; today’s computer vision models require vast amounts of images data for detection of surface area defects. In today’s smart manufacturing ecosystem, product cycle times are very short. Hence, leveraging advanced technology is critical; however, there are few datasets that assist in model development for surface defect detection. Small and medium-sized manufacturers typically lack the technical capabilities and computational resources required. High resolution images and inputs are required but do not exist so training new models for defect detection are constrained. To address these problems, DXC Technology and AWS leveraged AWS generative AI services to create image datasets to better train computer vision models to detect manufacturing defects.
In this blog post, DXC Technology showcases the utilization of Amazon SageMaker and its Large Language Models (LLMs) to generate infrequent image events crucial for modern manufacturing surface defect solutions. DXC used Stable Diffusion 2.1, which uses text-to-image models trained using a new encoder-OpenCLIP, developed by LAION with support from Stability AI. This model can be used to generate images based on text prompts and allows for image variation. Due to its modularity, it can be combined with other models such as Karlo. Later in this blog post, we will discuss how generative AI can be used to produce these rarely occurring images, but let’s quickly review how computer vision works.
Surface detection: technology and challenges
Surface defect solutions rely on technologies that automatically recognize images and describe them accurately and efficiently; this is computer vision. Computer vision systems use special programming and technology to mimic the capabilities of the human brain responsible for object recognition and classification. These systems require access to a large volume of images and video sourced from cameras and other devices. Computer vision applications use artificial intelligence and machine learning to process this data accurately for object identification and defect recognition. Surface detection solutions will interrupt the manufacturing processes and recommend activities to remediate the defect.
In the past, developers had to manually tag tens of thousands of images for defect detection; a process that is time consuming, incomplete, and error prone. Image data is also unstructured and complex, so automating these tasks requires extensive computer power. Manufacturing use cases are also highly esoteric and there are environmental considerations such as lighting, dust, fluids, and vibration. There are also a variety of materials to consider, because surfaces could be aluminum, plastic, glass, coatings, or fiber. The product being manufactured has to be taken into consideration as washing machines, toasters, and automobiles have their own size, shape, and surface requirements. The need for rarely occurring instances, “EDGE CASES,” of defects is needed to improve the computer vision models for surface detection. This type of data is sparse and is not something that can be purchased through a third party.
Historically, most of the standard industrial applications of computer vision have been traditional artificial intelligence models. This means that they require a lot of training data in order to train the model to make good predictions in a deployment situation. Not only is the large quantity of data important, but it requires that the data is diverse and representative of the situations the model will see in deployment. The remainder of this post compares how historical computer models perform against the AWS generative AI solution.
Surface detection: comparing two methodologies
Two teams from DXC Technology trained and deployed a scratch classification model using the two methods listed below. Both relied on synthetic data for training a classification model with the first team historical model with on-premises resources, and the second team attempted to replicate and improve the results using Generative AI technology on AWS.
The two methodologies are:
Group 1: Synthetic data generation from 3D models. Group 1 focused on synthetic data generation using 3D rendered images of cars and cropped the images using software animation. The process injected scratches using digital alterations and traditional augmentation of the data to create several datasets on which the model is subsequently trained and evaluated. The group leveraged on-premise systems for training, evaluation, and deployment.
Group 1 used 4 x NVIDIA V100 GPUs for approximately 100 hours for training. The models were trained with the TensorFlow framework, and the infrastructure management around training was a significant burden to Group 1 with a steep learning curve.The infrastructure was sufficient with enough GPU power to perform training; however, issues with on-premise computations were observed, training processes stopped due to errors, and management of prolonged training times were challenging.
Group 2: Synthetic data generation with generative AI. Group 2 operated independently from Group 1 and leveraged generative AI Stable Diffusion models from AWS. Group 2 used a small amount of images scraped from the web to fine tune a diffusion model and generate synthetic data. The group leveraged AWS services using Amazon SageMaker, Amazon Simple Storage Service (S3), Amazon CloudWatch, as well as accompanying storage and compute services.
The overall architecture for training and evaluation is presented in the diagram below.
The following methodology was followed:
- Scrape data of scratched cars from realistic dealerships from the internet.
- Collect and crop around 10-15 images of scratched and non-scratched surfaces.
- Prompt engineering (instance prompt) – describe the fine-tuning dataset with a single prompt. Additional improvements were made as well, such as the inclusion of a class prompt.
- Prompt engineering (generation prompt) – select a prompt for generation; it is usually the same as the original fine tuned prompt, but with varied parameters to reflect different embeddings in the fine-tuned space.
- Train an image classifier to detect scratches on surfaces.
Surface detection: results and outcomes
The key takeaways from the generative AI study are as follows:
AWS generative AI observed capabilities
- AWS generative AI has the capability of generating images quickly and affordably, even with relatively small amounts of data.
- The model generated high quality images for scratch detection, and it was able to abstract words and produce images of scratches from a seemingly diverse fine tuning data.
- Stable diffusion models generated good synthetic data after the models were fine tuned.
Direct comparison of AWS generative AI (Group 2) versus on-premises TensorFlow (Group 1)
- Group 1 – 3D modeling method took the team two months, whereas Group 2 was able to train a model in about 1-2 weeks.
- Group 2 offered significant improvement and more training options than the fixed infrastructure in Group 1. The improvements in Efficient Net made the detailed fine tuning unnecessary, as the initial model provided near perfect accuracy on the validation dataset with as little as 5 epochs.
- Group 2 generative AI creation of images was more efficient than Group 1 by a factor of 8 to 1.
- Group 1 leveraged Nvidia GPUs for training and had very different development timeframes due to infrastructure installation, setup, and support. With Group 2, this process was simple using AWS SageMaker, JumpStart and the readymade S3 storage. Group 2 reduced the infrastructure setup and support to almost zero.
Overall, the use of AWS generative AI for synthetic data on AWS is simple, flexible, and cost effective. With minimal effort, development, data collection, and manual efforts can be reduced, which makes computer vision development significantly easier and more accessible.