Technology Interchange—Getting Smart About Implementing Artificial Intelligence in Research & Development
Posted on Thursday, September 21, 2023
By Dr. Erik Sapper
Machine learning (ML) and artificial intelligence (AI) are rapidly becoming foundational tools in the research and development of paints and coatings. While companies are learning of the costs and benefits of integrating ML and AI into their R&D workflows, the best practices for doing so are still not well established.
With the recent advent of large language models like ChatGPT, it has become easier than ever to begin implementing these tools in an organization. However, please do not confuse ease of use with proper use. Implementation is one thing; effecting meaningful change in your organization is another thing entirely. In this article, I will try to provide quick context for when, how, and why to use ML and AI in a powder coatings research and development environment, along with some tips and strategies for getting started with your own data and machine learning workflow.
What Can ML and AI Do?
You should think about ML and AI tools as methods for squeezing information out of your data. These algorithms are great at finding trends in datasets that the typical human, even a subject matter expert, may overlook. The structure of a neural network can be used to demonstrate how data is filtered through a model training process. Neural networks may be thought of as a nested series of functions (nodes) that take a diverse set of inputs, pass them through said series of nested functions, and produce a set of one or more predicted outputs. Figure 1 shows a schematic of a typical neural network. When more hidden layers of functions (or nodes) are added, we can call this process deep learning. Think of a well-trained neural network as a means of filtering or refining your dataset, separating the wheat from the chaff, to produce a complex function that can predict the outputs that are relevant to your research and development data.
What Can ML and AI Not Do?
There is no machine learning algorithm that can fix a bad dataset. While it is common to impute or estimate missing values in large data sets, even the best data preparation and cleaning procedures cannot produce predictive or informative value from data that was generated in a poorly designed or executed experiment. Here, we define bad data as data that is untrustworthy, improperly collected, made up, or statistically unbalanced across a design space of parameters.
Because of this, focusing on designing better experiments in the future is often a more value-added approach than applying machine learning analysis to historical data. The best practice of using a statistical design of experiments (DOE) approach still applies when it comes to collecting data. Note that bad data is not the same as data from bad or poorly performing formulations; that data should always be left in your dataset.
What Mindset is Needed?
I recommend approaching the implementation of these tools as both an experiment and a mindset shift in how data is collected and analyzed. The nature of data analysis and model training is messy. Data can be cluttered, data cleaning procedures can be laborious, and the ways to train a model on your data are increasing every day. Bring an experimental mindset to the work, embracing the scientist part of being a data scientist. I also stress not getting hung up on writing beautiful code. Computational costs are cheap. You do not have datasets that count as Big Data, so don’t waste time optimizing computer code when the difference is milliseconds or less in computation time. Worry about that once you have gigabytes of data to work within a single database.
A Brief Roadmap for Getting Started
Feeling adventurous? Begin your journey in machine learning and artificial analysis by following this simple roadmap.
1. Decide on a programming language and computing environment.
I recommend Python (python.org) as a programming language and Jupyter notebooks (jupyter.org) as a beginning environment in which to perform data analysis and model training. There are countless tutorials to be found online. Try a search for “Setting up Jupyter notebooks on a [Mac/PC]” and take it from there.
2. Collect and clean your data.
This is harder than it sounds. Start with a small but complete data set that clearly has a well-understood set of quantitative inputs (features or variables, such as weight fractions of formulation components, particle size, electrospray conditions, or substrate preparation) and quantitative outputs (or responses, such as gloss, hardness, scratch recovery, or corrosion resistance). Your data should exist as a table containing columns of inputs and outputs for each case, instance, sample, formulation, or trial in your dataset.
3. Train a machine learning model on your data.
I was recently asked by a colleague what should be done to prevent the use of ChatGPT by students in their college course. My response? Nothing. Encourage it. Build lesson plans and assignments that integrate the critical use of large language models like ChatGPT. For example, have ChatGPT turn an outline into a rough draft of an essay, which you then edit and improve yourself. While I do recommend that budding data engineers and scientists consult literature to learn about various algorithms, I also encourage new users to not be shy about having tools like ChatGPT assist their efforts. People will disagree with me on this, I am sure. My response to them is a question: what are you using ML for? Are you trying to develop new mathematical algorithms and procedures, or are you trying to understand and optimize your materials, coatings, and application methods? Are you trying to publish results in an academic journal, or are you trying to get new materials out the door in a timelier manner? I suspect in nearly every case the answer is the latter. Don’t be timid about using tools that help to accelerate those outcomes. Create an account with OpenAI for using ChatGPT at www.openai.com. The free version is sufficient. Ask ChatGPT to help you create Python code that can train a model on your dataset. Examples of a ChatGPT prompt written by a user (me!) and the ChatGPT response are shown in Figures 2 and 3.
The code that is generated by ChatGPT can be copied, pasted, and run in your Jupyter notebook. Ask ChatGPT to explain any steps that you are unfamiliar with. If your code doesn’t work, try pasting the error message into ChatGPT.
4. Design new materials, optimize processes, and keep learning.
With a trained model in hand, you can begin the more adventurous task of designing new powder coatings and optimizing manufacturing and application processes. You may wish to use the trained model to make one-off predictions, to extract new heuristics or rules of thumb for your materials or processes, or to discover interrelationships and dependencies between various formulation components, application processes, and end-use coating performance.
Re-train your models when new data is collected, keep asking questions and learning, and—most importantly—have fun!
Dr. Erik Sapper is an associate professor at California Polytechnic State University and a faculty fellow in the California Polytechnic University’s Center for Innovation and Entrepreneurship.