Verifying and Validating AI/ML
Artificial intelligence and machine learning (AI/ML) systems aid human decision-making or make decisions in applications ranging from medical systems to self-driving cars. We need to be able to trust the AI/ML decisions, so it is essential to verify and validate the AI/ML to ensure the system is functioning correctly in both normal and abnormal or edge conditions.
We’ll address the differences between verification and validation, as well as how planning should account for not only normal but also edge conditions.
There are many recent examples of an AI / ML system being fooled by relatively simple alterations of the sensor inputs. For example, the alterations of street signs based on innocently placed stickers or maliciously placed tape or shapes. Another example is signs altered by natural phenomena such as being hit by vehicles, damage from environmental causes such as wind damage, or the sign being tilted due to placement in soft soil.
In a clinical setting, there are many sources of “noise” or interference that can cause an acquired signal to appear different than the data that was used to train the algorithm causing erroneous output. These include injected noise from nearby cell phones and interference caused by nearby devices such as MRI systems. It even includes the patient themselves generating noise by moving, tapping on the electrodes, or sitting on an ECG lead wire.
While these exceptional conditions or inputs, termed here as “edge” conditions, can be numerous, a detailed review of the use cases and sources of inputs outside the ideal sources should be part of the planning and design of the system.
Verification vs. Validation
In machine learning, verification is testing that your product meets the mathematical description, specifications, and requirements you have written, “Did I build what I said I would?”
Validation determines that the model accurately responds to the real work inputs or application.
A key aspect of training, verification, and validation is the data sets or inputs used to train, verify, or test and finally validate the model.
In addition, if the model is dynamically changing or continually learning based on incoming data, the verification and validation need to not only measure and establish or validate the behavior of the initial trained system. This must include the verification and validation of the methods used to gather and potentially filter the incoming data and how the system adapts to that data.
An important aspect of the data sets used to develop, architect, verify and validate an AI / ML model, should include examples of all expected inputs. This includes inputs that contain edge conditions that might be caused by noisy inputs or even failed or compromised input sources or sensors. In a continuous learning scenario, that potentially changing data may also be updating the model compensating for the change or possibly adversely affecting the outcome.
For instance, the audio input may contain background noise or speaking, a camera lens may get dirty, a humidity sensor may degrade over time or RF-induced noise may appear due to a cell phone. During the specification of the system, the span of expected and unexpected inputs should be considered, and data gathered or synthesized to include these potential edge conditions. Performing verification and validation ensures the AI/ML has delivered on its purpose in an error-free state under extreme or edge conditions.
One of the advantages of an AI/ML system that has been architected, trained, and updated by this data, is that the system may “retrain” itself on changing data. It will compensate for changes or noise in the incoming data provided the system is tracking the key output metrics.
Initial Design Choices
Upfront design choices and requirements on the input data, how the data will be collected and stored and partitioned into the required data sets needs to be planned and specified. Building a plan that includes a risk assessment of edge conditions or inputs should be part of the over-planning.
Use case analysis and problem space analysis to help identify not only the available data inputs but also the presence of non-problem space or edge inputs that may appear in the incoming data.
The design and architecture of the system can then take the discovered non-ideal or edge condition inputs into consideration to be certain that this exception data has a low probability of affecting the system output. The knowledge of the edge conditions can be used to make sure both the training data and the validation data sets include these edge conditions.
Building a table of possible interferers, their characteristics and probability can go a long way toward managing, planning for, and “training” in robustness.
An interfering input would have a type, cause, magnitude, probability, harm or effect, and a training and validation strategy. In this manner, the interferers can be accounted for and collected in the data gathering or synthesized for inclusion in the testing and validation data sets.
Building the Sets
In AI/ML two different sets are developed. These sets are:
The database that is used to develop the algorithms, consists of over-read or annotated data. The development data set should ideally have representative data for all of the expected parameters, edge conditions, as well as parameters that may be experienced such as the presence of 60 Hz, baseline wander in an ECG system, or contamination on a camera lens.
A database that is utilized to test and report on the performance of the algorithm. This database should contain coverage of all expected normal parameter variations, abnormal conditions, noise expected, and so on. In addition, this database requires ‘full space’ coverage of the problem space.
Tools and Frameworks
In building the test set it is important to build a framework and toolset that allows inputting data from large databases and captures the outputs to a tool to calculate the performance statistics and criteria in a repeatable manner. This allows rapid development and experimentation while building the system and a repeatable mechanism of gauging the performance of the system.
Verification will be complicated if an applicant cannot convincingly establish ground truth. This is because the approach is predicated on the ability to confidently establish ground truth in the output of the system vs. the inputs. Ground-truthing refers to the process of gathering the proper objective, provable data for a test or the ground truth.
An additional complication is not being able to locate data that contains edge or exceptional conditions that the use case may include. In this case, a plan to find, gather, or synthesize the data needs to be developed and reviewed to ascertain data coverage of normal and compromised data.
It is critical to consider the use cases, the problem to be solved, and the not-so-normal data that the problem can present when architecting, designing, and implementing an AI/ML system.
The coverage of the development, training, testing, and validation data sets needs to be complete to assure that the system will meet the requirements.