Primer: A Practicing Attorney’s Guide to Artificial Intelligence
Use of AI in both warfare and military administration is poised to increase dramatically, and a DoD that embraces AI and its potential will gain a strategic advantage over its competitors in the future.
Artificial intelligence (AI) can be defined as “the ability to perform tasks that normally require human intelligence.” This definition encompasses technology that has been around for nearly a century as well as decades-old technology already embedded throughout the Department of Defense (DoD) such as: aircraft autopilots, missile guidance, signal processing systems, and even our human resource systems. While the emergence of AI may be aged, recent advancements in large data sets, increased computing power, improved machine learning algorithms, and open source code libraries have led to a considerable increase in real-world applications for AI. These advances are already drastically revolutionizing our gadgets and lives towards a more AI-centric future. A more AI-centric future can offer increased accuracy, increased capability, reduced human capital requirements, and a distinct advantage in future military operations. There is even evidence that AI can make us happier and healthier. However, the promise of a more AI-centric future also brings unfamiliar threats driven by the speed of development and technological sophistications within the field of AI. The legal profession is poised to play a vital role in shaping how AI impacts our lives, the DoD, and society. But, before it is possible to know how Air Force and other DoD attorneys can succeed in such an endeavor, it is necessary to understand how AI works.
Expand Your Knowledge
Major Groups of Artificial Intelligence Systems
Generally speaking, AI is split into two large groups, rule-based (RB) systems and machine learning (ML) systems, conditioned upon on how the machine "learns.” The first group is comprised of systems which learn from rule-based techniques; these machines are called rule-based systems or handcrafted knowledge systems. RB systems learn through a process of reducing knowledge to if-then statements—known as a rule set or rule sets—whereby each rule obliges a specific output that is predetermined by the given input. A human operator "teaches" the machine using traditional software programming. A “classic” example of a RB system is International Business Machines Corporation’s (IBM) Deep Blue® chess playing computer. The RB Deep Blue® system bested reigning World Chess Champion Gary Kasparov in New York City on 11 May 1997. This victory was the result of IBM's extensive collaboration with chess champions to develop if-then rule sets that the computer system would follow when countering a chess move made by a human player.
A human operator “teaches” the machine using traditional software programming.
The concept of RB learning can also be seen in everyday work as a legal professional through the use of e-mail inbox rules. For example, an operator may teach a system to automatically move e-mails sent by firstname.lastname@example.org from an inbox folder to a subfolder labeled "Office.” More complex inputs by the operator illustrate multilayered rules, like when an operator “teaches” a system to forward e-mails received from email@example.com which also contain the word “invoice” to a different e-mail address altogether (i.e., firstname.lastname@example.org.) Depending on a system's function, the rule set(s) will be more or less complex and can even be used in conjunction with the second major group, machine learning systems, to form an AI system. As such, RB learning systems will continue to remain relevant for the DoD.
ML systems “learn” by interactions with a real environment, a simulated environment, and/or training data sets.
Machine Learning Systems
Machine learning systems include machines which learn through adaptive capabilities. In contrast to RB systems which are human-programed and have fixed rule sets, ML systems "self-program" by creating rules which the system may later discard, modify, and/or create new rules—to varying degrees. ML systems "learn" by interactions with a real environment, a simulated environment, and/or training data sets. In simple terms, the primary distinction between RB and ML systems is: when an AI system executes the same task(s) on the same data population, a RB system will have the same output every time but a ML system should produce a more efficient and effective output at each subsequent interval. To achieve more efficient and effective outputs, ML systems use a mathematical algorithm built with software code that gives varying values to data which is imputed into the system. ML systems are further divided into four subsets, called learning methods, determined by how the algorithm handles data. The four learning methods—supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning—are differentiated by learning algorithm and input data characteristics, as discussed below.
EXPAND YOUR KNOWLEDGE
External Link to Additional Resource
Supervised, Unsupervised, Semi-supervised, & Reinforcement Learning Methods
Supervised learning uses input data known as training data. Training data is data which has been labeled, often by a human supervisor, to the correct data class. This type of input training data is called labeled data. In other words, supervised learning involves the process of identifying raw data (images, text files, videos, et certera) and adding one or more meaningful labels to provide context that is used by a software algorithm. The goal of each learning method is to teach a machine a particular function and the human supervisor must keep that goal in mind if labeling input training data. For example, if the goal is to identify F-16 aircraft from overhead imagery, the human supervisor should collect a sample group of aircraft photographs and assign the photographs to a particular class (F-1s, F-14s, F-15s, F-16s, F-22s, et cetera). When shown a new image, the model will predict the correct aircraft classification of the new image by comparing the new image to the training data.
Supervised learning systems tend to have higher performance levels than unsupervised systems; however, they are more time-intensive to build and require sizable training data sets.
In contrast, unsupervised learning uses unlabeled training data and assigns a particular data class based on detected patterns. Unsupervised learning occurs most frequently when there is not enough expert knowledge to assign correct class labels, when the training data is so large that it is economically or temporally impractical to label the data, or when researchers are asking questions without already knowing the correct answer. Supervised learning systems tend to have higher performance levels than unsupervised systems; however, they are more time-intensive to build and require sizable training data sets. Semi-supervised learning, as the name suggests, uses a combination of both labeled and unlabeled training input data. Once trained, a ML system may use labeled or unlabeled data regardless of whether it is a supervised, semi-supervised, or unsupervised system.
The last type of learning method is reinforcement learning. Reinforcement learning uses feedback obtained through trial-and-error; whereby, a machine is tasked to make a decision (action), receives a reward or punishment (feedback) based on whether the action was consistent with the machine's predefined goal(s), and then applies feedback to influence subsequent decisions. Reinforcement learning appears to be the most complicated learning method at first glance but it is easily demonstrated through a real-world example. When teaching a dog to sit (action) you provide the dog feedback based the action aligning, or not aligning, with your predefined goal (the dog sitting). If the dog sits, you give the dog a treat (positive feedback). If the dog does not sit, you do not give the dog a treat (negative feedback). The dog uses the earlier feedback (receiving a treat or not) to decide whether to comply the next time you ask the dog to sit. By repeating the action and feedback loop, the output accuracy should increase at each subsequent interval and, eventually, the dog should sit every time you ask.
It is worth noting that deep learning, also known as deep neural networks, is a ML technique that can be applied to any of the abovementioned learning methods but the technical details are beyond the scope of this article. While an in-depth discussion of deep learning and neural networks goes beyond the scope of this article, it is important to highlight a common AI technique that is frequently used to improve the software algorithm. General adversarial models (GANs) use two sub-models to train an AI system—a generator and a discriminator. A generator produces a plausible example, such as an image of a fake person, and a discriminator compares the plausible example against a real image to determine which is real.  The generator improves its algorithm model based upon its ability to trick the discriminator.
Legal Principles for Artificial Intelligence
In general, an AI system consists of two components: the software that makes up an algorithm and the data that interacts with the software algorithm. Humans remain critical to the development and deployment of AI systems through choosing algorithms, formatting data, setting learning parameters, and troubleshooting problems. Potential legal challenges relating to AI systems are numerous and daunting, so having a baseline understanding of AI systems is crucial to a successful legal review. In this regard, the ability to distinguish between RB and ML systems and various ML models is essential for attorneys practicing in this field. For one reason, a legal review of RB systems does not require addressing training data because RB systems do not use training data. Similarly, understanding the kind of ML model that is being used can help the legal practitioner identify which types of data should be evaluated in the legal review. When examining an AI system, it is best to analyze the system from three distinct perspectives: first, an overarching view imposed by the DoD, called AI ethics; second, a view from the point of data, which interacts with the software algorithm to produce an output; and third, a view from the point of software, which is primarily comprised of the algorithm on which the AI system operates.
The DoD published five ethical principles for guiding the ethical development of combat and non-combat AI capabilities on 24 February 2020, as part of its efforts to be a leader in the fields of AI and AI regulation. Those principles, listed below, encompass the general areas of responsibility, equity, traceability, reliability, and governability. However, ethical principles exist beyond those listed below, and legal reviewers must exercise due diligence and remain cognizant of the fact that there may be other controlling regulations, dependent on the customer or audience.
1. Responsibility: DoD personnel will exercise appropriate levels of judgment and care, while remaining responsible for the development, deployment, and use of AI capabilities.
2. Equity: The Department will take deliberate steps to minimize unintended bias in AI capabilities.
3. Traceability: The Department’s AI capabilities will be developed and deployed such that relevant personnel possess an appropriate understanding of the technology, development processes, and operational methods applicable to AI capabilities, including with transparent and auditable methodologies, data sources, and design procedure and documentation.
4. Reliability: The Department’s AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness of such capabilities will be subject to testing and assurance within those defined uses across their entire life-cycles.
5. Governability: The Department will design and engineer AI capabilities to fulfill their intended functions while possessing the ability to detect and avoid unintended consequences, and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior.
On 26 May 2021, Deputy Secretary of Defense issued a memorandum affirming DoD’s commitment to the DoD ethical principles and implementing responsible AI (RAI) in the DoD. Despite the potential confusingly similar name to first ethical principle of AI ethics, RAI is DoD’s implementing strategy for all five of the ethical principles. The Joint Artificial Intelligence Center (JAIC) serves as DoD’s coordinator for development and implementation of RAI strategy, guidance, and policy. As of the date of this article, the JAIC has not published any official policy for interpreting AI ethical principles outside of the data strategy document and the RAI implementation memorandum; however, legal practitioners employed in the development or deployment of AI should check the JAIC website for updated material. Until the DoD formalizes such policies, attorneys should, at a minimum, note in their legal reviews that the DoD AI ethical principles were considered prior to procurement, development, or deployment of an AI system.
As AI becomes increasingly more widespread, complex legal questions will naturally arise regarding acquisition, development, use, and ownership of the underlying data used to train or operate an AI system. Just as the underlying software is becoming more advanced, so too is the availability of data and the complexity associated with handling that data. Forbes reported that 2.8 quintillion bytes of data were created each day and over 90% of the world's data had been created over the preceding two years, at the time of the article in 2018. Since 2018, the amount, type, and availability of data has only been increasing. The DoD is starting to recognize the power that vast data can have in AI development and deployment. To that end, the DoD has begun to focus on becoming a more data-centric organization that uses data at speed and scale for operational advantage and increased efficiency. The transformation of the DoD to a data-centric organization created the need to re-think the importance of data throughout the organization and acquisition life-cycle. The result is that the federal government and the DoD now consider data a strategic asset.
The legal practitioner should consider what is happening to the data at each of these stages with a general understanding of personnel with access, how the data will be used, and constitutional or other legal implications.
Three Distinct States
Data is a strategic asset that does not exist in a single state, but exists across three distinct states—in use, at rest, and in transit, also called data in motion. Data at rest is all data in computer storage that is not currently being accessed or transferred, data in motion is data that is moving or being transferred between locations within or between computer systems, and data in use is data that is currently being updated, processed, accessed and read by a system. The legal practitioner should consider what is happening to the data at each of these stages with a general understanding of personnel with access, how the data will be used, and constitutional or other legal implications. This is particularly important in areas where there are restrictions, controls, or privacy implications to data access. For example, data containing personally identifiable information (PII) may require: a privacy impact assessment, system of records notice (SORN), contractor approval for handling, a public affairs review, or constitutional considerations for how the data is being used. Additionally, multiple contractor approvals may be necessary if different contractors handle the data at different states. For example, one contractor may work on storage and handling of the data and another contractor may work on handling the data when used to train an AI system.
In order for data to be usable for AI systems it must be properly formatted across all three states. For the DoD, proper formatting means that the data is visible, accessible, understandable, linked, trustworthy, interoperable and secure across each state. The Department of the Air Force, Chief Data Office, is responsible for the Air Force’s policies and procedures for handling data. As of the date of this article, the Chief Data Office has not published an official policy on how to satisfy DoD’s formatting requirements, but education and training standards are already being implemented. As such, legal practitioners should continue to monitor this area for new developments in standard licensing terms, formatting requirements, and other data-structuring procedures.
When evaluating an AI system, a legal practitioner must determine the underlying rights, if any, associated with the data.
The DoD is directed to maximize data sharing and data-use rights. In fact, ownership in technical data is essential for ensuring Department of the Air Force systems remain affordable and sustainable. However, data suitable for AI training and use may carry various restrictions or terms which could limit the application of an AI system in its intended end state. Just because a Department of the Air Force unit has access to the data, does not necessarily mean the unit owns the data itself. Failure to properly account for ownership and future use of underlying data can have drastic implications for usability of an AI system down the line. When evaluating an AI system, a legal practitioner must determine the underlying rights, if any, associated with the data. For example, some licenses limit data use to educational or research purposes only. Additionally, to ensure the data is ultimately usable, attorneys should be careful to clarify new rights, or changes to existing rights, if formatting data. If a contractor formats or makes changes to the data, associated licenses or contracts must identify what rights are attached to the formatted data. The role of a legal practitioner when examining data for AI systems should be to ensure the Department of the Air Force is using data consistent with any terms or restrictions and that data acquired for AI enables the greatest flexibility well into the future.
The DoD Data Strategy calls out data ethics as a legal consideration distinct and unique from AI ethics. While the DoD Data Strategy does not provide a clear definition for data ethics, the federal data ethics framework defines data ethics as “norms of behavior that promote appropriate judgments and accountability when acquiring, managing, or using data, with the goals of protecting civil liberties, minimizing risks to individuals and society, and maximizing public good.” Therefore, a legal review assessing data which was acquired for, or is used by, an AI system should regard ethical implications of the data itself as a discrete consideration. While most ethical considerations relating to data are self-evident, such as civil liberties, some are not as readily apparent. One less obvious consideration involves actions that may qualify as human subject research. Many AI systems utilize or analyze data containing PII, but such use may qualify as human subject research under applicable DoD regulations regardless of whether the data was training or operational data. For example, surveillance cameras outside of a Base Exchange (BX) capture images of its customers and those images likely contain PII, or information which could be used to distinguish or trace the identities of those customers. If A ML system uses the BX live camera feeds, it may qualify as human subject research at two distinct times—as the machine trains on the data before operational use and as the machine adapts once it becomes operational.
Similar to data, ownership in software is essential for ensuring Department of the Air Force systems remain affordable and sustainable. Legal practitioners must determine software ownership and applicable restrictions, if any. While this task may seem relatively straightforward, it can quickly become complicated if employing a software suite that contains software code incorporating several different license structures. Aside from the complexities of multi-layered license structures, even so called “open-source” software sets may convey restrictions and terms on software use. Addressing this early on can pay dividends for the command and mission in the long run.
While weapon systems are the obvious choice for legal review requirements, non-weapon systems may also violate policy and law at state or federal levels.
Review of the software is also where a legal practitioner should look to federal and state laws addressing how AI systems can and will be used. Department of Air Force attorneys may quickly jump to the weapons review process and considerations outlined in Department of Defense Directive (DoDD) 3000.09, Autonomy in Weapon Systems, and Air Force Instruction 51-401, The Law of War; however, those policies address legal considerations for some, but not all, of the uses of AI in a weapons system. Indeed, there currently exists a noted gap in DoDD 3000.09 for AI weapon system considerations and an ongoing debate outside the scope of this article on how to address that gap. While weapon systems are the obvious choice for legal review requirements, non-weapon systems may also violate policy and law at state or federal levels. The ability of DoD-compliant AI systems to lawfully operate is not at all assured. For example, several states have laws restricting or banning the use of biometrics, which would directly affect the feasibility of an AI facial recognition system for detecting intruders. Many of the data considerations will affect the software considerations and vice versa; however, a review should analyze considerations separately given that data and software may operate independently from each other in an AI system.
The use of AI in both warfare and military administration is poised to increase dramatically, and a DoD that embraces AI and its potential will gain a strategic advantage over its competitors in the future. However, a number of challenges related to technology, policy, process, and data will continue to challenge those working in the dynamic field of AI. To address these challenges, the DoD published ethical principles and an implementation memorandum for responsible AI, but the absence of formal policies related to data formatting and data rights is another obstacle to the successful integration of AI in military applications. Collectively, these factors represent an enormous charge for legal practitioners. To help navigate this ever-changing field we established key concepts and provided a framework for legally examining AI from three viewpoints—data, software, and ethics. In evaluating data, an attorney must determine data ownership and whether there are restrictions, controls, or privacy implications all while considering how the data is being used and accessed against all three states. Similarly, an attorney must evaluate governing licenses and laws to determine whether there are restrictions on software and its use. Lastly, there is an obligation to ensure ethical use of data, software, and AI systems generally. As such, practicing attorneys must examine an AI system from three distinct views in order to ensure the system as a whole is legal.
About the Authors
at Executive Summary.
Allen, Understanding AI Technology.
Allen, Understanding AI Technology.
Data can be defined as “the representation of information in a formalized manner suitable for communication, interpretation, or processing by humans or by automatic means, and is concerned with the encoding of information for repeatability, meaning, and proceduralized use.” See
Chief Information Officer, Department of Defense, Glossary,
available at: https://dodcio.defense.gov/Library/DoD-Architecture-Framework/dodaf20_info_data/
Allen, Understanding AI Technology
IJAIA, Vol. 2, No. 2.
Allen, Understanding AI Technology
at 11 - 13.
An example of a GAN in action can be seen on the website https://www.whichfaceisreal.com/
. Here, a computer system trained through a GAN presents an image of a fake person. The visitor to the website, working as a discriminator, selects which image is a photograph of a real person. Each selection by the human improves the accuracy.
Brownlee, A Gentle Introduction to Generative Adversarial Networks (GANs).
Allen, Understanding AI Technology
DoD, DOD Adopts Ethical Principles for Artificial Intelligence
As defined in DoDI 5400.11, DoD Privacy Program.
4th Amendment, United States Constitution.
Deputy Secretary of Defense, Memorandum for Senior Leadership – Creating a Data Advantage.
Deputy Secretary of Defense, Memorandum for Senior Leadership – Creating a Data Advantage.
University at Buffalo, The State University of New York, IBM-UB Handwritten Database Sub-License Agreement,
available at: https://cubs.buffalo.edu/hwdata/license-agreement
(an example of a data license which restricts use to research and educational purposes only).
DoD, Executive Summary: DoD Data Strategy.
32 CFR 219, Protection of Human Subjects;
DoDI 3216.02, Protection of Human Subjects and Adherence to Ethical Standard in DoD-Conducted and Supported Research
, 15 April 2020.
Defense Acquisition University, Air Force Data Rights Guidebook