This chapter provides information about common tools for implementing AI. The predominant AI implementations require extensive coding for implementing or extending existing machine learning algorithms. Because AI is a fast-evolving field, the tools for such implementations usually cannot be done using software packages with a user-friendly Graphical User Interface (GUI). Instead, we will need to resort to a programming language or/and packages built with it. There may be a pragmatic reason for this fact: at an AI high tide, AI develops so fast that commercial software with GUI can hardly catch up, while during a low tide, AI becomes inadequately attractive to software developers.
Typical computer language only provides very basic functions. It is rare that we only rely on these basic functions for implementing AI. It is more common that we use functions built by others for AI implementations to save effort at "reinventing the wheel". Some of such functions have been wrapped up as packages shared across the community. As a result, when we are talking about AI tools, we mostly refer to a development environment or a hierarchical ecosystem consisting of a basic programming language and the packages built on it. Here, hierarchy connotes that some tools may rely on others.
The above fact exhibits the concept of "dependencies", which is very common in Linux environments but obscure in other operating systems. Therefore, such concepts may be new to you if you are only familiar with GUI-based environments such as those in Windows. In addition, because such AI development environments may be contributed more by open-source communities, you can still easily run into dependency or compatibility issues even if you use such tools in Windows. For example, this could happen when you use a software package that manages all the tools including the programming language, such as Anaconda. Anaconda includes the basic Python and popular packages built with Python, which can help us easily construct an AI coding environment running in various operating systems including Windows. Therefore, one of the first jobs to do in AI implementation is to be prepared for dealing with such dependency and compatibility issues.
Regarding the basic programming language, we need a language with versatility to handle the complexity inherent in AI projects. According to statistics, most AI developers prefer Python, while a much less common choice is JAVA with close competition by R, Prolog, and Lisp. Some other languages like Scala, Julia, and C++ have also been used for AI development.
Python is a general-purpose, object-oriented language with much emphasis on code readability. The selection of this language for AI by major AI developers helped this language become more popular and powerful. Python secured the first place in Tiobe's language popularity ranking [29] as of August 2024, which exhibits its predominance since 2022. In fact, Python is likely to be at the top of most other language popularity rankings as the third wave of AI continues.
There are many reasons for Python's popularity in AI. First, Python is very easy to learn. Because it uses an Englishlike syntax, it can be written much faster than other major languages like C/C++ and Java. Second, as an interpreted language, you can run the code on any platform with a Python interpreter. Third, Python's vast library support and huge community make Python is an excellent choice for beginners. In particular, in areas involving data, Python offers many powerful libraries such as NumPy, Pandas and Matplotlib, which provide what you need for the acquisition, manipulation, analysis, and visualization of data. What is more, Python is great for use in large-scale machine learning attributed to multiple out-of-box deep learning and machine learning libraries such as Scikit-learn, Keras, TensorFlow, and PyTorch. In summary, the versatility, easiness, and availability of powerful libraries, as well as the excellent integration and the vast
Figure 3.1: Structure of Python-based AI tools
and active community, helped Python secure a predominant role in AI and relevant areas.
Fig.3.1 presents an illustration of the structures of Python-based AI tools. We can divide the AI development environments into multiple layers: coding language, data manipulation and visualization tools, machine learning and data analysis tools, and deep learning tools, among which the latter/upper layers likely rely on the former/lower layers.
Many packages are available for data manipulation and visualization. NumPy is preferred for array operations [30, 31], while Scipy can provide extra complicated math functions and algorithms [32]. Pandas provides powerful tools for data treatment like the organization, rearrangement, storage, and input/output of data [33]. Visualization for viewing the structures and trends of the data can be performed with Matplotlib [34]. Extra visualization work can be finished using packages like Seaborn [35] and Bokeh.
Mature packages are also available for machine learning and deep learning. Scikit-learn is one of the most popular packages for machine learning [36]. For statistics-intensive work like probability distributions and advanced statistical models, we can resort to packages like Scipy.stats and Statmodels, which can achieve most goals similar to the R framework. Many packages are available for deep learning. On a "lower" level, we have popular options like TensorFlow [37], Theono, and PyTorch [38]. Deep learning can be practiced with relatively high flexibility on this level. On a "higher" level, packages like Keras further wrap up functions from lower levels like TensorFlow to conveniently implement deep learning at the cost of sacrificing some flexibility [39].
Software for reinforcement learning has not been that well developed compared with deep learning. But packages like OpenAI Gym can still be good tools for beginners to perform reinforcement learning [40].
It is worthwhile to mention that the software pool for AI is a highly dynamic area considering the fast technology iteration in the world of AI, especially when AI is at a high tide. For example, the popularity shift from Caffe [41] to TensorFlow to PyTorch in deep learning occurs only within several years. The development is not necessarily a one-way process and could be very complex. However, even if newer and fancier players come to the playground, the above packages can still provide an adequate environment for beginners to learn and practice AI.
In the following sections, some essential functions of Python and popular Python-based AI tools including NumPy, Pandas, Matplotlib, Scikit-learn, TensorFlow, Keras, and OpenAI Gym, which will be used in later chapters of this book for AI implementations, will be briefly introduced one by one. It is noted that only the most essential functionalities of these tools will be mentioned. Hence, the materials to be presented can be similar to many online tutorials titled "Learn Python in ******* * * minutes". In fact, it is not difficult to find out that Engineers' AI practices may only need to frequently use a small set of functions from these packages, while other functions that may be used can be easily grasped by checking the official manuals or other learning materials for these well-documented packages. The purpose of the following sections is to provide engineers with a quick entry to the world of AI tools in Python, so that beginners would not be discouraged or overwhelmed by the explosive information that is available online.