Data Science and AI
By Kat Campise, Data Scientist, Ph.D.
From robots and “smart” homes and cities to driverless cars, the possibilities surrounding the productive use of artificial intelligence (AI) continues to captivate our imaginations. But, there is also some underlying unease perpetuated by headlines such as, “These Jobs Will be Dead Due to AI” or “Will AI Take Over the World?” Granted, automating certain job functions will naturally cause some angst, but this is nothing new. Throughout each facet of the Industrial Revolution and in almost every industry, human labor has been replaced — to a certain extent — with machines. But, in the midst of the SciFi infused imaginings influenced by Hollywood’s take on the ultimate dominance of machines (e.g., Robocop, The Terminator, etc.), there is a spectrum that has some truth and a heavy dose of reality.
Additionally, due to overzealous marketing efforts, data science and AI have been transformed from complex topics to mere catchphrases leveraged to increase click-through rates (CTRs) and Google search rankings. While these aren’t necessarily “bad” objectives, click-bait headlines and the rush to create “content” tend to generate more misapprehension surrounding these topics. We’ve covered data science extensively throughout other articles. The goal here is to clarify, as much as possible within a relatively short article, what AI is and how it intersects with a career in data science.
What is AI?
Any discussion about data science AI must also include machine learning. A commonly held interpretation as to the connection between machine learning and AI is that the former is a subset of the latter. One simplified way of viewing the differences and similarities between the two is to consider machine learning as the early learning stages of AI. Much like human learning, there are pedagogical stages that begin with a concrete stage (e.g., this is an apple, can you identify the apple in other pictures?) and then move through increasingly abstract representations of things such as interpreting the meaning behind a poem, detecting human emotion or creating a new algorithm. Indeed, when you begin your machine learning journey, you’ll become familiar with how machine learning algorithms are trained to recognize pictures, numbers, and letters — not unlike what we’re taught in elementary or grade school. The comparisons don’t end here.
Although not all AI experts agree as to what AI “is” exactly, the fundamental assumption centers on the “thinking machine.” We won’t dive into a philosophical discussion on what constitutes “thinking.” But, if we use human learning as a model, then a clearer picture emerges with regard to machine learning vs. AI.
Returning to the formal education example, machine learning is akin to learning that takes place at the kindergarten through Bachelor’s degree levels. You learn to label and classify the “things” of each discipline: math, history, language, etc. This process begins with “this is what X is,” now find X (supervised learning), and then moves through unlabeled or undefined instances, e.g., “find all of the things that belong to X vs. Y” (semi or unsupervised learning). When you take an exam or turn in an assignment, your instructors response, as well as your score, helps you to understand (hopefully) whether or not you’ve mastered the material. In a very loose sense, this is reinforcement learning which is yet another parallel to machine learning.
AI begins to diverge at the masters and Ph.D. strata. In those learning scenarios, you’re expected to combine your existing knowledge with new knowledge and extend this learning to hypothesis generation and testing. Not all masters programs are constructed in this way, but Ph.D. programs do follow this essential protocol. You’re expected to create something new and/or further the discipline in some fashion. AI is roughly approximate to this learning scenario; it will create its own algorithms and enact decisions based on incoming data with minimal to no human assistance. In essence, AI will learn from the environment and interact with it through a digital feedback loop (ours is analog, so there are already some data differentials present — but, this aspect is beyond the scope of the current article). Whether or not AI will fully replicate, if not supersede, human intelligence, is still up for debate.
Industries Where Data Science and AI Intersect
You are likely carrying the current iteration of AI around on a daily basis: your “smart” phone. If you’ve ever used Siri, Alexa, Pandora or Netflix, all deploy early-stage AI. We’re not quite at the AI as a Ph.D. level type of learning and knowledge production just yet. Despite the knowledge we’ve acquired from studying human behavior and cognition, we’re not a wholly predictable species; if we were, then the financial markets would be far easier to predict than they are currently. This is where data scientists are playing a crucial role in improving AI functionality.
As you enter the data science field, you’ll notice that many employers (outside of Google, Facebook, Amazon, Apple, and the other big tech companies) still remain unsure as to what a data scientist does or what they can do to help boost the company’s revenue. This is partly due to the marketing blitz previously discussed and also has roots in the need to stay ahead of the competition. As a data scientist, you’re not only analyzing data, you’re building predictive models and then devising the algorithms that put those models into an active state. Consequently, machine learning is a data science skill set and since machine learning is a subset of AI, it’s not a risky bet to say that you’ll ultimately be on the forefront of AI research and practical application (if you stay in data science).
With this in mind, every industry is seeking to incorporate AI, and data scientists are well positioned to make this a reality. Keep in mind that AI is frequently used as a synonym for predictive analytics but with that extra dimension of prediction and automated action/reaction. We need to be conscientious, however, about completely removing the human agent from these decisions. Regardless of how “smart” AI becomes — and some predict that “self-awareness” will be an organic outgrowth of this digital intelligence — it does not contain the nuances that are decidedly human. Perhaps compassion and empathy can be algorithmically applied (at some point). For now, and even if AI self-awareness comes to fruition, technological tools should continue to supplement human activity rather than overtake it.
In Which Industries are Data Scientists Involved in AI?
While the concept of machine learning has existed since at least the 1950s — some authors place the start date at the creation of the Turing Test — the big jump in machine learning deployment was brought about by the massive inflow of available data throughout Web 1.0 and Web 2.0. This has led to a steady developmental transition from machine learning to AI. Some industries have been slower in their data science and AI adoption, while others have jumped in with both feet (mainly the tech behemoths). Below you’ll find a brief overview of how several industries are implementing or planning to implement AI.
Financial Services
Cybersecurity is a huge issue throughout the world, and one of the main hacking targets is the financial services industry. Trillions of dollars flow through banks and payment systems, and highly sensitive personal information is stored on databases throughout the world. All things digital have made it easier for consumers to purchase and invest their hard earned money, and for hackers to steal that money or data that can be used for their nefarious intentions.
Training AI to detect potential fraud or theft is a hot career path. Unfortunately, hackers can construct AI to counteract the cybersecurity AI. Thus, it’s coming down to which AI will win. As a data scientist in the financial services industry, you may be building an intelligent system that can detect a human infiltrator vs. another AI system which will then automatically lock the account or system before the hacking can take place. This is a precarious scenario as completely disrupting the transaction stream is far from an ideal situation (this is an understatement). Thus, the added challenge is to obstruct cyber theft while still maintaining business as usual for the entire system.
Health Care
In the U.S., health care costs are out of control. Multiple factors comprise the root cause of this, and debates on the issue frequently devolve into trying to solve only one of the problematic components. Between insurance companies, health care providers (which add even more variety), and health decisions made at the individual level, data scientists in the health care industry have plenty of challenging work available.
One area where AI can help to improve the health care journey, and possibly lower costs, is through disease risk mitigation using predictive analysis and health forecasting. Despite the fact that we all have a similar physiological structure (e.g., a heart, lungs, circulatory system, etc.) there are other factors that can be used to make better predictions during health assessment and diagnosis activities. AI algorithms have the capacity to scan and gather information from a large cache of health data, more so than humans are capable of reading and analyzing. A health care data scientist is likely to be tasked with boosting predictive accuracy by training a machine learning algorithm to “read” health records (for both qualitative and quantitative data, which is not an easy thing for machines to do), extract the data, and then produce a diagnosis.
Smart patient rooms, robotic-assisted surgery, an AI nurse practitioner that is available 24/7, and consumer health tech (e.g., more advanced health monitoring technology that can be used by individual consumers), are additional areas for data scientists interested in AI-powered health care.
Supply Chain, Logistics, and Manufacturing
Robotics are already used in manufacturing, and this will continue to be the case. Once the driverless car issues are solved (this goes back to human behavior not being as predictable as we envision, especially when driving), the next step being closely reviewed by logistics companies is automated delivery driving. The U.S. is still experiencing a truck driver shortage. Meanwhile, consumers and companies continue to need goods delivered on a regular basis. Supply and demand impact pricing levels, so an increase in transportation costs generally produces an increase in product pricing. AI might be a viable solution in cost reduction if it can safely navigate a massive vehicle across long distances, regardless of the weather, and despite the unpredictable nature of human drivers (e.g., cutting off other drivers, driving too fast, driving too slow, not paying attention to the road, etc.).
Studying Data Science and AI
A frequent question among those who are trying to determine if data science is a viable career is “Do I need a Ph.D. to be a data scientist.” The short answer is not necessarily. A Ph.D. can be helpful for knowing how to approach research design such as hypothesis conception, testing, and applying advanced statistical techniques (depending on the discipline). You can, however, learn this during a masters program or on your own if you’re determined to enter the field. AI is research driven. No one truly knows what it will be able to do. As a matter of fact, the black box problem is an ongoing issue (AI generating algorithms that data scientists and other experts cannot figure out/explain) that we need to resolve before we unleash AI into every possible aspect of human life.
In terms of you studying AI, begin with machine learning as it is the mathematical and functional foundation of AI. Coursera and edX provide beginning, intermediate, and advanced courses from experts in the field (Andrew Ng is one of the top minds in AI). If linear algebra, calculus, and statistics aren’t your strong suit, then plan on adding these to your learning plan. Khan Academy and the aforementioned MOOCs offer courses including math for data science. Should any of the industries discussed here interest you as a data science sub-discipline, seek out coursework in cybersecurity, health care informatics, robotics, and supply chain or manufacturing processes. Although becoming an expert in data science, machine learning, and AI is central to your success as a data scientist, sub-disciplines are emerging as data science continues to evolve.