What’s the difference between Data Science and Computer Science?
Computer science (CS) made its debut with mainframe computers in the 1960s and 1970s and is a field that has been developing for decades. Sub-disciplines include computer architecture, data structures, programming languages, software engineering, web-design, database development, machine learning, algorithm development, and artificial intelligence. CS is an umbrella that covers many different areas.
Data science, on the other hand, is a more focused field that centers in on one thing – big data. Occasionally, data science and CS are perceived to be the same thing, most likely because data scientists do some programming; but computer scientists and data scientists have different end games. Computer scientists generate software that data scientist’s use, while data scientists apply that software to identify trends and find significance through statistics.
Programming and the Emergence of “Big Data”
According to Wikipedia, a pivotal article called Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics written by William Cleveland gave it a name in 2001, well after CS was on the scene. CS set the stage for data science in that it provided the programming languages necessary to process big data.
Programmers are computer scientists that focus on software development. They use specific programming languages, like C++ and JAVA, to implement specific algorithms. To do so, they need to understand the problem, the data involved with the problem, and figure out the proper algorithm to solve the problem. Computer scientists are trained not only to write code but also to understand this entire process.
Computer scientists incorporate data structures into their programs, which provide a method of organizing big data so that its elements are easily retrievable. Only with this capacity for complex data structures like data frames and arrays is it possible to do the sophisticated analysis that data scientists are able to complete. Once data is properly structured, data scientists then analyze it with programming languages created by computer scientists.
Development vs Application
Computer scientists are problem solvers and developers. They solve problems by developing algorithms and implementing them into software. Some create websites using JavaScript, HTML, and CSS. Others build databases with SQL. Machine learning is a sub-discipline of CS where advanced algorithms are developed to learn from data. For a simple machine learning example, imagine a device that is a camera with processing abilities. That device can be trained to learn what a dog is by being shown thousands of pictures of different dogs. Once the computer is “trained” it will be able to be given any novel image and label it as “dog” or “non-dog”. This simple example of machine learning can be developed into advanced applications like facial recognition. Artificial intelligence (AI) goes even further, aiming to develop algorithms and devices that replicate human thinking. AI makes use of sophisticated concepts and methods, such as neural networks, to simulate human cognitive mechanisms.
Data scientists are generally not developers, although they do heavily rely on code and programming languages such as Python and R in order to run statistical analysis to identify trends. They do data mining to isolate data sets of interest and then they characterize the data. They want to know distributions, or where the data falls. They need to calculate means, medians, variance, and standard deviations so that they can properly normalize data, because more often than not, big data does not look like a perfect bell curve. They calculate correlations and identify significant differences using t-tests and other measures. In short, data scientists are math experts. They know how to find data sets of interest, they know how to process and analyze those data sets, and they know how to identify important trends.
The Art of Presentation
A significant portion of a data scientist’s job is to determine the best possible way to communicate results. Visual representations are just as important as the numbers behind them, and data scientists have to know standard images such as scatter plots and histograms as well as more complicated images like volcano plots and heat maps. The average Joe does not know much about statistics, but he or she can understand well-designed images that convey significant results. Data scientists need to know how to make their results compelling.
Computer scientists contribute to presentations by implementing or creating functionality. PowerPoint and Google slides are standard presentation platforms that were generated by computer scientists. The packages within R or Python that generate the images were created by computer scientists as well.
Infrastructure and Business
All businesses have an Information Technology (IT) infrastructure that includes all computers, networks, software, operating systems, and servers. Computer scientists designed all the individual components of that infrastructure. In addition, computer scientists generated company databases and security software.
More and more businesses are hiring data scientists, and as big data grows, so do the number of jobs. The more data collected, the more information there is to analyze. Companies are starting to realize that data science and the predictions and trends it provides can significantly boost business. The corporate environment is changing to be more proactive instead of reactive, thanks to data science.
To Academia and Beyond
Data science is hot right now and getting even hotter as more people hear about it. Higher education is scrambling to keep up with its demand and scratching their heads trying to figure out how to develop the right degree programs, because big data is collected in many different fields. Math departments associate data science with statistics, business schools approach data science with a marketing and presentation lens, while computer scientists link data science to machine learning, as math models are the scaffold that machines compare to input in order to evaluate. Biologists are also utilizing data science in bioinformatics and are therefore seeking to incorporate it into their programs.
An emerging professional considering a job in CS or data science has a lot of options. CS is a well-established field and there is demand for software and hardware engineering, web and database design, machine learning, and AI. Data science is a fast-growing field due to the emergence of big data applications and the need to learn from big data.