Like many other technical terminologies, data science has also become a buzz-word that everyone seems to be using it and at the same time no one really knows what it means exactly. I came to this definition a while ago, thought it is nice to blog about it.
The data science terminology was (kind of) invented by Cleveland W.S. (2001) as part of an action plan that focused on technical areas of statistics. Data science was defined by its action, as the single biggest stimulus of new tools and theories of data science is the analysis of data to solve problems. Cleveland stated that historically, the field of data science has concerned itself only with one corner of this large domain — computational algorithms posed in terms of the subject matter under investigation.
Provost F. and Fawcett T. (2013) define data science as a “set of fundamental principles that support and guide the principled extraction of information and knowledge from data. Possibly the most closely related concept to data science is data mining—the actual extraction of knowledge from data via technologies that incorporates these principles“. According to F. and Fawcett T (2013), main application areas of data science include targeted marketing, online advertising and recommendations for cross selling.
Reference:
- Provost, F., & Fawcett, T. (2013). Data Science and its Relationship to Big Data and Data-Driven Decision Making. Big Data, 1(1), 51-59.
- Cleveland, W. S. (2001). Data science: an action plan for expanding the technical areas of the field of statistics. International statistical review, 69(1), 21-26.