Each term below is also a link to an exceptional glossary on the subjects in question.
Artificial intelligence (AI) is the mimicking of human thought and cognitive processes to solve complex problems automatically. AI uses techniques for writing computer code to represent and manipulate knowledge. Different techniques mimic the different ways that people think and reason (see case-based reasoning and model-based reasoning for example). AI applications can be either stand-alone software, such as decision support software, or embedded within larger software or hardware systems. Click here for more AI nomenclature.
This term has been defined in many ways, but along similar lines. Doug Laney, then an analyst at the META Group, first defined big data in a 2001 report called “3-D Data Management: Controlling Data Volume, Velocity and Variety.” The three V’s of Big Data:
Volume refers to the sheer size of the datasets. The McKinsey report, “Big Data: The Next Frontier for Innovation, Competition, and Productivity,” expands on the volume aspect by saying that, “’Big data’ refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”
Velocity refers to the speed at which the data is acquired and used. Not only are companies and organizations collecting more and more data at a faster rate, they want to derive meaning from that data as soon as possible, often in real time.
Variety refers to the different types of data that are available to collect and analyze in addition to the structured data found in a typical database. Barry Devlin of 9sight Consulting identifies four categories of information that constitute big data:
1. Machine-generated data, such as RFID data, physical measurements and geolocation data, from monitoring devices
2. Computer log data, such as clickstreams
3. Textual social media information from sources such as Twitter and Facebook
4. Multimedia social and other information from the likes of Flickr and YouTube
IDC analyst Benjamin Woo has added a fourth V to the definition: Value. Woo says that because big data is about supporting decisions, you need the ability to act on the data and derive value from it.
When it comes to Big Data, the crowd section of the toolbox is reserved for techniques and processes of measuring and taking into account the collective opinion of a group of individuals rather than a single expert to answer a question. Global Crowd Intelligence is a great example.
A rather recent term with multiple definitions, but generally accepted as a discipline that incorporates statistics, data visualization, computer programming, data mining, machine learning, and database engineering to solve complex problems.
In Knowledge Discovery, machine learning is most commonly used to mean the application of induction algorithms, which is one step in the knowledge discovery process. This is similar to the definition of empirical learning or inductive learning in Readings in Machine Learning by Shavlik and Dietterich. Note that in their definition, training examples are “externally supplied,” whereas here they are assumed to be supplied by a previous stage of the knowledge discovery process. Machine Learning is the field of scientific study that concentrates on induction algorithms and on other algorithms that can be said to “learn.”