Data Structures

Other than optimising code at a really detailed level, why would it be useful to understand data structures? At the very least, I think that learning about them is useful to:

  • understand the implementations of things like dictionaries "under the hood" of various programming languages
  • be able to use concepts like graph theory when engineering your own solutions to problems.

But what is the definition of "data structure"? Well, Wikipedia says:

In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data, i.e., it is an algebraic structure about data.

Makes sense. What about when they were "first" discovered or noted in literature, like were the ancient civilisations using some of these? Well again according to Wikipedia

  • the hash table (implementation of an associative array) was invented in 1953
  • the B-tree (an implementation of a self balancing search tree which is an implementation of an associative array) was invented in 1970.

Why do we mention hash tables and search trees? Because they are so fundamental and widely used in say dictionaries and databases respectively - and also because they are both implementations of the associative array which we are about to define.

It's also useful to note what an abstract data type is, from Wikipedia:

In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behaviour (semantics) from the point of view of a user of the data, specifically in terms of possible values, possible operations on data of this type, and the behaviour of these operations. This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user.

An associative array, or dictionary, is an abstract data type. While a hash table and B-tree are data structures.

Associative Array

Just to go back to basics:

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection. Not to be confused with Associative Processors.

Operations associated with this data type allow to:

  • add a pair to the collection;
  • remove a pair from the collection;
  • modify an existing pair;
  • lookup a value associated with a particular key.

Implementing associative arrays poses the dictionary problem, a classic computer science problem: the task of designing a data structure that maintains a set of data during 'search', 'delete', and 'insert' operations. The two major solutions to the dictionary problem are a hash table and a search tree. In some cases it is also possible to solve the problem using directly addressed arrays, binary search trees, or other more specialized structures.

Many programming languages include associative arrays as primitive data types, and they are available in software libraries for many others. Content-addressable memory is a form of direct hardware-level support for associative arrays.

The name does not come from the associative property known in mathematics. Rather, it arises from the fact that we associate values with keys.

That is quite a chunk of a definition! But it does put things in to context. Future posts will delve a bit in to dictionaries and trees. Of note is that the two implementations of an associative array might have different algorithmic complexities for each for each of the operations like add and delete.