A Data Lake is a centralised data storage place associated with a Big Data infrastructure. It enables an organisation – a company or an institution, for example – to import and store a large quantity of data, whatever its source or nature, in one single place. The data can come from websites, social networks, connected devices, mobile applications, company applications, etc.
The user – data scientist, developer, business analyst, or researcher – can extract the data and apply different processes to it for uses such as reporting or designing advanced statistical models.
The increasing volume and variety of data collected complicates Big Data management. The Data Lake provides more flexibility than traditional data warehouses (such as relational databases, etc.). Large amounts of raw data, be they structured or not, can be stored there without needing to know in advance the use to which they will be put. This saves time and widens the possibilities of future analyses.
In other words, Data Lakes are huge reservoirs of heterogeneous data from which we will be able to draw according to our needs.
Care must be taken, however, to ensure that the lake does not become a swamp. Data governance is an important issue for the Data Lake, to be able to exploit the data efficiently and make it easily accessible within the organisation.