Data Ingestion#
Data Ingestion is the first stage in most data architecture designs. The process has 2 steps. First, it consumes data from assorted sources. Second, it loads data into centralized storage, which can be accessed and used by the organization.
Warning
it's a critical component in the data engineering because downstream systems rely entirely on the ingestion layer's output.
The ingestion layer works with various data sources, which data engineers typically don't have full control of.
Note
A good practice is building a layer of data quality checks and a self-healing system to react to unexpected situations, such as data loss, corruption, system failure, etc.
Generally, there are several types of data ingestion, as below:
Various methods to perform data ingestion:
- Secure File Transfer Protocol (SFTP)
- Application Programming Interface (API)
- Object Storage
- Change Data Capture (CDC)
- Streaming Platform
Reference: Educative - Data Ingestion