ETL – Raymond Giorgi

Jan 31

Using Maps for ETL Aggregation

By Raymond Giorgi in ETL

Introduction You’re trying to decide which products are most important to your customers, or maybe the marketing team is interested in how many widgets have been sold to each industry, or you’re trying to list which five pages in your project are most viewed. To add another wrinkle, the key stakeholders in your project want to …

Batch Processing, Data Structures, ETL

Jan 25

Using Sets to Handle Duplication in Large Scale ETLs

By Raymond Giorgi in ETL

Introduction We’ve all been in the situation where you’re processing a ton of information for your ETLs, but you’re dealing with a lot of duplicates in the data. There are so many duplicates that you’re actually running out of memory or your database updates are taking forever. Ho do you deal with them? Do you throw all the entries …

Batch Processing, Data Structures, ETL

Jan 24

Directed Acyclic Graphs (DAGs) for Batch Processing

By Raymond Giorgi in Batch Processing, ETL

Introduction In the first and second parts of this series, we discussed what a DAG is and what abstract benefits it can offer you. In this article, we’ll talk about what benefits it can offer you in a batch processing system. We’ll use customer reviews as our example since retailers will often use these reviews to improve their recommendations for you …

Airflow, Batch Processing, DAG, ETL

Jan 22

Directed Acyclic Graphs (DAGs) for Queue Systems

By Raymond Giorgi in ETL, Queueing

Introduction In the first part of this series, we discussed what a DAG is and what abstract benefits it can offer you. In this article, we’ll talk about its specific implementation in a queuing system. For this article, I’ll use customer reviews as an example; they’re relatively straightforward, but you’d be surprised at the steps that can …

DAG, ETL, Queueing, RabbitMQ

Jan 19

Directed Acyclyic Graphs (DAGs) for Non-Interactive Processing

By Raymond Giorgi in ETL

Introduction Since the introduction of the personal computer, interaction of users has become the default mode of developing applications, but as users come to expect more and more features with less and less processing time. If Amazon generated a new recommendation list on each page load, they wouldn’t have gotten very far in the retail space. Explanation …

DAG, ETL
3 comments

Tag: ETL

Using Maps for ETL Aggregation

Using Sets to Handle Duplication in Large Scale ETLs

Directed Acyclic Graphs (DAGs) for Batch Processing

Directed Acyclic Graphs (DAGs) for Queue Systems

Directed Acyclyic Graphs (DAGs) for Non-Interactive Processing

Recent Posts

Recent Comments

Archives

Categories

Meta