Introduction You’re trying to decide which products are most important to your customers, or maybe the marketing team is interested in how many widgets have been sold to each industry, or you’re trying to list which five pages in your project are most viewed. To add another wrinkle, the key stakeholders in your project want to …
Tag: Batch Processing
Jan 25
Using Sets to Handle Duplication in Large Scale ETLs
Introduction We’ve all been in the situation where you’re processing a ton of information for your ETLs, but you’re dealing with a lot of duplicates in the data. There are so many duplicates that you’re actually running out of memory or your database updates are taking forever. Ho do you deal with them? Do you throw all the entries …
Jan 24
Directed Acyclic Graphs (DAGs) for Batch Processing
Introduction In the first and second parts of this series, we discussed what a DAG is and what abstract benefits it can offer you. In this article, we’ll talk about what benefits it can offer you in a batch processing system. We’ll use customer reviews as our example since retailers will often use these reviews to improve their recommendations for you …
Recent Comments