Purpose: To keep up with reading an article a (week)day and add my own thoughts to it.
#1: I spent 5 hours understanding how Uber built their ETL pipelines.
- This article talks about how Uber built a data lake platform, Apache Hudi, to process real-time, incremental data efficiently, in both performance and cost. Then it talks briefly about the storage architecture and types of operations that can be performed on top of Hudi. I haven’t used it before, but will check it out later.
#2: A visual guide to LLM agents
- This article gives a nice visual overview of LLM agents, insights into their components, how multiple agents can work together and how to build them - which bringes me to the next article.
#3: QueryGPT – Natural Language to SQL Using Generative AI
- Large scale text-to-SQL generation using LLM agents. Beneficial mostly to non-technical users or newcomers or engineers/analysts starting out on a new project.
- Didn’t mention dataset-level or table-level permissions, though I assume the ACLs are mostly likely verified after the Intent agent step or users are limited to tables within their workspace.
#4: On becoming competitive when joining a new company
- Quite an interesting post to read, especially for early-mid career professionals like me.
- I believe that doing exciting and meaningful work in your 20s and mid 30s has the greatest impact on long-term career growth, and this article gives some good tips on how to make the most of it.
#5: WTF: The Who to Follow Service at Twitter
- High-level introduction to the Who to Follow service at Twitter (sorry, X), which is a recommendation system that suggests accounts for users to follow.
- Since the original paper is quite old now, I am curious to see how the system has evolved over the years, especially as they open-sourced their stuff (ie: tweepcred).