Semantics, sources and wasted time

Apr 7, 2023

It’s widely acknowledged that in any data warehousing and analytics project, around 80% of the effort is spent on ETL (Extract, Transform, Load) and other data preparation tasks.

Read →

3 Comments

Gareth Horton

Apr 14, 2023

Don’t forget that there is a large amount of “other people’s data” that you have absolutely no control over, but need to use. For such third party data, there’s little choice other than consume it and enrich it as best you can.

Expand full comment

Richard Hackathorn

Apr 7, 2023

Love the analogy of extracted data is like a tree torn from its forest. Then... "The semantic layer helps with our problem of context by providing a consistent, business-centric view of the information which can be as rich as necessary, without moving data."

First, the most violence is caused by copying data, not just moving data. Now, you have the dilemma of determining where is the REAL data. Here is where the tree analogy falters.

Second, there is a direct analogy of providing "context" to that of prompt engineering for LLMs. Where should you focus your ATTENTION (a principle that is enabling current AI innovations)? Wonder whether the semantic layer vendors could adapt the attention principle from neural net transformers? And, wonder what the analogy of similarity latent/embedding spaces are for a semantic layer?

Expand full comment

Lee Feinberg

Apr 7, 2023

Beautifully stated "But instead of asking why this work takes so long, we should be asking why this work is needed in the first place. Why does data need so much preparation and transformation?"

Expand full comment

Creative Differences

Semantics, sources and wasted time