It’s widely acknowledged that in any data warehousing and analytics project, around 80% of the effort is spent on ETL (Extract, Transform, Load) and other data preparation tasks.
Don’t forget that there is a large amount of “other people’s data” that you have absolutely no control over, but need to use. For such third party data, there’s little choice other than consume it and enrich it as best you can.
Love the analogy of extracted data is like a tree torn from its forest. Then... "The semantic layer helps with our problem of context by providing a consistent, business-centric view of the information which can be as rich as necessary, without moving data."
First, the most violence is caused by copying data, not just moving data. Now, you have the dilemma of determining where is the REAL data. Here is where the tree analogy falters.
Second, there is a direct analogy of providing "context" to that of prompt engineering for LLMs. Where should you focus your ATTENTION (a principle that is enabling current AI innovations)? Wonder whether the semantic layer vendors could adapt the attention principle from neural net transformers? And, wonder what the analogy of similarity latent/embedding spaces are for a semantic layer?
Beautifully stated "But instead of asking why this work takes so long, we should be asking why this work is needed in the first place. Why does data need so much preparation and transformation?"
Don’t forget that there is a large amount of “other people’s data” that you have absolutely no control over, but need to use. For such third party data, there’s little choice other than consume it and enrich it as best you can.
Love the analogy of extracted data is like a tree torn from its forest. Then... "The semantic layer helps with our problem of context by providing a consistent, business-centric view of the information which can be as rich as necessary, without moving data."
First, the most violence is caused by copying data, not just moving data. Now, you have the dilemma of determining where is the REAL data. Here is where the tree analogy falters.
Second, there is a direct analogy of providing "context" to that of prompt engineering for LLMs. Where should you focus your ATTENTION (a principle that is enabling current AI innovations)? Wonder whether the semantic layer vendors could adapt the attention principle from neural net transformers? And, wonder what the analogy of similarity latent/embedding spaces are for a semantic layer?
Beautifully stated "But instead of asking why this work takes so long, we should be asking why this work is needed in the first place. Why does data need so much preparation and transformation?"