“We consider the problem of building an organizational directory of data lakes to support effective user navigation. The organization directory is defined as an acyclic graph that contains nodes representing sets of attributes and edges indicating subset relationships between nodes. A probabilistic model is constructed to model user navigational behaviour. The model also predicts the likelihood of users finding relevant tables in a data lake given an organization.”
Read the paper and see the full list of authors in IEEE Transactions on Knowledge and Data Engineering.