“Modern data lakes are heterogeneous in the vocabulary that is used to describe data. … How can we determine if a data value occurring more than once in the lake has different meanings and is therefore a homograph? While word and entity disambiguation have been well studied, … we show that data lakes provide a new opportunity for disambiguation of data values. … We introduce DomainNet, which efficiently represents this network, and investigate to what extent it can be used to disambiguate values without requiring any supervision.”
Find the paper and full list of authors in ACM Transactions on Database Systems.