Printable Version of Topic

Click here to view this topic in its original format

UtterAccess Forums _ General Chat _ Data Lake?

Posted by: haresfur May 12 2019, 05:47 PM

At the risk of asking for serious replies (non-serious ones welcome, too) - is anyone working with data lakes? Some of the data-wonkies here are heading down that direction and, frankly, I can't wrap my head around it. Most of the scientists are still working in spreadsheets because it is too much work to organize data into a database, and I don't know how you can expect to throw all that and other stuff into some storage location and expect to be able to extract something useable. But I'm willing to be convinced.

Posted by: nvogel May 12 2019, 11:30 PM

I've worked on several data lake projects. A difference from the enterprise data warehouse approach is that in a data lake you aim to land data on a common platform in its native or most general form and without integrating and conforming it with other data sources. The data lake then becomes a source for analysis and further processing. Technologies like Hadoop and Spark are what make this kind of approach feasible.

Posted by: haresfur May 13 2019, 06:36 PM

Thanks. I'll read up a little on Hadoop & Spark. IMO most of our attempts to create some sort of universal data repository have failed because no one has the time to document the data well enough so it will be useable. Never mind actually doing QA...

Posted by: nvogel May 14 2019, 12:30 AM

Hadoop and Spark are two popular technologies used for data lake solutions. There are others. I suggest you find out what your organisation is proposing to use.

Setting out to build a universal store is not the right way to start in my opinion. Look for some priority use-cases first. Source the data for those specific cases and build the analytical processes and tools needed to get some value out of the data. Then repeat.

Posted by: haresfur May 14 2019, 08:56 PM

Thanks.

QUOTE
I suggest you find out what your organisation is proposing to use.


We use the term "organisation" very loosely pullhair.gif

QUOTE
Setting out to build a universal store is not the right way to start in my opinion. Look for some priority use-cases first. Source the data for those specific cases and build the analytical processes and tools needed to get some value out of the data. Then repeat.


You are preaching to the choir. Right now, I'm trying to learn just enough to figure out if it is likely to be any use to my projects. If the answer is maybe, I could offer myself up as a sacrificial lamb-case study and ask them for funding. If not, I need to figure out what to do if other infrastructure is turned off.