TrerraPetaExaZetta

Doer

Well-Known Member
Just doing a bunch of planning at work. And it is all about Data Mining this year....again. Data collection is a database. We hear GIOG. Garbage In, Garbage Out.

But, when we try to get information out across time and multiple sources, it can still be garbage, but we call it data Mining. Mine data to process information.

So, have you asked yourself, do we know how much data there is? Not specifically, but it is estimated at 2.7 zettabytes just in the digital domain, leaving out all paper. But, it grows at approx. 5 exabytes, per DAY. We are expecting, 35 Zettabytes of data to sell mining tools, into by 2020.

just like the population, it keeps growing. And like the population, there may be a tipping point where we begin to lose information, like we can adjust our birth rates. (well under way, as a matter of fact)

And almost all the information we have today is more like plain dirt. 80% of all information is unstructured. So, almost all the important stuff has to be sifted, like placer panning for flour gold. A data mine is like a hard rock, lode mine. Things are organized and we are not shifting tons of dirt to find one flake of information.

There is more opportunity in this world today than has every existed over all time, in my mind. Plenty of work for me, means plenty of work for you.

Just to give an idea.,,,, A gigabyte is 3 sets of Zeros. 1,000,000,000. A Zettabyte is 6 sets of Zeros.
 

Doer

Well-Known Member
We are industry and not at all concerned with that. We sell to the largest Enterprises in the world. What ever we are allowed to do we do. We can't afford to be illegal in any way, shape or form. I think you need to Zetta-ize your thinking on data. Only a gigga of this is privacy data, the rest is just saved and UN-organzied past production and measurements, that can yield great value for an Enterprise, if it can be mined, properly.

Again in my gold analogy, it is like taking all the placer gold back and getting a valley worth of tons of unrelated, hard to find extremely valuable particles. And like gold, a few particles are only nothing. But, it all adds up. So, a data mine concept is not beginning a new mine that after decades will produce chunks of gold.

It is like a gold magnet, that goes out and finds the placer gold and sucks it all back into the rock vein for easy pickings. Every company has a valley of placer gold, if you will, that they cannot extract, in a worthwhile manner. They need good, gold magnet data Tech to organized the data and data sources, first.

This is the opposite of collecting all voice on phones, for example. They do that. But the searcher bot Tech that catalogs and indexes and constantly collates the most disparate info, with 5 orders of separation, that humans cannot remember, is the Data Miner Tech.
 

Doer

Well-Known Member
I have no idea why you take and put this stuff so personally.

I certainly don't.
 

Doer

Well-Known Member
I don't take it any way at all. I was curious if you did because you know about it. You see how a question like that could be asked though? Its' a tone thing really, I come off that way because ethics are interesting to me. It is not a self-righteous thing I like to see the way people do or do not choose to care and the subject is virgin to me. I apologize if it seems any other way..don't take me so personally.
You are getting marginal again. You asked a simple question, I responded and you puked yourself. Knock it off. You said personal. I don't. For me, you are just trolling again. OK? You asked. I answered. And then you said this.
...your domain....under the assumption that people make that they are protected by privacy?
How does all this work legally...
is there and(sic) ethical dilemmas(sic) that you face?

I answer directly those points and you say:

It seems like you personally could have an issue with it.

You opinions about me are inappropriate for me to discuss, in that I do not care and won't be baited. So, knock off the personal comments and stop picking at your own mental scabs just because the tone you read in your mind offends yourself.
 

heckler73

Well-Known Member
Uhhh, interesting mystery conversation aside, what is it about data mining that is making it such a lucrative venture?

A friend of mine who's doing a math major (switching over from Physics, ironically) is wanting to get into DMing through someone else he knows and expecting to start at--or close to--6 figures.
That seems like one helluva starting pay.

Who are the clients?
 

Doer

Well-Known Member
Well, my good buddy, with a math brain, all of us are.

Do you remember how we used to say, the internet is the library with all the books on the floor? IAC, a datum is nothing, not even a line, not even an object. But, it is everything, in that from data, information can be derived. So, think of layers of abstraction. Because that is all computers are. Now we have abstracted the On and Off, the 1 or O. state, to the point we can store a lot of information that we painstakingly gathered and published to ourselves inside our organizations. Research, Marketing Position Statements and a giant amount of everything we can think of and type up or scan in, or file as photos, videos, with tags, etc.

Long ago, in internet time, a couple of guys invented the spreadsheet, called Lotus, now, to keep track of financial what-ifs (they never got a single Dime or Credit for it, wasn't patented, but, then was just a cool trick for visualizing 2D Array Arithmetic, etc.) Another new abstraction, slips in and becomes all fire essential.

I know the guys that grabbed a disk routine from IBM and box that could only play static on a radio, and the PC was born. Abstract.

I know the guys that saw the potential or relational database techniques, in a Paper from Berkeley Labs. Large scale information handling was born, not just cells and math. Abstraction into a new layer of need, by taking Risk. That itself is an abstraction of how the science proceeds. Industry takes the reins.

Now, we have all this information, in vast data stores you've heard about, collected by the govt, from the Gray Zone. But, that is simply tiny, compared to corporate private information, in this world.

So, Information on that scale.. images, recordings of all kinds, telemetry, text, whatever, has now been abstracted, for these large global entities, into mere data again. Too Much!!! They have plenty of compute to store it and go find it, if they only knew what IT, is. See? :)

So, the abstraction continues. The data set, is now information sets with actual man-years of expense. They are held by many abstraction layers, below that is called the computer system. That is an evolutionary Wild Thing. So, the information that now must be treated as atomic data first to find it, and then peel it open, and dive in for the data parts, is extremely wide and deep and important.

And that is to say nothing of the fragility and time bound nature of all data. Persistence of information requires that we now, mine, that is break open, individual information structures, like a photograph or a raw telemetry plot, and use those data, outside the information set, they are in. And that creates a new information set. It is kinda like looking for gold and then realizing all that dirt has diamonds but we can't figure out how to find them.

So, Data Mining refers to new techniques to first code and classify, the "books" on the floor, inside a Company or Agency, then get them on "shelves." Databases. That is where we are today. Large Scale Relational Databases.

Data Mining brings in the Abstraction Layer, of the info-bot retrieval, software techniques to go to the "book", open it up, find the right place, scan the photo on virtual page 220. and see if any company officers were present. That is just a tiny, tiny, legal aspect.

But, I hope you get what this is, a little bit, now. :)
 

heckler73

Well-Known Member
Indeed, I believe I understand a little more of why and who DM serves.
With efficiency in data collection/storage has come a side-effect of a raw data clusterfuck, perhaps due to poor foresight (or apathy for data organization).
And now DM'rs are there to "rescue" the data from the bases.

It makes sense...I suppose I just never thought about it before.
 

Doer

Well-Known Member
Indeed, I believe I understand a little more of why and who DM serves.
With efficiency in data collection/storage has come a side-effect of a raw data clusterfuck, perhaps due to poor foresight (or apathy for data organization).
And now DM'rs are there to "rescue" the data from the bases.

It makes sense...I suppose I just never thought about it before.
No, it is not that. In Computer Tech, we don't know what we don't know, yet. It hasn't been invented.

So, not apathy, just the best you can do, with the tools at hand.

This is not easy. You realized the size of the engineering details for, say, AirBus 350? It cannot be moved around, electronically. It has to be taped off, and the tape mailed...in the 21st Century.

That process takes days. So, see it more as a virtualized search. Something new, for limitations you didn't know about.

Instead of getting all those old drawings transferred to allow the NTSB to go through it. Mining allows you to programmatically, compare the "power bus" details of the A300 to the A320 and then the A350 in a matter of hours.

And no one had to put their reading glasses on and burn the midnight oil. No hidden relationships can be missed.

The entire change structure of one sub-part can easily be traced, compared and contrasted in one simple report. But, to do that, 3 sets of information, those 3 airplane plans, are cracked open, in the database, and scanned, relevance is lifted and sorted. No data was harmed no tapes were mailed and no months of effort required.

So, computers don't make mistakes, people do. And we have more compute horsepower than can be put to use.

So, the other side of Computer Tech is that, before, we did not know, (in data storage) when we stored, what we need now after that was done. HIndsight is Data Mining. And we can need some things now that are already in data store. But, we had no idea we needed it.

And, now, we can easily compare year over year, of highly complex blobs of info in all formats.

Well, before we could not.
 
Top