We briefly introduce big data and the cautionary tales surrounding this recent phenomenon and to be aware of those ponies…
April 26, 2017
Course: INFO 198 / BIOL 106B. University of Washington
Instructors: Carl T. Bergstrom and Jevin West
Synopsis: Our world is saturated with bullshit. Learn to detect and defuse it.
The course will be offered as a 1-credit seminar this spring through the Information School at the University of Washington. We aim to expand it to a 3 or 4 credit course for 2017-2018. For those who cannot attend in person, we aim to videotape the lectures this spring and make video clips freely available on the web.
Video edited by Bum Mook Oh
Music by Chris Zabriskie: Prelude No.7
Jevin, WEST:, All, right., Good, afternoon.
Everyone Let me lower this down, just a little bit.
We did a little switch in the order of what we're going to talk about.
So today, we're going to talk about the big data aspect of the class, which is one of our core lectures.
Around, you know, what's, sort of the genesis of this project.
You can see in the title, Calling Bullshit in the Age of Big, Data., So we're going to talk about this subtitle, in the age of big data.
It means for the kinds of things that we've been learning throughout the quarter.
Next week, we'll, focus on the visualizations.
Carl will primarily be doing that because I'll be out of town.
This is you.
This is an actual image of the class.
I've sort of grayed out things.
So you can't tell who anyone is, but I, imagine that everyone that's here, of course, has a cell phone.
Most of you.
There is someone that doesn't have a cell phone, I'm proud of you.
Pretty impressive to live in today's world without a cell phone.
Many of you maybe got here on an ORCA cart, on a bus, or on a subway.
Many of you of, course, are using social media on a daily basis.
I, don't certainly go through almost an hour of my life, except when I'm sleeping, when I'm not using one of the major search engines.
Some of you went to the grocery store today, and you had to use your credit card.
And when you got home, you put it in your refrigerator, and just like many of the appliances, nowadays, they're, all connected.
The nest thermometer is connected to the internet, which sort of tells you something about the habits of what people are doing from day to day.
If, you're, lucky.
Maybe some of you have been looking to buy a Tesla.
Certainly, not me.
But you had to use your computer to look for that.
You watched a movie last night on Netflix.
One of you had to go to the doctor, your regular check-up., Some of you may be submitting your data to 23andMe, one of these kinds of organizations that collects your genetic data and tells you something about your health.
Maybe some of you are planning a trip after the quarter.
Because, some of you I think.
Well, a lot of you, are graduating at the end of this quarter.
All of these things are places where we're collecting massive amounts of data.
The phones, tell you about where your geolocation is.
Your ORCA car does a similar thing.
Social media, can tell me as a data scientist, what you're thinking about, what you care about, what you like, what you dislike.
Of course, searching through these search engines, I can tell a lot about what's going on.
The grocery store, companies like Safeway, collect data about what you buy, where you buy it, where you pick it up, what items you pick out.
There's, a lot of smart data scientists at the grocery store.
And in the grocery store, industry.
In, the cars, now, they're, essentially being driven by machines, and soon will be, any day.
All of this information is interconnected and we're sharing it at high degrees.
So this is really big data.
And this is the world in which we live now.
Depending on who you talk to.
Everyone has a different definition of big, data.
My class, we try to come to some sort of consensus in the class.
We won't, do that today.
What I will say is that a lot of this is simply the result of cheaper, storage, faster, communication, the willingness of the world to put their personal information online and turning it over to these big companies.
They, talk about volume and velocity and storage, and variety of this data.
But really all you need to know if you've never heard the term, big data-- which I'm sure most of you have, is that it's, really the real sort of secondary markets, the opportunity in this area from a data science perspective, from an industry perspective, from an education.
Perspective, is all of this data that's being used for things that it wasn't, primarily built for.
So we'll, talk, a little bit about the Google Flu.
This was a project where.
You know, just by using the search queries at Google, your search queries, you could build some predictive models on where the flu was outbreaking in the United States.
A lot of people in every epidemiology use this kind of information, based on airline networks, where people are moving, to parametrize their models.
It's, all this data that's being collected for their primary mode, that then is being sort of used in lots of different ways., Now there's, a lot more to the big data definition.
But this is the aspect of it that I want you to focus on today.
Now, there's data, brokers out there.
It's, a multibillion dollar business where companies share and sell their data, and sometimes, for some companies, the data that they share and sell to other companies is worth more than the actual subscriptions or company or the primary data product that it was originally built, for., And it's.
Just one of many examples where Gillette was asking for data from Tinder's analytics group about the effectiveness of shaving for-- I mean.
This is just sort of a kind of a cute example.
You know, do people swipe left or right based on facial hair? And.
This was sort of a little funny experiment that hit the social media.
The idea here is that there's these data brokers, they share information or they share this data.
And they make a lot of money.
The, brokers themselves.
And the companies that are selling it.
There was a report, not too long ago, several years ago, that really got the industry.
It got the universities really excited, because I've been a part of that sort of movement here, at the University of Washington for many years now, where we were looking at reports like this, like from the McKinsey report about what the job opportunities are for you.
The job opportunities are pretty much limitless, right? Now., I, don't think that demand has subsided at all.
So there is a big demand for this.
This is the kind of things that universities and parents like to see too, because there are lots of jobs.
And there are lots of big problems to solve.
Here at the University of Washington.
We care very much about it.
And throughout this talk I'm going to throw a lot of criticisms of big data., But, I do want to give you one.
Caveat, I do really get excited about big data and data.
Do think there is a lot of opportunity, but there's a lot of things that we should be aware.
Of., As students here at the University of Washington, just as a side note, I.
Think you should take advantage of all the great things we've got going on across the campus.
Not, just within the Information School.
And in biology.
Now, you see these kinds of articles in the news almost daily.
I track the social media around these topics, of course, because this is sort of the core of what I do in my research.
And my teaching.
And I read the newspapers, and every day-- there's, hardly a day that goes by that I don't read some sort of newspaper article like this.
This one is about another great, smart MIT group that had some new algorithm that claims to rub shoulders with human intuition.
In big data, analysis.
They are doing is really great.
And it gets me, excited, and I read it and look at their techniques and then think of possibilities around it.
But some of this starts to get a little bit hype-y.
Then you even sort of see articles like this, that was recently written in The Guardian by William Davies, who was talking about the big data.
He said, "In in the long term.
The implications of big data will probably be as profound as the invention of statistics." That is that's quite a claim.
So now, we're getting into territory where things are getting a little bit, maybe too good to be true.
"The rise of big data provides far greater opportunities for quantitative analysis than any amount of polling or statistical modelling.
But it's, not just the quantity of data that is different.
It represents an entirely different type of knowledge." That's.
A big statement., And, I'm, not really necessarily making fun of William.
I say, these things in my class, too., So, I, sort of get caught in the enthusiasm of big data., But I, just want to put that out in this room of calling B.S.
when you see it.
This particular article was written by Chris Anderson.
And it was published in Wired and got a lot of attention.
This article basically said, the scientific method, it's, done., We, don't need the scientific method.
Anymore., We, don't need theory.
We've got big data, the data, the numbers speak for themselves.
Look at Google-- I'm, sorry, I'm speaking through him.
I'm, not saying, this.
He basically makes this claim that science should be looking at Google, because just having the numbers, you don't need to know anything about psychology.
The numbers speak for themselves.
They do is what you collect and that's all that matters.
This is where I will sort of come out and say, I think this is bullshit.
Do think there is a real need for the scientific method, and we're going to talk about some examples today, where we really do still need these traditional forms of doing analysis, collecting data, and modelling, the kinds of things that we see in the social sciences and in biology.
What I want you to take from today's lecture.
These are the five important take-homes that sort of weave in and out of the discussion, today., So first.
The science method is not dead., I, promise, you, it's, not dead.
Even as a lab that's specializes in data.
We very much use the scientific method.
Machines are not bias-free.
The may be, next to this one, which is the most important thing you can learn today without knowing anything about statistics or machine-learning, garbage in, garbage out.
We'll, give you some ways and some examples to be able to apply this garbage in, garbage out principle.
There is major fallibility in machines., And I think, as a classroom of 160, students, I hope if there's one thing you'll do today, when you walk out or hopefully you're doing it already, is don't, get carried away too much with the enthusiasm of machine-learning, and the self-driving cars, and data science, be a little skeptical and start to call out the fallibility of machines.
The reason for that is that there are lots of ethical, problems., CARL, BERGSTROM: Devin.
That was a super introduction to big data, although maybe a bit cheerlead-y.
You didn't really give us a good definition, though, and I think I've got a definition for you.
You just advance the slide? I stuck that in there.
JEVIN, WEST:, Sure, CARL, BERGSTROM:, To, me.
It seems that big data is the idea that a sufficiently large pile of horseshit will, with probability one, somewhere contain a pony.
If you don't, remember anything I said, at least you'll, remember, the pony now., JEVIN, WEST:, Yeah.
So that'll get us.
Remember the pony and we'll go into the next section.
A University of Washington seminar, “Calling BS in the Age of Big Data,” promises to help students develop a BS detector — and it's become a global phenomenon, with universities as far away as Australia planning to teach a version of it this fall.What are the 5 P's of big data? ›
It takes several factors and parts in order to manage data science projects. This article will provide you with the five key elements: purpose, people, processes, platforms and programmability , and how you can benefit from these in your projects.What does I am calling BS mean? ›
Calling bullshit is a performative utterance, a speech act in which one publicly repudiates something objectionable. The scope of targets is broader than bullshit alone. You can call bullshit on bullshit, but you can also call bullshit on lies, treachery, trickery, or injustice.What is BS in data? ›
Bachelor's Degree in Data Science
in Data Science degree gives students the skills, tools, and experiences to make intelligent, data-driven decisions and draw conveyable meaning from data sets, positioning them for data science careers in nearly every industry imaginable.
The 3 V's (volume, velocity and variety) are three defining properties or dimensions of big data. Volume refers to the amount of data, velocity refers to the speed of data processing, and variety refers to the number of types of data.What does BS in data science mean? ›
The B.S. in Data Science spans academic fields in computer science and mathematics such as machine learning and statistical inference, probability, linear algebra, computer programming, software engineering, data mining, high-performance computing, and cloud computing.What is BS data management? ›
Bachelor of Science, Data Management/Data Analytics. The B.S. in Data Management/Data Analytics is designed to prepare. science professionals who can set up a database environment, design. databases, acquire data, wrangle it, analyze it, and visualize it to.