Data Science "operationalization" is a nightmare
I've been working as a data guy for in data science team now for about 6 months. My job lately has been mostly about putting together databases and server side code in R to put PhD modellers' code in production.
Before this job, I was a big fan of the idea that with Microsoft ML server and R Studio Connect, you could simply take the code from these data scientists, polish it up a little, and plug it into a production workflow as opposed to re-writing the whole thing in C++, Java, or C#. If fact, the vendors of these platforms sell on that exact idea. <tbc>
October 9th, 2018 11:23am
But here's the reality: The data scientists are pretty familiar with R from grad school .. but they are for the most part abysmal programmers. I'll often get some monolithic files of code with copy and pasted blocks all over the place, over-reliance on global variables, re-using the same variables for strings or numbers, no comments, etc. Many times, their models are too slow or even wrong, and I have to correct it.
So the polishing up part means I have to do what I used to do before: re-write the whole thing. I usually modularize it, reduce the size of the code base, and apply generally well know software engineering principles. Then, after I do all that, someone will send me another block of code which is similar to the last one they sent except with a few corrective changes to the model Now I have to take apart their new changes and make sure it still fits with what I did before - which can takes sometimes just as long as re-writing it in the first place.
Now I'm starting to lean towards Python a bit more. Not because I think it does data science work better than R (it doesn't). But at least data science guys won't think they're programmers also. As easy as Python is, it does require a bit more understanding of software development as a discipline than R, IMO.
Don't get me wrong, the data science guys are smart and everything, but telling them that we can put their code in production almost "as is" makes them think they're programmers when they really are not.
October 9th, 2018 11:24am
Would you say that data science is a good path for someone with a programming background? I have been thinking about getting into that area or business analytics.
October 9th, 2018 12:01pm
You want those guys to value you, and see you as a valuable addition to the team. So when they send you their steaming pile of poo, smile and say thank you.
Also, keep copies of their original R source so you can do a diff with their updates.
Bonus if you send them common modules they can reuse to reduce the poo in the future.
Data Science is still a bullshit field. You shape data into your narrative. There is only so much you can say about data. Sure, you can see a trend this way or that way but that's about it. It isn't like building spaceship and rockets to Mars. You either have it or you don't!
October 9th, 2018 1:07pm
HP thought big data was the bee's knees...
...until they bought Autonomy for $11.7 billion.
Within a year, HP had written off $8.8 billion of Autonomy's value:
They have since sold the company to MicroFocus.
October 9th, 2018 1:13pm
This is actually very true. I had been a Data Warehouse/BI specialist for years so I see this from that angle as well. I'm finding that well over half the work being done by these data science teams is in fact BI. They often simply add, average, and graph trends in data in a way that we've been doing for decades already.
But because many data science teams are not part of IT, and because they're almost all PhD's, they get away with producing reports with no QA controls, no process engineering, and all with a direct line to the C-Suite.
It's really the fault of IT departments, IMO, for having allowed themselves to become cost centers without promoting their capabilities to improve market share. They raced to the bottom in wages by hiring cheap labor and resorting to offshoring which drastically dropped quality as well as their own standing in the company.
Data scientists have done a great job of positioning themselves as wizards ready to unlock secrets in their data. For that they're seen as sexy and they get a sexy budget to boot.
October 9th, 2018 1:29pm
> It's really the fault of IT departments, IMO, for having allowed themselves to become cost centers without promoting their capabilities
as profit centres.
October 9th, 2018 6:23pm
>Would you say that data science is a good path for someone with a programming background? I have been thinking about getting into that area or business analytics.
I had a completely clueless coworker at my last job. He works as a "Data Scientist" now.
The main problem with Data Science is that there's too many fools chasing it. That makes it harder for someone competent to stand out. (counterintuitive, but true)
October 9th, 2018 6:55pm
> Too true.
Good point HD
Also agree with FSK. All of tech has that problem
October 9th, 2018 10:43pm
Well the current barrier to entry is the PhD in mathematics or statistics. Even if you're good enough with statistical models to perform better than some that do have a higher degree, your boss will never let you build one.
I cam in from programming and I'm really the data guy for the team, but not a real data scientist. This is a bit different that the programming field where there is no objective qualification to keep out the burger flippers that went to a 2 week coding boot camp.
October 10th, 2018 1:56am
Does it have to be specifically PhD though? I know a bloke who was doing a Bachelors majoring in data science.
October 10th, 2018 3:37am
Not sure about the bachelors. But I bet that might fly in a pinch. Point is, it seems for the moment to require some academic credentials that aren't required for programmers.
October 10th, 2018 5:05am
They're requiring a PhD for jobs that don't really require one (Data Scientist).
The proves there isn't really a shortage of STEM PhDs. If there really was a shortage, PhD wouldn't be a hard requirement.
What happens is that the boss has a PhD, so he makes sure everyone else has a PhD (otherwise why did he waste 5+ years on his degree?).
October 10th, 2018 8:09pm
I agree, FSK. No only that, but I find that the PhD's aren't a guarantee of being that smart. I believe they're also using this to impress the client. Non-PhD's problably have this idea they're dealing with genius types in lab coats. I was actually a bit intimidated at first .. but now, not so much.
The PhD program costs some of these people alot of time out of the workforce. So while they were researching their thesis, others were implementing things in the real world.
But .. if a PhD from Harvard or MIT fucks up a statistical model that costs the company millions, well hey, the boss can't blamed .. I mean c'mon Harvard. But if a self-educated yokel like me fucks it up, the boss gets crucified.
October 11th, 2018 2:58am
Physics or Chemistry or Biology you want a PhD since an undergraduate in those doesn't qualify you for high level work.
Computer Science is different.
Every CS PhD I've met is an absolute dipshit who can't program. Their backstories are that they couldn't get a job with their BS or BA, so they went back to school. Only the total idiots couldn't get high paying jobs with their BS in CS.
October 11th, 2018 5:01am
> So while they were researching their thesis, others were implementing things in the real world.
All advancements in CS now take place in industry.
The CS professors they are studying with are 20-40 yrs behind the state of the art.
PhD students spend 8 yrs writing a thesis about 1 idea.
A good programmer at a company making new things will have implemented 50 novel PhD thesis quality ideas in that same time period.
October 11th, 2018 5:04am
> Every CS PhD I've met is an absolute dipshit who can't program.
Isn't MS and Google employing large batches of CS PhD guys? Do all of them match with your description?
October 11th, 2018 5:06am
I'm not sure I'd put them all down as dipshits. Most of the people on my team are very smart. It's just that they have no experience.
The biggest problem I see with this is that they think that because they have a PhD from MIT or Stanford, they must be smart and if others don't, they must be stupid. Therefore, they think they know how to code much better than they do, they think they can get by without processes, and that requirements are for stupid people.
I've been at meetings where they roll their eyes and silently give each other knowing smiles as I explain why we need to use a source control system.
As for Google and MS, I'm partially convinced that their budget is correlated to the number of PhD's from big name schools they can hire. And they can promote the idea to the public and their stockholders that they're working on something really big because they're paying one or two guys a 700K salary.
Smoke and mirrors, IMO.
October 11th, 2018 6:09am