Posts Tagged ‘Big Data’

The Republicans scramble to learn data science to combat Democrats’ sophisticated data-analytics platform


Data science word scramble



12 February 2014 – President Obama’s 2012 re-election campaign relied on a sophisticated data-analytics platform that allowed organizers and volunteers to precisely target potential donors and voters. The centerpiece of that effort was Project Narwhal, which brought voter information—steadily accumulated since Obama’s 2008 campaign—onto a single platform accessible to a growing number of campaign-related apps.

We covered this in detail in our post How Big Data, cloud computing, Amazon and poll quants won the U.S. election (click here).

The GOP has only a few short years to prepare for the next Presidential election cycle, and the party is scrambling to build an analytics system capable of competing against whatever the Democrats deploy onto the field of battle. To that end, the Republican National Committee (RNC) has launched Para Bellum Labs, modeled after a startup, to produce digital platforms for election analytics and voter engagement.

Is this a genuine attempt to infuse the GOP’s infrastructure with data science, or merely an attempt to show that the organization hasn’t fallen behind the Democratic Party when it … Read more

Hadoop: what it is and how it works

Hadoop logo 2



24 May 2013 – You can’t have a conversation about Big Data for very long without running into the elephant in the room: Hadoop. This open source software platform managed by the Apache Software Foundation has proven to be very helpful in storing and managing vast amounts of data cheaply and efficiently.

But what exactly is Hadoop, and what makes it so special? Basically, it’s a way of storing enormous data sets across distributed clusters of servers and then running “distributed” analysis applications in each cluster.

It’s designed to be robust, in that your Big Data applications will continue to run even when individual servers — or clusters — fail. And it’s also designed to be efficient, because it doesn’t require your applications to shuttle huge volumes of data across your network.

Here’s how Apache formally describes it:

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each

Read more

Nate Silver, l’homme qui parlait à l’oreille du big data

Nate Silver


24 Mai 2013 – Portrait de Nate Silver, « le saint patron des nerds », comme le baptise Le Monde.

« Il est là, costume gris et lunettes rectangles, avec des airs d’oiseau tombé du nid, face aux mines imprégnées de ses fidèles, aussi sûr de ses algorithmes que de son aura. Des mois avant le scrutin présidentiel américain du 6 novembre 2012, il annonçait une confortable victoire de Barack Obama. Faisant ainsi mentir les analystes et autres experts qui pronostiquaient un vote serré. “

Il fait actuellement la promo de son livre, ‘The Signal and the Noise’ (‘Le Signal et le Bruit’, paru aux Etats-Unis en septembre 2012). Dans la liste des best-sellers des œuvres non romanesques du New York Times et du site Amazon à sa sortie, il est aujourd’hui traduit dans le monde entier. Sauf en France.

‘Il n’est pas étonnant que l’Hexagone n’ait pas traduit son œuvre, estime Stéphane Rozès, conseiller politique, président de Cap. Si la France est une grande consommatrice de sondages, l’idée même qu’un statisticien puisse annoncer en amont le résultat de

Read more

Anonymity? HA! It’s becoming algorithmically impossible



10 May 2013 – Has Big Data made anonymity impossible?  As the amount of data expands exponentially, nearly all of it carries someone’s digital fingerprints. According to International Data Corporation (IDC), the amount of data created in 2012 reached a whopping 2.8 zettabytes — that’s 2.8 trillion gigabytes — and that number is predicted to double by 2015. Most of it is made by individuals as they go through their daily interactions, and consequently, as tracking and storing of that data improves, analysts are able to learn even more about those people. All of this is leading to a day when, according to computer scientist Arvind Narayanan, it will be “algorithmically impossible” to be truly anonymous.

For more, here is a great article from MIT Technology Review (click here).

Related: “Laws as algorithms: converting simple laws to machine-readable code” 


 … Read more

The importance of making Big Data accessible to non-data scientists

Data science for dummies 1


4 May 2013 – Gartner analyst Doug Laney first coined the term ”big data” over over 12 years ago although one suspects — at least in its current form — people have been complaining about “information overload” since Roman times. But the term’s meaning is still far from clear and it wins continuous nominations in the “Tech Buzzword That Everyone Uses But Don’t Quite Understand” competitions, followed closely by “the cloud”.

When using the term, Gartner usually keeps the quote marks in place (i.e. it’s “big data”, not big data). And as we learned at the Gartner Business Intelligence and Analytics Summit in Barcelona two months ago, Gartner has spent a tremendous amount of time on it. As Gartner analyst Donald Feinberg warned people at the conference “talking only about big data can lead to self-delusion” and he urged people not to “surrender to the hype-ocracy.”

NOTE: next month we’ll have a chance to talk about “big data” more with Gartner analyst Debra Logan along with Jason R. Baron when our video crew travels to Rome to interview Read more

If Hadoop was easy … well, then everyone would be doing it

Big Data Hadoop word conglomeration


3 April 2013 – One of our “must reads” of late has been IBM Data Magazine (available in print and digital) because of its clear, in-depth technical advice and hands-on examples on the latest topics in data management, IBM databases, and Big Data. It’s coterie of columnists include Tom Deutsch on Big Data, Paula Wiles Sigmon on Governance, and James Kobielus who writes our favorite column “Rocketship to Planet Petabyte”.  From the current issue:

The advantages of enhancing an analytic ecosystem with Apache Hadoop capabilities to leverage unstructured data have never been clearer. Just a few years ago, this technology was not available to enterprises—all we had was good-old structured data in the data warehouse. Today, though, the picture is very different. Hadoop is presenting tremendous opportunities to evolve from traditional data warehousing to big data platforms where organizations can process all kinds of data.

At IBM, we’ve spent quite a bit of time looking at how customers are leveraging big data technologies. Five key use cases have emerged:

  1. Enriching an organization’s information base with big data exploration
  2. Improving
Read more

The power of data visualization’s “Aha!” moments: an interview with Amanda Cox of The New York Times



19 March 2013 – Amanda Cox has been a graphics editor at the New York Times for eight years. Trained as a statistician, Cox develops visualizations across platforms, from simple print infographics to highly complex online interactive data tools. The Times is a visualization leader, but Cox believes the best is yet to come from this discipline, which she calls “both young and not young.” In an interview with the Harvard Business Review Blog Network Cox spoke about the Times’ approach to visualization and the power of “Aha!” moments:

Do you think data visualization is entering a time when it’s becoming a core communication tool?

I wish there were more examples in the high-end data viz world to back that up. I wish there were more examples where data viz actually mattered. The case studies for us to lean on are sparser than they should be. On the other hand, you can argue it’s a young field and people are doing all kinds of crazy interesting things, and that’s a good thing. There’s that classic idea that it’s useful … Read more

Cloud Infographic: Companies Fighting For Data

Size of big data

7 March 2013NEWS ALERT!! The amount of data in our world increases massively day-by-day. Wow.  Who knew?  Hardly ANYBODY writes about this.  Ok, a few.  According to a McKinsey report (which we’ll admit is a bit dated now, having been written waaaaay back in 2011) U.S. companies from almost all industry sectors have, on average, hundreds of terabytes of data stored per company. The amount of data is growing as companies gather more and more information with each transaction and interaction with their customers. We may run out of those 86 GB flash drives at this rate.

Which is just a somewhat (we hope) comical intro to a cool graphic the folks at EVault came up with for our pals at CloudTweaks:

click here

[ NOTE: after you open the image just click on it to make it larger ]

 … Read more

For Debra Logan of Gartner, Big Data is “a solution looking for a problem”

Big Data confusion


22 February 2013 –  Ok, it is an open and shut case. Big Data is all the rage. But now if only someone had to clue what to do with it. The trouble is most businesses may be failing to ask the right questions, according to analyst extraordinaire Debra Logan of Gartner.

NOTE: I first met Debra at ZyLAB Universe 2011  and we have stayed in touch since. She is one of the organizers behind the DESI workshop on predictive coding and other machine learning algorithms  to be held in Rome, Italy in June and if the stars align I will be conducting a video interview with my crew from Project Counsel Media and we’ll be able to chat about a number of topics.

Many are looking at big data — large datasets from multiple sources — and trying to figure out what it is, according to Debra. As she said in a recent London roundtable debate “what is it that a massive set of data tells you about a particular problem that a more reasonable set of … Read more

Big Data: Stop Focusing On Size

It’s not the size of your data, it’s what you do with it, says IBM analytics executive

14 January 2013 – Rich Rodts, who manages IBM’s analytics academic programs, often finds himself discussing big data with family, friends, clients and business partners, including representatives from top universities across the U.S.  “There really is no wrong definition of what big data is,” Rodts told InformationWeek in a phone interview. “I like to explain big data as taking a vast amount of information and being able to distill it in a way that can be consumed and acted upon.”

A common definition that’s often overused is one that focuses solely on the vast quantities of data being created, said Rodts, who offered an alternative view.  Big data, he said, “paints a picture” of a human being, including the often mundane tasks a person completes through the day: using an ATM, paying bills or buying movie tickets online, taking public transportation, and so on. “Each one of those things creates a unique data point,” said Rodts. “One that points back to me as … Read more