Archive for the ‘Big Data’ Category

This company has built a profile on every American adult. Or so it says.

A file on every American

6 August 2016 – Every move you make. Every click you take. Every game you play. Every place you stay. They’ll be watching you. IDI, a year-old company in the so-called data-fusion business, is the first to centralize and weaponize all that information for its customers. The Boca Raton, Fla., company’s database service, idiCORE, combines public records with purchasing, demographic, and behavioral data.

Chief Executive Officer Derek Dubner says the system isn’t waiting for requests from clients — it’s already built a profile on every American adult, including young people who wouldn’t be swept up in conventional databases, which only index transactions. ‘We have data on that 21-year-old who’s living at home with mom and dad,’ he says.

 Read more

The Republicans scramble to learn data science to combat Democrats’ sophisticated data-analytics platform


Data science word scramble



12 February 2014 – President Obama’s 2012 re-election campaign relied on a sophisticated data-analytics platform that allowed organizers and volunteers to precisely target potential donors and voters. The centerpiece of that effort was Project Narwhal, which brought voter information—steadily accumulated since Obama’s 2008 campaign—onto a single platform accessible to a growing number of campaign-related apps.

We covered this in detail in our post How Big Data, cloud computing, Amazon and poll quants won the U.S. election (click here).

The GOP has only a few short years to prepare for the next Presidential election cycle, and the party is scrambling to build an analytics system capable of competing against whatever the Democrats deploy onto the field of battle. To that end, the Republican National Committee (RNC) has launched Para Bellum Labs, modeled after a startup, to produce digital platforms for election analytics and voter engagement.

Is this a genuine attempt to infuse the GOP’s infrastructure with data science, or merely an attempt to show that the organization hasn’t fallen behind the Democratic Party when it … Read more

Researchers connect 91% of phone numbers with names in metadata probe


Metadata 2

24 December 2013 – One of the key tenets of the argument that the National Security Agency and some lawmakers have constructed to justify the agency’s collection of phone metadata is that the information it’s collecting, such as phone numbers and length of call, can’t be tied to the callers’ names.


Some quick investigation by some researchers at Stanford University who have been collecting information voluntarily from Android users found that they could correlate numbers to names with very little effort. The Stanford researchers recently started a program called Metaphone that gathers data from volunteers with Android phones. They collect data such as recent phone calls and text messages and social network information.

The goal of the project, which is the work of the Stanford Security Lab, is to draw some lines connecting metadata and surveillance. As part of the project, the researchers decided to select a random set of 5,000 numbers from their data and see whether they could connect any of them to subscriber names using just freely available Web tools.

The result: They found … Read more

How in HELL do you visualize a yottabyte?! Well, by use of a brilliant infographic.


Data on big screen


26 November 2013 – Nowadays, data size measurements such as kilobyte and megabyte are commonplace in tech parlance, but a new infographic puts those measurements into context, and takes a look at drive technologies, too. One byte? One character. 10 bytes? One word. A megabyte? A short novel. 10 terabytes? The entire printed collection in the U.S. Library of Congress. Etc., etc., etc.

Its source is our good friends at datascience@berkeley who are part of the University of California Berkeley and who offer a professional Master of Information and Data Science (MIDS) which is delivered online and which features a multidisciplinary curriculum designed to prepare data science professionals to solve real-world problems using complex and unstructured data. For more information just click the link above. Two of our staffers are currently enrolled.

The infographic begins with the humble bit and works its way up in file size all the way to zettabyte and yottabyte. A yottabyte is equal to 1,000 zettabytes and one yottabyte is the size of the entire world wide web, according to the infographic.


Read more

Financial markets and “Big Data” crashs: “reckless behavior” replaces “market manipulation” as the standard for prosecuting misbehavior

Swirling 0s and 1s


29 May 2013 – Most of us have been following the regulators’ struggle to meet the challenges posed by high-frequency trading (for an excellent infographic on HFT click here). This ultra-fast, computerized segment of finance now accounts for most trades. HFT also contributed to that infamous “flash crash”  back in 2010, the sudden, vertiginous fall in the Dow Jones Industrial Average. However, the HFT of today is very different from that of three years ago. This is because of … yep … our “new new” friend “Big data”. And financial markets are notorious producers of big data: trades, quotes, earnings statements, consumer research reports, official statistical releases, polls, news articles, etc.

Companies that have relied on the first generation of HFT, where unsophisticated “speed exploits” price discrepancies, have had a tough few years. Profits from ultra-fast trading firms were 74 per cent lower in 2012 compared with 2009, according to Rosenblatt Securities which tracks this sort of information for its institutional clients.

NOTE:  In the hacking world an “exploit” is a piece of software, a chunk of data, Read more

Hadoop: what it is and how it works

Hadoop logo 2



24 May 2013 – You can’t have a conversation about Big Data for very long without running into the elephant in the room: Hadoop. This open source software platform managed by the Apache Software Foundation has proven to be very helpful in storing and managing vast amounts of data cheaply and efficiently.

But what exactly is Hadoop, and what makes it so special? Basically, it’s a way of storing enormous data sets across distributed clusters of servers and then running “distributed” analysis applications in each cluster.

It’s designed to be robust, in that your Big Data applications will continue to run even when individual servers — or clusters — fail. And it’s also designed to be efficient, because it doesn’t require your applications to shuttle huge volumes of data across your network.

Here’s how Apache formally describes it:

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each

Read more

Nate Silver, l’homme qui parlait à l’oreille du big data

Nate Silver


24 Mai 2013 – Portrait de Nate Silver, « le saint patron des nerds », comme le baptise Le Monde.

« Il est là, costume gris et lunettes rectangles, avec des airs d’oiseau tombé du nid, face aux mines imprégnées de ses fidèles, aussi sûr de ses algorithmes que de son aura. Des mois avant le scrutin présidentiel américain du 6 novembre 2012, il annonçait une confortable victoire de Barack Obama. Faisant ainsi mentir les analystes et autres experts qui pronostiquaient un vote serré. “

Il fait actuellement la promo de son livre, ‘The Signal and the Noise’ (‘Le Signal et le Bruit’, paru aux Etats-Unis en septembre 2012). Dans la liste des best-sellers des œuvres non romanesques du New York Times et du site Amazon à sa sortie, il est aujourd’hui traduit dans le monde entier. Sauf en France.

‘Il n’est pas étonnant que l’Hexagone n’ait pas traduit son œuvre, estime Stéphane Rozès, conseiller politique, président de Cap. Si la France est une grande consommatrice de sondages, l’idée même qu’un statisticien puisse annoncer en amont le résultat de

Read more

A new goldmine: making official data public could spur lots of innovation


Government data


18 May 2013 – After a Soviet missile shot down a South Korean airliner that strayed into Russian airspace in 1983, President Ronald Reagan made America’s military satellite-navigation system, GPS, available to the world. Entrepreneurs pounced. Car-navigation, precision farming and 3m American jobs now depend on GPS. Official weather data are also public and avidly used by everyone from insurers to ice-cream sellers.

But this is not enough. On May 9th Barack Obama ordered that all data created or collected by America’s federal government must be made available free to the public, unless this would violate privacy, confidentiality or security. “Open and machine-readable”, the president said, is “the new default for government information.”

This is a big bang for big data, and will spur a frenzy of activity. Pollution numbers will affect property prices. Restaurant reviews will mention official sanitation ratings. Data from tollbooths could be used to determine prices for nearby billboards. Combining data from multiple sources will yield fresh insights. For example, correlating school data with transport information and tax returns may show that academic performance

Read more

Anonymity? HA! It’s becoming algorithmically impossible



10 May 2013 – Has Big Data made anonymity impossible?  As the amount of data expands exponentially, nearly all of it carries someone’s digital fingerprints. According to International Data Corporation (IDC), the amount of data created in 2012 reached a whopping 2.8 zettabytes — that’s 2.8 trillion gigabytes — and that number is predicted to double by 2015. Most of it is made by individuals as they go through their daily interactions, and consequently, as tracking and storing of that data improves, analysts are able to learn even more about those people. All of this is leading to a day when, according to computer scientist Arvind Narayanan, it will be “algorithmically impossible” to be truly anonymous.

For more, here is a great article from MIT Technology Review (click here).

Related: “Laws as algorithms: converting simple laws to machine-readable code” 


 … Read more