Archive for May, 2013

Financial markets and “Big Data” crashs: “reckless behavior” replaces “market manipulation” as the standard for prosecuting misbehavior

Swirling 0s and 1s


29 May 2013 – Most of us have been following the regulators’ struggle to meet the challenges posed by high-frequency trading (for an excellent infographic on HFT click here). This ultra-fast, computerized segment of finance now accounts for most trades. HFT also contributed to that infamous “flash crash”  back in 2010, the sudden, vertiginous fall in the Dow Jones Industrial Average. However, the HFT of today is very different from that of three years ago. This is because of … yep … our “new new” friend “Big data”. And financial markets are notorious producers of big data: trades, quotes, earnings statements, consumer research reports, official statistical releases, polls, news articles, etc.

Companies that have relied on the first generation of HFT, where unsophisticated “speed exploits” price discrepancies, have had a tough few years. Profits from ultra-fast trading firms were 74 per cent lower in 2012 compared with 2009, according to Rosenblatt Securities which tracks this sort of information for its institutional clients.

NOTE:  In the hacking world an “exploit” is a piece of software, a chunk of data, Read more

Sophisticated AI program helps reassemble more than 100,000 document fragments collected across 1,000 years

AI document fragments


28 May 2013 – One scholar likened it to finding the orphaned socks for generations of a family. Another compared it to law-enforcement’s use of DNA databases and face-recognition software. The idea is to harness technology to help reassemble more than 100,000 document fragments collected across 1,000 years that reveal details of Jewish life along the Mediterranean, including marriage, medicine and mysticism. For decades, scholars relied mainly on memory to match up pieces of the Cairo genizah, a treasure trove of papers that include works by the rabbinical scholar Maimonides, parts of Torah scrolls and prayer books, reams of poetry and personal letters, contracts, and court documents, even recipes (there is a particularly vile one for honey-wine).

Now, for the first time, a sophisticated artificial intelligence program running on a powerful computer network is conducting 4.5 trillion calculations per second to vastly narrow down the possibilities.

“In one hour, the computer can compare 10 million pairs — 10 million pairs is something a human being cannot do in a lifetime,” said Roni Shweka, who has advanced degrees in both … Read more

Hadoop: what it is and how it works

Hadoop logo 2



24 May 2013 – You can’t have a conversation about Big Data for very long without running into the elephant in the room: Hadoop. This open source software platform managed by the Apache Software Foundation has proven to be very helpful in storing and managing vast amounts of data cheaply and efficiently.

But what exactly is Hadoop, and what makes it so special? Basically, it’s a way of storing enormous data sets across distributed clusters of servers and then running “distributed” analysis applications in each cluster.

It’s designed to be robust, in that your Big Data applications will continue to run even when individual servers — or clusters — fail. And it’s also designed to be efficient, because it doesn’t require your applications to shuttle huge volumes of data across your network.

Here’s how Apache formally describes it:

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each

Read more

Nate Silver, l’homme qui parlait à l’oreille du big data

Nate Silver


24 Mai 2013 – Portrait de Nate Silver, « le saint patron des nerds », comme le baptise Le Monde.

« Il est là, costume gris et lunettes rectangles, avec des airs d’oiseau tombé du nid, face aux mines imprégnées de ses fidèles, aussi sûr de ses algorithmes que de son aura. Des mois avant le scrutin présidentiel américain du 6 novembre 2012, il annonçait une confortable victoire de Barack Obama. Faisant ainsi mentir les analystes et autres experts qui pronostiquaient un vote serré. “

Il fait actuellement la promo de son livre, ‘The Signal and the Noise’ (‘Le Signal et le Bruit’, paru aux Etats-Unis en septembre 2012). Dans la liste des best-sellers des œuvres non romanesques du New York Times et du site Amazon à sa sortie, il est aujourd’hui traduit dans le monde entier. Sauf en France.

‘Il n’est pas étonnant que l’Hexagone n’ait pas traduit son œuvre, estime Stéphane Rozès, conseiller politique, président de Cap. Si la France est une grande consommatrice de sondages, l’idée même qu’un statisticien puisse annoncer en amont le résultat de

Read more

Sears turns old stores into datacentres: converting shells of the 20th century into tools for the 21st century digital economy



23 May 2013 – Sears has decided that one of the best things to do with all those stores it had to close after the US ran out of money was to convert them into data centres. Sears has created a Ubiquity Critical Environments unit which will convert a Sears retail store in Chicago into data centre space.

The cunning plan is to market space from former Sears and Kmart retail stores as a home for data centres, disaster recovery space and wireless towers.

This way the company will convert the shells of the 20th century retail industry into tools for the 21st century digital economy. The big idea is that you have a technology platform laid atop a retail footprint, creating the possibility for a product with a different look to it, he said. So what does Sears know about data centres? Well Farney does know a thing or two as he previously managed Microsoft’s huge Chicago data centre, and then ran a network of low-latency services for the financial services firm Interactive Data.

For more from Datacenter … Read more

The continuing application of artificial intelligence to the legal process

artificial intelligence


By: Gregory P. Bufithis, Esq.

23 May 2013 – You can certainly say this: the IT industry is nothing if not a breeding ground for an infinite variety of acronyms and neologisms. Alongside cloud computing today sits the term Big Data, which of course we understand to mean “that amount” of data which a traditional database would find hard to compute and process as a normal matter of job processing. The Big Data ethos and the zany incomprehensible world of zettabytes, petabytes and yottabytes have provided the computing world its neologism du jour, and a technology conference in 2013 is rarely considered complete without a smattering of uses.

And to make it worse … never mind the fact that Big Data has made anonymity algorithmically impossible … we are typically barraged further by volume-related qualifiers (tsunamis of big data being by far the worst offender I have encountered – other suggestions welcome). Although my favorite, used by the Chief Data Officer of a major corporation (who shall remain nameless) was at an analytics conference I recently attended: “Think Read more

A new goldmine: making official data public could spur lots of innovation


Government data


18 May 2013 – After a Soviet missile shot down a South Korean airliner that strayed into Russian airspace in 1983, President Ronald Reagan made America’s military satellite-navigation system, GPS, available to the world. Entrepreneurs pounced. Car-navigation, precision farming and 3m American jobs now depend on GPS. Official weather data are also public and avidly used by everyone from insurers to ice-cream sellers.

But this is not enough. On May 9th Barack Obama ordered that all data created or collected by America’s federal government must be made available free to the public, unless this would violate privacy, confidentiality or security. “Open and machine-readable”, the president said, is “the new default for government information.”

This is a big bang for big data, and will spur a frenzy of activity. Pollution numbers will affect property prices. Restaurant reviews will mention official sanitation ratings. Data from tollbooths could be used to determine prices for nearby billboards. Combining data from multiple sources will yield fresh insights. For example, correlating school data with transport information and tax returns may show that academic performance

Read more

Anonymity? HA! It’s becoming algorithmically impossible



10 May 2013 – Has Big Data made anonymity impossible?  As the amount of data expands exponentially, nearly all of it carries someone’s digital fingerprints. According to International Data Corporation (IDC), the amount of data created in 2012 reached a whopping 2.8 zettabytes — that’s 2.8 trillion gigabytes — and that number is predicted to double by 2015. Most of it is made by individuals as they go through their daily interactions, and consequently, as tracking and storing of that data improves, analysts are able to learn even more about those people. All of this is leading to a day when, according to computer scientist Arvind Narayanan, it will be “algorithmically impossible” to be truly anonymous.

For more, here is a great article from MIT Technology Review (click here).

Related: “Laws as algorithms: converting simple laws to machine-readable code” 


 … Read more

The importance of making Big Data accessible to non-data scientists

Data science for dummies 1


4 May 2013 – Gartner analyst Doug Laney first coined the term ”big data” over over 12 years ago although one suspects — at least in its current form — people have been complaining about “information overload” since Roman times. But the term’s meaning is still far from clear and it wins continuous nominations in the “Tech Buzzword That Everyone Uses But Don’t Quite Understand” competitions, followed closely by “the cloud”.

When using the term, Gartner usually keeps the quote marks in place (i.e. it’s “big data”, not big data). And as we learned at the Gartner Business Intelligence and Analytics Summit in Barcelona two months ago, Gartner has spent a tremendous amount of time on it. As Gartner analyst Donald Feinberg warned people at the conference “talking only about big data can lead to self-delusion” and he urged people not to “surrender to the hype-ocracy.”

NOTE: next month we’ll have a chance to talk about “big data” more with Gartner analyst Debra Logan along with Jason R. Baron when our video crew travels to Rome to interview Read more