Archive for the ‘Cloud computing architecture’ Category

Hadoop: what it is and how it works

Hadoop logo 2



24 May 2013 – You can’t have a conversation about Big Data for very long without running into the elephant in the room: Hadoop. This open source software platform managed by the Apache Software Foundation has proven to be very helpful in storing and managing vast amounts of data cheaply and efficiently.

But what exactly is Hadoop, and what makes it so special? Basically, it’s a way of storing enormous data sets across distributed clusters of servers and then running “distributed” analysis applications in each cluster.

It’s designed to be robust, in that your Big Data applications will continue to run even when individual servers — or clusters — fail. And it’s also designed to be efficient, because it doesn’t require your applications to shuttle huge volumes of data across your network.

Here’s how Apache formally describes it:

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each

Read more

The Education Of An Algorithm

Dilbert Algorithm


30 March 2013 – The story starts in the early 20th century with one of those obscure academic squabbles that usually don’t amount to much. A mathematician and theologian named Pavel Nekrasov argued that since independent events follow the law of big numbers and social phenomena such as crime statistics do as well, then humans must have free will.

Andrei Markov, one of the great mathematicians of the day, thought Nekrasov’s argument was hogwash. After all, he noted, just because independent variables follow a certain mathematical law doesn’t mean that directed activity can’t do so as well.

To prove his point, he performed a mathematical analysis of Eugene Onegin, Pushkin’s famous novel in verse, and showed that the combinations of vowels and consonants followed the law of big numbers as well. A vowel, would most likely be followed by a consonant and vice versa, in proportions that became more stable as you analyzed more text.

And so, Markov succeeded in showing that dependent variables could yield distinct probabilities. It was the kind of interesting, but relatively useless insight that … Read more

Facebook’s latest server architecture: a challenge to OEMs IBM, H-P,and Dell? And Amazon and Google?

18 January 2013 – Facebook has provided a big endorsement for ARM server CPUs: the company is showing off a next-gen server architecture for its Open Compute platform for building cheap/dense/power-efficient IT infrastructures that allows companies to switch between various x86 and ARM CPUs by swapping boards. Applied Micro and Calxeda are the first ARM vendors to support it. The architecture will also support Intel’s silicon photonics tech, using it to handle 100G Ethernet links.

Pundits in the industry are saying this latest server architecture could present a huge challenge to server OEMs such as IBM, H-P , and Dell. Not only does the open-source architecture enable cheap, energy-efficient servers, it offers a tremendous amount of flexibility to install and swap out parts as users see fit. It arrives at a time when many incumbents are already struggling to deal with the adoption of cheap commodity servers by Internet/cloud giants.

And Rackspace is throwing its weight behind Facebook’s platform.  Embracing Open Compute could help Rackspace’s OpenStack solutions become more cost-competitive relative to Amazon and Google’s cloud infrastructure offerings – both … Read more

How Big Data, cloud computing, Amazon and poll quants won the U.S. election

By: Gregory P. Bufithis, Esq.   Founder/CEO, The Cloud and E-Discovery

15 November 2012 –   As Daniel Honan of Big Think pointed out, just like in baseball and politics, there are winners and losers in a data-driven world. The losers in baseball, for instance, are the over-rated prospects who will never be drafted because data analysis has a way of finding them out early on in their careers. In politics, the biggest loser will be the horse race pundit, the guy who spins the polls to reinforce one side’s belief that it is winning when it’s actually losing. Sometimes this is done for partisan reasons, in the hope of creating “momentum,” and sometimes it is done to create a more compelling media narrative.

This was indeed a choice election, and the choice was between following entertainment journalism or data-based journalism. As Andrew Beaujon has pointed out, entertainment is fun, and math is hard. Well, math won.

Data analysis at its best

It is a fascinating area of data analysis.  As part of my neuroinformatics degree program, I recently had the chance … Read more

Big data: Why it’s really an architecture challenge

31 October 2012 – What’s missing from all the conversations about big data is a focus on the infrastructure necessary to support it — and in particular its use in real time.  For many companies, big data means opening up access to the data warehouses they have always maintained. Data warehousing has been and continues to be a critical component of enterprise-class organisations. Such systems provide the aggregation of data from across the organisation and enable it to be sliced and diced into consumable chunks allowing business analysts to provide insights into business conditions. A properly designed architecture focused on scalability is paramount.

It is this form of the data — parsed and processed into actionable information — that will be integrated back into the datacentre, into applications and infrastructure, to serve as input to the myriad systems and processes making near real-time decisions. But data warehouses were not designed for the volume of integration and access required by such models — nor are the various business-intelligence systems that assist in processing the data.

What’s missing from all the conversations … Read more

Cloud Infographic: The Cloud Wars – Private vs Public

19 April 2012 – Cloud computing is not an all-or-nothing option. In the past decade, the industry has matured to a point where there are almost a dozen different options to move your data and processes to the cloud. Two of the most discussed are Private and Public clouds.

Here is an infographic presenting a breakdown of the two different cloud models from CloudTweaks (click here).… Read more

A Funny Thing Happened on the Way to the Data Center

13 April 2012 – Douglas Balog is Vice President and Business Line Executive, IBM Storage Systems, responsible for the overall IBM storage business, including business strategy and product plans for the company’s complete family of workload optimized storage systems.  In the current issue of Data Center Knowledge he writes:

“The personal computer, the Internet and World Wide Web, wireless and mobile computing, and now social media have all served to form and reform the way in which we in business think about, manage and exploit computing power. The incessant waves of innovation have led to incalculable levels of information sharing and leveraging and created entire industries as well as business disciplines.  But a funny thing happened on the way to the data center. Over the course of the last 25 years of technical disruption, those responsible for managing all of this innovation started becoming saddled with more responsibility. For example, the constant influx of new technologies resulted in larger IT staffs. As innovation lead to greater prosperity, employee ranks swelled as well, and the gatekeepers found themselves now setting access … Read more

I.B.M. aims to sharply simplify corporate data center technology

11 April 2012 – We have waxed lyrical in the past about how IBM seems to be in every aspect of  the disruptive technologies sweeping across whole industries and society, all of it concerning information technology and information management.  Just one example is what we learned at FutureMed (click here) which is part of our upcoming special series “IBM: a culture of analytics”.

But corporate data centers are, as Steve Lohr points out in today’s New York Times,  the slowpoke laggards of information technology.  Although the essence of cloud computing is a move towards highly standardized racks of commodity servers, it can take up to six months to get a new business application up and running, from buying the hardware to fine-tuning the software. An estimated 70 percent of corporate technology budgets is spent on installing, updating and maintaining current technology — keeping the digital lights on.

IBM intends to change all that in an effort that is the most ambitious step yet to simplify and streamline data center technology.  For the story from today’s New … Read more

Investors and users beware: Facebook is all about IT (and that 30 petabytes Hadoop cluster)

By:  Gregory P Bufithis, Esq.  

2 February 2012 –  The nerve of Facebook.   Filing its S-1 during LegalTech when many of us were deeply ensconced in the issues surrounding the cloud and ediscovery.

As Gigaom points out, by now every statement in Facebook’s S-1 filing has already been pored over to death (I downloaded the S-1 to my iPad and enjoyed it over dinner last night) … published, blogged  and analyzed. Karsten Weide, a technology analyst with IDC, posted an analysis last night and reflected what many of the number crunchers have been saying:

“This filing implies Facebook is valued at $100 billion, which I think is too high. That’s about 27 times more than their 2011 revenue. But even assuming they can double revenue this year, I think it’s too high. It’s reminiscent of the valuations for stocks in the Internet 1.0 days.  Even if it were valued at just at $80 billion, I think it would be too high. There are a number of challenges and risks Facebook faces, and one is the growth of Google Plus … Read more