Friday 13 May 2016

R Tutorials

The following resources helped me in having a good R programming foundation.

Cook book for R

http://www.cookbook-r.com/

R Training Video

https://www.youtube.com/watch?v=iffR3fWv4xw&list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP&index=1&noredirect=1

Quick R:-
http://www.statmethods.net/

A tutorial to perform basic operations with spatial data in R, such as importing and exporting data (both vectorial and raster), plotting, analysing and making maps:

http://pakillo.github.io/R-GIS-tutorial/#intro

Many tutorials for using ggplot2 and Lattice

 http://learnr.wordpress.com/tag/ggplot2/

I am sure the above will help you the way it helped me in learning R

Enjoy your day folks...

Thursday 12 May 2016

Why Spark in place of MapReduce ???

Apache Spark has numerous advantages over Hadoop's MapReduce execution engine, in both the speed with which it carries out batch processing jobs and the wider range of computing workloads it can handle.Spark is able to execute batch-processing jobs between 10 to 100 times faster than the MapReduce engine according to Cloudera, primarily by reducing the number of writes and reads to disc.As Hadoop moves beyond MapReduce, an Enterprise focus, in-memory technology and accessible machine learning are the next frontiers."You have map and reduce tasks and after that there's a synchronisation barrier and you persist all of the data to disc".While this feature was designed to allow a job to be recovered in case of failure, "the side effect of that is that we weren't leveraging the memory of the cluster to the fullest".

"What Spark does really well is this concept of an Resilient Distributed Dataset (RDD), which allows you to transparently store data on memory and persist it to disc if it's needed."But there's no synchronisation barrier that's slowing you down. The usage of memory makes the system and the execution engine really fast."

Converged Data Platform

Converged Data Platform integrates Hadoop and Spark with real-time database capabilities, event streaming, and scalable enterprise storage to power a new generation of big data applications. MapR's Converged Data Platform integrates both Hadoop and Spark to deliver enterprise grade security, reliability, and real-time performance while dramatically lowering both hardware and operational costs of your most important applications and data

Predictive Analytics Future

Predictive Analytics will be more user driven than knowing the nuances of statistics,machine learning algorithms. User will ask questions to smart applications/expert systems which will give insight by extracting knowledge from the data. New data sources will be added like wearables data/IOT data.


Predictive analytics can be used not only to reveal what people will buy, but also how effective they will be in a job.Information Management revealed how Abbot Analytics used predictive data(as per Forbes) to help the U.S. Special Forces assess new candidates based on information such as the qualities of people who had done the job effectively in the past, as well as “acceptable trade-offs” for those qualities.

See you tomorrow...