http://www.zdnet.com/article/hadoop-creator-doug-cutting-on-the-near-future-tech-that-will-unlock-big-data/
Thursday, 9 June 2016
Hadoop creator Doug Cutting on the near-future tech that will unlock big data
http://www.zdnet.com/article/hadoop-creator-doug-cutting-on-the-near-future-tech-that-will-unlock-big-data/
Wednesday, 8 June 2016
Three distinct trends in today’s predictive analytics process
Three distinct trends in today’s predictive analytics process.
1. Predictive hypothesis testing
As businesses begin to ask more strategic questions about what’s really driving their performance, executives are understandably demanding proof before executing on this new era of insight. Specifically, businesses want to understand the cause and effect of various sets of data, and know that this analysis can be extrapolated over different periods of time. This new form of analysis, powered by machine learning, is critical to businesses looking to gain a competitive edge.
2. Closing the gap between data and delivery
The hunger for Big Data doesn’t end with merely gathering the right kind of data. From the thousands of data sets available, executives who are looking to better leverage data to solve complex problems need more streamlined ways to glean insight from the variety of data being collected. Currently, companies often deal with this by creating their own analytics platforms, which is very expensive and doesn’t support all facets of the predictive analytics process. Companies are looking for easier ways to close the data gap and are turning toward more streamlined cloud computing in order to speed up the time to insights.
3. Shrinking the barrier between internal and external data
While internal systems have been consolidated for years, the influx of external data is creating unforeseen silos within businesses. In order to increase efficiency and streamline workflows, companies have implemented web-based data transactions, which have created a great divide between internal data and external data. Companies that are spending thousands on a tool to gather a wide variety of external data sets are now faced needing to spend even more on a solution to combine this data into their internal workflows. As such, executives are demanding easier, quicker and less expensive ways to close this barrier.
Embracing the new predictive analytics process
The diversity of data sets and sheer amount of external data continues to grow with the speed of technology. Global companies certainly recognize its power, but only now are they beginning to find ways to glean real business value from the insights this data can provide.
From implementing more seamless processes for gathering and correlating external data to finding flexible solutions that enable hypothesis testing and analysis of various data sets, companies looking to fully leverage Big Data to solve big problems must embrace this new era of predictive analytics. By incorporating these three trends into business processes, companies can truly see what’s driving their performance and, ultimately, stay ahead of global competition.
Source:
1. Predictive hypothesis testing
As businesses begin to ask more strategic questions about what’s really driving their performance, executives are understandably demanding proof before executing on this new era of insight. Specifically, businesses want to understand the cause and effect of various sets of data, and know that this analysis can be extrapolated over different periods of time. This new form of analysis, powered by machine learning, is critical to businesses looking to gain a competitive edge.
2. Closing the gap between data and delivery
The hunger for Big Data doesn’t end with merely gathering the right kind of data. From the thousands of data sets available, executives who are looking to better leverage data to solve complex problems need more streamlined ways to glean insight from the variety of data being collected. Currently, companies often deal with this by creating their own analytics platforms, which is very expensive and doesn’t support all facets of the predictive analytics process. Companies are looking for easier ways to close the data gap and are turning toward more streamlined cloud computing in order to speed up the time to insights.
3. Shrinking the barrier between internal and external data
While internal systems have been consolidated for years, the influx of external data is creating unforeseen silos within businesses. In order to increase efficiency and streamline workflows, companies have implemented web-based data transactions, which have created a great divide between internal data and external data. Companies that are spending thousands on a tool to gather a wide variety of external data sets are now faced needing to spend even more on a solution to combine this data into their internal workflows. As such, executives are demanding easier, quicker and less expensive ways to close this barrier.
Embracing the new predictive analytics process
The diversity of data sets and sheer amount of external data continues to grow with the speed of technology. Global companies certainly recognize its power, but only now are they beginning to find ways to glean real business value from the insights this data can provide.
From implementing more seamless processes for gathering and correlating external data to finding flexible solutions that enable hypothesis testing and analysis of various data sets, companies looking to fully leverage Big Data to solve big problems must embrace this new era of predictive analytics. By incorporating these three trends into business processes, companies can truly see what’s driving their performance and, ultimately, stay ahead of global competition.
Source:
A New Era of Predictive Analytics: 2016 Trends to Watch by Rich Wagner
Excellent Python Tutorials Sources for you to use
1. Code Academy – Interactive, Beginners
There are many interactive tutorials available for Python that let you write code in the browser and see the results live, right there. That is what makes learning fun! Code Academy hosts the best interactive Python tutorials for beginners.
As of today, there are 2.5m students enrolled for this course. The course length is 13 hours, approximately.
It covers Python syntax, strings and console output, conditional and control flow, functions, lists and dictionaries, loops, file input and output and also talks about advanced python options. You get to build small projects as you learn, step by step instructions make coding these projects easy, right there in the browser.
Couple of the example projects that you build while learning Python on Code Academy are – “Tip calculator” and a small board game named “Battleship”.
2. TutorialsPoint.com – Beginners, Online
Want to learn Python from Scratch? Tutorialspoint.com hosts one of the most comprehensive tutorials for learningPython basics and fundamental concepts. Anyone who is totally new to programming can also start learning with Tutorials Point’s Python tutorials. It starts by giving high level overview of Python, talks about environment setup, basic syntax, and variable types, operators, decision making, loops etc. and the depths of the language.
Tutorials Point also talks about advanced concepts like CGI programming, database access, multithreading, XML processing, GUI programming and Networking etc.
3. Codementor.io – Advanced, For Experts, Online
Codementor is a perfect place to find advanced tutorials if you are an expert developer. This is not about step by step Python learning guide but comes with tutorials to complete specific development tasks using Python. This website keeps on adding new tutorials on regular basis and you can keep yourself updated with latest tutorials by signing up to their newsletter.
Given below are few examples of the kinds of Python tutorial you will find on Codementor.io –
- - Building a movie recommendation service using Apache spark and Flask – In Python
- - Sorting Git Authors in less than 10 lines of code, of course, using Python script
- - Data Science with Python & R: Sentiment Classification Using Linear Methods
- - Integrating Node.js & Python to Write Cross-Language Modules using pyExecJs
- - Advanced Uses of Python Decorators
4. PythonChallenge.com – Advanced, Interactive
Love challenges and also want to learn the depths of Python programming language, there is no better resource on the internet than PythonChallenge.com. This one of course is not for the absolute beginners and also not for the kinds who just take notes in the classrooms for overnight cramming/reading without bothering their brains much.
Visit this website, if you want to keep your creative juices flowing. There are 33 levels at the moment and the very first one itself has the potential to get you engaged and addicted to the website.
5. Google’s Python Class – Free eBook, Intermediate
Tutorials from Google developer on Python are well written and cleanly organized. It is all about theory though, without any practical step by step instructions to build projects. Python tutorials from Google developers are Best suited when you want to read through while on the go and/or for those who already have some basic programming background.
This entire Python tutorial set is organized in three sections –
Python Course – This section talks about Python setup, basics of Python like strings, lists, sorting, regular expressions, utilities etc. and feeds you with the fundamentals of the language.
Lecture videos day1, day2 – Not the reading kinds, no problem, you can go through the video lectures divided into day1 and day2 and grasp the fundamentals of the python language.
Python Exercises – This is what makes python learning interesting. There are basic exercises, an exercise around baby names, and one for “copy special” feature and one on the log puzzle. You need to get your head around to understand and get these exercises done.
You can download the Python code used in the exercises and run locally on your machine. You can also ask questions to clarify your doubts in google groups.
6. Python.org – Online, Beginners Python Tutorial
This is the official Python guide and is best suited for those who need a comprehensive tour of the Python language. Official python documentation is a complete reference to the language and always updated with the latest features and release notes.
It is always good to skim through the official guide at least once to ensure that you are not missing on anything basic. The official Python guide covers what is new in Python, Python installation guides, library references, python how-tos, embedding, extending and distributing Python modules.
7. Learn Python the Hard Way – eBook, Beginners
Learn python the hard way is one of the sure shot ways to get on-boarded to Python programming. As per the official website, 1.5 million people read this eBook every year and the book is the most successful beginner programming eBook in the market as of today.
The paper and digital versions of the book come at a cost but you can read the online version of complete book, for free. Arguably the best Python tutorial out there in the wild.
8. LearnPython.org – Interactive Python Programming, Intermediate
This website offers interactive python tutorials to master the syntax of python programming language. The current interpreter runs Python 2 but the tutorial highlights key differences between Python 2 and Python 3 programming.
The tutorial starts off with Hello world, explains variables and types, lists, basic operators, string formatting, basic string operations, loops, functions, classes and objects, dictionaries, modules and packages. You also get exercises at the end of each chapter to get your head working around with the depths of Python web programming.
9. Invent with Python – Free Book, Online
My colleague has been teaching students earlier, learning programming by building games is what keeps students engaged for hours. Invent with Python does the same, each chapter has step by step instructions to build a small game.
As you keep learning, the complexity of games keeps on increasing and learning becomes more fun. The online version of the book is free but you can also download the pdf version of the book at a nominal price.
10. Dive into Python 3 – Beginners, Online
This is one of the good reads for beginners as well as for those who already know Python 2 and want to move to Python 3. Author clearly highlights differences between Python2 and Python 3, wherever applicable, and ensures that readers grasp the concepts by citing relevant examples.
11. Python Crash Course - Intermediate, Online
If you are already a programmer and want to quickly get on-boarded with Python, this is the right place for you. This course is meant for intermediate level programmers and assumes that you already understand object oriented programming.
The target of the course is not to go into depths of programming but to highlight what python brings on the table and how you can code in python if you already know programming.
12. Learning Python Magic Methods – Advanced
This is collection of tutorials/blogs by Refekettler and is intended for advanced level python programmers. Magic methods have anything and everything to do with object oriented programming, but it looks like the official documentation is not good enough. Refekkettler has tried to explain ins and outs of magic methods using good examples. A must read for anyone looking to master the magic methods of Python.
13. Afterhoursprogramming.com – Online, Beginners
Here is another well written tutorial for beginners with code simulator to test the code in the browser. In this tutorial, you not only get to learn fundamentals of python but also learn how you build interactive web applications.
Towards the end, this tutorial hosts Python quiz to test your knowledge. Quiz is pretty useful to check where you stand in terms of Python basics.
14. Python Basic Tutorials – Beginners, Video
Want to learn like they learn in the classroom, video tutorials is the way to go. This one is series of Python video tutorials by theNewBoston. You get end to end coverage of Python by following these video tutorials.
15. Python Fundamentals Training – Beginners, Video
This is more like the previous one but goes a little beyond the basics of coding in Python. It is a full four days training course to let you understand Python fundamentals, watching videos. These Python fundamental training videos are brought to you by NewCircle.com.
16. A Byte of Python - Free Online, PDF
A byte of Python is for absolute novice in the world of computers and programming. It is written by Swaroop and is loved by beginners all over the globe. The language used is simple and the contents are organized neatly. You can read the book online or download the python tutorial pdf copy.
17. Coursera Python Course
Coursera hosts online classes from the top notch universities. It includes Python courses as well provided by Rice University. As of today, there are two part course available spread across many weeks. You can check the schedule and register for the online classes, these are free of cost.
If, however, you need Python certificate after completing the courses, you need to pay some nominal charge for that.
18. Think Python - Free Online, Python Tutorial PDF
This one is another beginner’s book and the author has made online version as well as the pdf version freely available. You can buy the paper copy from Amazon as well. Author’s main intent in this book is to teach computer science fundamentals and Python happens to be the programming language used to do so.
A good read for all the students enrolling for computer science disciplines.
19. Learning Django – Beginners, Video
It is not possible that we talk about Python without uttering a word about Django. Django framework has contributed big time in the recent fame of Python programming language.
If you want to learn Django, there is no better video tutorial available than Getting Started with Django. It has multiple videos covering multiple aspects and best practices of the framework.
20. Python Playgrounds - Coding in Python, Online
Tutorials based on interactive coding playgrounds let you try and learn the language without the hassles of setting up your system for development.
You may not grasp the depths of the language but definitely get to understand the basic concepts. Here are the Python playgrounds to make your learning fun -
Cloudera vs Hortonworks vs MapR: Comparing Hadoop Distributions
Cloudera
vs Hortonworks vs MapR: Comparing Hadoop Distributions
For
all those looking to harness the potential of big data, Hadoop is the platform
of choice. This open source software framework enables processing of huge data
sets by distributing them across commodity servers. Thus, it eliminates
dependency on high-end hardware and makes the entire process economical for
businesses to implement. All of the big data enterprises today use Apache
Hadoop in some way or the other. To simplify working with Hadoop, enterprise
versions like Cloudera, MapR and Hortonworks have sprung up.
In
its original version, Hadoop was designed as a simple write-once storage
infrastructure. But it has evolved through the years to expand beyond mere web
indexing capacity. Based on Google’s MapReduce model, Hadoop is designed to
store and process large amounts and variety of data that may reside in multiple
computer servers.
While
Hadoop’s distributed file system (HDFS) helps break down all incoming data and
store them across multiple nodes, the MapReduce component facilitates the
simultaneous processing of data across multiple nodes.
Hadoop
is by no means an out-of-the-box solution. In order to build a truly
information- driven enterprise, where decisions are based on data and not guess
works, the companies would require a data management solution that not only
offers robust data governance, but also is easily manageable and seamlessly
integrates with existing enterprise infrastructure.
The
flexible, modular architecture of haddoop allows for adding new functionalities
for the accomplishment of diverse Big Data tasks. A number of vendors have
taken advantage of Hadoop’s open-ended framework and tweaked its codes to
change or enhance its functionalities. In the process they have been able to
fix some of the inherent drawbacks of Apache Hadoop. So far as Hadoop
distribution is concerned, the three companies that really stand out in the
completion are: Cloudera, MapR and Hortonworks.
Comparing top three
Hadoop distributions: Cloudera vs Hortonworks vs MapR
Cloudera
has been here for the longest time since the creation of Hadoop. Hortonworks
came later. While Cloudera and Hortonworks are 100 percent open source, most
versions of MapR come with proprietary modules. Each vendor/distribution has
its unique strength and weaknesses, each have certain overlapping features as
well. If you are looking to make the most of Hadoop’s immense data processing
power, it makes sense in making a comparative study in the top three Hadoop
distributions.
Cloudera
Cloudera
Inc. was founded by big data geniuses from Facebook, Google, Oracle and Yahoo
in 2008. It was the first company to develop and distribute Apache Hadoop-based
software and still has the largest user base with most number of clients.
Although the core of the distribution is based on Apache Hadoop, it also provides
a proprietary Cloudera Management Suite to automate the installation process
and provide other services to enhance convenience of users which include
reducing deployment time, displaying real time nodes’ count, etc.
Cloudera Overview
Hortonworks
Hortonworks,
founded in 2011, has quickly emerged as one of the leading vendors of Hadoop.
The distribution provides open source platform based on Apache Hadoop for
analysing, storing and managing big data. Hortonworks is the only commercial
vendor to distribute complete open source Apache Hadoop without additional
proprietary software. Hortonworks’ distribution HDP2.0 can be directly
downloaded from their website free of cost and is easy to install. The
engineers of Hortonworks are behind most of Hadoop’s recent innovations
including Yarn, which is better than MapReduce in the sense that it will enable
inclusion of more data processing frameworks.
Hortonworks Overview
MapR
In
its standard, open source edition, Apache Hadoop software comes with a number
of restrictions. Vendor distributions are aimed at overcoming the issues that
the users typically encounter in the standard editions. Under the free Apache
license, all the three distributions provide the users with the updates on core
Hadoop software. But when it comes to handpicking any one of them, one should
look at the additional value it is providing to the customers in terms of
improving the reliability of the system (detecting and fixing bugs etc),
providing technical assistance and expanding functionalities.
All
three top Hadoop distributions, Cloudera, MapR and Hortonworks offer
consulting, training, and technical assistance. But unlike its two rivals, Hortonworks’ distribution is claimed
to be 100 percent open source. Cloudera incorporates an array of proprietary elements in
its Enterprise 4.0 version, adding layers of administrative and management
capabilities to the core Hadoop software.
Going a step further, MapR replaces HDFS
component and instead uses its own proprietary file system, called MapRFS.
MapRFS helps incorporate enterprise-grade features into Hadoop, enabling more
efficient management of data, reliability and most importantly, ease of use. In
other worlds, it is more production ready than its other two competitors.
Through
a recent partnership with Canonical,
the creator of Ubuntu operating system, MapR is offering Hadoop as a default
component of Ubuntu operating system. Under the terms of the
partnership, MapR’s M3 Edition for Apache Hadoop will be integrated into Ubuntu
operating system.
Upto
its M3 edition, MapR is free,
but the free version lacks some of its proprietary features namely, JobTracker
HA, NameNode HA, NFS-HA, Mirroring, Snapshot and few more.
MapR Overview
Cloudera and
Hortonworks: The Similarities
Cloudera as well as Hortonworks are
both built upon the same core of Apache Hadoop. As such, they have more
similarities than differences.
·
Both
offer enterprise-ready Hadoop distributions. The distributions have stood the
test of time as well as consumers, ensuring security and stability. Besides,
they provide paid training and services to familiarize the newcomers treading
the path of Big Data and Analytics.
·
Both
have established communities that actively participate and help with the
problems faced as well as demonstrations needed.
·
Both distributions have master-slave
architecture.
·
Both have a shared-nothing computing
framework.
·
Both support MapReduce as well as
YARN.
Cloudera vs.
Hortonworks: The Differences
That
being said, the differences are the ones that play a deciding role of choosing
one vendor over the other. Broadly, Cloudera and Hortonworks differ in the
following aspects:
·
Cloudera
has announced that its long
term goal is to become an “enterprise data hub,” thus diminishing the
need of data warehouse. Hortonworks, on the other hand, remains firmly a
provider of Hadoop distro, and has partnered with data warehousing company
Teradata.
·
While
Cloudera CDH can be run on windows server, HDP is available as a native
component on the windows server. A Windows-based Hadoop cluster can be deployed
on Windows Azure through HDInsight Service.
·
Cloudera
has a proprietary management software Cloudera Manager, SQL query handling
interface Impala, as well as Cloudera Search for easy and real-time access of
products. Hortonworks has no proprietary software, uses Ambari for management
and Stinger for handling queries, and Apache Solr for searches of data.
·
Cloudera has a commercial license,
while Hortonworks has open source license. Cloudera also allows the use of its
open- source projects free of cost, but the package doesn’t include the management suite Cloudera
Manager or any other proprietary software.
·
Cloudera
has a free 60-day trial, Hortonworks is completely free.
Cloudera has been the oldest player
in the market, with more than 350 customers. But Hortonworks is fast catching
up and has made more innovations in the Hadoop ecosystem in the recent past.
Cloudera has several enterprise softwares overlaid on its open source distributions
to aid the consumers, whereas Hortonworks strives to provide a framework
comprising only of open source projects.
Top 10 Reasons Customers Choose MapR Hadoop
Top 10 Reasons Customers Choose MapR
Not all Hadoop distributions are created equal. Beyond the marketing claims, there are real differences which impact the bottom line while making IT operations easier. Here are 10 reasons why more customers are choosing MapR.1.
2.
3.
4.
5.
6.
7.
Enterprise-Grade NoSQLMapR-DB, the in-Hadoop NoSQL database that uses the HBase API, was recently recognized as the top-ranked NoSQL key-value database for current offering. Its high throughput, consistent low latency, enterprise-grade features, and Hadoop integration allow you to deploy business-critical, real-time operational analytics applications. As MapR customer Atzmon Hen-tov of Pontis puts it, "MapR-DB requires about half the machines compared to other [NoSQL] platforms. This dramatically reduces the cost of a new system." |
8.
Unbiased Open SourceThe MapR Distribution provides customers with more flexibility and choice for their open source projects. MapR supports multiple execution frameworks such as YARN and Spark, and multiple options for SQL-on-Hadoop technologies, machine learning packages, NoSQL databases. Furthermore, MapR uniquely supports backward compatibility across mulitple versions of projects. |
9.
Read-Write File SystemUnlike HDFS, which follows the write-once-read-many paradigm, the MapR Data Platform delivers a true random read-write capable, POSIX compliant file-system providing unique features such as read-write NFS. These capabilities enable Hadoop as a real-time, enterprise storage and processing platform. |
10.
Enterprise-grade SecurityMapR security includes wire-level encryption for Hadoop, granular authorization with ACLs and boolean access control expressions, and authentication via existing Kerberos infrastructure or a simplified username-password based mechanism. MapR also implements security projects from Apache for additional layers of protection.You can get this from the following URL: https://www.mapr.com/top-ten-reasons |
Subscribe to:
Posts (Atom)