Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

Name: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Rating: 4.6 (1581 reviews)

by: Wes McKinney (0)

Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupiter in the process.

Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.

Use the IPython shell and Jupiter notebook for exploratory computing
Learn basic and advanced features in NumPy (Numerical Python)
Get started with data analysis tools in the pandas library
Use flexible tools to load, clean, transform, merge, and reshape data
Create informative visualizations with matplotlib
Apply the pandas group by facility to slice, dice, and summarize datasets
Analyze and manipulate regular and irregular time series data
Learn how to solve real-world data analysis problems with thorough, detailed examples.

The Quotes

An important first distinction from Python’s built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array.

When you pass objects as arguments to a function, new local variables are created referencing the original objects without any copying.

The Reviews

Wes is the creator of Pandas but he is not an effective writer. This has left a bad taste of pandas in my mind. A lot of examples created in this book are using random numbers and this is a poor way of teaching someone as it's too abstract. Random number generated examples rarely have anything to do with data encountered in real life.This book's problem is the classic curse of knowledge. The author does not know what it's like to get started with pandas and what are the difficulties users will have.

This book falls somewhere between a manual page providing one example per function and a cookbook, tending more toward the former. Examples are dry and most are constructed using random data. There is very little in the way of practical use cases. I bought the book hoping to get some inspiration for using numpy and/or pandas for some types of analyses I find myself doing, but that didn't happen. Probably I've gathered enough overview that I now can put together useful queries that will provide useful hits on Stack Exchange. I wish I had better to say.

As others have said, this book provides a good manual. If you have a project in mind and some programming background, you can adapt the examples in the book to complete the task. That said, a lot of the book reads more as documentation than instruction, and the documentation is more sparse than the official pandas documentation. Furthermore, some of the examples are rather opaque in understanding the main point, and the use of random number generators for example data manipulation sometimes makes it difficult to understand what a specific block of code is doing.Overall, this book provides a jumping off point in understanding the capabilities of pandas as well as its strengths, but it wasn't terribly useful in even basic data science workflow and concepts. For that, I highly recommend something like Hadley Wickham's "R for Data Science," which is much more approachable and rewarding in its use of example datasets, its more personable writing style, and its outlining of good practices for data science.

This book has been my foundation of using python as a data analyst.This book primarily focuses on the pandas Python library, which is awesome at processing and organizing data (Python pandas is like MS Excel times 100. This is not an exaggeration). It also introduces the reader into numpy (lower level number crunching and arrays), matplotlib (data visualizations), scikitlearn (machine learning), and other useful data science libraries. The book contains other book recommendations for continuing education.Although this would be a challenging book for a brand new Python user, I would still recommend it, especially if you are currently doing a lot of work in MS Excel and/ or exporting data from databases. I had a few false starts learning Python, and my biggest stumbling block was lack of application in what I was learning. This book puts practical tools in the reader's hands very quickly. I personally don't have time to make goofy games etc. that other books have used as practice examples. Despite other reviews criticizing the use of random data throughout the book, I found the examples easy to follow and useful. I would also argue that learning how to generate random data is useful in itself (thus the purpose of the numpy random library), and that there are practical examples throughout the book. Chapter 14 devoted to real-world data analysis examples.I am almost finished with my second time through the book, this time working through every example. This book has been well worth the hours spent in it. For context, I previously relied on Excel, SQL, and some AutoHotKey. This book has significantly improved how I work.Thanks, Wes and team.

This is the Python book for the data scientist: already knows Python or at least OOP programming, but wants to be able to utilize the native and NumPy structures for writing machine learning algorithms. Slicing, broadcasting, tuples, pandas data frames -- all useful for applying Python's tools to data science. This is not a beginner book, but it's exactly what I needed to learn the details for translating equations to code.

First of all, if you've never used python before find an intro to python. This book is more for people who are familiar to intermediate python programmers. Meaning this is not for the complete beginner.The book mainly deals with introducing you to Numpy and Pandas libraries used for data analysis, such cleaning, manipulating wrangling, processing and visualisation.Its a great book to have as a reference and learning data analysis techniques. There are plenty of code examples. So worth the purchase.Only negative I wish there were mini projects to learn from.

This book covers all of the basics that you would want to know to get started in programming in Python for data analysis, as the title implies, but it doesn't really offer compelling real-world examples. The data seem to be made up and the analyses don't go into enough detail to help you really learn how pandas and numpy work. Overall this is a decent starter book but you will have to bookmark the python and pandas documentation online if you want to have a reference to all of the functionality those tools have, and there are many places online where you can get better examples to learn from. If you haven't made your mind up about which tool to use for data analysis, I highly recommend checking out dplyr in R, which has an excellent free book online (R for data science, hadley wickham). I find it very easy to learn and it is much easier to set up R and RStudio than it is to set up Python, even though I love Python and Pandas.

Concur with other reviews. Give it a 1 - very poor for beginner. Requires substantial experience with Python programming, and its still changing development platform, to fully take advantage of the textbook and to understand the rather contorted examples. Better yet order, Python Data Analytics: With Pandas, NumPy, and Matplotlib by Fabio Nelli . Excellent introduction to Python with a clear learning experience of Data Analytics. Mr McKineys book will make more sense after reading Fabio's book.

Great book, though a bit dry and slow. My only complaint is that I would have preferred more examples, but I don't think that was the aim of this book, and I think it accomplished it's goal of being a good general resource for people beginning their career/learning with Python and data manipulation. Not comprehensive by any means, but thorough enough for any beginner/intermediate python programmer.

I am not a programmer, but have been trying to learn python for data analysis for a while. This book does a great job of explaining some basics that other books/programs tend to skip over. Also seems like python is even easier to work with now than it was just a few years ago. If you have tried to pick up these skills without success before, this book might be a good way to re-enter.

This is one of the best Python books I've used. All of the examples worked which is rare in programming books. Some of them had slightly different output, but I believe that may be due to time lapsed since the book was published and my working through it in Mac OS. Highly recommend, but be sure you have some previous Python experience.

This book gave me my first job. And I am still learning it. It is simple, talks some general idea why functions design like this, and introduces some practical functions. Because in real life real job you always need to look up documentation or to google certain functions, I think the idea why Wes makes functions/variables like this, and what he wants to develop in the future is very important. anyway, I think this book is for data analysis beginner and some intermediate users. I learned Python first so I recommend beginners who want to use Python for Data Analyst/Scientist to learn Python Programming first/simultaneously. At least understand lambda and python expressions, otherwise, you can't feel the full magic.

I got this book when I was transitioning to doing data science with Python and was struggling to become familiar with standard tools. It's written by the creator of Pandas, and follows the style of the Pandas documentation: dense, telegraphic, peppered with examples.It's hard work because Wes McKinney often does not articulate why you would need to do something (assuming you are already knowledgeable on the underlying process), and writes like an impatient person who would rather be doing something else. Additionally examples often suffer from being both too long and too short - too long in that almost every example is on a toy dataset created from scratch, too short in that most of those datasets have only 5 or 10 elements and do not always showcase complex operations. Other examples (particularly involving time series) have an overabundance of data that make the critical results hard to spot. Frankly, my first month with Pandas was a miserable one.But I give the book 5 stars both because I came to love Pandas as I got more familiar with it, and because while McKinney is not fun to read, he does pack the book with useful information and it is (mostly) well organized. If anything it would benefit from being longer and with a more patient treatment of larger and more concrete datasets (eg the Titanic passenger dataset used in the Pandas documentation). The initial chapter on the basics of using Python could go - if you need this book, then you don't want to be trying to learn the rudiments of Python from it. If you can accept that you'll need a lot of bookmarks or margin notes to get through a rather steep learning curve, it will reward your persistence.