v0.13.0 (January 3, 2014)¶
This is a major release from 0.12.0 and includes a number of API changes, several new features and enhancements along with a large number of bug fixes.
Highlights include:
- support for a new index type
Float64Index, and other Indexing enhancements HDFStorehas a new string based syntax for query specification- support for new methods of interpolation
- updated
timedeltaoperations - a new string manipulation method
extract - Nanosecond support for Offsets
isinfor DataFrames
Several experimental features are added, including:
- new
eval/querymethods for expression evaluation - support for
msgpackserialization - an i/o interface to Google’s
BigQuery
Their are several new or updated docs sections including:
- Comparison with SQL, which should be useful for those familiar with SQL but still learning pandas.
- Comparison with R, idiom translations from R to pandas.
- Enhancing Performance, ways to enhance pandas performance with
eval/query.
Warning
In 0.13.0 Series has internally been refactored to no longer sub-class ndarray
but instead subclass NDFrame, similar to the rest of the pandas containers. This should be
a transparent change with only very limited API implications. See Internal Refactoring
API changes¶
read_excelnow supports an integer in itssheetnameargument giving the index of the sheet to read in (GH4301).Text parser now treats anything that reads like inf (“inf”, “Inf”, “-Inf”, “iNf”, etc.) as infinity. (GH4220, GH4219), affecting
read_table,read_csv, etc.pandasnow is Python 2/3 compatible without the need for 2to3 thanks to @jtratner. As a result, pandas now uses iterators more extensively. This also led to the introduction of substantive parts of the Benjamin Peterson’ssixlibrary into compat. (GH4384, GH4375, GH4372)pandas.util.compatandpandas.util.py3compathave been merged intopandas.compat.pandas.compatnow includes many functions allowing 2/3 compatibility. It contains both list and iterator versions of range, filter, map and zip, plus other necessary elements for Python 3 compatibility.lmap,lzip,lrangeandlfilterall produce lists instead of iterators, for compatibility withnumpy, subscripting andpandasconstructors.(GH4384, GH4375, GH4372)Series.getwith negative indexers now returns the same as[](GH4390)Changes to how
IndexandMultiIndexhandle metadata (levels,labels, andnames) (GH4039):# previously, you would have set levels or labels directly >>> pd.index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]] # now, you use the set_levels or set_labels methods >>> index = pd.index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]]) # similarly, for names, you can rename the object # but setting names is not deprecated >>> index = pd.index.set_names(["bob", "cranberry"]) # and all methods take an inplace kwarg - but return None >>> pd.index.set_names(["bob", "cranberry"], inplace=True)
All division with
NDFrameobjects is now truedivision, regardless of the future import. This means that operating on pandas objects will by default use floating point division, and return a floating point dtype. You can use//andfloordivto do integer division.Integer division
In [3]: arr = np.array([1, 2, 3, 4]) In [4]: arr2 = np.array([5, 3, 2, 1]) In [5]: arr / arr2 Out[5]: array([0, 0, 1, 4]) In [6]: pd.Series(arr) // pd.Series(arr2) Out[6]: 0 0 1 0 2 1 3 4 dtype: int64
True Division
In [7]: pd.Series(arr) / pd.Series(arr2) # no future import required Out[7]: 0 0.200000 1 0.666667 2 1.500000 3 4.000000 dtype: float64
Infer and downcast dtype if
downcast='infer'is passed tofillna/ffill/bfill(GH4604)__nonzero__for all NDFrame objects, will now raise aValueError, this reverts back to (GH1073, GH4633) behavior. See gotchas for a more detailed discussion.This prevents doing boolean comparison on entire pandas objects, which is inherently ambiguous. These all will raise a
ValueError.>>> df = pd.DataFrame({'A': np.random.randn(10), ... 'B': np.random.randn(10), ... 'C': pd.date_range('20130101', periods=10) ... }) ... >>> if df: ... pass ... Traceback (most recent call last): ... ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). >>> df1 = df >>> df2 = df >>> df1 and df2 Traceback (most recent call last): ... ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). >>> d = [1, 2, 3] >>> s1 = pd.Series(d) >>> s2 = pd.Series(d) >>> s1 and s2 Traceback (most recent call last): ... ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Added the
.bool()method toNDFrameobjects to facilitate evaluating of single-element boolean Series:In [1]: pd.Series([True]).bool() Out[1]: True In [2]: pd.Series([False]).bool()