#215 A Visual Introduction to NumPy

Python Bytes - A podcast by Michael Kennedy and Brian Okken - Mondays

Categories:

Sponsored by us! Support our work through: Our courses at Talk Python Training pytest book Patreon Supporters Special guest: Jason McDonald Michael #1: 5 ways I use code as an astrophysicist Video by Dr. Becky (i.e. Dr Becky Smethurst, an astrophysicist at the University of Oxford) She has a great YouTube channel to check out. #1: Image Processing (of galaxies from telescopes) Noise removal #2: Data analysis Image features (brightness, etc) One example: 600k “rows” of galaxy properties #3: Model fitting e.g. linear fit (visually as well through jupyter) e.g. Galaxies and their black holes grow in mass together Color of galaxies & relative star formation #4: Data visualization #5: Simulations Beautiful example of galaxies colliding Star meets black hole Brian #2: A Visual Intro to NumPy and Data Representation Jay Alammar I’ve started using numpy more frequently in my own work. Problem: I think of np.array like a Python list. But that’s not right. This visualization guide helped me think of them differently. Covers: arrays creating arrays (I didn’t know about np.ones(), np.zeros(), or np.random.random(), so cool) array arithmetic indexing and slicing aggregation with min, max, sum, mean, prod, etc. matrices : 2D arrays matrix arithmetic dot product (with visuals, it takes seconds to understand) matrix indexing and slicing matrix aggregation (both all entries and column or row with axis parameter) transposing and reshaping ndarray: n-dimensional arrays transforming mathematical formulas to numpy syntax data representation All with excellent drawings to help visualize the concept. Jason #3: Qt 6 release (including PySide2) Qt 6.0 released on December 8: https://www.qt.io/blog/qt-6.0-released 3D Graphics abstraction layer called RHI (Rendering Hardware Interface), eliminating hard dependency on OpenGL, and adding support for DirectX, Vulkan, and Metal. Uses native 3D graphics on each device by default. Property bindings: https://www.qt.io/blog/property-bindings-in-qt-6 A bunch of refactoring to improve performance. QtQuick styling CAUTION: Many Qt 5 add-ons not yet supported!! They plan to support by 6.2 (end of September 2021). Pay attention to your 5.15 deprecation warnings; those things have now been removed in 6.0. PySide6/Shiboken6 released December 10: https://www.qt.io/blog/qt-for-python-6-released New minimum version is Python 3.6, supported up to 3.9. Uses properties instead of (icky) getters/setters now. (Combine with snake_case support from 5.15.2) from __feature__ import snake_case, true_property PyQt6 also just released, if you prefer Riverbank’s flavor. (I prefer official.) Michael #4: Is your GC hyper active? Tame it! Let’s think about gc.get_threshold(). Returns (700, 10, 10) by default. That’s read roughly as: For every net 700 allocations of a collection type, a gen 0 collection runs For every gen 0 collection run, 1/10 times it’ll be upgraded to gen 1. For every gen 1 collection run, 1/10 times it’ll be upgraded to gen 2. Aka for every 100 gen 0 it’s upgraded to gen 2. Now consider this: query = PageView.objects(created__gte=yesterday).all() data = list(query) # len(data) = 1,500 That’s multiple GC runs. We’ve allocated at least 1,500 custom objects. Yet never ever will any be garbage. But we can adjust this. Observe with gc.set_debug(gc.DEBUG_STATS) and consider this ONCE at startup: # Clean up what might be garbage gc.collect(2) # Exclude current items from future GC. gc.freeze() allocs, gen1, gen2 = gc.get_threshold() allocs = 50_000 # Start the GC sequence every 10K not 700 class allocations. gc.set_threshold(allocs, gen1, gen2) print(f"GC threshold set to: {allocs:,}, {gen1}, {gen2}.") May be better, may be worse. But our pytest integration tests over at Talk Python Training run 10-12% faster and are a decent stand in for overall speed perf. Our sitemap was doing 77 GCs for a single page view (77!), now it’s 1-2. Brian #5: Top 10 Python libraries of 2020 tryolabs criteria They were launched or popularized in 2020. They are well maintained and have been since their launch date. They are outright cool, and you should check them out. General interest: Typer : FastAPI for CLI applications I can’t believe first commit was right before 2020. Just about a year after the introduction of FastAPI, if you can believe it. Sebastián Ramírez is on 🔥 Rich : rich text and beautiful formatting in the terminal. Will McGugan yep. showed up in 2020, amazing. Dear PyGui : Python port of the popular Dear ImGui C++ project. PrettyErrors : transforms stack traces into color coded, well spaced, easier to read stack traces. Diagrams : lets you draw the cloud system architecture without any design tools, directly in Python code. Machine Learning: Hydra and OmegaConf PyTorch Lightning Hummingbird HiPlot : plotting high dimensional data Also general Scalene : CPU and memory profiler for Python scripts capable of correctly handling multi-threaded code and distinguishing between time spent running Python vs. native code, without having to modify your code to use it. Jason #6: Adoption of pyproject.toml — why is this so darned controversial? The goal of this file is to have a single standard place for all Python tool configurations. It was introduced in PEP 518, but the community seems divided over standardizing. A bunch of tools are lagging behind others in implementing. Tracked in this repo A few of the bigger “sticking points”: setuptools is working on it: https://github.com/pypa/setuptools/issues/1688 MyPy: GVR says it “doesn’t solve anything” and closed the PR. https://github.com/python/mypy/issues/5205 Flake8 objections: https://gitlab.com/pycqa/flake8/-/issues/428#note_251982786 Lack of standard TOML parser. “pip to change its behavior so mere presence of the file does not change functionality” Flake9 already implemented what Flake8 wouldn’t. Is this political? Bandit is sitting on a PR since 2019: https://github.com/PyCQA/bandit/issues/550 ReadTheDocs: It’s too much work? — https://github.com/readthedocs/readthedocs.org/issues/7065 PyOxidizer (shockingly), silent on the topic since 2019: https://github.com/indygreg/PyOxidizer/issues/93 Extras: Michael: PyXLL for Excel people, including Python Jupyter Notebooks in Excel. Django 3.1.5 Released Python 3.10.0a4 Is Now Available for Testing SciPy 1.6.0 Released M1 + PyCharm fast? Example. Flying solo with the M1 too - apparently 75% is shutdown time for my MBP! Joke “Why did the programmer always refuse to check his code into the repository? He was afraid to commit.”