{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "# Introduction to Python \n", "\n", "This material was prepared at the [Luthey-Schulten Group](http://www.scs.illinois.edu/schulten/), University of Illinois at Urbana-Champaign, for the [_\"Hands-On\" Workshop on Cell Scale simulations_](http://www.ks.uiuc.edu/Training/Workshop/Urbana2018e/).\n", "\n", "This introduction will cover main aspects of Python, some of it's main libraries, and Jupyter Notebooks. We will focus on concepts and techniques that will be used throughout the _\"Hands-On\" Workshop on Cell Scale simulations_. We will begin from the basics of the language, assuming you have never seen it before, but also assuming you have had *some* programing experience.\n", "\n", "Overview: \n", "\n", "- [Python language](#intro_python)\n", " - Interpreted Modern Day Stuff\n", " - Installation (system package/Anaconda)\n", " - Python 3 vs 2\n", "- [Jupyter Notebook (What is this thing I am looking at?)](#intro_notebook)\n", " - [Shortcuts!](#intro_shortcuts)\n", "- [Variables and Collections](#intro_vac)\n", " - [Native Types and Dynamic Typing](#intro_ntdt)\n", " - [Everything is a class](#intro_eiac)\n", " - [Numbers](#intro_numbers)\n", " - [Strings/bytes](#intro_strings)\n", " - Basic methods\n", " - [Lists](#intro_lists)\n", " - Indexing, slicing, negative indices\n", " - Basic methods\n", " - [Tuples](#intro_tuples)\n", " - [Sets](#intro_sets)\n", " - Basic methods\n", " - [Dictionaries](#intro_dictionaries)\n", " - Basic methods\n", " - [is vs. equals (or, value vs. reference)](#intro_ive)\n", " - [None](#intro_none)\n", "- [Control Flow](#intro_controlflow)\n", " - [Indentation and Scope](#intro_ias)\n", " - [If/Else](#intro_ifelse)\n", " - [For/While Loops](#intro_fwl)\n", " - Continue/Break/Else\n", " - [Try/Except](#intro_tryexcept)\n", "- [Functions](#intro_functions)\n", " - [Def/Lambda](#intro_deflambda)\n", " - [Arguments (and default arguments )](#intro_arguments)\n", " - \\*args and \\*\\*kwargs\n", " - [Comments and Doc-Strings](#intro_cads)\n", " - [Scope](#intro_scope)\n", "- [Classes](#intro_classes)\n", " - [Encapsulation](#intro_encapsulation)\n", " - [Inheritance](#intro_inheritance)\n", " - [Polymorphism](#intro_polymorphism)\n", "- [Iterators and Generators](#intro_iag)\n", " - [Comprehensions/range/map](#intro_crm)\n", "- [Modules](#intro_modules)\n", " - [Import syntax](#intro_importsyntax)\n", " - [Install new modules](#intro_inm)\n", "- [File IO](#intro_fio)\n", " - [open() and *with*](#intro_oaw)\n", " - [csv/Pickle/Json](#intro_cpj)\n", "- [Virtual Environments](#intro_ve)\n", " - [python -m venv](#intro_pve)\n", " - [conda new](#intro_condanew)\n", "- [Scientific Modules](#intro_scientificmodules)\n", " - [Numpy/Scipy and Matplotlib](#intro_nsm)\n", " - [Pandas](#intro_pandas)\n", " - Wide vs Long data formats: melting and casting\n", " - [Plotnine](#intro_plotnine)\n", " - [Cython and Numba](#intro_can)\n", " - [Mpi4Py](#intro_mpi4py)\n", "- [Jupyter](#intro_jupyter)\n", " - [ipywidgets](#intro_ipywidgets)\n", "- [Integrations](#intro_integrations)\n", " - [Magics](#intro_magics)\n", " - [pybind11](#intro_pybind11)\n", "- [Sources and Aknowledgements](#intro_saa)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "\n", "########\n", "\n", "# The Python Language \n", "\n", "[Python][1] was created almost 30 years go as an interpreted, high-level, object-oriented programing language. It allows for different interactive environments (such as the notebook you are looking at), and lets one develop, test and distribute code much faster than traditional compiled languages like C/C++ and Fortran.\n", "\n", "[1]: https://www.python.org/\n", "\n", "## Installation\n", "\n", "- In **windows**, use [Anaconda][2].\n", "\n", "- In **MacOS** (which is almost Windows), also use [Anaconda][3].\n", "\n", "- In **Linux**, Python should come pre-installed (if not, use your package manager to add it), but you will need to install many interesting packages. You can do that using the `pip` tool, as in\n", "\n", " `pip install scipy`\n", "\n", "We will talk about creating individual environments (which would look like different python installations), so that one can organize packages, keep different versions of the same package, or keep conflicting packages in the same computer.\n", "\n", "[2]: https://conda.io/docs/user-guide/install/windows.html\n", "[3]: https://conda.io/docs/user-guide/install/macos.html\n", "\n", "## Python 2 vs 3\n", "\n", "Python 3 is better! Also, [\"The End Of Life date (EOL, sunset date) for Python 2.7 has been moved five years into the future, to 2020.\"](https://www.python.org/dev/peps/pep-0373/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Jupyter Notebook (What is this thing I am looking at?) \n", "\n", "A notebook is a special type of interactive interface that allows us to combine text, image and video, with code blocks that are executed on demand, and can create interactive interfaces, as we will see later in this introduction and along the tutorials in this workshop.\n", "\n", "The image below describes some of the toolbar functionalities, but you will find some useful shortcuts below.\n", "\n", "Image of Jupyter interface with descriptive labels\n", "\n", "
[(credits)](https://github.com/michhar/python-jupyter-notebooks)
\n", "\n", "- **shift + enter**\n", "\n", "The next \"cell\" contains python code, and we can execute that code clicking on the **Run cell** button (on the toolbar above) or by typing **shift + return** or **shift + enter**." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "1 + 2 * 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **control + enter**\n", "\n", "You will note that the code was executed, the answer was shown below the executed cell, and the following cell (this text block) was selected automatically. You can use the arrow keys to select the next code block, and then use **control + enter** to execute the code block *without* automatically selecting the next cell (particularly useful when writing and testing new code in a cell)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Hello World!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The previous cell had a `print` statement. A jupyter notebook cell will always print the last variable or results it creates, like the first cell you ran with a mathematical statement. However, if you want to print several results from the same cell, or simply format and control the output, you can use the `print` statement. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **alt + enter**\n", "\n", "The last (of the most useful) shortcut is the **alt + enter** (or **option + enter** in Macs). This will execute the cell and automatically *create a new cell* bellow the one executed. \n", "\n", "Once created, a cell can be **deleted** by typing **d,d** (typing the \"d\" key twice), and **recovered** by typing **z** (typing the \"z\" key once).\n", "\n", "Make sure you are not **editing** the text or code inside the cell! You can easily tell if you are in edit mode by checking if there is a little **pencil** on the upper right corner of the notebook, next to the \"Python 3\".\n", "You can exit the edit mode by typing **esc** or by clicking outside the cell. Once you exit the edit mode, the blinking cursor will disappear and the color of the vertical bar to the left of the cell will change from **green** to **blue**." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"alt + enter\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All cells share the same python interpreter, meaning variables you create in one cell will be available in the next cell(s). Also, changes made in any following cell will affect the current cell. Try executing the next three cells (notice they have *comments* in lines that begin with a \"hash\" `#`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Cell 1\n", "var = 123" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Cell 2\n", "print(var)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Cell 3\n", "var = 456" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now go back two cells (to the cell with the `print(var)` statement), and execute it again. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Shortcuts! \n", "
[(top)](#goto_top)
\n", "\n", "* A complete list can be found under help, but these are some of the more commonly used shortcuts. There is a *command* mode and *edit* mode much like the unix editor `vi/vim`. `Esc` will take you into command mode. `Enter` (when a cell is highlighted) will take you into edit mode.\n", "\n", "Mode | What | Shortcut\n", "------------- | ------------- | -------------\n", "Command (Press `Esc` to enter) | Run cell | Shift-Enter\n", "Command | Add cell below | B\n", "Command | Add cell above | A\n", "Command | Delete a cell | d-d\n", "Command | Undo delete cell | z\n", "Command | Go into edit mode | Enter\n", "Command | Exit edit mode | Esc\n", "Edit (Press `Enter` to enable) | Run cell | Shift-Enter\n", "Edit | Indent | Clrl-] (**or** Tab)\n", "Edit | Unindent | Ctrl-[ (**or** Shift-Tab)\n", "Edit | Toggle comment section | Ctrl-/\n", "Edit | Function introspection | Shift-Tab*\n", "Both | Run cell and select next | Shift-Enter\n", "Both | Run cell and keep selected cell | Ctrl-Enter\n", "Both | Run cell and *add* cell below | Alt-Enter\n", "\n", "\\* For function introspection instead of indent, the edit cursor inside function call\n", "\n", "**You can also left-double-click with the mouse to \"Enter\" a markdown cell for modifying text**\n", "\n", "(this shortcut table was adapted from [Azure Notebooks - Welcome][1])\n", "[1]:https://notebooks.azure.com/Microsoft/libraries/samples/html/Azure%20Notebooks%20-%20Welcome.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Variables and Collections \n", "\n", "You will have noticed from the previous cells and examples that the python interpreter can serve as a simple calculator when used with simple numbers. They do not need to be defined as variables, and will automatically match the necessary precision level, assuming roles that other languages would have assigned to an integer, long or a float.\n", "\n", "Python variables can be of many other types, however, from numbers to strings to *containers* like lists, sets and dictionaries, among others, not to mention user-defined types. When created, a variable will be given its space in memory automatically, and when it goes out of scope (more on that later) and is not needed anymore, the interpreter automatically deletes it and frees the memory.\n", "\n", "## Native Types and Dynamic Typing \n", "
[(top)](#goto_top)
\n", "\n", "As we mentioned before, variables are created as they are used, and both their *values* and *types* can be changed any time, anywhere." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Defining the variable \"numb\" that will hold a real number\n", "numb = 123.45\n", "numb" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The same variable can be re-used to store a new value, but of a different type:\n", "numb = \"one hundred and twenty three\"\n", "numb" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can find out if a variable belong to a given type using the \"isinstance\" method.\n", "numb = 2\n", "\n", "print(isinstance(numb, int)) # Is it an integer?\n", "print(isinstance(numb, float)) # Is it a floating point number?\n", "print(isinstance(numb, str)) # Is it a string?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "numb = \"2\"\n", "\n", "print(isinstance(numb, int))\n", "print(isinstance(numb, float))\n", "print(isinstance(numb, str))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Or simply asking for its \"type\"\n", "\n", "print(type(numb))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that we used several `print` calls to output the results of the function `isinstance`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Everything is a Class \n", "
[(top)](#goto_top)
\n", "\n", "Everything is a class, but what is a class?\n", "\n", "Python is built around the notion of Object-Oriented Programming (OOP). We will focus on properties of classes and creation of user-defined classes later on, but for now, we will go over python's native types, which are all defined as classes. What that means is, unlike traditional compiled languages like C/C++ or Fortran, a number is not a single region in memory that holds integer or floating point information, it is a combination of properties and functions. A string is not a simple vector of characters, it is a dynamic list of elements that grows and shrinks as needed, and has several convenience functions available to it at the moment of its creation.\n", "\n", "That is why, in the examples above, we verified if a variable was an `int` with the function `isinstance`. That function checks if the variable is an instance of the class `int`.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numbers \n", "
[(top)](#goto_top)
\n", "\n", "The basic numerical types are `int`, `float` and `complex`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "num_i = 42\n", "num_f = 3.14\n", "num_c = 1 + 2j" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print( type(num_i))\n", "print( type(num_f))\n", "print( type(num_c))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The python interpreter will accept all classic opperators, and parenthesis for grouping.\n", "40 - num_i * 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(40 - num_i) * 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Division will always return a float, but integer division can be requested using `\\\\`.\n", "\n", "print(num_f / 2)\n", "print(num_f // 2) # Integer division (or floor division)\n", "print(num_f % 2) # Remainder of the division" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Conversions can be achieved by calling the class name.\n", "\n", "x = int(42) \n", "y = int(num_f) # Will keep only the integer part of the float.\n", "z = int(\"42\")\n", "\n", "print(x)\n", "print(y)\n", "print(z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Strings \n", "
[(top)](#goto_top)
\n", "\n", "Python is particularly powerful when it comes to string operations. The string class provides a comprehensive set of operations, making it very easy to create, edit, parse and format strings." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'We can create a string using single quotes...'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"... or double quotes.\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Calling the `str` method would be redundant.\n", "s = str(\"This is also a string\")\n", "s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You may need to escape the quote, if you want it in the output:\n", "print('\"Isn\\'t,\" she said.')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The backslash can also be used to create special characters:\n", "\n", "s = 'First line.\\nSecond line.' # \\n means newline\n", "s # without print(), \\n is included in the output" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(s) # with print(), \\n produces a new line" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If you don't want any processing of special characters to be done, use a RAW string:\n", "\n", "print(r'C:\\some\\name') # note the r before the quote" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# For multiple lines, use triple quotes:\n", "\n", "print(\"\"\"\\\n", "Usage: thingy [OPTIONS]\n", " -h Display this usage message\n", " -H hostname Hostname to connect to\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Strings can be added and multiplied\n", "\"one\" * 3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# String literals (enclosed in quotes) will be automatically concatenated\n", "\"one\" \"two\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var1 = \"one\"\n", "var2 = \"two\"\n", "var1 + var2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var1 + var2 * 3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var1 + \" - \" + var2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "They can be **accessed** with indices but not **modified**, they are *immutable*. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var = \"variables\"\n", "\n", "print(var[0:3])\n", "print(var[3:6])\n", "print(var[6:9])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Open ends\n", "print(var[1:])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# WARNING\n", "#var[6:9] = \"var\" # This will NOT work!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# But this will:\n", "\"V\" + var[1:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Indexing from the end of the string is also possible:\n", "print(var[-1])\n", "print(var[-3:])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# And the *length* of the string is easy to find:\n", "len(var)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basic Methods\n", "\n", "There are many methods implemented in the string class, but we will only highlight a couple here which may prove particularly useful." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Spliting using an arbitrary character returns a list of strings\n", "bla = \"123 ia a number, 456 is another number.\"\n", "\n", "print(bla.split(','))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Substitutions are easy\n", "bla.replace(\"number\",\"cow\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# It is easy to investigate a string:\n", "print( bla.find(\"number\") ) # Tells you if the substring appears in the main string (returns its index)\n", "print( bla.count(\"number\") ) # Counts the number of times the substring appears in the main string \n", "print( bla.index(\"number\") ) # Returns the index of the substring, or an ERROR if the string is not there." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Note the difference between this:\n", "print( bla.find(\"cow\") ) # Returns -1 because there are no \"cow\"s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# And this:\n", "#print( bla.index(\"cow\") ) # Errors out!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Starts (or ends) with?\n", "bla = \"Variable\"\n", "bla.startswith(\"Var\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can check if a string contains ONLY digits, or also contains letters and other characters\n", "bla = \"this is a number 123\"\n", "bla.isdigit()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# However, if the string is only composed of digits\n", "bla = \"123\"\n", "bla.isdigit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that string methods can also be called from a string definition, without explicitly creating a variable. The python interpreter understands the use of quotes as the creation of an instance of the `string` class, and gives it all the power of the class." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"bla\".upper()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\" -bla- \".strip())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "String methods can be called in sequence." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bla = \"Variable\" # Notice the uppercase \"V\"\n", "bla.lower().startswith(\"var\") # First transforms all letters in lower case, then looks for the substring." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lists \n", "
[(top)](#goto_top)
\n", "\n", "There are four main containers in Pyhton:\n", "\n", "- **List** is a collection which is ordered and changeable. Allows duplicate members.\n", "- **Tuple** is a collection which is ordered and unchangeable. Allows duplicate members.\n", "- **Set** is a collection which is unordered, changeable and unindexed. No duplicate members.\n", "- **Dictionary** is a collection which is unordered, changeable and indexed. No duplicate members.\n", "\n", "\n", "Lists are one of the most powerful and flexible tools in python. They can be easily created and modified, and the same list can contain members of different types." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The pythonic way of creating a list:\n", "bla = [1,2,3,4,5]\n", "bla" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Appending is easy\n", "bla = [1,2,3] + [4,5]\n", "bla" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Or a more \"classical\" way of creating a list:\n", "bla = list((1,2,3,4,5)) # Notice the double parenthesis\n", "bla" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# And this is an empty list\n", "bla = []\n", "bla" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# And a mixed type one:\n", "bla = [1, \"two\", 3.14]\n", "bla" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexing, slicing, negative indices" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Like the string (or is the string like a list?)\n", "print(bla[1]) # This will return the content of the index 1 (indices start at ZERO)\n", "print(bla[1:]) # This will return a new string (a slice of the original string)\n", "print(bla[:1])\n", "print(bla[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basic methods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bla = []\n", "\n", "bla.append(1)\n", "bla.append(\"two\")\n", "bla.append(3.14)\n", "\n", "bla" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# These may be familiar\n", "print(bla.count(1))\n", "print(bla.index(3.14))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# pop will extract the last item of the list and return it:\n", "print( bla.pop() )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# now the list is one item shorter!\n", "print(bla)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bla = [4,2,6,1,3,7,5]\n", "bla.sort() # \"sort\" does not return anything, it orders the list *in-place*, changing the original variable. \n", "bla" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Combining a string method (join) with a list of strings!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "','.join( [\"A\",\"B\",\"C\"] ) # The method expects a list of strings, \n", " # and uses the main string to join the items in the list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tuples \n", "
[(top)](#goto_top)
\n", "\n", "Tuples are immutables, they can be built much like lists, with mixed types, but can never be changed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bla = (1,\"two\",3.14)\n", "bla" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This will NOT work!\n", "#bla[2] = 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Tuple can be nested (lists can too, by the way)\n", "bla = tuple((\"a\", \"b\", \"c\"))\n", "ble = (bla, 123)\n", "ble" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Tuples can also be easily created with commas \n", "# (this is helpful for returning funciton calls, which we will see latter on)\n", "ble = bla, 123\n", "ble" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sets \n", "
[(top)](#goto_top)
\n", "\n", "Sets are containers where no repeteated items are allowed. They are not ordered like lists and tuples, so cannot be accessed with an index, but unlike tuples they are mutable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We use curly braces to create a set.\n", "bla = {1,\"two\", 3.14}\n", "bla" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An empty set needs to be created with the `set()` function. Using \"{}\" will create an empty **dictionary**!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bla = set()\n", "bla.add(1)\n", "bla.add(\"two\")\n", "bla" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# And deleting items is easy\n", "bla.remove(1)\n", "bla" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basic methods\n", "\n", "Sets accept common mathematical set opperations:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Defining sets from strings will automatically use each letter as an individual item.\n", "a = set('abracadabra')\n", "b = set('alacazam')\n", "print(a) # unique letters in a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(a - b) # letters in a but not in b" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Union (OR)\n", "print(a | b) # letters in either a or b" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Intersection (AND)\n", "print(a & b) # letters in both a and b" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Exclusive OR (XOR)\n", "print(a ^ b) # letters in a or b but not both" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dictionaries \n", "
[(top)](#goto_top)
\n", "\n", "Dictionaries are containers that link *keys* and *values* (AKA, a mapping type). Unlike lists and tuples, dictionaries are not ordered, they create a [hashing table][1] using the *keys* to quickly access their respective *values*.\n", "\n", "[1]:https://en.wikipedia.org/wiki/Hash_table" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This is an empty dictionary\n", "bla = {}\n", "# or this\n", "# bla = dict()\n", "print(bla)\n", "\n", "# This adds items to the dictionary\n", "bla[\"a\"] = 123\n", "bla[\"b\"] = 456\n", "print(bla)\n", "\n", "# This accesses items to a dictionary\n", "print(bla[\"a\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We just used strings as *key* to index a dictionary, and then used it to access a numeric *value*, but *keys* and *values* can be of any type, and even of mixed types in a same dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bla = dict()\n", "\n", "# Number as key and string as value\n", "bla[1] = \"a\"\n", "# Vice versa\n", "bla[\"label_B\"] = 234\n", "# Touples as both key and value\n", "bla[ (5,6,7) ] = (8,9)\n", "\n", "print(bla)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dictionaries can be indexed by many native types and by user-defined classes as well, as long as those are [*hashable*][1]. We don't need to go in this deep right now, but know that dictionaries, sets and frozensets require elements to be hashable. A list, for example is not, so it cannot be used as key.\n", "\n", "[1]: https://docs.python.org/3/glossary.html#term-hashable" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Indexing with a list a bad idea... will give you an ERROR!\n", "#bla[ [1,2,3] ] = 123" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can also create dictionaries explicitely:\n", "bla = { \"a\":1, \"b\" : 2 }\n", "bla" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Or use the `dict` constructor to build a dictionary from a list of key-value pairs.\n", "# In this case, we use a list of touples, but other combinations can be used.\n", "bla = dict( [(\"a\",1),(\"b\",2),(\"c\",3)] )\n", "bla" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basic methods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Removing items is easy\n", "bla = dict( [(\"a\",1),(\"b\",2),(\"c\",3)] )\n", "del(bla[\"a\"])\n", "bla" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can access all keys and all values easily:\n", "\n", "bla = dict( [(\"a\",1),(\"b\",2),(\"c\",3)] )\n", "\n", "print(bla.keys())\n", "print(bla.values())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Or all key-value pairs as a list of tuples\n", "print(bla.items())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can combine dictionaries too:\n", "bla.update( {\"x\":10, \"y\":11} )\n", "print(bla)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## is vs. equals (or, value vs. reference) \n", "
[(top)](#goto_top)
\n", "\n", "Depending on the variable or container, we can easily determine if a variable **has the same value as** another, or if it **is the same** variable as another." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Assignes the value to the variable\n", "var1 = [1,2,3]\n", "var2 = [1,2,3]\n", "\n", "# Checks if the value in the variable `var1` equals the values in `var2`\n", "print(var1 == var2)\n", "# Checks if the name `var1` accesses the same variable as the name `var2`\n", "print(var1 is var2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If we assign one to the other:\n", "var1 = var2\n", "\n", "# Checks if the value in the variable `var1` equals the values in `var2`\n", "print(var1 == var2)\n", "# Checks if the name `var1` accesses the same variable as the name `var2`\n", "print(var1 is var2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# In this case, if we operate on the second list, we will change the first!\n", "\n", "var1[1] = \"one\"\n", "\n", "print(var2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the examples above, we have assigned one list to another, and that made both variables share the same **reference** to a given value (namely, the list \"1,2,3\").\n", "However, in some cases we want to keep two separate and independent lists, and just copy their values. For that, one uses the function `copy`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var1 = [\"a\",\"b\",\"c\"]\n", "var2 = [1,2,3]\n", "\n", "var1 = var2.copy()\n", "\n", "print(var1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now we modify var1\n", "var1[0] = \"zero\"\n", "\n", "print(var1)\n", "print(var2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Another alternative is to use the `slice` notation:\n", "var1 = var2[:]\n", "\n", "print(var1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pythonic \"in\" \n", "
[(top)](#goto_top)
\n", "\n", "The more one reads about Python, the more one hears about the \"pythonic\" way of writing code. This usually means making code more readable, hiding complex features inside classes, and using the language operators (+, -, is, in, etc.) to answer questions.\n", "\n", "There are examples everywhere, but a simple one would be to check if a value is in a list. The C/C++ way would iterate over the list and check every value. We *can* use the `index` method and check if it returns an index equal to or larger than zero, or an error. Or we can just **ask** if the value is in the list, the **pythonic** way." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Creates containers\n", "var_list = [123, 456, 789]\n", "var = 789\n", "\n", "# Checks for presence in the container\n", "\n", "# 1: Loop and check\n", "for x in var_list:\n", " if x == var:\n", " print(\"True\")\n", " break\n", "\n", "# 2: Use the `index` method\n", "print( var_list.index(var) >= 0 )\n", "\n", "# 3: Just ask\n", "print( var in var_list )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## None \n", "
[(top)](#goto_top)
\n", "\n", "**None** is a special class that denotes \"lack of value\", and is used extensively in Python, particularly as a return value of functions and operations that fail. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Control Flow \n", "\n", "## Indentation and Scope \n", "\n", "In Python, code indentation defines scope. This forces code to be written in an *understandable* way, and makes it very clear what is in or out of a function or a loop.\n", "We will start with if/else statements:\n", "\n", "## If/Else \n", "
[(top)](#goto_top)
\n", "\n", "In any `if` statement, the `else` clause is optional, and the `elif` (short for `else if` found in other languages) avoids excessive indentation, and provides an alternative to the `switch` or `case` statements." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Feel free to change this value!\n", "x = 3\n", "\n", "if x < 0:\n", " x = 0\n", " print('Negative changed to zero')\n", "elif x == 0:\n", " print('Zero')\n", "elif x == 1:\n", " print('Single')\n", "else:\n", " print('More')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## For/While Loops \n", "
[(top)](#goto_top)
\n", "\n", "Similar to the If/Else clause, loops are defined and enclosed based on indentation.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xlist = [1,2,3,4]\n", "for x in xlist:\n", " print(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "index = 0\n", "while index < len(xlist):\n", " print(xlist[index])\n", " index += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can create ranges of numbers dynamically using the `range` command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Note that the values start from zero.\n", "for x in range(5):\n", " print(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Or within a defined range.\n", "for x in range(5,8):\n", " print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cntinue/Break/Else \n", "\n", "We can control the progression of loops using special keywords.\n", " - `continue` **skips** the rest of the code in the loop and updates the looping variable.\n", " - `break` **stops** the code in the loop and exits.\n", " - `else` creates a code block executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for x in range(5):\n", " # Will skip printing numbers smaller than 3\n", " if x < 3:\n", " continue\n", " print(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = 0\n", "while x < 10:\n", " print(x)\n", " # Will stop the loop when x equals 3\n", " if x == 3:\n", " break\n", " x += 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for x in range(3):\n", " print(x)\n", "else:\n", " print(\"Print all numbers!\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for x in range(10):\n", " print(x)\n", " if x == 3:\n", " break\n", "else:\n", " print(\"Print all numbers!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Try/Except \n", "
[(top)](#goto_top)
\n", "\n", "The Try/Except block allows us to write code that may fail, and write code that will recover from the failure, and maybe continue down an alternative route.\n", "\n", "One good example of the Try/Except block in action is the more *pythonic* way of testing if a variable belongs to a certain type. It is common in python idiom to ask for forgiveness instead of permission." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bla = \"this is the number 123\"\n", "try:\n", " ans = int(bla) + 100\n", " print(ans)\n", "except:\n", " print(\"Could not convert string into an integer!\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bla = \"123\"\n", "try:\n", " ans = int(bla) + 100\n", " print(ans)\n", "except:\n", " print(\"Could not convert string into an integer!\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can catch specific types of exceptions:\n", "\n", "try:\n", " x = 1/0\n", "except ZeroDivisionError as err:\n", " print('Handling run-time error:', err)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# We can raise custom exceptions and pass relevant information:\n", "\n", "try:\n", " raise Exception(123, 'Important INFO')\n", " \n", "except Exception as inst:\n", " print(type(inst)) # the exception instance\n", " print(inst.args) # arguments stored in .args\n", " print(inst) # __str__ allows args to be printed directly,\n", " # but may be overridden in exception subclasses\n", " arg1, arg2 = inst.args # unpack args\n", " print('argument 1 =', arg1)\n", " print('argument 2 =', arg2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# And exeptions that happen in other scopes are raised up the scopes untill they reach the first try/except block.\n", "# We will go over scopes a little later on.\n", "\n", "def this_fails():\n", " x = 1/0\n", "\n", "try:\n", " this_fails()\n", "except ZeroDivisionError as err:\n", " print('Handling run-time error:', err)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Functions \n", "
[(top)](#goto_top)
\n", "\n", "\n", "## Def/Lambda \n", "
[(top)](#goto_top)
\n", "\n", "Functions in python can be expressed in two different ways, with the \"classical\" form using `def` keyword, or using **lambda** definitions. We will go over the `def` first.\n", "\n", "Like previous code blocks for `if/else` statements or `for/while` loops, the code that belongs to a function will be all the code below the function definition that is shifted one indentation up: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Classical function deffinition\n", "\n", "def funcPrintX(x):\n", " print(x)\n", "\n", "# Now execute the function\n", "funcPrintX(\"bla\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Note the change in identaton:\n", "\n", "def funcPrintX2(x):\n", " print( x**2 )\n", "print(\"this is outside the function!\")\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now exwcute the function\n", "funcPrintX2(4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Lambdas** are *anonymous* functions, they are created quickly, with a very simple syntax, and allow flexibility for event handling. They are especially useful for short code that needs to be executed repeatedly, but cannot be entirely written at the time the software is written (due to lack of information, parameter values, etc.).\n", "\n", "The following example demonstrates the *format* of a `lambda` definition. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Lambda definition being defined.\n", "funcL = lambda x : x**2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in [1,2,3,4]:\n", " print( funcL(i) )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A more common use for lambda functions:\n", "\n", "Suppose we have a `list` of `tuples`, and we wish to sort the `list` based on the second element in the `tuples`. We can make use of the `sort` method in lists and use the `key` argument. The `key` expects a function that will return exaclty one value per item in the list, so instead of `def`ining a new function that will simply extract the second element in a tuple, we create a lambda right then and there." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Creates a list sorted alphabetically by the first tuple element.\n", "tlist = [(\"SampleA\", 1.243), (\"SampleB\", 0.243), (\"SampleC\", -3.243), (\"SampleD\", 5.243) ]\n", "\n", "# Use the list `sort` function with the `key` argument.\n", "# The lambda receives an item from the list, and returns the contents of index 1.\n", "tlist.sort(key = lambda item: item[1])\n", "\n", "# Prints the sorted list (rmember that `sort` will re-order the list in place, that is, will change the list itself).\n", "print(tlist)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# As opposed to:\n", "\n", "tlist = [(\"SampleA\", 1.243), (\"SampleB\", 0.243), (\"SampleC\", -3.243), (\"SampleD\", 5.243) ]\n", "\n", "def returnSecond(x):\n", " return x[1]\n", "\n", "# Use the list `sort` function with the `key` argument.\n", "# The lambda receives an item from the list, and returns the contents of index 1.\n", "tlist.sort(key = returnSecond)\n", "\n", "# Prints the sorted list (rmember that `sort` will re-order the list in place, that is, will change the list itself).\n", "print(tlist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another good example is the creation of functions with fixed parameters defined during run-time. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Creates a function that *returns* another function\n", "def createFunc(a, b):\n", " return lambda x: a*x + b\n", "\n", "# Creates a line function with angular coefficient 2 and Y offset of 4.\n", "lineFun = createFunc(2, 4)\n", "\n", "for x in [0,1,2,3,4]:\n", " print( lineFun(x) )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arguments (and default arguments ) \n", "
[(top)](#goto_top)
\n", "\n", "Diving deeper into `def`ining functions:\n", "\n", "Functions can be defined to have *positional* arguments and *keyword* arguments. Positional arguments are *required* when a function is called, but keyword arguments are not, they have default values that will be used in case the function call does not provide one. Keyword arguments **must** follow positional arguments." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Function definition with position arguments\n", "def funcPos(first, second, third):\n", " print(\"First:\", first)\n", " print(\"Second:\", second)\n", " print(\"Third:\", third)\n", "\n", "funcPos(1, 2, 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "funcPos(3,2,1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Providing keywords for the arguments and default values.\n", "def funcKey(first=1, second=2):\n", " print(\"First:\", first)\n", " print(\"Second:\", second)\n", "\n", "funcKey(10, 20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using the default value for the second argument.\n", "funcKey(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# With keywords, you can provide values in *any order*\n", "funcKey(second=10, first=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# And you can ommit any keyword(s) you want.\n", "funcKey(second=10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# With a mixed function declaration, you must always provide the positional argument(s)\n", "def funcPosKey(first, second=2, third=3):\n", " print(\"First:\", first)\n", " print(\"Second:\", second)\n", " print(\"Third:\", third)\n", "\n", "funcPosKey(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "funcPosKey(5, third=30)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This WIL NOT work! Missing positional argument!\n", "#funcPosKey(third=30)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You can also change the type of the default values\n", "funcPosKey(5, \"two\", 3.14)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \\*args and \\*\\*kwargs\n", "\n", "For even more flexibility, Python allows you to add extra arguments to a function call at run-time. For extra positional arguments, you can access the values using the `*args` keyword, and for extra keyword arguments, use the `**kwargs` keyword. As before, `*args` must come before `**kwargs`.\n", "\n", "If present, extra positional arguments will populate a tuple in `args`, and extra keyword arguments will be passed through a dictionary in `kwargs`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# *args and **kwargs\n", "def funcArgs(first, *args, **kwargs):\n", " print(\"First:\", first)\n", " print(\"Extra positional:\",args)\n", " print(\"Extra keyword:\",kwargs)\n", "\n", "funcArgs(1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "funcArgs(1, 2, 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "funcArgs(1, 2, 3, fourArg=\"4\", five=(5.12, 6.34))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A *pythonic* way of checking whether extra arguments were passed (positional or not) is to check if the tuple and/or dictionary were populated with values. In python, empty containers themselves can be tested in an `if` statement, and will evaluate to `False` if they are empty." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare with the rpevious outputs.\n", "def funcArgs(first, *args, **kwargs):\n", " print(\"First:\", first)\n", " \n", " if args:\n", " print(\"Extra positional:\",args)\n", " \n", " if kwargs:\n", " print(\"Extra keyword:\",kwargs)\n", " \n", "funcArgs(1, 2, 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "funcArgs(1, fourArg=\"4\", five=(5.12, 6.34))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a side note, the important points are the `*`(s) in the function definition, not the words \"*args*\" or \"*kwargs*\". One could define a function as:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# *arguments, **keyarguments\n", "def funcArgs(first, *arguments, **keyarguments):\n", " print(\"First:\", first)\n", " print(\"Extra positional:\",arguments)\n", " print(\"Extra keyword:\",keyarguments)\n", "\n", "funcArgs(1, 2, 3, four=\"5\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comments and Docstrings \n", "
[(top)](#goto_top)
\n", "\n", "By now you know that a comment in Python code starts with `#`, anywhere in a line, and goes on until the end of the line (but preferably uses its own line). Docstring are like comments, but they take a special place in a python function definition (and class and module definition too, as we will see later on), and can be accessed in a very easy way. We define docstring using tuple quotes at the beginning of the function block, right below its definition. \n", "\n", "Comments are **very** important to explain **how** the code works, while **docstrings** should describe **what** the code does.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def funcDoc(x):\n", " '''\\\n", " Calculate and print the square of the value passed as argument.\n", " '''\n", " \n", " # Multiply the number by itself and store the result.\n", " y = x**2\n", " print(y)\n", "\n", "funcDoc(2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can access the documentation directly using an attribute of the function:\n", "funcDoc.__doc__" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Or use the Jupyter interface to access it. Execute this cell to see the jupyter notebook \n", "# create window at the bottom of the page woth documentation.\n", "?funcDoc" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Or place the cursor inside the parenthesis and use *Shift + Tab*.\n", "funcDoc(4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you can try going back to some previous cell and use these techniques to learn more about the `sort` method in lists, for example.\n", "\n", "Or create a new cell here to try it out." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scope \n", "
[(top)](#goto_top)
\n", "\n", "Python introduces the concept of controlling **scope** using indentation, but that gets more complicated when we introduce `functions/lambdas` and `modules` (more on that later). The scope of a variable is the portion of code that can access it (that is, code that is capable to read and/or modify the variable). A [great explanation][1] would be the LEGB Rule.\n", "\n", "L, Local — Names assigned in any way within a function (`def` or `lambda`)), and not declared global in that function.\n", "\n", "E, Enclosing-function locals — Name in the local scope of any and all statically enclosing functions (`def` or `lambda`), from inner to outer.\n", "\n", "G, Global (module) — Names assigned at the top-level of a module file, or by executing a `global` statement in a `def` within the file.\n", "\n", "B, Built-in (Python) — Names preassigned in the built-in names module : `open`,`range`,`SyntaxError`,...\n", "\n", "[1]:https://stackoverflow.com/questions/291978/short-description-of-the-scoping-rules" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Classes \n", "
[(top)](#goto_top)
\n", "\n", "Classes are structures that aggregate variables and functions. They are the core of [Object Oriented Programming][1] (OOP), which aims at organizing code in terms of classes that contain information (or **attributes**) and functions (or **methods**) related to that information, and then create instances of the class (called **objects**) that can actually carry on with the code's functionality. The modularity allows for better control of execution, leading to less bugs, and more understandable code, meaning it is easier to maintain and extend the software.\n", "\n", "The three main concepts in OOP are **Encapsulation**, **Inheritance** and **Polymorphism**, and we will go over how Python implements these properties in sequence. First, lets look at a basic example of creation and use of classes.\n", "\n", "[1]:https://en.wikipedia.org/wiki/Object-oriented_programming\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class MyClass:\n", " \"\"\"A simple example class\"\"\"\n", " i = 12345\n", " \n", " # When a method is called from an object, the first argument\n", " # it receives is a link to the object itself (thus, \"self\").\n", " # Class methods must always have \"self\" as the first argument.\n", " def f(self):\n", " return 'hello world'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Creating an object from the class looks like calling a function.\n", "obj = MyClass()\n", "\n", "# Now we call a method from the object.\n", "obj.f()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# And now we access one of its attributes\n", "print( obj.i )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each new instance of a class can be created while giving it values, and processing initial data. That is done with the '\\__init\\__()' method. The **self** keyword comes in again when differentiating *class* and *instance* variables. Class variables are variables shared by *all* instances of a class, meaning all objects created from the class will share the same value. Instance variables are unique to each object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class MyNewClass:\n", " \"\"\"A simple example of class and instance variables\"\"\"\n", " \n", " class_var = 12345\n", " \n", " def __init__(self, init_data):\n", " self.inst_var = init_data\n", " \n", " def print_vars(self):\n", " print(\"Class variable:\",self.class_var)\n", " print(\"Instance variable:\", self.inst_var)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "obj1 = MyNewClass(\"one\")\n", "obj2 = MyNewClass(\"two\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "obj1.print_vars()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "obj2.print_vars()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Encapsulation \n", "
[(top)](#goto_top)
\n", "\n", "With encapsulation, a class can be created to hold important information and implement functions that operate on the information, not allowing others parts of the code to interfere or modify the data directly. Functions (or methods) and variables (or attributes) that are *not* intended to be accessed from *outside* the class are called **private**, the rest are called **public** methods and attributes.\n", "\n", "A class in Pyhton may provide public dedicated methods to create, access, modify or delete some of its attributes, but in Python, classes cannot **prohibit** outside access to private attributes or methods. That is done only through *convention*: private methods and attributes will start with an underscore." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class MyCapsule:\n", " \"\"\"A simple example of class encapsulation\n", " The class keeps a list and a separate variable with\n", " the size of the list for easy access. \n", " \"\"\"\n", " \n", " def __init__(self, init_list):\n", " self._my_list = init_list\n", " self._my_list_len = len(init_list)\n", " \n", " def get_list(self):\n", " return self._my_list\n", " \n", " def get_list_len(self):\n", " return self._my_list_len\n", " \n", " def set_list(self, new_list):\n", " self._my_list = new_list\n", " self._my_list_len = len(new_list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "obj = MyCapsule( [1,2,3,4,5] )\n", "print( \"List:\", obj.get_list() )\n", "print( \"List length:\", obj.get_list_len() )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using the set method gives the expected result:\n", "\n", "obj.set_list( [1,2,3] )\n", "print( \"List:\", obj.get_list() )\n", "print( \"List length:\", obj.get_list_len() )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# NOT using the public method, and accessing the private variable\n", "# will NOT WORK.\n", "\n", "obj._my_list = [1,2,3,4,5,6]\n", "print( \"List:\", obj.get_list() )\n", "print( \"List length:\", obj.get_list_len() )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inheritance \n", "
[(top)](#goto_top)
\n", "\n", "One of the greatest provider of flexibility in OOP code is *inheritance*, it allows us to create classes that reuse and extend functionalities already coded in other classes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use template library to substitute values in strings.\n", "from string import Template\n", "\n", "class RateFrom:\n", " '''Defines basic functionalities for chemical rate forms'''\n", " \n", " def __init__(self):\n", " self._rateform = Template(\"\")\n", " \n", " def get_rate(self):\n", " return self._rateform\n", " \n", " def set_rate(self, new_rateform):\n", " self._rateform = new_rateform\n", " \n", " def get_final_rate(self, values_dict):\n", " return self._rateform.safe_substitute(values_dict)\n", "\n", "class MMRate(RateFrom):\n", " '''Derived class with extra attributes and methods'''\n", " \n", " def __init__(self):\n", " self._name = \"Michaelis-Menten\"\n", " self._rateform = Template(\"($Vmax * $S)/($Km + $S)\")\n", " \n", " def get_name(self):\n", " return self._name\n", "\n", "class MMRateKcat(RateFrom):\n", " '''Derived class with extra attributes and methods'''\n", " \n", " def __init__(self):\n", " self._name = \"Michaelis-Menten with Enzyme concentration\"\n", " self._rateform = Template(\"($Kcat * $E * $S)/($Km + $S)\")\n", " \n", " def get_name(self):\n", " return self._name" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mich_ment_rate = MMRate()\n", "\n", "print(mich_ment_rate.get_name() )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mich_ment_rate2 = MMRateKcat()\n", "\n", "print(mich_ment_rate.get_name() )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "param_dict = {\"Vmax\":100, \"Km\": 3.14}\n", "print( mich_ment_rate.get_final_rate(param_dict) )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "param_dict = {\"Vmax\":100, \"Km\": 3.14, \"Kcat\":5.6}\n", "print( mich_ment_rate2.get_final_rate(param_dict) )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Polymorphism \n", "
[(top)](#goto_top)
\n", "\n", "The last main aspect of OOP is *polymorphism*, which is the ability to reshape and modify inherited methods, or methods common between different classes. The idea is that you can call the same function in different objects, but their implementation will be specific to the object.\n", "\n", "Using the previous example, suppose you want to create two different reaction rate objects, with that gives you more information regarding the internal operations taking place." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class MMRateVerbose(RateFrom):\n", " '''Derived class with extra attributes and methods'''\n", " \n", " def __init__(self):\n", " self._name = \"Michaelis-Menten\"\n", " self._rateform = Template(\"($Vmax * $S)/($Km + $S)\")\n", " \n", " def get_name(self):\n", " return self._name\n", " \n", " # New deffinition of method get_final_rate\n", " def get_final_rate(self, values_dict):\n", " print(\"Substituting values in rate form...\") # We add an informative print statement\n", " return self._rateform.safe_substitute(values_dict)\n", "\n", "mich_ment_rate_V = MMRateVerbose()\n", "\n", "print(mich_ment_rate_V.get_name() )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "param_dict = {\"Vmax\":100, \"Km\": 3.14}\n", "print( mich_ment_rate_V.get_final_rate(param_dict) )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "# Iterators and Generators \n", "
[(top)](#goto_top)
\n", "\n", "In previous examples, we looped over container items using the `for` statement. This very convenient access can be created for any user defined container class through the definition of a couple of methods: \\__iter\\__() and \\__next\\__().\n", "Python uses the \\__iter\\__() function to get an iterator object, which will exhibit an item of the container, and will respond to the \\__next\\__() method by moving to the next element in the container. If there are no more elements in the container, the StopIteration exception is raised." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s = 'abc'\n", "it = iter(s)\n", "it" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "next(it)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "next(it)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we create our own class (which is a great example from the Python tutorial):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Reverse:\n", " \"\"\"Iterator for looping over a sequence backwards.\"\"\"\n", " def __init__(self, data):\n", " self.data = data\n", " self.index = len(data)\n", "\n", " def __iter__(self):\n", " return self\n", "\n", " def __next__(self):\n", " if self.index == 0:\n", " raise StopIteration\n", " self.index = self.index - 1\n", " return self.data[self.index]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rev = Reverse([1,2,3,4,5])\n", "iter(rev)\n", "\n", "for char in rev:\n", " print(char)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Generators** are special functions that only compute the requested value on-demand. They use the `yeld` command (instead of `return`) to \"hold off\" on the computation and wait until they are called again. They create the \\__iter\\__() and \\__next\\__() methods automatically, and when they finish, they raise the StopIteration behind the scenes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can easily create a generator that returns the square of numbers for as long as we want.\n", "# Note that we are not creating a list and then accessing it. This is much more memory efficient.\n", "\n", "def get_numbers():\n", " number = 0\n", " while True:\n", " number += 1\n", " yield number**2\n", "\n", "for i in get_numbers():\n", " print(i)\n", " if i >10:\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comprehensions/range/map \n", "
[(top)](#goto_top)
\n", "\n", "Much like *lambdas*, comprehensions provide a quick and readable way to create lists, dictionaries, and complex collections. Comprehensions can be created for lists, tuples and dictionaries:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# List Comprehension\n", "\n", "# Instead of:\n", "ret = []\n", "for i in [1,2,3,4,5]:\n", " ret.append(i**2)\n", "print(ret)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can do:\n", "ret = [i**2 for i in [1,2,3,4,5]]\n", "print(ret)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This creates a Dictionary!\n", "\n", "{ a:a**2 for a in [1,2,3,4,5]}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This uses the same syntax but creates a GENERATOR, not a tuple.\n", "\n", "ret = (i/2.0 for i in [1,2,3,4,5])\n", "print(ret)\n", "\n", "for i in ret:\n", " print(i)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Which can be used immediately\n", "\n", "sum(i/2.0 for i in [1,2,3,4,5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `range` command creates an iterator that returns a sequence of numbers:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This is an iterable list of numbers.\n", "range(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Starts from ZERO by default.\n", "for i in range(3):\n", " print(i)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Start:Stop:Step\n", "for i in range(3,10,2):\n", " print(i)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This comprehension is more fancy:\n", "# It combines all vs all values from two lists of numbers, and creates tuples with the pairs.\n", "\n", "[ (x,y) for x in range(3) for y in range(3,6) ]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can also add an `if` statement:\n", "\n", "[ (x,y) for x in range(3) for y in range(3,6) if x + y < 6]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A **map** allows one to apply the same function to an iterable. **Map** itself returns a generator that will `yield` the results of the function call as it is called." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def funcRaise(x):\n", " return x**2\n", "\n", "for result in map(funcRaise, range(4)):\n", " print(result)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Or to create a tuple with the results:\n", "\n", "tuple(map(funcRaise, range(4)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another \"quality of life\" function is the `zip`, which combines items from different iterators:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Returning to an earlier example:\n", "# Zip avoids the \"all-to-all\" behavior of a list comprehension with two iterables.\n", "\n", "[ pair for pair in zip(range(3), range(3,6)) ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Modules \n", "
[(top)](#goto_top)
\n", "\n", "Modules are the media for distribution of new code and functionalities. Since Python is an open source project, many members of the community are not only users of the language but also contributors. Anyone can build a package with their own code, providing a new functionality (or a new implementation for an existing functionality), and submit it to the [Python Package Index](), or to their GitHub/BitBucket/etc repository.\n", "\n", "## Import syntax \n", "
[(top)](#goto_top)
\n", "\n", "As you have seen throughout this introduction to Python, in order to import a module we can use the `import` keyword. This will import the main module and will make available sub-modules through the `.` `(dot)` operator. For example, the `numpy` module (which will be explored below) combines a large amount of scientific functions. To access the `random` sub-module, and the `normal` function, which samples numbers from a normal distribution, we can do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy\n", "numpy.random.normal(0,1,5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We cal also use shortcuts to make available only the sub-module we want, using a shorter keyword:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy.random as rd\n", "rd.normal(0,1,5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or import only a function from the whole module:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from numpy.random import normal as rnorm\n", "rnorm(0,1,5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install new modules \n", "
[(top)](#goto_top)
\n", "\n", "Most modules you will want yo install will be available in the main [Python Package Index]() which means you can use Python's `pip` command. For example, to install the *SciPy* package:\n", "\n", "`pip install scipy`\n", "\n", "Custom packages can also be installed using `pip` after downloading the package." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# File IO \n", "
[(top)](#goto_top)
\n", "\n", "Python provides different ways to handle file operations.\n", "\n", "## open() and *with* \n", "
[(top)](#goto_top)
\n", "\n", "A classic way of handling a file would be to `open` and `close` it:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Open a file with permission to Write to it.\n", "# If it does not exist, it will be created.\n", "\n", "fileHandler = open(\"test_file.txt\", \"w\")\n", "\n", "fileHandler.write(\"Test file:\")\n", "\n", "# List of lines:\n", "lineList = [\"First line\", \"Second line\", \"Third line\"]\n", "\n", "fileHandler.writelines(lineList)\n", "\n", "fileHandler.close()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now we open the file for Reading (ONLY):\n", "\n", "fileHandler = open(\"test_file.txt\", 'r')\n", "\n", "# The file handler can be used as an iterator:\n", "for index, line in enumerate(fileHandler):\n", " print(index, line)\n", "\n", "fileHandler.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will notice that we only wrote one line in the file, even though we called `write` and `writelines` with several inputs. Writing into files does *NOT* automatically add line breaks. In Linux, the special character for new lines is `\\n`, so to get our input in different lines we would need to add that to the end of every line." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fileHandler = open(\"test_file.txt\", \"w\")\n", "\n", "# List of lines:\n", "lineList = [\"Test file:\", \"First line\", \"Second line\", \"Third line\"]\n", "\n", "# Provide new list with line ending characters in all lines\n", "fileHandler.writelines( [line + \"\\n\" for line in lineList] )\n", "\n", "fileHandler.close()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fileHandler = open(\"test_file.txt\", 'r')\n", "\n", "# The file handler can be used as an iterator:\n", "for index, line in enumerate(fileHandler):\n", " print(index, line)\n", "\n", "fileHandler.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We get extra lines between the lines in the file because the `print` statement always adds a new line after every call.\n", "\n", "Another way to handle files without having to `open` and `close` it is to use the **context manager** `with`. This is the *Pythonic* way, and it keeps us from forgetting to close a file:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(\"test_file.txt\", \"w\") as fileHandler:\n", " \n", " lineList = [\"Test file:\", \"First line\", \"Second line\", \"Third line\"]\n", "\n", " fileHandler.writelines( [line + \"\\n\" for line in lineList] )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(\"test_file.txt\", 'r') as fileHandler:\n", "\n", " for index, line in enumerate(fileHandler):\n", " print(index, line)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## csv/Pickle/Json \n", "
[(top)](#goto_top)
\n", "\n", "Once we open a file, we may need help handling its contents. If the file is formatted, we can use Python modules to parse its contents and present a more accessible handler than each raw full line.\n", "\n", "We will use a spreadsheet from the SI of a [2015 paper](https://www.nature.com/articles/nbt.3418) where the authors quantified the expression of 2359 proteins in *E. coli* cells, in 22 different conditions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import csv\n", "\n", "# This is a micro subset of the SI of the paper (for readability) saved as a CSV file.\n", "with open(\"data/protein_copies_per_cell_small.csv\",'r',newline='') as infile:\n", " csvreader = csv.reader(infile)\n", " for row in csvreader:\n", " print(row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For simpler access, we can also handle each line as a dictionary:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(\"data/protein_copies_per_cell_small.csv\",'r',newline='') as infile:\n", " \n", " # Using `DictReader`\n", " csvreader = csv.DictReader(infile)\n", " for row in csvreader:\n", " print(row[\"Gene\"])\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For fast access (in particular of large amounts of data) we can also store and read information in `pickle` format. This is a binary format that is NOT intended for long ter use, but that can seep up workflows that depend on file IO. We can write entire data structures directly to file:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pickle" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pfile = open(\"test_pickle.pickle\",'wb')\n", "\n", "lineList = []\n", "\n", "# Create a list of dictionaries:\n", "with open(\"data/protein_copies_per_cell_small.csv\",'r',newline='') as infile:\n", " \n", " # Using `DictReader`\n", " csvreader = csv.DictReader(infile)\n", " for row in csvreader:\n", " lineList.append(row)\n", " \n", "# \"Dumps\" the whole object into the file\n", "pickle.dump(lineList, pfile)\n", "\n", "pfile.close()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now we open the binary file for reading and load its contents to memory\n", "with open(\"test_pickle.pickle\",'rb') as infile:\n", " new_lineList = pickle.load(infile)\n", "\n", "new_lineList" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A final, but no less important, mention should go to Json. Its interface is similar to Pickle, but it writes in JavaScript Object Notation instead of binary format, and allows for a standardized exchange of information. Like pickle, it can store any arbitrary object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "# This is what the file looks like:\n", "print(json.dumps([123, 456, \"This and that\", {'4': 5, '6': 7}], sort_keys=True, indent=4))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a string in JSON readable format\n", "string_json = '''\\\n", "[\n", " 123,\n", " 456,\n", " \"This and that\",\n", " {\n", " \"4\": 5,\n", " \"6\": 7\n", " }\n", "]'''\n", " \n", "# And load it into the original list:\n", "json.loads(string_json)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Virtual Environments \n", "
[(top)](#goto_top)
\n", "\n", "Virtual environments are independent Python installations, created to organize modules and prevent conflicts between versions of the same module or different conflicting modules.\n", "\n", "## python -m venv \n", "
[(top)](#goto_top)
\n", "\n", "Pyhton provides the `venv` tool (previously `pyvenv` in versions 3.3 and 3.4, deprecated in 3.6), which creates a copy of the system's Python installation in a user-defined location:\n", "\n", "```bash\n", "python3 -m venv /path/to/new/virtual/environment\n", "```\n", "\n", "You can then activate your environment with:\n", "\n", "```bash\n", "source /path/to/new/virtual/environment/bin/activate\n", "```\n", "\n", "And deactivate it with the `deactivate` command.\n", "\n", "This ensures that the Pyhton interpreter and all the modules you load, install or remove are in that particular and isolated environment.\n", "\n", "## conda new \n", "
[(top)](#goto_top)
\n", "\n", "If you are using Anaconda to manage your Python installation (common in MacOS), you can use the `conda new` command to perform the same action. It will create an isolated environment to contain specific modules and versions. **Remember**, after working with an environment for some time, if you create a *new* environment from scratch, you will need to install all relevant modules again, since this will be a *new* and *independent* environment.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Scientific Modules \n", "
[(top)](#goto_top)
\n", "\n", "## Numpy/Scipy and Matplotlib \n", "\n", "From their documentation, \"In an ideal world, NumPy would contain nothing but the array data type and the most basic operations: indexing, sorting, reshaping, basic element-wise functions, et cetera. All numerical code would reside in SciPy.\" However, you may find some overlap of functionalities between them. \n", "\n", "[NumPy][1] arrays are commonly used in data analysis because accessing and operating on them is extremely more efficient than using Python's lists. For large datasets and/or intensive computation, using NumPy data structure is essential.\n", "\n", "[SciPy][2] has a large array of functionalities already programmed and ready to use. They are also usually very efficient and completely integrated with NumPy data structures. From random number generators to parameter optimization, signal processing to statistics. \n", "\n", "[1]:http://www.numpy.org/\n", "[2]:https://docs.scipy.org/doc/scipy-1.1.0/reference/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import scipy as sp\n", "import matplotlib.pyplot as plt\n", "%matplotlib notebook\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a series of 20 numbers betwen 0 and 10\n", "npar = np.linspace(0,10,20)\n", "npar" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy arrays provide *vectorial* operations which are easy to use and internally optimized for performance. One example would be to multiply all elements of an array by a constant. In `C/C++`, one would create a loop to multiply each element and overwrite the original number:\n", "\n", "```Python\n", "const = 1.5\n", "npar = np.linspace(0,10,20)\n", "for index in range(len(npar)):\n", " npar[index] *= const\n", "```\n", "\n", "With vector operations, you can simply multiply *the whole array* by using it directly:\n", "\n", "```Python\n", "const = 1.5\n", "npar = np.linspace(0,10,20)\n", "npar *= const\n", "```\n", "\n", "In practice:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "const = 1.5\n", "npar = np.linspace(0,10,20)\n", "npar *= const\n", "npar" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The arrays in NumPy allow for a series of operations, like the creation of masks which will select only some indices of an array. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Behind the scenes, NumPy can verify a condition and return an array of Boolean values.\n", "npar < 5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Application of a boolean mask:\n", "# We keep only the numbers smaller than 5\n", "npar[ npar < 5 ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A simple example of a SciPy application using NumPy structures would be fitting observations of an oscillator using SciPy's optimization module. What we will be optimizing are the amplitude, frequency and phase parameters of the function `funcOsc`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load optimiation module\n", "from scipy.optimize import curve_fit as cfit\n", "\n", "# Define the function that will be used in the optimization procedure.\n", "def funcOsc(ang, k, n, d):\n", " # Simple oscillator.\n", " return k * (1 + np.cos(n*ang - d))\n", "\n", "# Generate initial data: we create 2D data, one axis at a time.\n", "# First, the x axis values are created with NumPy's `linspace` function, \n", "# that generates evenly spaced numbers over a specified interval.\n", "# Note that we are creating 12 numbers between 0 and 360, then multipling \n", "# the array by pi over 180, to transform angle vbalues from degrees into radians.\n", "xdata = (np.linspace(0, 360, 13))*np.pi/180\n", "\n", "# Second, we use example values from a potential with three peaks. These particular \n", "# values came from a quantum mechanical torsion angle scan of a chemical bond.\n", "# NumPy's \"asarray\" transforms a Python list into a NP array (for efficiency).\n", "ydata = np.asarray([0,-4.37,-7.06,-3.69,.00,-4.32,-7.01,-3.74,-0.01,-4.44,-7.01,-3.69,0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Creates plot with original data. We can get an idea of what the oscillation looks like.\n", "\n", "# Matplotlib uses the `plot` function with positional arguments for X and Y values. The third\n", "# argument \"ro\" indicates the style of the data. The \"r\" indicates the color red, and the \"o\"\n", "# indicates we want to plot solid points.\n", "plt.plot(xdata, ydata, 'ro')\n", "plt.xlabel(\"Angle (rad)\")\n", "plt.ylabel(\"Energy (kcal/mol)\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The function cfit uses non-linear least squares to fit a function, f, to data.\n", "# We provide the X and Y axis points (xdata and energy arrays), and provide an\n", "# initial guess for the values of the variables we are trying to fit. \n", "popt, pcov = cfit(funcOsc, xdata, ydata, p0=[-3,3,3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Optimized parameters:\n", "popt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now we join two representations on the same plot: the original data and new data\n", "# from the parameter fitting opperation.\n", "# For the original data, we just repat the same plot call.\n", "# For the second plot, we provide the same X values, but use the original function to \n", "# produce the Y values given the X values and the optimized parameter values.\n", "# As for the style, \"b\"lue dashed lines (given by \"-.\").\n", "# You will also notice we asked for lable for our data. Each plot call has a lable \n", "# argument. On the second case, we use a string substitution pattern like the on found in C.\n", "\n", "plt.plot(xdata, ydata, 'ro', label='data')\n", "\n", "plt.plot(xdata, funcOsc(xdata, *popt), 'b-.',\n", " label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))\n", "plt.xlabel(\"Angle (rad)\")\n", "plt.ylabel(\"Energy (kcal/mol)\")\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another example would be simulating the ODE system defined by the Lotka-Volterra model.\n", "\n", "Given the prey $y_1$ and predator $y_2$, we define their change over time by the following equations:\n", "\n", "$$ \\frac{dy_1}{dt} = \\alpha y_1 - \\beta y_1 y_2 $$\n", "$$ \\frac{dy_2}{dt} = \\gamma y_1 y_2 - \\delta y_2 $$\n", "\n", "This can be easily programmed and simulated using an ODE solver present in SciPy. First we define a function that will calculate the change in X and Y over time:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Model function\n", "def rhs(t, y):\n", " '''\\\n", " Devines the right hand side of the ODE system, representing the change in population\n", " according to the Lotka-Volterra model.'''\n", " \n", " dy1 = alpha*y[0] - beta*y[0]*y[1]\n", " dy2 = gamma*y[0]*y[1] - delta*y[1]\n", " \n", " return [dy1,dy2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We create the time points for which we will store population values, \n", "# and set the parametes for the model:\n", "\n", "totTime = 25 # Total simulatio time\n", "sampling = 250 # Sampling along total time\n", "\n", "# Numpy function that creates a range of numbers in an optimized NumPy array,\n", "# instead of a list or a generator. This improves speed of calculation.\n", "times = np.linspace(0,totTime, sampling)\n", "\n", "# Global parameters for the function.\n", "alpha = 2.0/3.0 # Growth of prey\n", "beta = 4.0/3.0 # Death of pray dependent on predator population.\n", "gamma = 1.0 # Growth of predator dependent on prey population.\n", "delta = 1.0 # Death of predator\n", "\n", "# Initial Population\n", "preyInit = 2 # Populaiton (individuals per square mile?)\n", "predInit = 1 # Populaiton (individuals per square mile?)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now we load the necessary modules:\n", "from scipy.integrate import solve_ivp\n", "\n", "sol = solve_ivp(rhs, [0, totTime], [preyInit, predInit], t_eval=times,)\n", "\n", "print(sol.message)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now we plot the results\n", "\n", "plt.plot(times, sol.y[0], 'b-', label = \"Prey\")\n", "plt.plot(times, sol.y[1], 'g-', label = \"Predator\")\n", "plt.legend()\n", "plt.xlabel(\"Times\")\n", "plt.ylabel(\"Population\")\n", "plt.tight_layout()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Phase space plot.\n", "\n", "plt.plot(sol.y[0], sol.y[1], 'b-', label = \"Prey\")\n", "plt.legend()\n", "plt.xlabel(\"Prey Population\")\n", "plt.ylabel(\"Predator Population\")\n", "plt.tight_layout()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Combining Python capabilities\n", "\n", "# Create series of initial conditions\n", "initCond = [ [x, x] for x in np.arange(0.9,1.3,0.1) ]\n", "\n", "results = []\n", "\n", "# for initC in initCond:\n", "# Enumerate returns both the index of the entry, and the entry itself.\n", "for indx,initC in enumerate(initCond):\n", " # Run the Initial Value Problem with the current initial condition\n", " sol = solve_ivp(rhs, [0, totTime], initC, t_eval=times)\n", " \n", " # Stores the output\n", " results.append( sol.y )\n", " \n", " # Create a line in the plot.\n", " plt.plot(results[indx][0], results[indx][1], '-')\n", " # Places a point on the initial conditions.\n", " plt.plot(initC[0], initC[1], 'o') \n", "\n", "plt.xlabel(\"Prey Population\")\n", "plt.ylabel(\"Predator Population\")\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pandas \n", "
[(top)](#goto_top)
\n", "\n", "*Pandas* is a python module created to handle large sets of structured data. It is extremely efficient and provides large amounts of tools for signal processing and data analysis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# From Pandas documentation, one way to create a DataFrame is by manually combining data.\n", "df = pd.DataFrame({ 'A' : 1.,\n", " 'B' : pd.Timestamp('20130102'),\n", " 'C' : pd.Series(1,index=list(range(10)),dtype='float32'),\n", " 'D' : np.array([3] * 10,dtype='int32'),\n", " 'E' : pd.Categorical([\"test\",\"train\",\"test\",\"train\",\"test\"]*2),\n", " 'F' : 'foo' })\n", "df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# They have different types:\n", "df.dtypes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# For long DataFrames, we can look at the\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Or the\n", "df.tail(3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Instead of manually combining data, we can read data from an Excel File.\n", "\n", "# We will use the large version of the protein count file mentioned in the File IO section:\n", "\n", "largedf = pd.read_excel(\"./data/protein_copies_per_cell.xlsx\")\n", "largedf" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Melt the data set so that the conditionos become variables\n", "largedfMelt = largedf.melt(id_vars=[\"Description\",\"Gene\"])\n", "\n", "# Show all unique variables, i.e., all growth conditions:\n", "largedfMelt.variable.unique()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can select specific growth conditions by creating a mask for the rows. \n", "# For that, we ask which rows have a \"variable\" value in a given list.\n", "\n", "largedfMelt.loc[ largedfMelt[\"variable\"].isin(['Glucose', 'LB']) ,: ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas DataFrames provide a variety of access and manipulations tools, for example, we can access columns by name and select rows by index or value in one or more columns:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# iloc accesses rows and columns by index.\n", "largedf.iloc[20:25, 1:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# loc can give you more flexibility\n", "largedf.loc[20:25, [\"Gene\", \"LB\"] ]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Select rows by column value and change the selection and order of columns shown.\n", "largedf.loc[ largedf[\"LB\"] > 100000 , [\"Gene\", \"LB\",\"Description\"] ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Wide vs Long data formats: melting and casting\n", "\n", "When handling large datasets, the concepts of melting and casting data become increasingly important, and allows one to quickly iterate between plotting, filtering and analyzing information.\n", "\n", "The classic *spreadsheet* format would present observations arranged line-by-line, with columns indicating their individual attributes. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test = pd.DataFrame()\n", "test[\"SampleID\"] = range(1,13)\n", "\n", "# For reproducibility\n", "np.random.seed(1)\n", "\n", "# Randomly choose values to create our simple test dataset\n", "test[\"Day1\"] = np.random.normal(0,1,12)\n", "test[\"Day2\"] = np.random.normal(0,1,12)\n", "test[\"Day3\"] = np.random.normal(0,1,12)\n", "test[\"Group\"] = ['a']*3 + ['b']*3 + ['a']*3 + ['b']*3\n", "test[\"Replica\"] = [1]*6 + [2]*6\n", "\n", "test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the wide format, which would be very useful for certain operations. However, if we could re-arrange it into a long format, it would make it much easier perform statistical calculations and plots.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_melt = pd.melt(test, id_vars=[\"SampleID\",\"Group\",\"Replica\"], var_name=\"Day\", value_name=\"Growth\")\n", "test_melt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotnine \n", "
[(top)](#goto_top)
\n", "\n", "Structured data can be very easy to plot, particularly with a package that is ready for it. [Plotnine][1] is an implementation of the \"Grammar of Graphics\" in Python, and it provides an entirely new interface to plotting in python. It is not the first package to try, but it is the newest and most comprehensive implementation yet. It copies the `ggplot` package in R, trying to provide the same user experience.\n", "\n", "One of the main advantages over packages like Matplotlib is the simplicity (in code) in order to create complex graphs. \n", "\n", "[1]:https://plotnine.readthedocs.io/en/stable/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the module\n", "import plotnine as p9" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Lets use the same data we used before to compare all occurences per day, but divide the plot per replica. \n", "\n", "p9.ggplot(test_melt) + p9.geom_point( p9.aes(x=\"Day\",y=\"Growth\",color=\"Group\") ) + p9.facet_wrap(facets=\"Replica\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_cast = test_melt.drop(columns=['SampleID']).groupby(by=[\"Group\",\"Replica\",\"Day\"]).mean().reset_index()\n", "\n", "p9.ggplot(test_cast) + p9.geom_col( p9.aes(x=\"Day\",y=\"Growth\",fill=\"Group\"), position = \"dodge\" ) \\\n", " + p9.facet_wrap(facets=\"Replica\") + p9.theme_linedraw()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is easy to calculate statistics *per group* with our data. **Pandas** provides the `group` method to every DataFrame, which returns sub-sets of the full dataset. We can then operate on each group by calculating statistics (or other custom analysis with user-defined functions) using the `aggregate` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Explicit creation of a \"group\" object that contains subsets of original dataset.\n", "groupby = test_melt.drop(columns=['SampleID']).groupby(by=[\"Group\",\"Replica\",\"Day\"], as_index=False)\n", "\n", "# Aggregates the \"Growth\" data by calculating both mean and standard deviation of values. We also use \n", "# Panda's own `count` method to give us the number of elements per group.\n", "test_cast_agg = groupby.aggregate([np.mean, np.std, 'count']).reset_index()\n", "\n", "# Changes the column names to combine value and statistic: Growth_mean and Growth_std\n", "test_cast_agg.columns = ['_'.join(col).rstrip(\"_\") for col in test_cast_agg.columns.values]\n", "\n", "# Creates new columns with the ends of a confidence interval.\n", "test_cast_agg[\"Growth_min\"] = test_cast_agg[\"Growth_mean\"] - test_cast_agg[\"Growth_std\"]/test_cast_agg[\"Growth_count\"]\n", "test_cast_agg[\"Growth_max\"] = test_cast_agg[\"Growth_mean\"] + test_cast_agg[\"Growth_std\"]/test_cast_agg[\"Growth_count\"]\n", "\n", "test_cast_agg\n", "\n", "# Plots the previous plot adding an error bar.\n", "p9.ggplot(test_cast_agg) + p9.geom_col( p9.aes(x=\"Day\",y=\"Growth_mean\",fill=\"Group\"), position = \"dodge\", width=.8) \\\n", " + p9.facet_wrap(facets=\"Replica\") + p9.theme_seaborn() \\\n", " + p9.geom_errorbar(p9.aes(x=\"Day\", y=\"Growth_mean\", ymin = \"Growth_min\", ymax = \"Growth_max\", group=\"Group\"), \\\n", " position = \"dodge\", width=.8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now a larger dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Larger test\n", "\n", "size = 20000\n", "\n", "test = pd.DataFrame()\n", "test[\"SampleID\"] = range(size)\n", "\n", "# For reproducibility\n", "np.random.seed(1)\n", "\n", "# Randomly choose values to create our simple test dataset\n", "test[\"Day1\"] = np.random.gamma(1,1,size)\n", "test[\"Day2\"] = np.random.normal(1,1,size)\n", "test[\"Day3\"] = np.random.normal(1,2,size)\n", "test[\"Group\"] = ['a']*int(size/4) + ['b']*int(size/4) + ['a']*int(size/4) + ['b']*int(size/4)\n", "test[\"Replica\"] = [1]*int(size/2) + [2]*int(size/2)\n", "\n", "# Bias Replica 1, day 1\n", "test.loc[0:size, \"Day1\"] = test.loc[0:size, \"Day1\"] * 0.7 \n", "\n", "test" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_melt = pd.melt(test, id_vars=[\"SampleID\",\"Group\",\"Replica\"], var_name=\"Day\", value_name=\"Growth\")\n", "\n", "# Explicit creation of a \"group\" object that contains subsets of original dataset.\n", "groupby = test_melt.drop(columns=['SampleID']).groupby(by=[\"Group\",\"Replica\",\"Day\"], as_index=False)\n", "\n", "# Aggregates the \"Growth\" data by calculating both mean and standard deviation of values.\n", "test_cast_agg = groupby.aggregate([np.mean, np.std, 'count']).reset_index()\n", "\n", "# Changes the column names to combine value and statistic: Growth_mean and Growth_std\n", "test_cast_agg.columns = ['_'.join(col).rstrip(\"_\") for col in test_cast_agg.columns.values]\n", "\n", "test_cast_agg[\"Growth_min\"] = test_cast_agg[\"Growth_mean\"] - test_cast_agg[\"Growth_std\"]/test_cast_agg[\"Growth_count\"]\n", "test_cast_agg[\"Growth_max\"] = test_cast_agg[\"Growth_mean\"] + test_cast_agg[\"Growth_std\"]/test_cast_agg[\"Growth_count\"]\n", "\n", "test_cast_agg\n", "\n", "# Plots the previous plot adding an error bar.\n", "p9.ggplot(test_cast_agg) + p9.geom_col( p9.aes(x=\"Day\",y=\"Growth_mean\",fill=\"Group\"), position = \"dodge\", width=.8) \\\n", " + p9.facet_wrap(facets=\"Replica\") + p9.theme_linedraw() \\\n", " + p9.geom_errorbar(p9.aes(x=\"Day\", y=\"Growth_mean\", ymin = \"Growth_min\", ymax = \"Growth_max\", group=\"Group\"), \\\n", " position = \"dodge\", width=.8)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "p9.ggplot(test_melt) + p9.geom_violin( p9.aes(x=\"Day\",y=\"Growth\",fill=\"Group\") ) + \\\n", " p9.facet_wrap(facets=\"Replica\") + p9.theme_bw() + p9.scale_fill_brewer(type=\"qual\",palette=6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can have fun too..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "final=pd.DataFrame()\n", "final[\"x\"] = range(7)\n", "final[\"y\"] = final[\"x\"]**2\n", "\n", "p9.ggplot(final) + p9.geom_path( p9.aes(x=\"x\", y=\"y\") ) + \\\n", " p9.labs(x=\"Such Class\", y=\"Much Learning\", title=\"Wow\") + \\\n", " p9.theme_gray()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cython and Numba \n", "
[(top)](#goto_top)
\n", "\n", "Even using the advanced modules we just discussed, like NumPy and SciPy, \"standard\" Python code will be slower than `C/C++` code. For compute-intensive situations, for which no module exists and we need to write new code, a few options are available to create simple \"pythonic\" code that performs better than standard Python code.\n", "\n", "One of these options is **Cython**. This module will read standard python code, process it to create `C` code, and then compile the code, while providing a function that can be called from Python code as if it was all in Python. We will see better examples of Cython utilization below, in the Magics section, which allows us to use Cython directly from a notebook cell.\n", "\n", "** Numba** is another alternative to compiling Python code into faster code for compute intensive situations. In Numba's case, the code is compiled \"on-the-fly and in-memory\", and can produce *CPU or GPU* binaries. It works at a function level, and will compile [Just In Time (*jit*)](https://en.wikipedia.org/wiki/Just-in-time_compilation), meaning it will compile the function *during* execution of the code, not before, allowing for adjustments to be made to the code on the fly, before optimization and compilation. Like Cython, we will see an example below in the Magics section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Mpi4Py \n", "
[(top)](#goto_top)
\n", "\n", "[Message Passing Interface (MPI)](https://en.wikipedia.org/wiki/Message_Passing_Interface) is a standard developed by an international community of researchers to enhance and facilitate the creation and portability of parallel code. There are many open-source implementations of the standard, and Mpi4Py provides access to MPI functionality directly from Python code.\n", "\n", "A Jupyter notebook is not the best place to write Mpi4Py code since notebooks cannot be executed in parallel, but they can access services (written in Python with Mpi4Py) that run in multiple nodes of a cluster. A typical parallel code in Python would start by loading the module and getting the current instance `rank`, and a `communication` object so each instance of the code can talk to all other instances, in the same node or different nodes.\n", "\n", "\n", "```python\n", "from mpi4py import MPI\n", "\n", "comm = MPI.COMM_WORLD\n", "rank = comm.Get_rank()\n", "```\n", "\n", "Your code can then use the `comm` object to `send` and `receive` data. Python's native types (and even user defined objects) are automatically `pickled` behind the scenes in order to create a binary representation that can be easily transfered. \n", "\n", "```python\n", "if rank == 0:\n", " data = {'a': 7, 'b': 3.14}\n", " comm.send(data, dest=1, tag=11)\n", "elif rank == 1:\n", " data = comm.recv(source=0, tag=11)\n", "```\n", "\n", "In the previous example, a code running on two cores would have a dictionary created in the first core (or rank), and then transfered to the second core. It is common for one core to handle basic I/O operations and initial setup of your program, parsing user input and generating initial conditions, and then sending that information to all other cores.\n", "The most efficient way to transfer data is to first place it in NumPy objects. This way, Python does not need to `pickle` the object before transferring it to the underlying MPI implementation, NumPy provides direct access to the data in the `C/C++` level. Mpi4Py can automatically determine the data type (`int`, `float`, etc) or it can be told explicitly.\n", "\n", "```python\n", "if rank == 0:\n", " data = numpy.arange(100, dtype=numpy.float64)\n", " # Automatic type determination\n", " comm.Send(data, dest=1, tag=13)\n", "elif rank == 1:\n", " data = numpy.empty(100, dtype=numpy.float64)\n", " # Manual type determination\n", " comm.Recv([data, MPI.INT], source=0, tag=13)\n", "```\n", "\n", "If the same data will be sent to *all* other cores, we can use the method `broadcast`:\n", "\n", "```python\n", "if rank == 0:\n", " data = np.arange(100, dtype='i')\n", "else:\n", " data = np.empty(100, dtype='i')\n", "\n", "# Blocking operation: every rank waits here until everybody has the data.\n", "comm.Bcast(data, root=0)\n", "```\n", "\n", "One main difference when using NumPy objects with Mpi4Py is that you need to create a **receiving** object with the same size as the data being **sent**, which may require a first step of communication:\n", "\n", "```python\n", "# Every rank starts with zero data\n", "data_size = 0\n", "\n", "# Rank zero parses user options and input\n", "if rank == 0:\n", " data_size = func_parse_input(the_input)\n", " data = np.arange(data_size, dtype='i')\n", "\n", "# Broadcasts size of the data\n", "comm.Bcast(size, root=0)\n", "\n", "# Other ranks allocate the space\n", "if rank != 0:\n", " data = np.empty(size, dtype='i')\n", "\n", "# Actually sends the data.\n", "comm.Bcast(data, root=0)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Jupyter \n", "
[(top)](#goto_top)
\n", "\n", "[Jupyter][1] is a geat interface for interactive data analysis and software development. You already know the basics on how to use it, so we will now focus on more interesting things notebooks can do.\n", "\n", "[1]:http://jupyter.org/\n", "\n", "Every jupyter notebook runs with its own instance of the python interpreter, called a kernel. Jupyter notebooks can use other kernels for different versions of Python, or even for other languages like R, Java or Lua.\n", "\n", "## ipywidgets \n", "
[(top)](#goto_top)
\n", "\n", "Widgets are eventful python objects that have a representation in the browser, often as a control like a slider, textbox, etc.\n", "\n", "Widgets are useful for building interactive GUIs (Graphical User Interfaces) in your notebooks.\n", "\n", "Here is a link to the *complete* documentation from which I took the above sentences (citation is important): [Jupyter Widgets Documentation](https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Basics.html)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the module\n", "import ipywidgets as widgets\n", "from IPython.display import display" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Widgets will automatically dysplay when created, but you can also build a complex widget and `display` it later." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "widgets.IntSlider()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "w = widgets.IntSlider(value=10, \n", " min=2, \n", " max=20, \n", " step=2)\n", "display(w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can link widgets to make it easier to pass and/or read values. We can do that using `jslink`, which will embed the link in the html page created by jupyter, and will not depend on an active python kernel." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = widgets.FloatText()\n", "b = widgets.FloatSlider()\n", "display(a,b)\n", "\n", "mylink = widgets.jslink((a, 'value'), (b, 'value'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can organize widgets for a specific layout:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ipywidgets import HBox, Label\n", "\n", "# Creates a horizontal box\n", "HBox([Label('A very very long description'), widgets.IntSlider()])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ipywidgets import Button, HBox, VBox\n", "\n", "# Creates several widgets:\n", "words = ['correct', 'horse', 'battery', 'staple']\n", "items = [Button(description=w) for w in words]\n", "\n", "# Organizes them in two columns with `vbox`:\n", "left_box = VBox([items[0], items[1]])\n", "right_box = VBox([items[2], items[3]])\n", "HBox([left_box, right_box])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Widgets become very useful for updating parameters and observing changes in real-time. If a function is called to plot calculations done on-demand, or to process data recovered from a long computation, we will need plots that respond to widget changes.\n", "\n", "*Matplotlib* can do that using the **%matplotlib notebook** command. Lets see an example where we dynamically probe the effect of the shape and scale parameters in a [gamma distribution][1].\n", "\n", "[1]:https://en.wikipedia.org/wiki/Gamma_distribution" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ipywidgets import interact, interactive\n", "%matplotlib notebook\n", "\n", "# Dynamically sample from a Gamma distribution and plot a histogram\n", "def g(shape, scale, nbins):\n", " data = np.random.gamma(shape, scale, size=10000)\n", " plt.hist(data, nbins, facecolor='g')\n", " plt.show()\n", "\n", "# Ipywidgets are automatically created to provide values for all required arguments,\n", "# allowing us to update the shape and scale of the gamma distribution, and the number of bins in the histogram. \n", "interactive_plot = interactive(g, shape=10, scale=20, nbins=50)\n", "display(interactive_plot)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a more elaborated example, we can create individual widgets so we better control their characteristics, and allow the user to restrict the shape and scale values so they will maintain a constant mean.\n", "\n", "In the gamma distribution, the mean equals the multiplication of the shape and scale." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "\n", "# Generates initial data\n", "defaultShape = 10\n", "defaultScale = 20\n", "minShapeScale = 4\n", "maxShapeScale = 50\n", "\n", "# Plotnine uses the base functiona from Matplotlib to create an image,\n", "# so we prepare variable to be used to update the plots.\n", "fig = None\n", "axs = None\n", "\n", "# We will now use Plotnine to make a density plot of the Probability Density Function (PDF)\n", "# for the gamma distribution.\n", "def plotString(*args):\n", " \n", " # We use the \n", " global fig, axs\n", " \n", " # Shortcut to the parameterized gamma PDF.\n", " pdf = lambda x: sp.stats.gamma.pdf(x, shapeSlider.value, loc=0, scale=scaleSlider.value)\n", " \n", " # Creates the plot:\n", " # Uses SciPy to calculate probablities for values along the X axis.\n", " # Defines the `look` of the plot.\n", " # Defines axis lables.\n", " p = p9.ggplot( pd.DataFrame(data={\"x\": [0, maxShapeScale**2]}), p9.aes(x=\"x\") ) \\\n", " + p9.stat_function(fun=pdf , n = 300) \\\n", " + p9.theme_linedraw() \\\n", " + p9.labs(x=\"Random Variable\", y=\"Probability\") \n", " \n", " \n", " # Matplotlib does not cooperate with dynamic updates of independent images.\n", " # This is a little work-around...\n", " if fig is None:\n", " fig, plot = p.draw(return_ggplot=True)\n", " axs = plot.axs\n", " else:\n", " # We manually clean the figure data and draw it again.\n", " for artist in plt.gca().lines +\\\n", " plt.gca().collections +\\\n", " plt.gca().artists + plt.gca().patches + plt.gca().texts:\n", " artist.remove()\n", " # Redraw image using the same `figure` and `axis` objects\n", " p._draw_using_figure(fig, axs)\n", "\n", "\n", "\n", "shapeSlider = widgets.FloatSlider( description=\"Shape:\",\n", " value=minShapeScale, min=minShapeScale, max=maxShapeScale\n", ")\n", "\n", "scaleSlider = widgets.FloatSlider( description=\"Scale:\",\n", " value=defaultScale, min=minShapeScale, max=maxShapeScale\n", ")\n", "\n", "vboxSliders = widgets.VBox([shapeSlider,scaleSlider])\n", "\n", "# The text widget shows a value and allows for the value to be changes.\n", "meanText = widgets.FloatText(\n", " value=defaultShape*defaultScale, min=0, max=maxShapeScale**2,\n", " description=\"Mean:\",\n", " style={'description_width': 'initial'}\n", ")\n", "\n", "# The toggle button widget is a button that changes state between True and False.\n", "constMeanTB = widgets.ToggleButton(\n", " value=False,\n", " description='Keep Mean',\n", " disabled=False,\n", " button_style='info', # 'success', 'info', 'warning', 'danger' or ''\n", " tooltip='Locks shape and scale to keep a constant mean.',\n", " icon='square' # \"check-square\" vs \"square\"\n", ")\n", "\n", "vboxMean = widgets.VBox([meanText,constMeanTB])\n", "\n", "# Function is called when the button is pressed. It changes the icon on\n", "# the button between an empty box and a checked box.\n", "def toggle_keepMean(*args):\n", " if constMeanTB.value:\n", " constMeanTB.icon = \"check-square\"\n", " shapeSlider.value = defaultShape\n", " else:\n", " constMeanTB.icon = \"square\"\n", "\n", "# The `observe` method calls a function whenever an action occurs with a widget.\n", "# In our case, it calls \"toggle_keepMean\" when the value of the toggle-button changes.\n", "constMeanTB.observe(toggle_keepMean,\"value\")\n", "\n", "# This function is called any time scale is changed.\n", "# It updates the `mean` value and changes the shape value in case \n", "# the mean is to be kept constant.\n", "def change_scale(*args):\n", " \n", " if constMeanTB.value:\n", " scaleSlider.value = defaultShape*defaultScale / shapeSlider.value\n", " \n", " meanText.value = shapeSlider.value * scaleSlider.value\n", " \n", " plotString()\n", "\n", "shapeSlider.observe(change_scale,\"value\")\n", "\n", "# This function is called any time shape is changed.\n", "# It updates the `mean` value and changes the shape value in case \n", "# the mean is to be kept constant.\n", "def change_shape(*args):\n", " \n", " if constMeanTB.value:\n", " shapeSlider.value = defaultShape*defaultScale / scaleSlider.value\n", " \n", " meanText.value = shapeSlider.value * scaleSlider.value\n", " \n", " plotString()\n", "\n", "scaleSlider.observe(change_shape,\"value\")\n", "\n", "# Combines all widgets and displays the on the notebook.\n", "ui = widgets.HBox([vboxSliders, vboxMean])\n", "display(ui)\n", "\n", "\n", "shapeSlider.value = defaultShape\n", "plt.tight_layout()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Integrations \n", "
[(top)](#goto_top)
\n", "\n", "Python can be integrated with other languages very easily!\n", "\n", "## Magics \n", "
[(top)](#goto_top)
\n", "\n", "`Magics` are modules in Jupiter (available through IPython) that allow cells (or individual lines) to be executed using a different kernel. One simple example would be to use a cell to explore the local directory, and check or modify files, using regular `bash` commands: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "ls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A more powerfull example would be to parse a cell using Cython, improving the performance of a function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext Cython" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%%cython\n", "\n", "def funcCyt(x):\n", " y = x**2\n", " print(y)\n", "\n", "funcCyt(2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "\n", "# This function is defined within a Cython cell\n", "def CythonPower(x):\n", " return x**2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This IDENTICAL function is defined in a \"standard\" cell\n", "def PythonPower(x):\n", " return x**2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time, statistics\n", "\n", "functions = CythonPower, PythonPower\n", "numIter = 1000\n", "# Allocate one array of timing results per function\n", "times = {func.__name__: np.zeros(numIter) for func in functions}\n", "\n", "for i in range(numIter):\n", " for func in functions:\n", " \n", " tinit = time.time()\n", " func(i) # Function call\n", " tfinal = time.time() # In miliseconds\n", " \n", " times[func.__name__][i] = (tfinal - tinit) * 1000\n", "\n", "df = pd.DataFrame()\n", "for name, numbers in times.items():\n", " df[name] = numbers\n", "\n", "# Melts data for better analysis.\n", "dfMelt = df.melt(id_vars=[], var_name=\"Function\", value_name=\"Timing\")\n", "\n", "# Casts/Aggregates with statistics.\n", "dfCast = dfMelt.groupby(\"Function\").aggregate([np.mean, np.std]).reset_index()\n", "\n", "# Change column names\n", "dfCast.columns = ['_'.join(col).rstrip(\"_\") for col in dfCast.columns.values]\n", "\n", "def calcConfInt(dfline):\n", " interval = sp.stats.norm.interval(0.9, loc=dfline[\"Timing_mean\"], \n", " scale=dfline[\"Timing_std\"]/numIter)\n", " # Rounds up the limits\n", " interval = [round(limit,6) for limit in interval]\n", " \n", " dfline[\"IntMin\"] = interval[0]\n", " dfline[\"IntMax\"] = interval[1]\n", " \n", " return dfline\n", "\n", "# Adds confidence interval to the mean time\n", "dfStats = dfCast.apply(func = calcConfInt, axis=1)\n", "dfStats" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "p9.ggplot(dfStats) + p9.geom_col(p9.aes(x=\"Function\", y=\"Timing_mean\", fill=\"Function\"), width=0.4, show_legend=False) \\\n", " + p9.geom_errorbar(p9.aes(x=\"Function\", ymin=\"IntMin\", ymax=\"IntMax\"), width=0.3) \\\n", " + p9.labs(y=\"Mean time (s)\") + p9.theme_linedraw() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A more expressive example would be:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "from libc.math cimport sin\n", "\n", "# Define a pure C function (note the \"cdef\") specifying variable types\n", "# and using a C standard library function.\n", "cdef double cythonSinC(double x):\n", " return sin(x * x)\n", "\n", "# Wrap the C code in a function accesible to Python code \n", "def cythonSin(x):\n", " return cythonSinC(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# \"Standard\" python code\n", "def pythonSin(x):\n", " return np.sin(x * x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "functions = cythonSin, pythonSin\n", "numIter = 1000\n", "# Allocate one array of timing results per function\n", "times = {func.__name__: np.zeros(numIter) for func in functions}\n", "\n", "for i in range(numIter):\n", " for func in functions:\n", " \n", " tinit = time.time()\n", " func(i) # Function call\n", " tfinal = time.time() # In miliseconds\n", " \n", " times[func.__name__][i] = (tfinal - tinit) * 1000\n", "\n", "df = pd.DataFrame()\n", "for name, numbers in times.items():\n", " df[name] = numbers\n", "\n", "# Melts data for better analysis.\n", "dfMelt = df.melt(id_vars=[], var_name=\"Function\", value_name=\"Timing\")\n", "\n", "# Casts/Aggregates with statistics.\n", "dfCast = dfMelt.groupby(\"Function\").aggregate([np.mean, np.std]).reset_index()\n", "\n", "# Change column names\n", "dfCast.columns = ['_'.join(col).rstrip(\"_\") for col in dfCast.columns.values]\n", "\n", "def calcConfInt(dfline):\n", " interval = sp.stats.norm.interval(0.9, loc=dfline[\"Timing_mean\"], \n", " scale=dfline[\"Timing_std\"]/numIter)\n", " # Rounds up the limits\n", " interval = [round(limit,6) for limit in interval]\n", " \n", " dfline[\"IntMin\"] = interval[0]\n", " dfline[\"IntMax\"] = interval[1]\n", " \n", " return dfline\n", "\n", "# Adds confidence interval to the mean time\n", "dfStats = dfCast.apply(func = calcConfInt, axis=1)\n", "dfStats" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "p9.ggplot(dfStats) + p9.geom_col(p9.aes(x=\"Function\", y=\"Timing_mean\", fill=\"Function\"), width=0.4, show_legend=False) \\\n", " + p9.geom_errorbar(p9.aes(x=\"Function\", ymin=\"IntMin\", ymax=\"IntMax\"), width=0.3) \\\n", " + p9.labs(y=\"Mean time (s)\") + p9.theme_linedraw() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to inspect the code being generated by Cython to improve its efficiency, you can call `-annotate` (or just `-a`) with the magic and it will show you, line-by-line, all the code behind the scenes, along with a line highlighting that shows which lines have more code, and therefore are more costly.\n", "\n", "(author note: this is one of the most beautiful things I have ever seen, right in between Machu picchu and the Venus de Milo)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython -a\n", "\n", "def funcCyt(x):\n", " y = x**2\n", " print(y)\n", "\n", "funcCyt(2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython -a\n", "\n", "from libc.math cimport sin\n", "\n", "# Define a pure C function (note the \"cdef\") specifying variable types\n", "# and using a C standard library function.\n", "cdef double cythonSinC(double x):\n", " return sin(x * x)\n", "\n", "# Wrap the C code in a function accesible to Python code \n", "def cythonSin(x):\n", " return cythonSinC(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now to **Numba**, lets take a nice ecxample case form their [documentation](http://numba.pydata.org/numba-doc/0.12.2/tutorial_firststeps.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Define a sorting function\n", "def bubblesort(X):\n", " N = len(X)\n", " for end in range(N, 1, -1):\n", " for i in range(end - 1):\n", " cur = X[i]\n", " if cur > X[i + 1]:\n", " tmp = X[i]\n", " X[i] = X[i + 1]\n", " X[i + 1] = tmp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "# Create a large ORDERED dataset\n", "original = np.arange(0.0, 10.0, 0.01, dtype='f4')\n", "# Create a working copy\n", "shuffled = original.copy()\n", "# Randomize the copy\n", "np.random.shuffle(shuffled)\n", "\n", "# Creates a target array that will be sorted\n", "sorted = shuffled.copy()\n", "# Call the sorting function\n", "bubblesort(sorted)\n", "# Checks if the sorting function correctly ordered the array\n", "print(np.array_equal(sorted, original))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use the \"timeit\" magic to time the execution:\n", "%timeit sorted[:] = shuffled[:]; bubblesort(sorted)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a JIT version\n", "import numba\n", "bubblesort_jit = numba.jit(\"void(f4[:])\")(bubblesort)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Checks that the Numba JIT also works:\n", "sorted[:] = shuffled[:] # reset to shuffled before sorting\n", "bubblesort_jit(sorted)\n", "print(np.array_equal(sorted, original))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare the performance:\n", "%timeit sorted[:] = shuffled[:]; bubblesort_jit(sorted)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare to the \"autojit\" version without a signature\n", "bubblesort_autojit = numba.jit(bubblesort)\n", "%timeit sorted[:] = shuffled[:]; bubblesort_autojit(sorted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like Cython, when we provide type information to Numba it performs better, but still accelerates the code significantly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see all available Magics:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%lsmagic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## pybind11 \n", "
[(top)](#goto_top)
\n", "\n", "Sometimes, large portions of code are already written, or can be better written in another language. Since it is 2018, we are going to ignore Fortran as a language and focus on the real thing: `C/C++`. One great way of integrating external `C++` code, assuming we are using the `C++11` standard or newer (it IS 2018), is using **pybind11**, which connects `C++11` code to Python code.\n", "\n", "While writing C++ code, it is very easy to create a Python module and the necessary wrappers so that the C++ function can be called from Python (from the [documentation](http://pybind11.readthedocs.io/en/stable/basics.html)):\n", "\n", "```C++\n", "#include \n", "\n", "int add(int i, int j) {\n", " return i + j;\n", "}\n", "\n", "PYBIND11_MODULE(example, m) {\n", " m.doc() = \"pybind11 example plugin\"; // optional module docstring\n", "\n", " m.def(\"add\", &add, \"A function which adds two numbers\");\n", "}\n", "```\n", "\n", "In Python, you just load the module and use the new function:\n", "\n", "```Python\n", "import example\n", "example.add(1, 2)\n", "```\n", "\n", "There is a Magic for Pybind11, which would make it possible to use C++11 code directly from the notebook, but after Cython and Numba, pybind11 may be a module better left for external Python code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sources and Aknowledgements \n", "
[(top)](#goto_top)
\n", "\n", "Main text and code authorship: **Marcelo C. R. Melo**. This notebook also received contributions from **Tyler Earnest**, **David Bianchi**, and **Katherine Ritchie**. \n", "\n", "A large portion of the source material came from [Python 3.5 online tutorial and documentation](https://docs.python.org/3.5/tutorial/introduction.html).\n", "\n", "You should also look at [w3schools](https://www.w3schools.com/python/python_tuples.asp).\n", "\n", "A great (much more extensive) introduction can be found [here](https://github.com/jerry-git/learn-python3).\n", "\n", "Some great references for Magics [here](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.4" } }, "nbformat": 4, "nbformat_minor": 2 }