Python Brief Introduction
Brief Intro to Python
Outline
Why Python
Python is a generic scripting computer language used for many purposes
- System administration
- Web development
- game development
- multimedia, natural language
- Scientific computing
- Data science and statistics
- Machine Learning, …
Python is increasingly popular in the scientific community pretty much in all areas. Economists have been rapidly learning to use Python in research and teaching, and it is no stranger to young PhD students and researchers.
The ecosystem of Python developer and user community is huge, it is safe to say that Python is the computer language for everything.
Python is awarded the programming language of the year 2020.
Jupyter notebook
The Jupyter Notebook is an interactive computing environment that enables users to author notebook documents that include:
- Live code
- Interactive widgets
- Plots
- Narrative text
- Equations
- Images
- Video
These documents provide a complete and self-contained record of a computation that can be converted to various formats and shared with others using email, Dropbox, version control systems (like git/GitHub) or nbviewer.jupyter.org.
The Jupyter Notebook combines three components:
- The notebook web application: An interactive web application for writing and running code interactively and authoring notebook documents.
- Kernels: Separate processes started by the notebook web application that runs users’ code in a given language and returns output back to the notebook web application.
- Notebook documents: Self-contained documents that contain a representation of all content visible in the notebook web application, including inputs and outputs of the computations, narrative text, equations, images, and rich media representations of objects. Each notebook document has its own kernel.
Notebooks consist of a sequence of cells. There are three basic cell types:
- Code cells: Input and output of live code that is run in the kernel
- Markdown cells: Narrative text with embedded LaTeX equations
- Raw cells: Unformatted text that is included, without modification, when notebooks are converted to different formats using nbconvert
Internally, notebook documents are JSON data with binary values base64 encoded. This allows them to be read and manipulated programmatically by any programming language. Because JSON is a text format, notebook documents are version control friendly.
Notebooks can be exported to different static formats including HTML, reStructeredText, LaTeX, PDF, and slide shows (reveal.js) using Jupyter’s nbconvert
utility.
Furthermore, any notebook document available from a public URL or on GitHub can be shared via nbviewer. This service loads the notebook document from the URL and renders it as a static web page. The resulting web page may thus be shared with others without their needing to install the Jupyter Notebook.
Through Jupyter’s kernel and messaging architecture, the Notebook allows code to be run in a range of different programming languages. For each notebook document that a user opens, the web application starts a kernel that runs the code for that notebook. Each kernel is capable of running code in a single programming language and there are kernels available in the following languages:
- Python(https://github.com/ipython/ipython)
- Julia (https://github.com/JuliaLang/IJulia.jl)
- R (https://github.com/IRkernel/IRkernel)
- Ruby (https://github.com/minrk/iruby)
- Haskell (https://github.com/gibiansky/IHaskell)
- Scala (https://github.com/Bridgewater/scala-notebook)
- node.js (https://gist.github.com/Carreau/4279371)
- Go (https://github.com/takluyver/igo)
The default kernel runs Python code. The notebook provides a simple way for users to pick which of these kernels is used for a given notebook.
Each of these kernels communicate with the notebook web application and web browser using a JSON over ZeroMQ/WebSockets message protocol that is described here. Most users don’t need to know about these details, but it helps to understand that “kernels run code.”
Python starter
Python is generic. It is a calcultor to start with, and it is a gigantic system used to run big systems by corporates.
Data types
- boolean
- int, double, complex
- strings
- None
Operators
- mathematical
- logical
- bitwise
- membership
- identity
- assignment and in-place operators
- operator precedence
Collections
- Sequence containers - list, tuple
- Mapping containers - set, dict
- The
collections
module
Functions and methods
- Anatomy of a function
- Docstrings
- Class methods
Control flow
- if and the ternary operator
- Checking conditions - what evaluates as true/false?
- if-elif-else
- while
- break, continue
- pass
Loops and comprehensions
- for, range, enumerate
- lazy and eager evaluation
- list, set, dict comprehensions
- generator expression
Packages and namespace
- Modules (file)
- Package (hierarchical modules)
- Namespace and naming conflicts
- Using
import
- Batteries included
3+4
7
Arithmetic Operators
The syntax for arithmetic operators in Python are:
Operator | Description |
---|---|
+ |
addition |
- |
subtraction |
* |
multiplication |
/ |
division |
** |
exponentiation |
% |
remainder (or modulo) |
// |
integer division |
Notice that division of integers always returns a float:
long_winded_computation = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 +
13 + 14 + 15 + 16 + 17 + 18 + 19 + 20)
print("long_winded_computation=",long_winded_computation)
long_winded_computation= 210
Comparison Operators
Comparison operators produce Boolean values as output. For example, if we have variables x
and y
with numeric values, we can evaluate the expression x < y
and the result is a boolean value either True
or False
.
Comparison Operator | Description |
---|---|
< |
strictly less than |
<= |
less than or equal |
> |
strictly greater than |
>= |
greater than or equal |
== |
equal |
!= |
not equal |
For example:
Boolean Operators
We combine logical expressions using boolean operators and
, or
and not
.
Boolean Operator | Description |
---|---|
A and B |
returns True if both A and B are True |
A or B |
returns True if either A or B is True |
not A |
returns True if A is False |
For example:
math_is_scary = False
print(math_is_scary)
False
# Note the indention
names = ["Bob", "Alice", "Zack", "Tyler"]
for x in names:
print(x)
Bob
Alice
Zack
Tyler
Reserved Words
Summarized below are the reserved words in Python 3. Python will raise an error if you try to assign a value to any of these keywords and so you must avoid these as variable names.
False |
class |
finally |
is |
return |
None |
continue |
for |
lambda |
try |
True |
def |
from |
nonlocal |
while |
and |
del |
global |
not |
with |
as |
elif |
if |
or |
yield |
assert |
else |
import |
pass |
break |
except |
in |
raise |
Built-in Function Names
There are several functions which are included in the standard Python library. Do not use the names of these functions as variable names otherwise the reference to the built-in function will be lost. For example, do not use sum
, min
, max
, list
or sorted
as a variable name. See the full list of builtin functions.
"""Anonymous functions are handy"""
fib = [0,1,1,2,3,5,8,13,21,34,55]
results = list(filter(lambda x: x % 2==0, fib))
print('results:', results)
results: [0, 2, 8, 34]
Loops
for Loops
A for loop allows us to execute a block of code multiple times with some parameters updated each time through the loop. A for
loop begins with the for
statement:
iterable = [1,2,3]
for item in iterable:
# code block indented 4 spaces
print(item)
1
2
3
while Loops
What if we want to execute a block of code multiple times but we don’t know exactly how many times? We can’t write a for
loop because this requires us to set the length of the loop in advance. This is a situation when a while loop is useful.
The following example illustrates a while loop:
n = 5
while n > 0:
print(n)
n = n - 1
5
4
3
2
1
Sequences
The main sequence types in Python are lists, tuples and range objects. Main differences between these sequence objects are:
- Lists are mutable and their elements are usually homogeneous (things of the same type making a list of similar objects)
- Tuples are immutable and their elements are usually heterogeneous (things of different types making a tuple describing a single structure)
- Range objects are efficient sequences of integers (commonly used in
for
loops), use a small amount of memory and yield items only when needed
Lists
Create a list using square brackets [ ... ]
with items separated by commas. For example, create a list of square integers, assign it to a variable and use the built-in function print()
to display the list:
fib = [0,1,1,2,3,5,8,13,21,34,55]
results2 = [x for x in fib if x % 2 == 0]
for x in results2:
print(x)
0
2
8
34
Tuples
Tuples are similar to lists but are immutable, i.e., elements in a tuple cannot be changed.
listx = [3,4]
tupley = (3,4)
listx[1]=6
print("listx",listx)
try:
tupley[1] = 0
except TypeError:
print("cannot modify a tuple")
listx [3, 6]
cannot modify a tuple
Dictionaries
grades = {"Joel": 80, "Tim": 95}
x = grades.keys()
print("x",x)
y = grades.values()
print("y",y)
x dict_keys(['Joel', 'Tim'])
y dict_values([80, 95])
Functions
A code block that returns results, and can be used repeatedly.
"""
this code solve for consumer's utility maximization problem. U(C,l) is Cobb-Douglas, l is leisure, C is consumption
I worked out first-order conditions in equations, translated them to code
"""
import numpy as np
# utility function
def utilityfun(l,C,shareC, sharel, sigma):
tmplc = (l**sharel) * (C**shareC)
return (tmplc**(1.0-sigma) - 1.0)/(1.0-sigma)
# optimal choices
def optimalChoice(sharel,shareC,wage,hmax,profit,tax):
lstar = (sharel/(sharel+shareC)) *(wage*hmax + profit - tax) / wage
Cstar = lstar * wage * shareC / sharel
return lstar,Cstar
# parameters
shareC = 0.65
sharel = 1.0 - shareC
sigma = 1.4
wage, profit, tax = 0.5, 1.8, 0.5
hmax = 5.5
#---optimal cohices: c/l = w * sharec/sharel
lstar, Cstar=optimalChoice(sharel,shareC,wage,hmax,profit,tax)
# maximized utility under wage
util0 = utilityfun(lstar,Cstar,shareC, sharel, sigma)
print('Optimal C=', Cstar, "optimal l=", lstar)
print('Maximized utility level=', util0)
Optimal C= 2.6325 optimal l= 2.8349999999999995
Maximized utility level= 0.8200869606862031
Built-in Functions
The standard Python library has a collection of built-in functions ready for us to use. We have already seen a few of these functions in previous sections such as type()
, print()
and sum()
. The following is a list of built-in functions that we’ll use most often:
Function | Description |
---|---|
print(object) |
print object to output |
type(object) |
return the type of object |
abs(x) |
return the absolute value of x (or modulus if x is complex) |
int(x) |
return the integer constructed from float x by truncating decimal |
len(sequence) |
return the length of the sequence |
sum(sequence) |
return the sum of the entries of sequence |
max(sequence) |
return the maximum value in sequence |
min(sequence) |
return the minimum value in sequence |
range(a,b,step) |
return the range object of integers from a to b (exclusive) by step |
list(sequence) |
return a list constructed from sequence |
sorted(sequence) |
return the sorted list from the items in sequence |
reversed(sequence) |
return the reversed iterator object from the items in sequence |
enumerate(sequence) |
return the enumerate object constructed from sequence |
zip(a,b) |
return an iterator that aggregates items from sequences a and b |
Python for scientific computing
Python is one of the core languages of scientific computing. In fact, it is the modules (libraries) written in Python that have become the essential tools for scientific computing.
Many of the Python modules are translated from numerical libraries written in C
and Fortran
.
Its popularity in economics has been rising rapidly, a collaborated effort is quantecon.org.
We take a bird’s-eye view on Python modules for scientific computing
In Python, a module is a file that contains python definitions and statements. We need to import them when we need to use them.
NumPy library is fundamental for many other modules. It is for matrix and array manipulation and operation.
Scipy is a comprehensive numerical library.
Pandas is a package for data processing and analysis, popular but largely because no good substitutes. It is based on R
data frame.
Matplotlib is a basic (but rich) library for visualization. Some other complements are Bokeh, Plotly, etc.
Statsmodels is a library for statistical estimation and inference.
Scikit-learn, a Python machine learning package, mostly for supervised learning. Some other machine learning libraries, such as TensorFlow for neural networks.
Numpy
import numpy as np # convention to name numpy as np in your codes
x = np.array([[1,2,3],[4,5,6]])
print ('x=',x)
for xi in x:
print('xi=',xi)
y = x.T
print('y=',y)
x= [[1 2 3]
[4 5 6]]
xi= [1 2 3]
xi= [4 5 6]
y= [[1 4]
[2 5]
[3 6]]
a = np.linspace(-np.pi, np.pi, 10) # Create even grid from -π to π
print('a=', a)
a= [-3.14159265 -2.44346095 -1.74532925 -1.04719755 -0.34906585 0.34906585
1.04719755 1.74532925 2.44346095 3.14159265]
np.fromfunction(lambda x, y: x*3 + y + 1, (2,3))
array([[1., 2., 3.],
[4., 5., 6.]])
Solving a system of linear equations: \begin{align*}3 x_0 + 2 x_1 - x_2 = 9 \ x_0 - 2 x_1 + 0.5x_2 = 2 \ 5 x_0 +0.2x_1 - 2x_2 = 10 \end{align*} In matrix, $A x = b$
from numpy import linalg
A = np.array([[3, 2, -1], [1,-2, 0.5], [5,0.2,-2]])
b = np.array([9,2,10])
x = np.linalg.solve(A, b)
print('x=',x)
x= [3.11428571 1.28571429 2.91428571]
Scipy
Scipy is built on top of Numpy, it includes modules for function interpolation, integration, optimization, linear algebra, Fourier transformation, eignevalues, and more. See here for a complete list.
Example: calculate $ \int_{-0.5}^2 \phi(x) dx $ where $ \phi(x) = \frac{1}{\sqrt{2\pi}} \cdot e^{-x^2/2}$ is the standard normal density function.
from scipy.stats import norm
from scipy.integrate import quad
phi = norm()
y, error = quad(phi.pdf, -1.2, 2) # Integrate using Gaussian quadrature
print('y=',y)
y= 0.8621801978301126
Matplotlib
Given data in Numpy arrays, plot 2D or 3D figures, can also create animation.
First thing to look at should be the anatomy of a figure
Example: Standard normal distribution.
import numpy as np;
from scipy.stats import norm;
import matplotlib.pyplot as plt;
import seaborn;
seaborn.set(style='ticks');
x = np.linspace(norm.ppf(0.0000001),norm.ppf(0.9999999), 120);
pdfx = norm.pdf(x);
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(14, 6), sharey='all');
ax.plot(x,pdfx, color="green", linewidth=1.6, linestyle="-");
ax.fill_betweenx(pdfx,x,x2=1.96, where = x>1.96,interpolate=True);
plt.axvline(x=0.0, color='black', linewidth=1.4,linestyle='--');
plt.text(2.0,.1,'\n$\\alpha$=shaded area',fontsize=16, horizontalalignment='left');
plt.autoscale(enable=True, axis='x', tight=True);
ax.set_title('Standard normal distribution', fontsize=20);
ax.tick_params(axis='y', labelsize=16);
ax.tick_params(axis='x', labelsize=16);
ax.set_xlim([-4,4]);
ax.set_ylim([0.0,0.4]);
plt.xlabel("z", fontsize=16);
ax.set_ylabel('$\phi(z)$', fontsize=14);
#ax.set_aspect('equal')
#fig.tight_layout();
Pandas
Pandas is not so fast, not so efficient, not so flexible and not so well designed. Somehow, its syntax is confusing to me.
But Pandas has been improving.
For large data set, one should use https://dask.org.
# Example 1: use pandas to create time series
import pandas as pd
np.random.seed(1234)
data = np.random.randn(5, 2) # 5x2 matrix of N(0, 1) random draws
dates = pd.date_range('28/12/2010', periods=5)
df = pd.DataFrame(data, columns=('price', 'weight'), index=dates)
print(df)
price weight
2010-12-28 0.471435 -1.190976
2010-12-29 1.432707 -0.312652
2010-12-30 -0.720589 0.887163
2010-12-31 0.859588 -0.636524
2011-01-01 0.015696 -2.242685
/tmp/ipykernel_40679/2615389923.py:5: UserWarning: Parsing dates in DD/MM/YYYY format when dayfirst=False (the default) was specified. This may lead to inconsistently parsed dates! Specify a format to ensure consistent parsing.
dates = pd.date_range('28/12/2010', periods=5)
df.mean()
price 0.411768
weight -0.699135
dtype: float64
# Example 2: read Canadian labor market condition data from CSV form
import pandas as pd
cansimid = '14100287'
filename = 'tbl14100287Final3.csv'
lfs = pd.read_csv(filename, index_col=0)
lfs.index = pd.to_datetime(lfs.index)
lfsQtr = lfs.resample('QS-OCT').mean()
lfsQtr = lfsQtr.round(2)
lfsQtr
emplBothSex | emplFemale | emplFullBothSex | emplFullFemale | emplFullMale | emplMale | emplPartBothSex | emplPartFemale | emplPartMale | emplRateBothSex | ... | participRateMale | populationBothSex | populationFemale | populationMale | unemplBothSex | unemplFemale | unemplMale | unemplRateBothSex | unemplRateFemale | unemplRateMale | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ref_date | |||||||||||||||||||||
1976-01-01 | 9666.90 | 3566.33 | 8469.30 | 2735.10 | 5734.20 | 6100.53 | 1197.57 | 831.23 | 366.30 | 57.23 | ... | 77.93 | 16891.70 | 8538.53 | 8353.20 | 718.17 | 307.77 | 410.40 | 6.93 | 7.93 | 6.30 |
1976-04-01 | 9737.53 | 3604.93 | 8529.63 | 2756.80 | 5772.83 | 6132.60 | 1207.87 | 848.10 | 359.73 | 57.27 | ... | 77.73 | 17008.10 | 8599.70 | 8408.40 | 718.13 | 312.23 | 405.90 | 6.87 | 7.97 | 6.20 |
1976-07-01 | 9778.40 | 3635.07 | 8562.97 | 2779.30 | 5783.70 | 6143.37 | 1215.43 | 855.83 | 359.67 | 57.10 | ... | 77.53 | 17121.43 | 8659.10 | 8462.33 | 753.40 | 336.07 | 417.33 | 7.17 | 8.43 | 6.37 |
1976-10-01 | 9824.70 | 3673.90 | 8568.27 | 2790.33 | 5777.90 | 6150.80 | 1256.43 | 883.60 | 372.87 | 57.07 | ... | 77.53 | 17210.97 | 8705.67 | 8505.33 | 785.97 | 338.47 | 447.50 | 7.43 | 8.43 | 6.77 |
1977-01-01 | 9869.73 | 3700.57 | 8605.30 | 2810.80 | 5794.50 | 6169.17 | 1264.43 | 889.73 | 374.67 | 57.03 | ... | 77.70 | 17300.70 | 8752.77 | 8548.00 | 829.93 | 358.83 | 471.10 | 7.77 | 8.83 | 7.10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2019-07-01 | 19102.03 | 9081.50 | 15495.80 | 6783.90 | 8711.87 | 10020.57 | 3606.27 | 2297.57 | 1308.70 | 62.00 | ... | 70.17 | 30806.87 | 15611.33 | 15195.57 | 1140.47 | 503.03 | 637.43 | 5.63 | 5.27 | 5.97 |
2019-10-01 | 19124.53 | 9102.93 | 15521.23 | 6787.10 | 8734.17 | 10021.60 | 3603.27 | 2315.83 | 1287.43 | 61.83 | ... | 69.90 | 30931.17 | 15671.30 | 15259.87 | 1154.90 | 509.90 | 645.00 | 5.70 | 5.33 | 6.07 |
2020-01-01 | 18842.40 | 8899.60 | 15438.27 | 6697.90 | 8740.33 | 9942.83 | 3404.13 | 2201.70 | 1202.47 | 60.70 | ... | 69.23 | 31032.37 | 15719.97 | 15312.40 | 1268.40 | 607.10 | 661.30 | 6.30 | 6.43 | 6.23 |
2020-04-01 | 16695.60 | 7783.43 | 13971.77 | 6042.47 | 7929.30 | 8912.17 | 2723.80 | 1740.97 | 982.87 | 53.67 | ... | 66.47 | 31118.57 | 15760.20 | 15358.33 | 2496.70 | 1199.43 | 1297.27 | 13.00 | 13.37 | 12.73 |
2020-07-01 | 18135.83 | 8564.80 | 14692.03 | 6352.70 | 8339.33 | 9571.07 | 3443.80 | 2212.10 | 1231.70 | 58.13 | ... | 69.47 | 31196.97 | 15796.80 | 15400.17 | 2021.03 | 895.67 | 1125.37 | 10.03 | 9.47 | 10.50 |
179 rows × 27 columns
#--unemployment rate, raw
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12, 8), sharey='all')
ax.plot(lfsQtr['unemplRateBothSex'],color="blue", linewidth=2.5, linestyle="-",label='Unemployment rate')
plt.autoscale(enable=True, axis='x', tight=True)
ax.set_title('Unemployment rate, Canada', fontsize=18)
ax.tick_params(axis='x', labelsize=14)
plt.xlabel("\nLate date: 2020 July. Data source: Staistics Canada.", fontsize=14)
ax.set_ylabel('Percentage', fontsize=16)
fig.tight_layout()
filename = 'UnemploymentRate_Canada.png'
plt.savefig(filename, dpi=200, format='png')
plt.show()
Statsmodels
We read in the Mroz’s data on wages from PSID, then do a OLS estimation of the wage equation
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
mroz = pd.read_csv("mroz1987.csv")
mroz
inlf | hours | kidslt6 | kidsge6 | age | educ | wage | repwage | hushrs | husage | ... | faminc | mtr | motheduc | fatheduc | unem | city | exper | nwifeinc | lwage | expersq | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1610 | 1 | 0 | 32 | 12 | 3.3540 | 2.65 | 2708 | 34 | ... | 16310 | 0.7215 | 12 | 7 | 5.0 | 0 | 14 | 10.910060 | 1.210154 | 196 |
1 | 1 | 1656 | 0 | 2 | 30 | 12 | 1.3889 | 2.65 | 2310 | 30 | ... | 21800 | 0.6615 | 7 | 7 | 11.0 | 1 | 5 | 19.499980 | 0.328512 | 25 |
2 | 1 | 1980 | 1 | 3 | 35 | 12 | 4.5455 | 4.04 | 3072 | 40 | ... | 21040 | 0.6915 | 12 | 7 | 5.0 | 0 | 15 | 12.039910 | 1.514138 | 225 |
3 | 1 | 456 | 0 | 3 | 34 | 12 | 1.0965 | 3.25 | 1920 | 53 | ... | 7300 | 0.7815 | 7 | 7 | 5.0 | 0 | 6 | 6.799996 | 0.092123 | 36 |
4 | 1 | 1568 | 1 | 2 | 31 | 14 | 4.5918 | 3.60 | 2000 | 32 | ... | 27300 | 0.6215 | 12 | 14 | 9.5 | 1 | 7 | 20.100060 | 1.524272 | 49 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
748 | 0 | 0 | 0 | 2 | 40 | 13 | 0.0000 | 0.00 | 3020 | 43 | ... | 28200 | 0.6215 | 10 | 10 | 9.5 | 1 | 5 | 28.200000 | NaN | 25 |
749 | 0 | 0 | 2 | 3 | 31 | 12 | 0.0000 | 0.00 | 2056 | 33 | ... | 10000 | 0.7715 | 12 | 12 | 7.5 | 0 | 14 | 10.000000 | NaN | 196 |
750 | 0 | 0 | 0 | 0 | 43 | 12 | 0.0000 | 0.00 | 2383 | 43 | ... | 9952 | 0.7515 | 10 | 3 | 7.5 | 0 | 4 | 9.952000 | NaN | 16 |
751 | 0 | 0 | 0 | 0 | 60 | 12 | 0.0000 | 0.00 | 1705 | 55 | ... | 24984 | 0.6215 | 12 | 12 | 14.0 | 1 | 15 | 24.984000 | NaN | 225 |
752 | 0 | 0 | 0 | 3 | 39 | 9 | 0.0000 | 0.00 | 3120 | 48 | ... | 28363 | 0.6915 | 7 | 7 | 11.0 | 1 | 12 | 28.363000 | NaN | 144 |
753 rows × 22 columns
# OLS estmation of wage equation for husbands
mroz['ones'] = 1.0
X = mroz.loc[:,['ones','huseduc']]
Y = np.log(mroz['huswage'])
#Y = mroz4['huswage']
model = sm.OLS(Y, X)
results = model.fit()
print(" ")
print(results.summary())
# prediction: results.fittedvalues
Yhat = results.params['ones'] + results.params['huseduc']*X['huseduc']
Yhat2 = results.params['ones']*0.7 + results.params['huseduc']*1.4*X['huseduc']
OLS Regression Results
==============================================================================
Dep. Variable: huswage R-squared: 0.155
Model: OLS Adj. R-squared: 0.154
Method: Least Squares F-statistic: 138.3
Date: Thu, 12 Jan 2023 Prob (F-statistic): 2.03e-29
Time: 09:20:08 Log-Likelihood: -598.39
No. Observations: 753 AIC: 1201.
Df Residuals: 751 BIC: 1210.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
ones 0.9105 0.083 10.943 0.000 0.747 1.074
huseduc 0.0761 0.006 11.758 0.000 0.063 0.089
==============================================================================
Omnibus: 144.321 Durbin-Watson: 1.890
Prob(Omnibus): 0.000 Jarque-Bera (JB): 371.275
Skew: -0.986 Prob(JB): 2.39e-81
Kurtosis: 5.819 Cond. No. 55.0
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Networks and Graphs
Disclaimer: This subsection is credited to quantecon.org
Python has many libraries for studying graphs.
One well-known example is NetworkX. Its features include, among many other things:
- standard graph algorithms for analyzing networks
- plotting routines
Here’s some example code that generates and plots a random graph, with node color determined by shortest path length from a central node.
%matplotlib inline
import networkx as nx
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10,6)
np.random.seed(1234)
# Generate a random graph
p = dict((i, (np.random.uniform(0, 1), np.random.uniform(0, 1)))
for i in range(200))
g = nx.random_geometric_graph(200, 0.12, pos=p)
pos = nx.get_node_attributes(g, 'pos')
# Find node nearest the center point (0.5, 0.5)
dists = [(x - 0.5)**2 + (y - 0.5)**2 for x, y in list(pos.values())]
ncenter = np.argmin(dists)
# Plot graph, coloring by path length from central node
p = nx.single_source_shortest_path_length(g, ncenter)
plt.figure()
nx.draw_networkx_edges(g, pos, alpha=0.4)
nx.draw_networkx_nodes(g,
pos,
nodelist=list(p.keys()),
node_size=120, alpha=0.5,
node_color=list(p.values()),
cmap=plt.cm.jet_r)
plt.show()