⚠️ The writing is a work in progress. The functions work but text retouching⚠️
Please read everything found on the mainpage before continuing; disclaimer and all.
%%capture
!pip install geopandas
!pip install VitalSigns
ls
In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterized by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes (Holland and Leinhardt, 1971; Watts and Strogatz, 1998).
Two versions of this measure exist:the global and the local. The global version was designed to give an overall indication of the clustering in the network, whereas the local gives an indication of the embeddedness of single nodes. - Geek for Geeks
G=nx.erdos_renyi_graph(10,0.4)
cc=nx.average_clustering(G)
c=nx.clustering(G)
c
nx.draw(G)
u = intaker.Intake
rdf = u.getData('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Biz1_/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson')
# rdf.set_index('CSA2010', drop=True, inplace=True)
rdf.drop(labels=['OBJECTID_1', 'Shape__Area', 'Shape__Length'], axis=1, inplace=True)
vs10to19Ind = rdf.filter(regex='biz1|CSA2010', axis=1)
Get only the columns we want to work with
vs10to19Ind.head()
What we want is 1 record for every year and every CSA as a column. To do this, transpose the dataset. Set the CSA labels (first row) as our columns, relabel the index (for clarity) and cast our datatypes.
vs10to19Indt = vs10to19Ind.T
vs10to19Indt.columns = vs10to19Indt.iloc[0]
vs10to19Indt = vs10to19Indt[1:]
vs10to19Indt.index.name = 'variable'
vs10to19Indt = vs10to19Indt.astype('float64')
vs10to19Indt
cor_matrix = vs10to19Indt.iloc[:,:].corr()
#shows the first 5 rows
cor_matrix.head(5)
df = vs10to19Indt.copy()
import matplotlib.pyplot as plt
f = plt.figure(figsize=(19, 15))
plt.matshow(df.corr(), fignum=f.number)
irange = range(df.select_dtypes(['number']).shape[1])
labels = df.select_dtypes(['number']).columns
# plt.xticks(irange, labels, fontsize=14, rotation=45)
plt.yticks(irange, labels, fontsize=14)
cb = plt.colorbar()
cb.ax.tick_params(labelsize=14)
plt.title('Correlation Matrix', fontsize=16);
lblVals = cor_matrix.index.values
cor_matrix = np.asmatrix(cor_matrix)
cor_matrix
G = nx.from_numpy_matrix(cor_matrix)
#relabels the nodes to match the stocks names
G = nx.relabel_nodes(G,lambda x: lblVals[x])
#Shows the first 5 edges with their corresponding edges
# OLD: G.edges(data=True)[:5]
list(G.edges(data=True))[0:5]
create_corr_network_5(G, corr_direction="positive",min_correlation=0.7)
create_corr_network_5(G, corr_direction="negative",min_correlation=-0.7)
We want to create a linear regression for each CSA using {X: year, Y: value} for a given indicator
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
# Create 3 columns: CSA2010 variable value
wdf = vs10to19Ind.melt(id_vars='CSA2010', value_vars=vs10to19Ind.columns[1:])
# Convert indicator labels into our X (Year) column
wdf['variable'] = wdf['variable'].apply(lambda x: int(x.replace('biz1_','') ) )
findf = {'CSA':[], 'B':[], 'M':[] }
# For each CSA
for csa in wdf.CSA2010.unique():
CsaData = wdf[ wdf['CSA2010']==csa]
X = CsaData[['variable']] #.values # returns: [10 11 12 13 14 15 16 17 18 19]
y = CsaData[['value']] #.values
regressor = LinearRegression()
regressor.fit(X, y)
y_pred = regressor.predict(X)
plt.scatter(X, y, color = 'red')
plt.plot(X, regressor.predict(X), color = 'blue')
plt.title('biz1: '+ csa)
plt.xlabel('YEAR')
plt.ylabel('VALUE')
display( plt.show() )
display( print('B: ', regressor.coef_, 'Y: ', regressor.intercept_) )
findf['CSA'].append(csa)
findf['B'].append(regressor.intercept_[0])
findf['M'].append(regressor.coef_[0][0])
lin_reg_df = pd.DataFrame(data=findf)
lin_reg_df.head()
lin_reg_dft = lin_reg_df.T
lin_reg_dft.columns = lin_reg_dft.iloc[0]
lin_reg_dft = lin_reg_dft[1:]
lin_reg_dft.index.name = 'variable'
lin_reg_dft = lin_reg_dft.astype('float64')
lin_reg_dft
We may need to normalize the data for this to be useable
df = lin_reg_dft.copy()
import matplotlib.pyplot as plt
f = plt.figure(figsize=(19, 15))
plt.matshow(df.corr(), fignum=f.number)
irange = range(df.select_dtypes(['number']).shape[1])
labels = df.select_dtypes(['number']).columns
# plt.xticks(irange, labels, fontsize=14, rotation=45)
plt.yticks(irange, labels, fontsize=14)
cb = plt.colorbar()
cb.ax.tick_params(labelsize=14)
plt.title('Correlation Matrix', fontsize=16);