This notebook was made to allow visualizing networks.

⚠️ The writing is a work in progress. The functions work but text retouching⚠️

Please read everything found on the mainpage before continuing; disclaimer and all.

Binder Binder Binder Open Source Love svg3

NPM License Active Python Versions GitHub last commit

GitHub stars GitHub watchers GitHub forks GitHub followers

Tweet Twitter Follow

%%capture
!pip install geopandas
!pip install VitalSigns
ls

In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterized by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes (Holland and Leinhardt, 1971; Watts and Strogatz, 1998).

Two versions of this measure exist:the global and the local. The global version was designed to give an overall indication of the clustering in the network, whereas the local gives an indication of the embeddedness of single nodes. - Geek for Geeks

average_clustering[source]

average_clustering(G, trials=1000)

Estimates the average clustering coefficient of G.

The local clustering of each node in G is the fraction of triangles that actually exist over all possible triangles in its neighborhood. The average clustering coefficient of a graph G is the mean of local clusterings.

This function finds an approximate average clustering coefficient for G by repeating n times (defined in trials) the following experiment: choose a node at random, choose two of its neighbors at random, and check if they are connected. The approximate coefficient is the fraction of triangles found over the number of trials [1]_.

Parameters

G : NetworkX graph

trials : integer Number of trials to perform (default 1000).

Returns

c : float Approximated average clustering coefficient.

G=nx.erdos_renyi_graph(10,0.4) 
cc=nx.average_clustering(G) 
c=nx.clustering(G) 
c  
nx.draw(G)

Data Prep

u = intaker.Intake
rdf = u.getData('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Biz1_/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson')
# rdf.set_index('CSA2010', drop=True, inplace=True)
rdf.drop(labels=['OBJECTID_1', 'Shape__Area', 'Shape__Length'], axis=1, inplace=True)

vs10to19Ind = rdf.filter(regex='biz1|CSA2010', axis=1)

Get only the columns we want to work with

vs10to19Ind.head()
CSA2010 biz1_10 biz1_11 biz1_12 biz1_13 biz1_14 biz1_15 biz1_16 biz1_17 biz1_18 biz1_19
0 Allendale/Irvington/S. Hilton 8.921933 10.970464 9.486166 3.914591 9.15751 5.217391 3.686636 3.846154 2.912621 4.761905
1 Beechfield/Ten Hills/West Hills 3.896104 8.088235 4.255319 5.747126 6.00000 4.316547 2.400000 3.937008 3.100775 5.128205
2 Belair-Edison 8.888889 12.053571 7.569721 4.651163 4.64135 6.250000 2.732240 5.960265 6.547619 8.522727
3 Brooklyn/Curtis Bay/Hawkins Point 2.735562 7.487923 6.940063 5.303030 5.50661 5.908096 4.500000 2.623907 4.469274 4.057971
4 Canton 7.643312 10.869565 11.538462 5.250000 3.35196 8.554572 5.232558 4.430380 3.951368 4.137931

What we want is 1 record for every year and every CSA as a column. To do this, transpose the dataset. Set the CSA labels (first row) as our columns, relabel the index (for clarity) and cast our datatypes.

vs10to19Indt = vs10to19Ind.T
vs10to19Indt.columns = vs10to19Indt.iloc[0]
vs10to19Indt = vs10to19Indt[1:]
vs10to19Indt.index.name = 'variable'
vs10to19Indt = vs10to19Indt.astype('float64')
vs10to19Indt
CSA2010 Allendale/Irvington/S. Hilton Beechfield/Ten Hills/West Hills Belair-Edison Brooklyn/Curtis Bay/Hawkins Point Canton Cedonia/Frankford Cherry Hill Chinquapin Park/Belvedere Claremont/Armistead Clifton-Berea Cross-Country/Cheswolde Dickeyville/Franklintown Dorchester/Ashburton Downtown/Seton Hill Edmondson Village Fells Point Forest Park/Walbrook Glen-Fallstaff Greater Charles Village/Barclay Greater Govans Greater Mondawmin Greater Roland Park/Poplar Hill Greater Rosemont Greenmount East Hamilton Harbor East/Little Italy Harford/Echodale Highlandtown Howard Park/West Arlington Inner Harbor/Federal Hill Lauraville Loch Raven Madison/East End Medfield/Hampden/Woodberry/Remington Midtown Midway/Coldstream Morrell Park/Violetville Mount Washington/Coldspring North Baltimore/Guilford/Homeland Northwood Oldtown/Middle East Orangeville/East Highlandtown Patterson Park North & East Penn North/Reservoir Hill Pimlico/Arlington/Hilltop Poppleton/The Terraces/Hollins Market Sandtown-Winchester/Harlem Park South Baltimore Southeastern Southern Park Heights Southwest Baltimore The Waverlies Upton/Druid Heights Washington Village/Pigtown Westport/Mount Winans/Lakeland
variable
biz1_10 8.921933 3.896104 8.888889 2.735562 7.643312 9.389671 3.409091 5.714286 7.344633 4.945055 3.349282 3.125000 12.138728 7.336683 7.272727 8.350305 13.698630 8.016878 10.180995 7.382550 8.444444 5.714286 9.324759 10.112360 6.083650 10.294118 4.676259 9.844560 2.873563 7.901235 12.393162 5.092593 8.962264 7.317073 10.090909 10.087719 5.831533 6.392694 6.326034 13.375796 7.789474 6.509946 8.292683 6.504065 7.222222 12.751678 11.788618 8.196721 6.958763 9.359606 9.292035 4.705882 9.872611 8.000000 11.336032
biz1_11 10.970464 8.088235 12.053571 7.487923 10.869565 10.554090 9.420290 7.031250 7.027027 7.692308 9.326425 5.128205 10.734463 8.534799 10.344828 7.575758 11.971831 6.250000 9.976247 9.150327 11.068702 5.337079 9.507042 9.090909 5.394191 8.740360 9.493671 8.130081 7.246377 9.517601 10.454545 6.896552 8.641975 5.990220 6.646526 6.763285 5.701754 5.154639 7.360406 8.965517 6.250000 4.797048 10.309278 11.250000 6.927711 6.716418 9.049774 6.355932 8.716707 8.510638 10.859729 9.146341 8.724832 5.263158 7.818930
biz1_12 9.486166 4.255319 7.569721 6.940063 11.538462 9.828010 5.000000 9.655172 7.894737 5.487805 5.797101 9.375000 11.956522 6.568594 7.692308 7.392996 7.746479 7.528409 9.821429 5.128205 11.387900 8.860759 10.126582 12.087912 8.301887 8.591885 9.122807 9.290954 5.172414 9.123649 8.368201 3.755869 11.926606 4.711425 9.267841 8.256881 4.816514 6.000000 6.265060 5.882353 6.639004 6.761566 11.111111 7.228916 9.433962 10.135135 10.441767 6.976744 6.074766 10.294118 8.995816 9.782609 5.592105 9.536785 6.147541
biz1_13 3.914591 5.747126 4.651163 5.303030 5.250000 6.873614 3.105590 6.395349 4.672897 9.259259 2.602230 3.636364 8.056872 4.801670 4.615385 5.871212 9.202454 4.605263 5.527638 6.989247 5.033557 2.117647 8.206687 3.428571 5.743243 6.822612 5.307263 4.326923 5.241935 3.744493 5.904059 4.065041 7.826087 3.552207 5.624483 4.330709 5.607477 0.840336 3.571429 6.043956 1.782531 3.535354 5.963303 8.465608 6.845966 11.538462 7.473310 6.779661 3.649635 8.119658 7.707129 6.467662 6.333333 4.859335 6.000000
biz1_14 9.157510 6.000000 4.641350 5.506610 3.351960 8.986180 8.333330 3.355700 4.326920 7.602340 6.122450 2.500000 9.900990 5.601090 3.703700 6.951870 7.741940 6.397770 5.973450 8.045980 7.692310 5.866670 13.437500 4.026850 4.230770 5.895200 5.167170 5.053190 8.823530 7.287930 8.032130 7.042250 9.210530 5.151180 5.703770 7.826090 6.024100 4.845820 7.625270 5.084750 5.052630 6.172840 7.281550 10.181800 6.609200 7.333330 6.225680 4.982210 7.783020 7.456140 6.779660 4.926110 7.443370 6.878310 14.232200
biz1_15 5.217391 4.316547 6.250000 5.908096 8.554572 6.504065 9.271523 6.976744 4.347826 4.975124 7.692308 7.142857 4.444444 8.851523 8.333333 6.476190 7.947020 4.013378 8.628842 9.859155 5.857741 8.706468 5.985915 7.518797 7.000000 7.488987 9.236948 6.052632 6.302521 8.050314 3.571429 9.389671 5.140187 6.113033 7.192982 6.161137 6.971154 6.912442 7.236842 7.909605 24.918033 10.651828 7.303371 7.420495 9.715640 9.547739 8.333333 7.272727 5.250597 7.407407 7.947020 5.928854 7.817590 4.790419 2.661597
biz1_16 3.686636 2.400000 2.732240 4.500000 5.232558 2.686567 5.223881 5.839416 2.409639 5.468750 5.641026 9.090909 3.921569 7.209805 11.111111 4.418605 7.751938 4.675716 5.542453 6.870229 2.755906 5.898876 3.688525 4.878049 3.418803 4.946237 1.102941 6.666667 5.472637 5.044136 1.492537 9.604520 3.658537 7.629108 6.445498 4.324324 4.415584 8.715596 5.390836 5.517241 21.689786 6.529210 4.733728 2.985075 12.374582 5.882353 3.389831 5.243446 5.555556 0.613497 2.222222 2.395210 3.070175 4.347826 3.448276
biz1_17 3.846154 3.937008 5.960265 2.623907 4.430380 5.333333 8.823529 5.511811 3.797468 2.830189 6.818182 5.555556 5.072464 7.178312 4.255319 4.060914 5.128205 5.298013 6.017926 2.564103 4.566210 4.823151 2.512563 1.980198 5.116279 5.714286 6.225681 4.193548 5.882353 4.993065 7.692308 7.821229 9.154930 6.459330 4.907975 3.205128 7.124011 9.615385 6.358382 7.874016 13.318025 8.318584 2.985075 4.545455 10.067114 7.216495 1.948052 3.358209 5.159705 2.898551 7.430341 7.975460 3.827751 3.389831 3.550296
biz1_18 2.912621 3.100775 6.547619 4.469274 3.951368 4.491018 5.970149 5.511811 6.632653 2.985075 3.680982 3.333333 3.680982 7.424242 7.142857 5.361305 7.692308 7.165109 5.854241 7.758621 3.162055 5.642633 2.745098 6.956522 2.362205 5.760369 8.394161 4.913295 6.310680 5.889885 3.535353 6.735751 4.054054 7.118255 4.953560 6.470588 7.052897 7.339449 7.397260 6.611570 15.651359 8.044164 3.571429 3.278688 9.118541 3.669725 2.500000 5.415163 11.670481 5.917160 5.753425 4.046243 4.858300 4.320988 1.117318
biz1_19 4.761905 5.128205 8.522727 4.057971 4.137931 6.271777 8.730159 8.612440 3.317536 2.580645 4.812834 3.030303 9.333333 8.391098 8.620690 4.398148 3.906250 4.562738 10.539216 8.724832 4.597701 7.530120 6.578947 1.886792 5.042017 5.985037 6.153846 3.560831 5.464481 7.010014 9.142857 12.435233 2.040816 6.498195 5.845511 1.898734 6.544503 11.363636 7.024793 5.426357 24.050633 8.805031 6.451613 3.738318 8.443272 4.137931 3.571429 5.405405 4.326923 5.298013 5.029586 1.970443 3.007519 5.405405 3.977273

a. Calculate the correlation matrix

cor_matrix contains the full correlation matrix. The table below shows a snapshot of the first 5 rows.

cor_matrix = vs10to19Indt.iloc[:,:].corr()
#shows the first 5 rows
cor_matrix.head(5)
CSA2010 Allendale/Irvington/S. Hilton Beechfield/Ten Hills/West Hills Belair-Edison Brooklyn/Curtis Bay/Hawkins Point Canton Cedonia/Frankford Cherry Hill Chinquapin Park/Belvedere Claremont/Armistead Clifton-Berea Cross-Country/Cheswolde Dickeyville/Franklintown Dorchester/Ashburton Downtown/Seton Hill Edmondson Village Fells Point Forest Park/Walbrook Glen-Fallstaff Greater Charles Village/Barclay Greater Govans Greater Mondawmin Greater Roland Park/Poplar Hill Greater Rosemont Greenmount East Hamilton Harbor East/Little Italy Harford/Echodale Highlandtown Howard Park/West Arlington Inner Harbor/Federal Hill Lauraville Loch Raven Madison/East End Medfield/Hampden/Woodberry/Remington Midtown Midway/Coldstream Morrell Park/Violetville Mount Washington/Coldspring North Baltimore/Guilford/Homeland Northwood Oldtown/Middle East Orangeville/East Highlandtown Patterson Park North & East Penn North/Reservoir Hill Pimlico/Arlington/Hilltop Poppleton/The Terraces/Hollins Market Sandtown-Winchester/Harlem Park South Baltimore Southeastern Southern Park Heights Southwest Baltimore The Waverlies Upton/Druid Heights Washington Village/Pigtown Westport/Mount Winans/Lakeland
CSA2010
Allendale/Irvington/S. Hilton 1.000000 0.605979 0.605189 0.507454 0.648479 0.921941 0.107267 0.115496 0.611571 0.427827 0.415661 0.000816 0.835669 0.040085 0.095439 0.841667 0.571702 0.561711 0.589688 0.193949 0.946892 0.249394 0.824577 0.627311 0.475588 0.713383 0.316741 0.701471 0.144267 0.825075 0.709727 -0.381243 0.660943 -0.215071 0.600828 0.690546 -0.346096 -0.302116 0.351810 0.352127 -0.541845 -0.340371 0.880987 0.750134 -0.535245 0.358733 0.752120 0.420298 0.212735 0.682800 0.696283 0.511747 0.688641 0.723615 0.746989
Beechfield/Ten Hills/West Hills 0.605979 1.000000 0.571762 0.568973 0.298865 0.705692 0.382271 0.054397 0.226702 0.593402 0.422438 -0.360351 0.550078 -0.020585 -0.091276 0.419000 0.272529 -0.026668 0.321435 0.365019 0.590637 -0.214675 0.630727 0.019445 0.233356 0.318546 0.409636 0.006927 0.463691 0.403744 0.538658 -0.130710 0.323130 -0.532674 -0.089087 0.063635 0.017314 -0.451999 0.151200 0.025085 -0.500684 -0.464260 0.538822 0.843557 -0.761006 0.086658 0.359371 0.120023 0.014098 0.505539 0.622637 0.444035 0.477959 0.104997 0.474047
Belair-Edison 0.605189 0.571762 1.000000 0.265417 0.582821 0.668239 0.287643 0.425437 0.635941 -0.075309 0.349756 -0.210996 0.578843 0.544018 0.313644 0.497728 0.397796 0.391600 0.812766 0.278095 0.647464 0.201212 0.261946 0.469465 0.320972 0.684410 0.622760 0.423824 -0.110876 0.722502 0.729789 -0.061750 0.201213 0.076952 0.349109 0.267052 0.163344 0.064190 0.438349 0.527658 -0.168226 -0.039333 0.589691 0.362015 -0.472272 0.020896 0.452725 0.347306 0.288477 0.543448 0.709866 0.390695 0.460600 0.278573 0.155166
Brooklyn/Curtis Bay/Hawkins Point 0.507454 0.568973 0.265417 1.000000 0.629369 0.476382 0.201100 0.362371 0.337922 0.575786 0.496196 0.345643 0.263571 0.029724 0.300512 0.435568 0.186834 -0.000434 0.217443 0.414081 0.584904 0.258588 0.480856 0.458953 0.347988 0.221897 0.543421 0.279241 0.489635 0.536225 -0.059439 -0.232488 0.257001 -0.518738 0.114084 0.239097 -0.358256 -0.469413 0.131288 -0.323001 -0.252186 -0.275219 0.682841 0.649968 -0.203775 0.025861 0.417550 0.308177 0.153206 0.477877 0.374190 0.505585 0.281974 0.285541 0.092251
Canton 0.648479 0.298865 0.582821 0.629369 1.000000 0.621510 -0.019197 0.611347 0.649433 0.254587 0.457624 0.541926 0.470096 0.318307 0.460996 0.652246 0.501614 0.292745 0.644059 0.112305 0.794750 0.415290 0.323596 0.834692 0.737474 0.763350 0.551138 0.790775 -0.200812 0.739704 0.313655 -0.385994 0.492988 -0.180166 0.696485 0.485753 -0.375193 -0.235837 0.066509 0.361617 -0.213811 -0.110073 0.827967 0.458524 -0.054970 0.442614 0.761052 0.622450 0.014986 0.591480 0.678758 0.689069 0.487879 0.518006 0.068852
df = vs10to19Indt.copy()
import matplotlib.pyplot as plt
f = plt.figure(figsize=(19, 15))
plt.matshow(df.corr(), fignum=f.number)
irange = range(df.select_dtypes(['number']).shape[1])
labels = df.select_dtypes(['number']).columns
# plt.xticks(irange, labels, fontsize=14, rotation=45)
plt.yticks(irange, labels, fontsize=14)
cb = plt.colorbar()
cb.ax.tick_params(labelsize=14)
plt.title('Correlation Matrix', fontsize=16);
lblVals = cor_matrix.index.values
cor_matrix = np.asmatrix(cor_matrix)
cor_matrix
matrix([[1.        , 0.60597919, 0.60518925, ..., 0.68864074, 0.72361456,
         0.74698939],
        [0.60597919, 1.        , 0.57176228, ..., 0.47795921, 0.10499745,
         0.47404715],
        [0.60518925, 0.57176228, 1.        , ..., 0.46059962, 0.2785729 ,
         0.15516615],
        ...,
        [0.68864074, 0.47795921, 0.46059962, ..., 1.        , 0.42548492,
         0.63154204],
        [0.72361456, 0.10499745, 0.2785729 , ..., 0.42548492, 1.        ,
         0.59025009],
        [0.74698939, 0.47404715, 0.15516615, ..., 0.63154204, 0.59025009,
         1.        ]])

b. Create graph

G = nx.from_numpy_matrix(cor_matrix)

#relabels the nodes to match the  stocks names
G = nx.relabel_nodes(G,lambda x: lblVals[x])

#Shows the first 5 edges with their corresponding edges
# OLD: G.edges(data=True)[:5]
list(G.edges(data=True))[0:5]
[('Allendale/Irvington/S. Hilton',
  'Allendale/Irvington/S. Hilton',
  {'weight': 1.0}),
 ('Allendale/Irvington/S. Hilton',
  'Beechfield/Ten Hills/West Hills',
  {'weight': 0.6059791922334785}),
 ('Allendale/Irvington/S. Hilton',
  'Belair-Edison',
  {'weight': 0.6051892477527611}),
 ('Allendale/Irvington/S. Hilton',
  'Brooklyn/Curtis Bay/Hawkins Point',
  {'weight': 0.5074537422160204}),
 ('Allendale/Irvington/S. Hilton', 'Canton', {'weight': 0.6484785610661029})]

Part 5: Styling the nodes based on the number of edges linked (degree)

create_corr_network_5[source]

create_corr_network_5(G, corr_direction, min_correlation)

create_corr_network_5(G, corr_direction="positive",min_correlation=0.7)
create_corr_network_5(G, corr_direction="negative",min_correlation=-0.7)
 

We want to create a linear regression for each CSA using {X: year, Y: value} for a given indicator

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression

# Create 3 columns: CSA2010	variable value
wdf = vs10to19Ind.melt(id_vars='CSA2010', value_vars=vs10to19Ind.columns[1:])

# Convert indicator labels into our X (Year) column 
wdf['variable'] = wdf['variable'].apply(lambda x: int(x.replace('biz1_','') ) )

findf = {'CSA':[], 'B':[], 'M':[] }
# For each CSA 
for csa in wdf.CSA2010.unique():
  CsaData = wdf[ wdf['CSA2010']==csa]
  X = CsaData[['variable']] #.values # returns: [10 11 12 13 14 15 16 17 18 19]
  y = CsaData[['value']] #.values
  regressor = LinearRegression()
  regressor.fit(X, y)
  y_pred = regressor.predict(X)
  plt.scatter(X, y, color = 'red')
  plt.plot(X, regressor.predict(X), color = 'blue')
  plt.title('biz1: '+ csa)
  plt.xlabel('YEAR')
  plt.ylabel('VALUE')
  display( plt.show() )
  display( print('B: ', regressor.coef_, 'Y: ', regressor.intercept_) ) 
  findf['CSA'].append(csa)
  findf['B'].append(regressor.intercept_[0])
  findf['M'].append(regressor.coef_[0][0])
None
B:  [[-0.7676921]] Y:  [17.41907254]
None
None
B:  [[-0.22508909]] Y:  [7.95072375]
None
None
B:  [[-0.32746983]] Y:  [11.53006711]
None
None
B:  [[-0.19889243]] Y:  [7.83718388]
None
None
B:  [[-0.66888445]] Y:  [16.1948353]
None
None
B:  [[-0.65466176]] Y:  [16.58442803]
None
None
B:  [[0.3039354]] Y:  [2.32169095]
None
None
B:  [[-0.02009846]] Y:  [6.75182556]
None
None
B:  [[-0.40157394]] Y:  [10.99995577]
None
None
B:  [[-0.49404331]] Y:  [12.54628288]
None
None
B:  [[-0.06396662]] Y:  [6.511798]
None
None
B:  [[-0.06974003]] Y:  [6.20298308]
None
None
B:  [[-0.7691253]] Y:  [19.0763536]
None
None
B:  [[0.09235921]] Y:  [5.85057322]
None
None
B:  [[-0.0203046]] Y:  [7.60364247]
None
None
B:  [[-0.43978483]] Y:  [12.46261036]
None
None
B:  [[-0.8201571]] Y:  [20.17098339]
None
None
B:  [[-0.23034246]] Y:  [9.19129306]
None
None
B:  [[-0.25422869]] Y:  [11.4925597]
None
None
B:  [[-0.05470182]] Y:  [8.04050129]
None
None
B:  [[-0.80450422]] Y:  [18.12196378]
None
None
B:  [[0.07561718]] Y:  [4.95331986]
None
None
B:  [[-0.7946793]] Y:  [18.73421173]
None
None
B:  [[-0.79799401]] Y:  [17.76760905]
None
None
B:  [[-0.30745897]] Y:  [9.72745959]
None
None
B:  [[-0.47312123]] Y:  [13.88416679]
None
None
B:  [[-0.10561882]] Y:  [8.01954753]
None
None
B:  [[-0.58508737]] Y:  [14.68703487]
None
None
B:  [[0.11205558]] Y:  [4.25424319]
None
None
B:  [[-0.29943398]] Y:  [11.19802507]
None
None
B:  [[-0.59855651]] Y:  [15.7377276]
None
None
B:  [[0.63182316]] Y:  [-1.87756507]
None
None
B:  [[-0.75660519]] Y:  [18.0323737]
None
None
B:  [[0.13611165]] Y:  [4.08038371]
None
None
B:  [[-0.411554]] Y:  [12.63543847]
None
None
B:  [[-0.62237948]] Y:  [14.956962]
None
None
B:  [[0.15020353]] Y:  [3.83100144]
None
None
B:  [[0.62909985]] Y:  [-2.40394805]
None
None
B:  [[0.07323157]] Y:  [5.39377336]
None
None
B:  [[-0.46557307]] Y:  [14.01992571]
None
None
B:  [[1.9705589]] Y:  [-15.85895658]
None
None
B:  [[0.39170441]] Y:  [1.3328431]
None
None
B:  [[-0.65473736]] Y:  [16.29400569]
None
None
B:  [[-0.68673404]] Y:  [16.51748551]
None
None
B:  [[0.29808065]] Y:  [4.35365163]
None
None
B:  [[-0.77695571]] Y:  [19.15878425]
None
None
B:  [[-1.04493622]] Y:  [21.62375446]
None
None
B:  [[-0.31586691]] Y:  [10.57869196]
None
None
B:  [[-0.02666733]] Y:  [6.9012915]
None
None
B:  [[-0.69244689]] Y:  [16.62795866]
None
None
B:  [[-0.58921763]] Y:  [15.74535184]
None
None
B:  [[-0.48830303]] Y:  [12.81487531]
None
None
B:  [[-0.64902173]] Y:  [15.46557375]
None
None
B:  [[-0.38971961]] Y:  [11.33013992]
None
None
B:  [[-0.88092192]] Y:  [18.80231421]
None
lin_reg_df = pd.DataFrame(data=findf)
lin_reg_df.head()
CSA B M
0 Allendale/Irvington/S. Hilton 17.419073 -0.767692
1 Beechfield/Ten Hills/West Hills 7.950724 -0.225089
2 Belair-Edison 11.530067 -0.327470
3 Brooklyn/Curtis Bay/Hawkins Point 7.837184 -0.198892
4 Canton 16.194835 -0.668884
lin_reg_dft = lin_reg_df.T
lin_reg_dft.columns = lin_reg_dft.iloc[0]
lin_reg_dft = lin_reg_dft[1:]
lin_reg_dft.index.name = 'variable'
lin_reg_dft = lin_reg_dft.astype('float64')
lin_reg_dft
CSA Allendale/Irvington/S. Hilton Beechfield/Ten Hills/West Hills Belair-Edison Brooklyn/Curtis Bay/Hawkins Point Canton Cedonia/Frankford Cherry Hill Chinquapin Park/Belvedere Claremont/Armistead Clifton-Berea Cross-Country/Cheswolde Dickeyville/Franklintown Dorchester/Ashburton Downtown/Seton Hill Edmondson Village Fells Point Forest Park/Walbrook Glen-Fallstaff Greater Charles Village/Barclay Greater Govans Greater Mondawmin Greater Roland Park/Poplar Hill Greater Rosemont Greenmount East Hamilton Harbor East/Little Italy Harford/Echodale Highlandtown Howard Park/West Arlington Inner Harbor/Federal Hill Lauraville Loch Raven Madison/East End Medfield/Hampden/Woodberry/Remington Midtown Midway/Coldstream Morrell Park/Violetville Mount Washington/Coldspring North Baltimore/Guilford/Homeland Northwood Oldtown/Middle East Orangeville/East Highlandtown Patterson Park North & East Penn North/Reservoir Hill Pimlico/Arlington/Hilltop Poppleton/The Terraces/Hollins Market Sandtown-Winchester/Harlem Park South Baltimore Southeastern Southern Park Heights Southwest Baltimore The Waverlies Upton/Druid Heights Washington Village/Pigtown Westport/Mount Winans/Lakeland
variable
B 17.419073 7.950724 11.530067 7.837184 16.194835 16.584428 2.321691 6.751826 10.999956 12.546283 6.511798 6.202983 19.076354 5.850573 7.603642 12.462610 20.170983 9.191293 11.492560 8.040501 18.121964 4.953320 18.734212 17.767609 9.727460 13.884167 8.019548 14.687035 4.254243 11.198025 15.737728 -1.877565 18.032374 4.080384 12.635438 14.956962 3.831001 -2.403948 5.393773 14.019926 -15.858957 1.332843 16.294006 16.517486 4.353652 19.158784 21.623754 10.578692 6.901292 16.627959 15.745352 12.814875 15.465574 11.33014 18.802314
M -0.767692 -0.225089 -0.327470 -0.198892 -0.668884 -0.654662 0.303935 -0.020098 -0.401574 -0.494043 -0.063967 -0.069740 -0.769125 0.092359 -0.020305 -0.439785 -0.820157 -0.230342 -0.254229 -0.054702 -0.804504 0.075617 -0.794679 -0.797994 -0.307459 -0.473121 -0.105619 -0.585087 0.112056 -0.299434 -0.598557 0.631823 -0.756605 0.136112 -0.411554 -0.622379 0.150204 0.629100 0.073232 -0.465573 1.970559 0.391704 -0.654737 -0.686734 0.298081 -0.776956 -1.044936 -0.315867 -0.026667 -0.692447 -0.589218 -0.488303 -0.649022 -0.38972 -0.880922

We may need to normalize the data for this to be useable

df = lin_reg_dft.copy()
import matplotlib.pyplot as plt
f = plt.figure(figsize=(19, 15))
plt.matshow(df.corr(), fignum=f.number)
irange = range(df.select_dtypes(['number']).shape[1])
labels = df.select_dtypes(['number']).columns
# plt.xticks(irange, labels, fontsize=14, rotation=45)
plt.yticks(irange, labels, fontsize=14)
cb = plt.colorbar()
cb.ax.tick_params(labelsize=14)
plt.title('Correlation Matrix', fontsize=16);