Prospects for observing dynamically formed Binary Black Holes in the Local Universe with Gravitational Waves

Dongming Jin

  • Research Supervisor
    • Dr. Matthew Benacquista
  • Committee Members
    • Dr. Zdzislaw Musielak (co-chair)
    • Dr. Manfred Cuntz
    • Dr. Joseph Romano
    • Dr. Sangwook Park
    • Dr. Teviet Creighton
  • Graduate Advisor

    • Dr. Qiming Zhang
  • April 27th, 2018

Overview

Populating GCs in the local universe

  • LoGal: galaxy within 30 Mpc

    • 7,909 with GCs
    • 1,037 has no measured luminosity
    • 8,946 total
  • GClib: 662,772 GCs over 7,909 Galaxy

    • manipulated LoGal.Ngc with different GC age
    • assign random GC simulation

Simulating GCs

  • GClib models:

    • $2\times5\times3\times3\times3 \times 10 = 324 \times 10 = 3,240$ simulations
    • Node-hour: $150,000 + 90,000 = 240,000$
    • tmp file: over 100 TB
  • GClib model diversity:

    • 1,739 out of 662,772 are duplicated
    • only half of the 3,240 GC simulations have one duplicated entry

Relativistic BBHs

  • BBHlib: extracted BBHs from 3,240 GC simulations

    • 87,365 ejected BBHs
    • 11,095 ejected BHBs
  • BBHdata: 17,883,760 BBHs with host galaxy info. + host GC info. + BBH info.

Prospects for Detection

  • 86,939 BBH mergers + 17,883,760 BBH inspirals
  • Merger event rate (6.1.3)

    • $\eta = N \frac{8 k_0}{3} f_\text{min}^{8/3} \left(\frac{30 \text{ Mpc}}{1 \text{ Gpc}} \right)^3 \left( \frac{13.5 \text{ Gyr}}{1 \text{ yr}} \right) \simeq 587 \text{ Gpc}^{-3} \text{ yr}^{-1}$
  • Localizationable BBHs: 14

   PGC    Dist          M1           M2   Seperation        Ecc

  1628   0.787    3.158000    10.250000     0.807000   0.000000
  1628   0.787   12.728000    19.122000     0.157281   0.852028
   616   2.188    3.158000    10.250000     0.807000   0.000000
  1628   0.787    3.158000    10.250000     0.807000   0.000000
  1628   0.787    3.158000    10.250000     0.807000   0.000000
  1628   0.787    3.158000    10.250000     0.807000   0.000000
  1628   0.787   20.632000   137.730000     0.178258   0.477929
  1628   0.787   21.424999    25.615999     0.165200   0.750944
  1628   0.787   28.433001    26.503000     0.175125   0.813286
  1628   0.787    3.158000    10.250000     0.807000   0.000000
131228   0.762    3.158000    10.250000     0.807000   0.000000
  1628   0.787    3.158000    10.250000     0.807000   0.000000
  1628   0.787    5.319600    10.276000     0.974700   0.000000
  1627   0.813    5.319600    10.276000     0.974700   0.000000

Introduction (chap 1)

Motivation

  • Detection -> GWs from BBHs, overview about GW, BBH
  • BBH formation -> GCs
  • Research interns
    • N-body, MOCCA: Newtonian, statistical
    • 50BiN: optical, time series
    • DSFP: statistics, data analysis

Objective

Background

Introduction (chap 1)

Motivation

Objective

  • GW from BBH formed in GCs -> Astrometry
    • High performance computing: numerical simulations on superclusters
    • Long baseline/timeline observation: time series analysis + multi-messanger
    • Petascale data: systematic surveys, global multi-discipline collaborations

Background

Introduction (chap 1)

Motivation

Objective

Background

  • Astrometry (sec 2.1.2)
    • Cosmology (sec 1.2.1)
    • Dark matter content (sec 6.2.2)
  • GCs and BBHs
    • GCs (sec 3.3)
    • BBHs (sec 5.2.2)
  • GR and GWs from BBHs
    • GR -> GW (sec 5.1.2)
    • GW from BBH (sec 5.2.3)

Populating GCs in the local universe (chap 2)

GWGC for galaxy abundance (sec 2.1)

Data prepare

  1. [=] load GWGC data: wrong data type object due to ~ symbol => impute ~ as NaN and parse data type to float
  2. [=] check duplicates: explain pandas table, duplicated galaxies with Dist > 30 Mpc => drop

  3. extract local galaxies within 30 Mpc and 3D view

  4. spatial distribution and edge galaxies

  5. check interesting features

Description

   1-  7  I7    ---      PGC      [2,4715229]? Identifier from HYPERLEDA
                                  (empty for globular clusters)
   9- 36  A28   ---      Name     Common name of galaxy or globular
  38- 46  F9.5  h        RAhour   Right ascension (J2000, decimal hours)
  48- 55  F8.4  deg      DEdeg    Declination (J2000)
  57- 60  F4.1  ---      TT       [-9,10]? Morphological type code (1)
  62- 66  F5.2  mag      Bmag     ? Apparent blue magnitude
  68- 74  F7.3  arcmin   a        ? Major diameter (arcmins)
  76- 82  F7.3  arcmin   e_a      ? Error in major diameter (arcmins)
  84- 90  F7.3  arcmin   b        ? Minor diameter (arcmins)
  92- 98  F7.3  arcmin   e_b      ? Error in minor diameter (arcmins)
 100-104  F5.3  ---      b/a      [0,1]? Ratio of minor to major diameters
 106-110  F5.3  ---      e_b/a    ? Error on ratio of minor to major diameters
 112-116  F5.1  deg      PA       [0,180]? Position angle of galaxy
                                    (degrees from north through east)
 118-123  F6.2  mag      BMAG     ? Absolute blue magnitude
 125-131  F7.2  Mpc      Dist     ? Distance (Mpc)
 133-138  F6.2  Mpc      e_Dist   ? error on Distance (Mpc)
 140-143  F4.2  mag      e_Bmag   ? error on Apparent blue magnitude
 145-148  F4.2  mag      e_BMAG   ? error on Absolute blue magnitude
In [5]:
# 1. load GWGC data
GWGC=pd.read_csv(path_GWGC, delim_whitespace=True)
for key, n in Counter(GWGC.dtypes).most_common():
    print("'%s' (count %d): \n - %s" % (key, n, '\n - '.join(x for x in GWGC.columns[GWGC.dtypes==key])))
'object' (count 19): 
 - Name
 - Type
 - App_Mag_B
 - err_App_Mag_B
 - Abs_Mag_B
 - err_Abs_Mag_B
 - App_Mag_I
 - err_App_Mag_I
 - Abs_Mag_I
 - err_Abs_Mag_I
 - App_Mag_K
 - err_App_Mag_K
 - Maj_Diam(a)
 - err_Maj_Diam
 - Min_Diam(b)
 - err_Min_Diam
 - b/a
 - err_b/a
 - PA
'float64' (count 4): 
 - RA
 - Dec
 - Dist
 - err_Dist
'int64' (count 1): 
 - PGC
In [6]:
# 1-. convert to numeric, impute '~' as NaN 
for key in GWGC:
    if key != 'Name':
        GWGC[key]=pd.to_numeric(GWGC[key], errors='coerce')

# 2. check duplicates, duplicated galaxies with Dist > 30 Mpc
display(GWGC[GWGC.Name.duplicated(keep=False)] \
        .loc[: ,['Name','Dist','err_Dist','RA','Dec','Type','App_Mag_B']])
Name Dist err_Dist RA Dec Type App_Mag_B
63638 PGC138606 34.347 5.152 3.18005 61.105202 5.0 16.32
68443 PGC166081 72.361 15.919 4.09247 71.469597 7.0 17.47
68880 PGC166081 24.847 3.727 4.09246 71.469470 NaN 17.39
130474 PGC138606 35.569 7.825 3.17891 61.113250 1.2 15.51
178917 6dFJ1705055-200214 112.653 24.784 17.08485 -20.037250 NaN NaN
178919 6dFJ1704153-203840 117.694 25.893 17.07090 -20.644350 NaN NaN
179298 6dFJ1705055-200214 112.653 24.784 17.08485 -20.037250 NaN NaN
179620 6dFJ1704153-203840 116.194 25.563 17.07090 -20.644350 NaN NaN

Populating GCs in the local universe (chap 2)

GWGC for galaxy abundance (sec 2.1)

Data prepare

  1. [x] load GWGC data
  2. [x] check duplicates

  3. [=] extract local galaxies within 30 Mpc and 3D view: 8,946 galaxies out of 183,327, present in 3D plot

  4. spatial distribution and edge galaxies: present in distplot

  5. check interesting features

In [8]:
# 3. extract local galaxies within 30 Mpc and 3D view [slow]
GWGC['Mcat'] = 'Unknown'
GWGC['iMType'] = 0
GWGC.Mcat[GWGC.Type.between(-4,-3)] = 'E/S0'
GWGC.iMType[GWGC.Type.between(-4,-3)] = 6
GWGC.Mcat[GWGC.Type.between(0,11)] = 'S/Irr'
GWGC.iMType[GWGC.Type.between(0,11)] = 1
GWGC.Mcat[GWGC.Type.between(-6,-4)] = 'E'
GWGC.iMType[GWGC.Type.between(-6,-4)] = 2
GWGC.Mcat[GWGC.Type.between(-3,0)] = 'S0'
GWGC.iMType[GWGC.Type.between(-3,0)] = 3

LoGal = GWGC[GWGC.Dist<30]

if not flag_skip:
    fig = figure(figsize=sfig_size)
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter3D(GWGC.Dist*np.cos((GWGC.RA)/12*pi)*np.sin((90-GWGC.Dec)/180*pi),
                 GWGC.Dist*np.sin((GWGC.RA)/12*pi)*np.sin((90-GWGC.Dec)/180*pi),
                 GWGC.Dist*np.cos((90-GWGC.Dec)/180*pi),
                 alpha=0.1,s=2,edgecolors=None) # all GWGCs

    ax.scatter3D(LoGal.Dist*np.cos((LoGal.RA)/12*pi)*np.sin((90-LoGal.Dec)/180*pi),
                 LoGal.Dist*np.sin((LoGal.RA)/12*pi)*np.sin((90-LoGal.Dec)/180*pi),
                 LoGal.Dist*np.cos((90-LoGal.Dec)/180*pi),alpha=0.1,c='g',s=2,edgecolors=None) # all LoGals

    ax.view_init(20, 90)
    r = [-100, 100]
    for s, e in combinations(np.array(list(product(r, r, r))), 2):
        if np.sum(np.abs(s-e)) == r[1]-r[0]:
            ax.plot3D(*zip(s, e), color="k") # box to show orientation

    ax.set_axis_off()
    ax.set_xlim(left=-100,right=100)
    ax.set_ylim(bottom=-100,top=100)
    ax.set_zlim(-100,100)
    ax.set_title('%d galaxies out of the %d galaxies are located in the Local Universe (within 30 Mpc)' 
                 % (len(LoGal), len(GWGC)))

    if flag_svfg:
        fig.savefig(path.join(path_fig, 'GWGC3D.jpeg'),**params_svfg)
else:
    ax = figure(figsize=sfig_size).add_subplot(111)
    ax.imshow(imread(path.join(path_fig, 'GWGC3D.jpeg')))
    ax.grid(False)

Populating GCs in the local universe (chap 2)

GWGC for galaxy abundance (sec 2.1)

Data prepare

  1. [x] load GWGC data
  2. [x] check duplicates

  3. [x] extract local galaxies within 30 Mpc and 3D view

  4. [=] spatial distribution and edge galaxies: present in distplot

    • D < 30 Mpc but D + $\Delta$D > 30 Mpc: 1,856 orange
    • D > 30 Mpc but D - $\Delta$D < 30 Mpc: 3,015 blue
    • be cautious with bin in histgrams
    • galaxy morphological type, numerical Hubble stage, not well explained in paper or references therein: => a future topic
      • [-4, -3]: E/S0 => GCpG.iMType==6
      • [0, 11]: S/Irr => GCpG.iMType==1
      • [-6, -4): E => GCpG.iMType==2
      • (-3, 0): S0 => GCpG.iMType==3
      • NaN: Unknown, 0
  5. check interesting features
In [9]:
# 4. edge galaxies with morphology type, be cautious with bins in histograms
fig, ax = subplots(figsize=fig_size)
sns.distplot(GWGC.Dist,color='g',ax=ax) #  NO. of galaxies at different Dist
ax.set_xlim([0, 180])
if 'GWGC_DT' not in locals():
    GWGC_DT = GWGC.loc[:,['Dist','err_Dist','Mcat']].dropna()
    GWGC_DT['Dcat'] = 0
    GWGC_DT.Dcat[(GWGC_DT.Dist>30) & (GWGC_DT.Dist-GWGC_DT.err_Dist<30)] = -1
    GWGC_DT.Dcat[(GWGC_DT.Dist<30) & (GWGC_DT.Dist+GWGC_DT.err_Dist>30)] = 1
sub_axes = axes([0.16, 0.56, 0.25, 0.25])
sub_axes.set_yticklabels([])
sub_axes.set_xlim([24, 38])
sub_axes2=sub_axes.twinx()
sub_axes2.set_yticklabels([])
sub_axes.grid(None)
sub_axes2.grid(None)
sns.distplot(GWGC_DT.Dist[GWGC_DT.Dcat==-1],kde=False,ax=sub_axes) 
sns.distplot(GWGC_DT.Dist[GWGC_DT.Dcat==1],kde=False,ax=sub_axes)
sns.distplot(GWGC_DT.Dist[GWGC_DT.Dcat!=0],hist=False,ax=sub_axes2,color='y')

sub_axes = axes([0.6, 0.25, 0.25, 0.25]) 
sub_axes.set_xlabel('Morphology Type')
sub_axes.set_yticklabels([])
sub_axes.hist([GWGC_DT.Mcat[(GWGC_DT.Dcat==-1) & (GWGC_DT.Mcat!='Unknown')].values,
               GWGC_DT.Mcat[(GWGC_DT.Dcat==1) & (GWGC_DT.Mcat!='Unknown')].values], 
              histtype='bar')

if flag_svfg:
    savefig(path.join(path_fig,'Distance.jpeg'),**params_svfg)
/opt/conda/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
/opt/conda/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
/opt/conda/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "

Populating GCs in the local universe (chap 2)

GWGC for galaxy abundance (sec 2.1)

Data prepare

  1. [x] load GWGC data
  2. [x] check duplicates

  3. [x] extract local galaxies within 30 Mpc and 3D view

  4. [x] spatial distribution and edge galaxies

  5. [=] check interesting features: missing data

    • statistics: present in pandas table
    • missing data visualization: present in msnoplot
In [11]:
# 5-A. statistics
display(LoGal.loc[:,['Name','RA','Dec','Dist','Type','Abs_Mag_B','Abs_Mag_I']] \
     .describe(include='all').loc[['count','unique','min','max','mean','std'],:])
Name RA Dec Dist Type Abs_Mag_B Abs_Mag_I
count 8946 8946.000000 8946.000000 8946.000000 5979.000000 7877.000000 5606.000000
unique 8946 NaN NaN NaN NaN NaN NaN
min NaN 0.005940 -89.334590 0.010000 -5.000000 -22.590000 -25.520000
max NaN 23.996680 86.131800 29.986000 10.000000 -4.970000 -1.680000
mean NaN 10.973250 3.455383 19.011495 4.524034 -16.210734 -17.158776
std NaN 5.277991 34.889882 6.616150 4.707900 2.451796 2.627928
In [12]:
# 5-B. visualization
img = msno.matrix(LoGal.loc[:, ['Name','RA','Dec','Dist','Type','Abs_Mag_B','Abs_Mag_I']], inline=False)
if flag_svfg:
    img.savefig(path.join(path_fig, 'GWGC_NaN.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

Harris catalog for GCpG model (sec 2.2.1)

Data prepare

  1. load GCpG data
  2. check Morphology Type
  3. check duplicates and NaNs

  4. check interested features

Data explore

Description

0   1- 14  A14   ---        Name    Galaxy identifier
   16- 27  A12   ---        OName   Other galaxy identifier
   29- 38  F10.7 h          RAhour  ? Right Ascension in decimal hours (J2000)
   40- 50  F11.7 deg        DEdeg   ? Declination in decimal degrees (J2000)
4  52- 56  A5    ---        MType   Morphology type
5  58- 63  F6.2  Mpc        Dist    ? NED distance; see Section 2
   65- 69  F5.2  Mpc        e_Dist  ? Uncertainty in Dist
   71- 83  A13   ---        n_Dist  Method used to determine Dist
   85- 89  F5.3  mag        Av      NED foreground V band extinction
9  91- 97  F7.3  mag        VMag    [-25/-11]? Absolute visual magnitude (1)
   99-104  F6.3  mag        e_VMag  [0.2/1.5]? Uncertainty in VMag
11 106-115  F10.3 mag        KMag    [-28/-14]? Absolute K band magnitude (2)
   117-125  F9.3  mag        e_KMag  [0.1/0.3]? Uncertainty in KMag
       127  A1    ---        l_Ngc   [>] Limit flag on Ngc
14 128-134  F7.1  ---        Ngc     [0/32500] Number of Globular Clusters (NGC)
   136-142  F7.1  ---        e_Ngc   [0/20000] Uncertainty in Ngc
   144-152  A9    ---        r_Ngc   Reference(s) for Ngc (see refs.dat file)
17 154-159  F6.2  km/s       sigma   [10/414]? Stellar velocity dispersion (3)
   161-165  F5.2  km/s      e_sigma  ? Uncertainty in sigma
19 167-171  F5.2  kpc        Reff    [0.1/55]? Effective radius (4)
   173-177  F5.2  kpc        e_Reff  ? Uncertainty in Reff
21 179-184  F6.3  [Msun]     logMd   [7.6/12.8]? Log of dynamical mass
   186-191  F6.3  [Msun]    e_logMd  ? Uncertainty in logMd
23 193-197  F5.2  [Msun]     logMGC  ? Log of total globular cluster mass (5)
   199-203  F5.2  [Msun]   e_logMGC  ? Uncertainty in logMGC
25 205-209  F5.2  [Msun]     logMB   ? Log of central supermassive black hole mass(6)
   211-215  F5.2  [Msun]    E_logMB  ? Upper uncertainty in logMB
   217-221  F5.2  [Msun]    e_logMB  ? Lower uncertainty in logMB

Populating GCs in the local universe (chap 2)

Harris catalog for GCpG model (sec 2.2.1)

Data prepare

  1. [=] load GCpG data
  2. [=] check Morphology Type: MType: str => to numeric, & parse in Harris manner, present in barplot
  3. [=] check duplicates and NaNs: explain pandas table, 418 galaxy enteries

    • 0: Nan in [VMag, KMag, Dist], MilkyWay => drop
    • 356: Nan in [VMag, KMag], A1689-BCG, 770 Mpc => drop
    • 227,228: completely duplicated => drop 228
    • 272,273: different data reduction => drop 272(higher uncertainty in data)
    • 138: NaN in logMGC, 3 in Ngc => data is dirty, check in 4-A-b
  4. check interested features

Data explore

In [14]:
# 1. load GCpG data
GCpG=pd.read_csv(path_GCpG)

# 2. MType to numeric, in Harris manner
GCpG['iMType'] = ''
GCpG['Mcat'] = ''
# E -> 2: 226(+18)
GCpG.iMType[GCpG.MType.str.contains(r'E')] = 2
GCpG.Mcat[GCpG.MType.str.contains(r'E')] = 'E'
# S0 -> 3: 75(+18)
GCpG.iMType[GCpG.MType.str.contains(r'S0')] = 3
GCpG.Mcat[GCpG.MType.str.contains(r'S0')] = 'S0'
# S/Irr -> 1: 103
GCpG.iMType[-(GCpG.MType.str.contains(r'E') | GCpG.MType.str.contains(r'S0'))] = 1
GCpG.Mcat[-(GCpG.MType.str.contains(r'E') | GCpG.MType.str.contains(r'S0'))] = 'S/Irr'
# E/S0 -> 6: 18
GCpG.iMType[GCpG.MType.str.contains(r'E') & GCpG.MType.str.contains(r'S0')] = 6
GCpG.Mcat[GCpG.MType.str.contains(r'E') & GCpG.MType.str.contains(r'S0')] = 'E/S0'

fig = figure(figsize=fig_size)
ax = fig.add_subplot(111)
GCpG.loc[:,['Name','MType','Mcat']].groupby(['Mcat','MType']).count() \
        .unstack().plot(kind='bar',stacked=True,legend=False,rot=0, ax=ax)
text(0.8, 150,'%10s' % GCpG.loc[:,['Name','Mcat']].groupby('Mcat').count() \
     .to_string().replace('Name','count'), **params_font)
text(1.8, 150, '%6s' % GCpG[GCpG.Mcat.isin(['S0','E/S0'])].loc[:,['Name','MType','Mcat']] \
     .groupby(['Mcat','MType']).count().to_string().replace('Name','count').replace(' ', '%2s' % ' '), **params_font)
xlabel('Morphology Type')

if flag_svfg:
    savefig(path.join(path_fig, 'MType.jpeg'),**params_svfg)
    
print('Our catalog contains 248 ellipticals, 93 S0\'s,'+
      ' and 81 spirals or irregulars, (Harris et. al. 2013).')
Our catalog contains 248 ellipticals, 93 S0's, and 81 spirals or irregulars, (Harris et. al. 2013).
In [15]:
# 3. check duplicates and NaNs 
display(GCpG.reindex([0,356,227,228,272,273,138]).loc[:,
         ['Name','RAhour','DEdeg','Dist','e_Dist','VMag','e_VMag','KMag',
          'logMGC','Ngc','r_Ngc','sigma','Reff','logMd','logMB']])
if 0 in GCpG.index:
    GCpG.drop([0,356,228,272],inplace=True)
Name RAhour DEdeg Dist e_Dist VMag e_VMag KMag logMGC Ngc r_Ngc sigma Reff logMd logMB
0 MilkyWay NaN NaN NaN NaN -21.30 0.3 NaN 7.66 160.0 44 105.0 0.70 9.856 6.61
356 A1689-BCG 13.191528 -1.338056 790.00 40.00 NaN NaN NaN 10.10 30000.0 66b NaN NaN NaN NaN
227 NGC4417 12.447387 9.584243 16.03 0.53 -20.00 0.2 -22.860 7.24 72.0 77 135.2 1.22 10.318 NaN
228 NGC4417 12.447387 9.584243 16.03 0.53 -20.00 0.2 -22.860 7.24 72.0 77 135.2 1.22 10.318 NaN
272 VCC-1386 12.530939 12.656659 16.00 2.00 -17.34 0.2 NaN 6.67 26.6 67 20.5 1.80 8.848 NaN
273 VCC-1386 12.530939 12.656659 16.10 1.13 -16.90 0.2 NaN 6.64 26.0 24 NaN NaN NaN NaN
138 NGC2915 9.436528 -76.626389 4.06 0.20 -16.12 0.2 -18.298 NaN 3.0 66a NaN 0.31 NaN NaN

Populating GCs in the local universe (chap 2)

Harris catalog for GCpG model (sec 2.2.1)

Data prepare

  1. [x] load GCpG data
  2. [x] check Morphology Type
  3. [x] check duplicates and NaNs

  4. [=] check interested features:

    1. [=] statistics, present in 2 pandas tables

      • tab1: var(Ngc(6:0)) significantly larger => add logNgc(6:-inf)
      • tab2: explain unreasonable value in Ngc, e_Ngc(0:3), logMGC(1:NaN), e_logMGC(2:0):
        • Ngc derived from logMGC => focus on Ngc/logNgc(412/418)
        • Ngc == 0: [7,51,87,88,271,304] => Ngc = 0.5 e_Ngc, avoid -inf temporately
        • e_Ngc == 0: [1,28,138] => e_Ngc = 0.5 Ngc, avoid -inf temporately
        • logMGC, e_logMGC == NaN: [138] => linear interpolate NaN logMGC from logNgc(412))
        • e_logMGC ==0: [1,28] => leave alone
    2. visualization

Data explore

In [17]:
# 4-A-a. statistics
display(GCpG.loc[:,['Name','VMag','KMag','Ngc','e_Ngc','r_Ngc',
                    'sigma','Reff','logMd','logMGC','Dist','logMB']] \
        .describe(include='all').loc[['count','unique','min','max','mean','std'],:])
Name VMag KMag Ngc e_Ngc r_Ngc sigma Reff logMd logMGC Dist logMB
count 418 418.000000 344.000000 418.000000 418.000000 418 271.000000 340.000000 253.000000 417.000000 418.000000 64.000000
unique 418 NaN NaN NaN NaN 106 NaN NaN NaN NaN NaN NaN
min NaN -24.190000 -27.080000 0.000000 0.000000 NaN 10.400000 0.130000 7.693000 4.880000 0.030000 0.000000
max NaN -11.170000 -14.510000 32500.000000 13000.000000 NaN 414.000000 55.000000 12.726000 10.120000 375.000000 10.320000
mean NaN -19.127943 -22.583840 1512.699761 351.961962 NaN 168.750923 4.287588 10.727692 7.369305 34.758971 5.933750
std NaN 2.799122 2.659367 3957.436127 1075.182192 NaN 93.365634 6.092928 0.969581 1.229806 46.445852 3.953719
In [18]:
# 4-A-b. i: take logarithms -> logNgc; ii: fit NaN in logMGC and e_logMGC
display(GCpG.loc[(GCpG.loc[:,['Ngc','e_Ngc','logMGC','e_logMGC']]==0).sum(axis=1)==True,
                 ['Name','Ngc','e_Ngc','logMGC','e_logMGC']])
if 'has_GC' not in locals(): # i
    has_GC = (GCpG.Ngc > 0) # mark for logMGC interpolation
    GCpG.Ngc[~has_GC] = 0.5 * GCpG.e_Ngc[~has_GC]
    GCpG.e_Ngc[GCpG.e_Ngc==0] = 0.5 * GCpG.Ngc[GCpG.e_Ngc==0]
if 'logNgc' not in GCpG.columns:
    GCpG['logNgc'] = GCpG.Ngc.apply(log10)
if 'nn_MGC' not in locals(): # ii
    nn_MGC = GCpG.logMGC.notna() # update 138: logMGC NaN from Ngc/logNgc
    LRsk = linear_model.LinearRegression()
    LRsk.fit(GCpG.logNgc[has_GC & nn_MGC].values.reshape(-1,1),
             GCpG.logMGC[has_GC & nn_MGC].values.reshape(-1,1))
    GCpG.logMGC[~nn_MGC] = LRsk.predict(GCpG.logNgc[~nn_MGC].values.reshape(-1,1))
    LRsk.fit(GCpG.e_Ngc[has_GC & nn_MGC].apply(log10).values.reshape(-1,1),
             GCpG.e_logMGC[has_GC & nn_MGC].values.reshape(-1,1))
    GCpG.e_logMGC[~nn_MGC] = LRsk.predict(GCpG.e_Ngc[~nn_MGC].apply(log10).values.reshape(-1,1))
Name Ngc e_Ngc logMGC e_logMGC
7 NGC221 0.0 1.0 6.26 0.10
51 FCC-0059 0.0 4.0 6.21 0.19
87 FCC-242 0.0 3.3 5.51 0.30
88 FCC-246 0.0 3.4 5.51 0.30
138 NGC2915 3.0 0.0 NaN NaN
271 VCC-1363 0.0 3.0 6.19 0.24
304 VCC-1714 0.0 4.5 6.19 0.16

Populating GCs in the local universe (chap 2)

Harris catalog for GCpG model (sec 2.2.1)

Data prepare

  1. [x] load GCpG data
  2. [x] check Morphology Type
  3. [x] check duplicates and NaNs

  4. [=] check interested features

    1. [x] statistics, present in 2 pandas tables

    2. [=] visualization

      1. logNgc: present KDE in distplot
      2. data completeness: represent in barplot
      3. spatial sparseness of NaN data: present in msnoplot

Data explore

In [19]:
# 4-B. visualization
fig = figure(figsize=fig_size)
ax = fig.add_subplot(211)
GCpG.logNgc.plot('kde',title='logNgc',ax=ax)
ax2 = fig.add_subplot(212)
GCpG.loc[:,GCpG.columns[GCpG.dtypes==float]].describe().loc['count',:].plot('bar', ax=ax2)
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f34ba0885c0>
In [20]:
img = msno.matrix(GCpG.loc[:, ['Name','VMag','KMag','Ngc','r_Ngc',
                               'sigma','Reff','logMd','Dist','logMB','logNgc']], inline=False)
if flag_svfg:
    img.savefig(path.join(path_fig, 'GCpG_NaN.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

Harris catalog for GCpG model (sec 2.2.1)

Data prepare

Data explore

  1. [=] explore correlations for data type in float: present in pairplots with heatmap

    • properties:
      • 'Dist', 'e_Dist', 'Av', 'VMag', 'e_VMag', 'KMag', 'e_KMag', 'Ngc', 'e_Ngc', 'sigma', 'e_sigma', 'Reff', 'e_Reff', 'logMd', 'e_logMd', 'logMGC', 'e_logMGC','logMB', 'E_logMB', 'e_logMB', 'logNgc'
    • correlated =>
      • xxx & e_xxx: intuitive, not interesting
      • Av & Dist: Dist derived from NED Av => focus on Dist (more intuitive)
      • VMag & KMag(74:NaN): intuitive, LR in 2-A
    • probably correlated =>
      • Dist & Ngc(6:0) & Reff(78:NaN)
      • VMag & KMag(74:NaN) & sigma(147:NaN) & logMd(165:NaN) & logMGC(1:NaN) & logNgc(6:-inf)
  2. check correlated pairs [and interpolate values]

  3. correlation ranking

In [21]:
# 1. explore correlations [slow]
def pp_heat(df, cor = 0.5, figname = 'pairplots.jpeg'):
    '''
    Input
        df: pandas dataframe
    Output
        corr: df.corr()
        pairplot with heatmap
    '''
    with sns.axes_style("white"):
        pp = sns.pairplot(df.fillna(0), diag_kind='kde', markers='x', plot_kws={'s':15,'alpha':0.2})
        corr = df.corr()
        for i, j in zip(*triu_indices_from(pp.axes, 1)):
            pp.axes[i, j].set_visible(False)
            # pp.axes[i, j].figure.text((j+0.5)/len(corr),(len(corr)-i-0.5)/len(corr),
            #                           str(round(corr.values[i, j],3)), **params_font)
        for i, j in zip(*tril_indices_from(pp.axes, -1)):
            if abs(corr.values[i, j]) < cor or isnan(corr.values[i, j]):
                sns.despine(ax=pp.axes[i, j],top=True,right=True,left=True,bottom=True)
            else:
                sns.despine(ax=pp.axes[i, j],top=False,right=False,left=False,bottom=False)
                for pos in ['top','right','left','bottom']:
                    pp.axes[i, j].spines[pos].set_edgecolor('gray')
                pp.axes[i, j].figure.text((j+0.7)/len(corr),(len(corr)-i-0.3)/len(corr),
                                          str(round(corr.values[i, j],3)), **params_font)
        ax = pp.axes[0,0].figure.add_subplot(222)
        sns.heatmap(corr, square=True, cmap=cmap, alpha=0.8, ax=ax)
        savefig(path.join(path_fig,figname), **params_svfg)
    return 0

if not flag_skip:
    pp_heat(GCpG.loc[:,GCpG.columns[GCpG.dtypes==float][2:]], 0.5, 'pp_GCpG.jpeg')
else:
    ax = figure(figsize=sfig_size).add_subplot(111)
    ax.imshow(imread(path.join(path_fig, 'pp_GCpG.jpeg')))
    ax.grid(False)

Populating GCs in the local universe (chap 2)

Harris catalog for GCpG model (sec 2.2.1)

Data prepare

Data explore

  1. [x] explore correlations for data type in float

  2. [=] check correlated pairs [and interpolate values]

    1. [=] VMag(418) & KMag(344+74): LR interpolate => KMag(74:NaN), present in errorbar & OLS LR summary

    2. Dist & Ngc(6:0) & Reff(78:NaN)

    3. VMag(418)/KMag(344) & logMd(253)/sigma(271) & logMGC(417)/logNgc(412)

  3. correlation ranking

In [22]:
# 2. check correlated pairs [and interpolate values]
# 2-A. `VMag` & `KMag`: linear regression
if 'nn_KMag' not in locals():
    nn_KMag = GCpG.KMag.notna()
    LRsk = linear_model.LinearRegression()
    LRsk.fit(GCpG.VMag[nn_KMag].values.reshape(-1,1), GCpG.KMag[nn_KMag].values.reshape(-1,1))
    GCpG.KMag[~nn_KMag] = LRsk.predict(GCpG.VMag[~nn_KMag].values.reshape(-1,1))
    LRsk.fit(GCpG.e_VMag[nn_KMag].values.reshape(-1,1), 
             GCpG.e_KMag[nn_KMag].values.reshape(-1,1))
    GCpG.e_KMag[~nn_KMag] = LRsk.predict(GCpG.e_VMag[~nn_KMag].values.reshape(-1,1))
LRols = sm.OLS(GCpG.KMag.values, GCpG.VMag.values).fit() # OLS(y, X)
print(LRols.summary2())
                 Results: Ordinary least squares
==================================================================
Model:              OLS              Adj. R-squared:     0.999    
Dependent Variable: y                AIC:                803.0247 
Date:               2018-06-17 22:09 BIC:                807.0602 
No. Observations:   418              Log-Likelihood:     -400.51  
Df Model:           1                F-statistic:        5.005e+05
Df Residuals:       417              Prob (F-statistic): 0.00     
R-squared:          0.999            Scale:              0.39886  
---------------------------------------------------------------------
         Coef.     Std.Err.       t        P>|t|     [0.025    0.975]
---------------------------------------------------------------------
x1       1.1304      0.0016    707.4353    0.0000    1.1273    1.1336
------------------------------------------------------------------
Omnibus:              176.239      Durbin-Watson:         1.879   
Prob(Omnibus):        0.000        Jarque-Bera (JB):      1964.668
Skew:                 1.479        Prob(JB):              0.000   
Kurtosis:             13.201       Condition No.:         1       
==================================================================

In [23]:
fig = figure(figsize=fig_size)
ax = fig.add_subplot(111)
ax.errorbar(GCpG.VMag[nn_KMag], GCpG.KMag[nn_KMag],
            xerr=GCpG.e_VMag[nn_KMag], yerr=GCpG.e_KMag[nn_KMag], fmt='g+', **params_errbar)
ax.errorbar(GCpG.VMag[~nn_KMag], GCpG.KMag[~nn_KMag],
            xerr=GCpG.e_VMag[~nn_KMag], yerr=GCpG.e_KMag[~nn_KMag], fmt='rx', **params_errbar)
ax.invert_xaxis()
ax.invert_yaxis()
ax.set_xlabel('VMag')
ax.set_ylabel('KMag')
legend(['existing KMag vs VMag: %d/%d' % (sum(nn_KMag),len(nn_KMag)),
        'interpolated KMag vs VMag: %d/%d' % (sum(~nn_KMag),len(nn_KMag))])
if flag_svfg:
    savefig(path.join(path_fig,'KMag_VMag.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

Harris catalog for GCpG model (sec 2.2.1)

Data prepare

Data explore

  1. [x] explore correlations for data type in float
  2. [=] check correlated pairs [and interpolate values]

    1. [x] VMag(418) & KMag(344+74)

    2. [=]Dist & Ngc(6:0) & Reff(78:NaN): no significant correlation => leave alone, present in pairplot

    3. VMag(418)/KMag(344) & logMd(253)/sigma(271) & logMGC(417)/logNgc(412)

  3. correlation ranking

In [24]:
# 2-B. `Dist` & `Ngc` & `Reff`: no significant correlations
# pp = sns.pairplot(GCpG, vars= ['Dist','Ngc','Reff'], hue='Mcat', diag_kind= 'kde',
#                   plot_kws=params_hex, diag_kws={'lw':1}, size=3.2)
# for i, j in zip(*triu_indices_from(pp.axes, 1)):
#     pp.axes[i, j].set_visible(False)
# upgrade to PairGrid https://seaborn.pydata.org/generated/seaborn.PairGrid.html 
pp = sns.PairGrid(GCpG, vars= ['Dist','Ngc','Reff'], hue='Mcat',diag_sharey=False,size=3.2)
pp.map_lower(scatter, **params_hex)
pp.map_upper(sns.residplot)#, ci=97, scatter_kws={'s':15}, line_kws={'lw':1})
pp.map_diag(sns.kdeplot, lw=1)
pp.add_legend()

if flag_svfg:
    savefig(path.join(path_fig,'pp_DNR.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

Harris catalog for GCpG model (sec 2.2.1)

Data prepare

Data explore

  1. [x] explore correlations for data type in float
  2. [x] check correlated pairs [and interpolate values]

    1. [x] VMag(418) & KMag(344+74)
    2. [x] Dist & Ngc(6:0) & Reff(78:NaN)

    3. [=] VMag(418)/KMag(344) & logMd(253)/sigma(271) & logMGC(417)/logNgc(412)

      1. [=] logNgc(6:-inf) & logMGC(1:NaN): Ngc(6:0) deduct from logMGC (Harris et al. 2011), present in errorbar & OLS LR summary

        • => logNgc(412-1) LR interpolate logMGC(1:NaN)
        • => logMGC(418-1-6) LR interpolate Ngc/logNgc(6:0/-inf)
      2. VMag/KMag (418/344+74) & logNgc/logMGC (412/417)

      3. logMd(253) & sigma(271) & logNgc/logMGC (412/417)

  3. correlation ranking

In [25]:
# 2-C-a. `logNgc` & `logMGC`: linear regression, interpolate logNgc from logMGC
LRsk = linear_model.LinearRegression()
LRsk.fit(GCpG.logMGC[has_GC].values.reshape(-1,1),
         GCpG.logNgc[has_GC].values.reshape(-1,1))
GCpG.logNgc[~has_GC] = LRsk.predict(GCpG.logMGC[~has_GC].values.reshape(-1,1))
LRsk.fit(GCpG.e_logMGC[has_GC].values.reshape(-1,1),
         GCpG.e_Ngc[has_GC].apply(log10).values.reshape(-1,1))
GCpG.e_Ngc[~has_GC] = 10**LRsk.predict(GCpG.e_logMGC[~has_GC].values.reshape(-1,1))
GCpG.Ngc[~has_GC] = 10**GCpG.logNgc[~has_GC]
LRols = sm.OLS(GCpG.logNgc.values, GCpG.logMGC.values).fit() # OLS(y, X)
print(LRols.summary2())
                 Results: Ordinary least squares
==================================================================
Model:              OLS              Adj. R-squared:     0.892    
Dependent Variable: y                AIC:                954.0297 
Date:               2018-06-17 22:09 BIC:                958.0651 
No. Observations:   418              Log-Likelihood:     -476.01  
Df Model:           1                F-statistic:        3437.    
Df Residuals:       417              Prob (F-statistic): 1.75e-203
R-squared:          0.892            Scale:              0.57241  
---------------------------------------------------------------------
          Coef.     Std.Err.       t       P>|t|     [0.025    0.975]
---------------------------------------------------------------------
x1        0.2906      0.0050    58.6292    0.0000    0.2808    0.3003
------------------------------------------------------------------
Omnibus:               40.119       Durbin-Watson:          1.430 
Prob(Omnibus):         0.000        Jarque-Bera (JB):       18.367
Skew:                  0.316        Prob(JB):               0.000 
Kurtosis:              2.191        Condition No.:          1     
==================================================================

In [26]:
fig = figure(figsize=fig_size)
ax = fig.add_subplot(111)
ax.set_xlim([-1,6])
ax.set_ylim([4.5,10])
ax.errorbar(GCpG.logNgc[has_GC], GCpG.logMGC[has_GC],
            xerr=log10(GCpG.e_Ngc[has_GC]), yerr=GCpG.e_logMGC[has_GC], 
            fmt='g+', **params_errbar)
ax.errorbar(GCpG.logNgc[~has_GC], GCpG.logMGC[~has_GC],
            xerr=log10(GCpG.e_Ngc[~has_GC]), yerr=GCpG.e_logMGC[~has_GC], 
            fmt='rx', **params_errbar)
legend(['existing logNgc vs logMGC','interpolated logNgc vs logMGC'], loc=4)
ax.set_xlabel('logNgc')
ax.set_ylabel('logMGC')
if flag_svfg:
    savefig(path.join(path_fig,'Ngc_MGC.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

Harris catalog for GCpG model (sec 2.2.1)

Data prepare

Data explore

  1. [x] explore correlations for data type in float

  2. [x] check correlated pairs [and interpolate values]

    1. [x] VMag(418) & KMag(344+74)

    2. [x] Dist & Ngc(412+6) & Reff(78:NaN)

    3. [=] VMag(418)/KMag(344) & logMd(253)/sigma(271) & logMGC(417)/logNgc(412)

      1. [x] logNgc(412+6) & logMGC(417+1)

      2. [=] VMag/KMag (418/344+74) & logNgc/logMGC (412/417): all complete

        • => VMag(original) vs logNgc (more intuitive)
        • => Luminosity model in sec 2.2.1, present in pairplot
      3. logMd(253) & sigma(271) & logNgc/logMGC (412/417)

  3. correlation ranking

In [27]:
# 2-C-b. `VMag(KMag) & logNgc(logMGC)`
pp = sns.pairplot(GCpG, vars=['VMag','KMag','logNgc','logMGC'], hue='Mcat',
                  plot_kws=params_scatter, diag_kind='kde', diag_kws={'lw':1})
for i, j in zip(*triu_indices_from(pp.axes, 1)):
    pp.axes[i, j].set_visible(False)
pp.axes[0,0].invert_xaxis()
pp.axes[0,1].invert_xaxis()
pp.axes[1,0].invert_yaxis()
if flag_svfg:
    savefig(path.join(path_fig,'pp_Luminosity.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

Harris catalog for GCpG model (sec 2.2.1)

Data prepare

Data explore

  1. [x] explore correlations for data type in float

  2. [x] check correlated pairs [and interpolate values]

    1. [x] VMag(418) & KMag(344+74)

    2. [x] Dist & Ngc(6:0) & Reff(78:NaN)

    3. [=] VMag(418)/KMag(344) & logMd(253)/sigma(271) & logMGC(417+1)/logNgc(412+6)

      1. [x] logNgc(412+6) & logMGC(417+1)

      2. [x] VMag/KMag (418/344+74) & logNgc/logMGC (412+6/417+1)

      3. [=] logMd(253) & sigma(271) & logNgc/logMGC (412+6/417+1)

        • => logMd(253) = f(sigma(271),Reff(340)) vs logNgc(412+6):logMd(253) limiting term
        • => Dynamical mass model in sec 2.2.2, present in pairplot
  3. correlation ranking

In [28]:
# 2-C-c. `logMd & sigma & logNgc(logMGC)`
pp = sns.pairplot(GCpG, vars=['logMd','sigma','logNgc','logMGC'], hue='Mcat', 
                  plot_kws=params_scatter, diag_kind='kde', diag_kws={'lw':1})
for i, j in zip(*triu_indices_from(pp.axes, 1)):
    pp.axes[i, j].set_visible(False)
if flag_svfg:
    savefig(path.join(path_fig,'pp_Dynamical.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

Harris catalog for GCpG model (sec 2.2.1)

Data prepare

Data explore

  1. [x] explore correlations for data type in float

  2. [x] check correlated pairs [and interpolate values]

    1. [x] VMag & KMag

    2. [x] Dist & Ngc & Reff

    3. [x] VMag(KMag) & logMd (sigma) & logMGC(logNgc)

  3. [=] correlation ranking: Random Forest with feature importance, explain in heatmap

    • VMag/KMag(344+74) vs logNgc(412+6)/logMGC(417+1)
    • impute median for NaN: => KMag & logMGC > KMag & logNgc > VMag & logNgc
    • impute mean for NaN: => KMag & logNgc > KMag & logMGC > VMag & logMGC
    • impute most frequent for NaN: => KMag & logNgc > KMag & logMGC > VMag & logMGC
    • 'normalize' by data availibility: 253/418(logMd) subset => KMag & Ngc > logMd & logMGC > VMag & logMGC, max correlation score drops
    • => possible selection effect
    • => overfitting in interpolation
    • => debating GCpG nature
    • (color-blind/friendly seaborn colors and matplotlib)
In [30]:
# 3. correlation ranking, feature importance with Random Forest
## 3-A. impute median for NaN
## 3-B. impute mean for NaN
## 3-C. impute most frequent for NaN
## 3-D. 'normalize' by data avalibility 253/418

if 'strategies' not in locals():
    features = ['Dist', 'e_Dist', 'Av', 'VMag', 'e_VMag', 'KMag', 'e_KMag', 'sigma', 'e_sigma',
                'Reff', 'e_Reff', 'logMd', 'e_logMd', 'logMB', 'E_logMB', 'e_logMB']
    targets = ['Ngc', 'e_Ngc', 'logNgc', 'logMGC', 'e_logMGC', 'iMType']
    norm_feats = ['Dist', 'e_Dist', 'Av', 'VMag', 'e_VMag', 'KMag', 'e_KMag', 'sigma', 'e_sigma',
                'Reff', 'e_Reff', 'logMd', 'e_logMd']
    strategies = ['median', 'mean', 'most_frequent','norm']

def RF_NaN(df, feats, tags, strategy):
    '''
    Input
        df: pandas dataframe
        feats: list of feature colume names
        tags: list of target colume names
        strategy: str of strategy to deal with NaN
    Output
        df_rf: dataframe of Random Forest feature importances 
    '''
    df_rf = pd.DataFrame(index=feats)
    if strategy == 'norm':
        X_rf = df.loc[:, feats].values
    else:
        X_rf = df.loc[:, feats].values
        imp = Imputer(missing_values='NaN', strategy=strategy, axis=0)
        X_rf = imp.fit_transform(X_rf)

    RFreg = RandomForestRegressor()
    for target in tags:
        y_rf = df.loc[:, target].values
        RFreg.fit(X_rf, y_rf)
        df_rf[target] = RFreg.feature_importances_
    return df_rf

axs = figure(figsize=sfig_size).subplots(2,2).ravel()
for i in range(len(strategies)):
    if strategies[i] == 'norm':
        df_rf = RF_NaN(GCpG.loc[:, norm_feats+targets].dropna(), norm_feats, targets, strategies[i])
        axs[i].set_title("\'subset\' 253/418")
    else:
        df_rf = RF_NaN(GCpG, features, targets, strategies[i])
        axs[i].set_title(strategies[i])
    sns.heatmap(df_rf, cmap=cmap, alpha=0.8, ax=axs[i])

if flag_svfg:
    savefig(path.join(path_fig,'RF_heat.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

GCpG models (sec 2.2)

Linear models: logMd vs logMGC/logNgc

  1. [=] dynamical mass model logMGC/logNgc $=\mathscr{LR}$(logMd)

    1. [=] scatter plot with LR: strong correlation, present in scatter + regplot

    2. OLS LR summary

  2. luminosity model $\mathscr{LR}$(VMag/KMag)

  3. revisit with errorbar

GC SN model

In [32]:
# 1-A. `logMd vs logMGC/logNgc` in scatter
fig = figure(figsize=fig_size)
ax = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax.set_xlim([7.5,13])
ax2.set_xlim([7.5,13])
ax2.set_ylim([-0.5, 5.5])
for i in range(len(GCpG.Mcat.unique())):
    mask_MT = (GCpG.Mcat == GCpG.Mcat.unique()[i])
    ax.scatter(GCpG.logMd[mask_MT], GCpG.logMGC[mask_MT],**markers[i])
    sns.regplot(GCpG.logMd[mask_MT], GCpG.logMGC[mask_MT],
                scatter=False, ci=None, line_kws={'lw':1}, ax=ax)
    ax2.scatter(GCpG.logMd[mask_MT], GCpG.logNgc[mask_MT],**markers[i])
    sns.regplot(GCpG.logMd[mask_MT], GCpG.logNgc[mask_MT],
                scatter=False, ci=None, line_kws={'lw':1}, ax=ax2)
sns.regplot(GCpG.logMd,GCpG.logMGC, scatter=False, ci=ci, line_kws={'lw':1}, ax=ax)
sns.regplot(GCpG.logMd,GCpG.logNgc, scatter=False, ci=ci, line_kws={'lw':1}, ax=ax2)
ax.legend()
ax2.legend()

if flag_svfg:
    savefig(path.join(path_fig,'LR_Mdyn.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

GCpG models (sec 2.2)

Linear models: logMd vs logMGC/logNgc

  1. [=] dynamical mass model logMGC/logNgc $=\mathscr{LR}$(logMd)

    1. [x] scatter plot with LR

    2. [=] OLS LR summary: for logMd(253): logMGC(0.996) > logNgc(0.923)

  2. luminosity model $\mathscr{LR}$(VMag/KMag)

  3. revisit with errorbar

GC SN model

In [33]:
# 1-B. `logMd vs logMGC/logNgc` in OLS regression
LRols = sm.OLS(GCpG.logNgc[GCpG.logMd.notna()].values,
               GCpG.logMd[GCpG.logMd.notna()].values).fit() # OLS(y, X)
print(LRols.summary2())
                 Results: Ordinary least squares
==================================================================
Model:              OLS              Adj. R-squared:     0.923    
Dependent Variable: y                AIC:                566.4509 
Date:               2018-06-17 22:09 BIC:                569.9842 
No. Observations:   253              Log-Likelihood:     -282.23  
Df Model:           1                F-statistic:        3037.    
Df Residuals:       252              Prob (F-statistic): 1.41e-142
R-squared:          0.923            Scale:              0.54723  
---------------------------------------------------------------------
          Coef.     Std.Err.       t       P>|t|     [0.025    0.975]
---------------------------------------------------------------------
x1        0.2379      0.0043    55.1067    0.0000    0.2294    0.2464
------------------------------------------------------------------
Omnibus:              10.559        Durbin-Watson:           1.338
Prob(Omnibus):        0.005         Jarque-Bera (JB):        5.680
Skew:                 0.159         Prob(JB):                0.058
Kurtosis:             2.338         Condition No.:           1    
==================================================================

In [34]:
LRols = sm.OLS(GCpG.logMGC[GCpG.logMd.notna()].values,
               GCpG.logMd[GCpG.logMd.notna()].values).fit() # OLS(y, X)
print(LRols.summary2())
                 Results: Ordinary least squares
==================================================================
Model:              OLS              Adj. R-squared:     0.996    
Dependent Variable: y                AIC:                340.7403 
Date:               2018-06-17 22:09 BIC:                344.2737 
No. Observations:   253              Log-Likelihood:     -169.37  
Df Model:           1                F-statistic:        7.156e+04
Df Residuals:       252              Prob (F-statistic): 2.49e-311
R-squared:          0.996            Scale:              0.22424  
---------------------------------------------------------------------
         Coef.     Std.Err.       t        P>|t|     [0.025    0.975]
---------------------------------------------------------------------
x1       0.7394      0.0028    267.5077    0.0000    0.7339    0.7448
------------------------------------------------------------------
Omnibus:              5.236         Durbin-Watson:           1.515
Prob(Omnibus):        0.073         Jarque-Bera (JB):        7.433
Skew:                 -0.017        Prob(JB):                0.024
Kurtosis:             3.839         Condition No.:           1    
==================================================================

Populating GCs in the local universe (chap 2)

GCpG models (sec 2.2)

Linear models: VMag/KMag vs logNgc

  1. [x] dynamical mass model $\mathscr{LR}$(logMd)

  2. [=] luminosity model logNgc $=\mathscr{LR}$(VMag/KMag)

    1. [=] scatter plot with LR: strong correlation, present in scatter + regplot

    2. OLS LR summary

  3. revisit with errorbar

GC SN model

In [35]:
# 2-A. `VMag/KMag vs logNgc` in scatter
fig = figure(figsize=fig_size)
ax = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax.set_xlim([-12, -27])
ax2.set_xlim([-12, -27])
for i in range(len(GCpG.Mcat.unique())):
    mask_MT = (GCpG.Mcat == GCpG.Mcat.unique()[i])
    ax.scatter(GCpG.VMag[mask_MT], GCpG.logNgc[mask_MT],**markers[i])
    sns.regplot(GCpG.VMag[mask_MT], GCpG.logNgc[mask_MT],
                scatter=False, ci=None, line_kws={'lw':1}, ax=ax)
    ax2.scatter(GCpG.KMag[mask_MT], GCpG.logNgc[mask_MT],**markers[i])
    sns.regplot(GCpG.KMag[mask_MT], GCpG.logNgc[mask_MT],
                scatter=False, ci=None, line_kws={'lw':1}, ax=ax2)
sns.regplot(GCpG.VMag,GCpG.logNgc, scatter=False, ci=ci, line_kws={'lw':1}, ax=ax)
sns.regplot(GCpG.KMag,GCpG.logNgc, scatter=False, ci=ci, line_kws={'lw':1}, ax=ax2)
ax.legend()
ax2.legend()

if flag_svfg:
    savefig(path.join(path_fig,'LR_Mag.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

GCpG models (sec 2.2)

Linear models: VMag/KMag vs logNgc

  1. [x] dynamical mass model $\mathscr{LR}$(logMd)

  2. [x] luminosity model logNgc=$\mathscr{LR}$(VMag/KMag)

    1. [x] scatter plot with LR

    2. [=] OLS LR summary

      • VMag(0.866) < KMag(0.869) < logMd(0.923/0.996)
      • recall logMd(253/418), KMag(344/418) interpolated from VMag
      • => might be selection effect: logMd requires good photometry data
      • => overfitting in interpolation of KMag
  3. revisit

GC SN model

In [36]:
# 2-B. `VMag/KMag vs logNgc` in OLS linear regression
LRols = sm.OLS(GCpG.logNgc.values, GCpG.VMag.values).fit() # OLS(y, X)
print(LRols.summary2())
                 Results: Ordinary least squares
==================================================================
Model:              OLS              Adj. R-squared:     0.866    
Dependent Variable: y                AIC:                1043.0025
Date:               2018-06-17 22:10 BIC:                1047.0379
No. Observations:   418              Log-Likelihood:     -520.50  
Df Model:           1                F-statistic:        2698.    
Df Residuals:       417              Prob (F-statistic): 3.34e-184
R-squared:          0.866            Scale:              0.70819  
---------------------------------------------------------------------
       Coef.     Std.Err.       t        P>|t|      [0.025     0.975]
---------------------------------------------------------------------
x1    -0.1106      0.0021    -51.9461    0.0000    -0.1148    -0.1064
------------------------------------------------------------------
Omnibus:               24.883       Durbin-Watson:          1.467 
Prob(Omnibus):         0.000        Jarque-Bera (JB):       19.434
Skew:                  0.434        Prob(JB):               0.000 
Kurtosis:              2.397        Condition No.:          1     
==================================================================

In [37]:
LRols = sm.OLS(GCpG.logNgc.values, GCpG.KMag.values).fit() # OLS(y, X)
print(LRols.summary2())
                 Results: Ordinary least squares
==================================================================
Model:              OLS              Adj. R-squared:     0.870    
Dependent Variable: y                AIC:                1030.4039
Date:               2018-06-17 22:10 BIC:                1034.4393
No. Observations:   418              Log-Likelihood:     -514.20  
Df Model:           1                F-statistic:        2794.    
Df Residuals:       417              Prob (F-statistic): 6.21e-187
R-squared:          0.870            Scale:              0.68716  
---------------------------------------------------------------------
       Coef.     Std.Err.       t        P>|t|      [0.025     0.975]
---------------------------------------------------------------------
x1    -0.0980      0.0019    -52.8557    0.0000    -0.1017    -0.0944
------------------------------------------------------------------
Omnibus:               22.487       Durbin-Watson:          1.448 
Prob(Omnibus):         0.000        Jarque-Bera (JB):       19.376
Skew:                  0.453        Prob(JB):               0.000 
Kurtosis:              2.458        Condition No.:          1     
==================================================================

Populating GCs in the local universe (chap 2)

GCpG models (sec 2.2)

Linear models:

  1. [x] dynamical mass model logMGC/logNgc=$\mathscr{LR}$(logMd)

  2. [x] luminosity model logNgc=$\mathscr{LR}$(VMag/KMag)

  3. [=] revisit with errorbar, present in 2 errorbars

    1. [=] dynamical mass model: limited by data completeness (253/418)
    2. luminosity model

GC SN model

In [38]:
# 3-A. revisit `logMd vs logMGC/logNgc` with errorbar
fig = figure(figsize=fig_size)
ax = fig.add_subplot(111)
ax2 = ax.twinx()
ax.set_xlim([7.5,13])

for i in range(len(GCpG.Mcat.unique())):
    mask_MT = (GCpG.Mcat == GCpG.Mcat.unique()[i])
    ax.errorbar(GCpG.logMd[mask_MT], GCpG.logMGC[mask_MT],
                xerr=GCpG.e_logMd[mask_MT], yerr=GCpG.e_logMGC[mask_MT], **fmts[i])
    ax2.errorbar(GCpG.logMd[mask_MT], GCpG.logNgc[mask_MT],
                 xerr=GCpG.e_logMd[mask_MT], yerr=GCpG.e_Ngc[mask_MT].apply(log10), **fmts[i])
ax.set_ylabel('logMGC $[M_\odot]$')
ax2.set_ylabel('logNgc')
ax.set_xlabel('logMd $[M_\odot]$')
sns.regplot(GCpG.logMd,GCpG.logMGC, scatter=False, ci=ci, line_kws={'lw':1}, ax=ax)
sns.regplot(GCpG.logMd,GCpG.logNgc, scatter=False, ci=ci, line_kws={'lw':1}, ax=ax2)
ax.legend()

if flag_svfg:
    savefig(path.join(path_fig,'LR_Mdyn_err.jpeg'), **params_svfg)
LRols = sm.OLS(GCpG.logNgc.values, GCpG.KMag.values).fit() # OLS(y, X)
print(LRols.summary2())
                 Results: Ordinary least squares
==================================================================
Model:              OLS              Adj. R-squared:     0.870    
Dependent Variable: y                AIC:                1030.4039
Date:               2018-06-17 22:10 BIC:                1034.4393
No. Observations:   418              Log-Likelihood:     -514.20  
Df Model:           1                F-statistic:        2794.    
Df Residuals:       417              Prob (F-statistic): 6.21e-187
R-squared:          0.870            Scale:              0.68716  
---------------------------------------------------------------------
       Coef.     Std.Err.       t        P>|t|      [0.025     0.975]
---------------------------------------------------------------------
x1    -0.0980      0.0019    -52.8557    0.0000    -0.1017    -0.0944
------------------------------------------------------------------
Omnibus:               22.487       Durbin-Watson:          1.448 
Prob(Omnibus):         0.000        Jarque-Bera (JB):       19.376
Skew:                  0.453        Prob(JB):               0.000 
Kurtosis:              2.458        Condition No.:          1     
==================================================================

Populating GCs in the local universe (chap 2)

GCpG models (sec 2.2)

Linear models: logMd vs logMGC/logNgc and VMag/KMag vs logNgc

  1. [x] dynamical mass model logMGC/logNgc=$\mathscr{LR}$(logMd)

  2. [x] luminosity model logNgc=$\mathscr{LR}$(VMag/KMag)

  3. [=] revisit with errorbar, present in 2 errorbars

    • [x] dynamical mass model (253/418)
    • [=] luminosity model: no better with huge uncertainties

GC SN model

In [39]:
# 3. revisit `VMag/KMag vs logNgc` with errorbar
fig = figure(figsize=fig_size)
ax = fig.add_subplot(111)
ax2 = ax.twiny()
ax.set_xlim([-12, -27])
ax2.set_xlim([-12, -27])
sub_axes = axes([0.15, 0.42, 0.4, 0.4])

for i in range(len(GCpG.Mcat.unique())):
    mask_MT = (GCpG.Mcat == GCpG.Mcat.unique()[i])
    ax.errorbar(GCpG.VMag[mask_MT], GCpG.logNgc[mask_MT],
                xerr=GCpG.e_VMag[mask_MT], yerr=GCpG.e_Ngc[mask_MT].apply(log10), **fmts[i])
    ax2.errorbar(GCpG.KMag[mask_MT], GCpG.logNgc[mask_MT],
                 xerr=GCpG.e_KMag[mask_MT], yerr=GCpG.e_Ngc[mask_MT].apply(log10), **fmts[i])
    sub_axes.scatter(GCpG.VMag[mask_MT], GCpG.logNgc[mask_MT], **markers[i])
sns.regplot(GCpG.VMag,GCpG.logNgc, scatter=False, ci=ci, line_kws={'lw':1}, ax=ax)
sns.regplot(GCpG.KMag,GCpG.logNgc, scatter=False, ci=ci, line_kws={'lw':1}, ax=ax2)
ax.set_ylabel('logNgc')
ax.set_xlabel('VMag')
ax2.set_xlabel('KMag')
sub_axes.set_xlim([-10, -26])
sub_axes.grid(False)
sub_axes.axis('off')
sub_axes.legend()
sub_axes.set_title('Fig. 3, Rodriguez et al. (2015)')

if flag_svfg:
    savefig(path.join(path_fig,'LR_Mag_err.jpeg'), **params_svfg)
In [40]:
LRols = sm.OLS(GCpG.logNgc.values, GCpG.KMag.values).fit() # OLS(y, X)
print(LRols.summary2())
                 Results: Ordinary least squares
==================================================================
Model:              OLS              Adj. R-squared:     0.870    
Dependent Variable: y                AIC:                1030.4039
Date:               2018-06-17 22:10 BIC:                1034.4393
No. Observations:   418              Log-Likelihood:     -514.20  
Df Model:           1                F-statistic:        2794.    
Df Residuals:       417              Prob (F-statistic): 6.21e-187
R-squared:          0.870            Scale:              0.68716  
---------------------------------------------------------------------
       Coef.     Std.Err.       t        P>|t|      [0.025     0.975]
---------------------------------------------------------------------
x1    -0.0980      0.0019    -52.8557    0.0000    -0.1017    -0.0944
------------------------------------------------------------------
Omnibus:               22.487       Durbin-Watson:          1.448 
Prob(Omnibus):         0.000        Jarque-Bera (JB):       19.376
Skew:                  0.453        Prob(JB):               0.000 
Kurtosis:              2.458        Condition No.:          1     
==================================================================

Populating GCs in the local universe (chap 2)

GCpG models (sec 2.2)

Linear models: bothare messy and trustless

GC SN model

  1. [=] the $N_\text{GC}$ per unit of parent galaxy luminosity VMag, normalized to $M_V$ = −15, present in scatter
    • $S_N \equiv N_\text{GC} \times 10^{0.4(M_V^T + 15)}$
    • $N_\text{GC} = S_N / 10^{0.4(M_V^T + 15)}$
  2. [=] LOWESS regression for $S_N$ vs VMag, present in plot

  3. error comparison

In [42]:
# 1. compute GC SN and represent
if 'Sn' not in GCpG.columns:
    GCpG['Sn'] = GCpG.Ngc*10**(0.4*(GCpG.VMag+15.0))
    GCpG['e_Sn'] = GCpG.e_Ngc*10**(0.4*(GCpG.VMag+15.0))

fig = figure(figsize=fig_size)
ax = fig.add_subplot(111)
ax.set_ylim([-2,40])
ax.set_xlim([-10,-25])
ax.set_xlabel("VMag")
ax.set_ylabel("GC SN")
for i in range(len(GCpG.Mcat.unique())):
    mask_MT = (GCpG.Mcat == GCpG.Mcat.unique()[i])
    ax.scatter(GCpG.VMag[mask_MT], GCpG.Sn[mask_MT], **markers[i])
ax.set_title('Fig. 10, Harris et al. (2013)')

# 2. lowess regression
lowess = sm.nonparametric.lowess(GCpG.Sn, GCpG.VMag, frac=0.36)  # lowess(y, X)
f_Sn = interp1d(lowess[:,0],lowess[:,1],bounds_error=False)
ax.plot(GCpG.VMag.sort_values(), f_Sn(GCpG.VMag.sort_values()))
ax.plot(GCpG.VMag.sort_values(), f_Sn(GCpG.VMag.sort_values()),'*', markersize=16, 
        markevery=len(GCpG)-1,label='VMag $\in$ [%.2f,%.2f]' % (GCpG.VMag.max(), GCpG.VMag.min()))
ax.legend(loc=0)

if flag_svfg:
    savefig(path.join(path_fig, 'GCSN.jpeg'), **params_svfg)
LRols = sm.OLS(GCpG.logNgc.values, GCpG.KMag.values).fit() # OLS(y, X)
In [43]:
print(LRols.summary2())
                 Results: Ordinary least squares
==================================================================
Model:              OLS              Adj. R-squared:     0.870    
Dependent Variable: y                AIC:                1030.4039
Date:               2018-06-17 22:10 BIC:                1034.4393
No. Observations:   418              Log-Likelihood:     -514.20  
Df Model:           1                F-statistic:        2794.    
Df Residuals:       417              Prob (F-statistic): 6.21e-187
R-squared:          0.870            Scale:              0.68716  
---------------------------------------------------------------------
       Coef.     Std.Err.       t        P>|t|      [0.025     0.975]
---------------------------------------------------------------------
x1    -0.0980      0.0019    -52.8557    0.0000    -0.1017    -0.0944
------------------------------------------------------------------
Omnibus:               22.487       Durbin-Watson:          1.448 
Prob(Omnibus):         0.000        Jarque-Bera (JB):       19.376
Skew:                  0.453        Prob(JB):               0.000 
Kurtosis:              2.458        Condition No.:          1     
==================================================================

Populating GCs in the local universe (chap 2)

GCpG models (sec 2.2)

Linear models: bothare messy and trustless

GC SN model

  1. [x] the $N_\text{GC}$ per unit of parent galaxy luminosity VMag
    • $S_N \equiv N_\text{GC} \times 10^{0.4(M_V^T + 15)}$
    • $N_\text{GC} = S_N / 10^{0.4(M_V^T + 15)}$
  2. [x] LOWESS regression for $S_N$ vs VMag

  3. [=] error comparison, zoom in view, justificationin errorbar

    • => error suppressed on high luminosity end: a few noisy outliners less-weighted
    • => fair performance on low luminosity end: low influence as the base $M_V$ will be low
    • => well represented in the most range
    • => truncated by VMag range [-11.17, -24.19]: lower-bound
In [45]:
# 3. error comparison, zoomed in
fig = figure(figsize=fig_size)
ax = fig.add_subplot(111)
ax.axis([-10,-25,-2,40])
ax.set_xlabel("VMag")
ax.set_ylabel("GC SN")
axins = zoomed_inset_axes(ax, 3, loc=9)
axins.axis([-18, -20, -0.2, 4.8])
for i in range(len(GCpG.Mcat.unique())):
    mask_MT = (GCpG.Mcat == GCpG.Mcat.unique()[i])
    ax.scatter(GCpG.VMag[mask_MT], GCpG.Sn[mask_MT], alpha=0.8, **markers[i])
    ax.errorbar(GCpG.VMag[mask_MT], GCpG.Sn[mask_MT],
                xerr=GCpG.e_VMag[mask_MT], yerr=GCpG.e_Sn[mask_MT], **fmts[i])
    axins.scatter(GCpG.VMag[mask_MT], GCpG.Sn[mask_MT], alpha=0.8, **markers[i])
    axins.errorbar(GCpG.VMag[mask_MT], GCpG.Sn[mask_MT],
                   xerr=GCpG.e_VMag[mask_MT], yerr=GCpG.e_Sn[mask_MT], **fmts[i])    
ax.legend(GCpG.Mcat.unique())
ax.add_patch(patches.Rectangle((-20,-0.2),2,5,linewidth=1,edgecolor='k',facecolor='none'))
axins.add_patch(patches.Rectangle((-19.99,-0.19),1.98,4.96,linewidth=1,edgecolor='k',facecolor='none'))
ax.set_title('Fig. 10, Harris et al. (2013)')
axins.grid(False)

lowess = sm.nonparametric.lowess(GCpG.Sn, GCpG.VMag, frac=0.36)  # lowess(y, X)
f_Sn = interp1d(lowess[:,0],lowess[:,1],bounds_error=False)
ax.plot(GCpG.VMag.sort_values(), f_Sn(GCpG.VMag.sort_values()))
axins.plot(GCpG.VMag.sort_values(), f_Sn(GCpG.VMag.sort_values()))
_, pp1,pp2 = mark_inset(ax,axins,loc1=1, loc2=1, color='k')
pp1.loc1=1
pp1.loc2=2
pp2.loc1=3
pp2.loc2=4

if flag_svfg:
    savefig(path.join(path_fig, 'GCSN_err.jpeg'), **params_svfg)
In [46]:
LRols = sm.OLS(GCpG.logNgc.values, GCpG.KMag.values).fit() # OLS(y, X)
print(LRols.summary2())
                 Results: Ordinary least squares
==================================================================
Model:              OLS              Adj. R-squared:     0.870    
Dependent Variable: y                AIC:                1030.4039
Date:               2018-06-17 22:10 BIC:                1034.4393
No. Observations:   418              Log-Likelihood:     -514.20  
Df Model:           1                F-statistic:        2794.    
Df Residuals:       417              Prob (F-statistic): 6.21e-187
R-squared:          0.870            Scale:              0.68716  
---------------------------------------------------------------------
       Coef.     Std.Err.       t        P>|t|      [0.025     0.975]
---------------------------------------------------------------------
x1    -0.0980      0.0019    -52.8557    0.0000    -0.1017    -0.0944
------------------------------------------------------------------
Omnibus:               22.487       Durbin-Watson:          1.448 
Prob(Omnibus):         0.000        Jarque-Bera (JB):       19.376
Skew:                  0.453        Prob(JB):               0.000 
Kurtosis:              2.458        Condition No.:          1     
==================================================================

Populating GCs in the local universe (chap 2)

Modeling GC populations (sec 2.3)

  1. [=] bandpass conversion: B/I/K in GWGC from White et al. (2011) to V/K in GCSN model from Harris et al. (2013)

    1. [=] GWGC luminosities, Abs_Mag_B/I(7,877/5,606), App_Mag_B/I/K(7,877/5,606/2,995), and err_Abs/App_Mag_B/I/K:

      • count statistics, present in pandas table
      • completeness, msnoplot, lots of NaNs
    2. explore correlations

    3. fit conversion constant

    4. choose most probable value

  2. revert to Ngc

  3. GC distribution

In [47]:
# 1-A. statistics of GWGC luminosity
display(LoGal.loc[:,LoGal.columns[LoGal.columns.str.contains('Mag')]] \
        .describe().loc[['count','min','max','mean','std'],:].transpose().merge(
            LoGal.loc[:,LoGal.columns[LoGal.columns.str.contains('Mag')]].isna().sum(axis=0).to_frame('NaNs'),
            left_index=True,right_index=True).transpose())
App_Mag_B err_App_Mag_B Abs_Mag_B err_Abs_Mag_B App_Mag_I err_App_Mag_I Abs_Mag_I err_Abs_Mag_I App_Mag_K err_App_Mag_K
count 7877.000000 7877.000000 7877.000000 7877.000000 5606.000000 5606.000000 5606.000000 0.0 2995.000000 2287.00000
min -4.600000 0.040000 -22.590000 0.070000 5.730000 0.030000 -25.520000 NaN 7.271000 0.02000
max 25.260000 3.020000 -4.970000 3.020000 30.440000 6.880000 -1.680000 NaN 15.978000 0.29000
mean 14.981126 0.377188 -16.210734 0.387227 14.158673 0.174930 -17.158776 NaN 12.968526 0.10208
std 2.280469 0.176294 2.451796 0.175116 2.463852 0.317297 2.627928 NaN 1.594678 0.04896
NaNs 1069.000000 1069.000000 1069.000000 1069.000000 3340.000000 3340.000000 3340.000000 8946.0 5951.000000 6659.00000
In [48]:
img = msno.matrix(LoGal.loc[:,LoGal.columns[LoGal.columns.str.contains('Mag')]], inline=False)
if flag_svfg:
    img.savefig(path.join(path_fig, 'GWGC_mags_NaN.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

Modeling GC populations (sec 2.3)

  1. [=] bandpass conversion: B/I/K in GWGC from White et al. (2011) to V/K in GCSN model from Harris et al. (2013)

    1. [x] GWGC luminosities

    2. [=] explore correlations, explain in pairplots with heatmap

    3. fit conversion constant

    4. choose most probable value

  2. revert to Ngc

  3. GC distribution

In [49]:
# 1-B. correlations between bands
if not flag_skip:
    corr = pp_heat(LoGal.loc[:,LoGal.columns[LoGal.columns.str.contains('Mag')]], 0.8, 'pp_GWGC_mags.jpeg')
else:
    ax = figure(figsize=sfig_size).add_subplot(111)
    ax.imshow(imread(path.join(path_fig, 'pp_GWGC_mags.jpeg')))
    ax.grid(False)

Populating GCs in the local universe (chap 2)

Modeling GC populations (sec 2.3)

  1. [=] bandpass conversion: B/I/K in GWGC from White et al. (2011) to V/K in GCSN model from Harris et al. (2013)

    1. [x] GWGC luminosities

    2. [x] explore correlations

    3. [=] fit conversion constant, 197 galaxies in both catalogs, inspect in mag/color scatter

      • [Abs_Mag_B (7,877), err_Abs_Mag_B (7,877)] > [Abs_Mag_I (5,606), err_Abs_Mag_I (0)]: => prefer B over I for Ngc estimation
      • B/I in GWGC (197/148) vs V/K in GCSN=> almost homogenous, prefer B for fit
      • B/I in GWGC (197/148) vs V-K in GCSN=> color/Mcat didn't help (unequal axis)
    4. choose most probable value

  2. revert to Ngc

  3. GC distribution

In [51]:
# 1-C. fit conversion constant, color scatter
idx_GCpG = GCpG.Name.isin(LoGal.Name)
idx_LoGal = LoGal.Name.isin(GCpG.Name)

nullfmt = NullFormatter()  # no labels
left, width = 0.1, 0.65
bottom, height = 0.1, 0.65
bottom_h = left_h = left + width + 0.02
rect_main = [left, bottom, width, height]
rect_histx = [left, bottom_h, width, 0.2]
rect_histy = [left_h, bottom, 0.2, height]

figure(figsize=sfig_size)
ax_histx = axes(rect_histx)
ax_histy = axes(rect_histy)
ax_histx.xaxis.set_major_formatter(nullfmt)
ax_histy.yaxis.set_major_formatter(nullfmt)
ax_main = axes(rect_main)
ax_main.set_xlabel('VMag in GCpG from Harris et al. (2013)')
ax_main.set_ylabel('BMag in GWGC from White et al. (2011)')

ax_main.scatter(GCpG.VMag[idx_GCpG], LoGal.Abs_Mag_B[idx_LoGal],
                c=GCpG.iMType[idx_GCpG], s=40*(2+log2(30/GCpG.Dist[idx_GCpG])), cmap=cmap)
ax_main.errorbar(GCpG.VMag[idx_GCpG], LoGal.Abs_Mag_B[idx_LoGal],
                 xerr=GCpG.e_VMag[idx_GCpG], yerr=LoGal.err_Abs_Mag_B[idx_LoGal], 
                 fmt='+', elinewidth=3, alpha=0.6)

xx, yy = mgrid[ax_main.get_xlim()[0]:ax_main.get_xlim()[1]:100j, 
               ax_main.get_ylim()[0]-0.5:ax_main.get_ylim()[1]+0.5:100j]
positions = vstack([xx.ravel(), yy.ravel()])
values = vstack([GCpG.VMag[idx_GCpG].values, LoGal.Abs_Mag_B[idx_LoGal].values])
kernel = st.gaussian_kde(values)
zz = reshape(kernel(positions).T, xx.shape)
cfset = ax_main.contourf(xx, yy, zz, cmap='Greens', alpha=0.3)  # Contourf plot
cset = ax_main.contour(xx, yy, zz, colors='gray', alpha=0.7)  # Contour plot
ax_main.clabel(cset, inline=1, fontsize=16)
ax_main.legend(['Galaxy in both catalog: %d' % idx_GCpG.sum()], fontsize=16)

ax_main.set_xlim([-23,-16])
ax_main.set_ylim([-22,-15])
ax_histx.set_xlim(ax_main.get_xlim())
ax_histy.set_ylim(ax_main.get_ylim())
binwidth = 0.25
xymax = max([max(fabs(GCpG.VMag[idx_GCpG])), max(fabs(LoGal.Abs_Mag_B[idx_LoGal]))])
lim = (int(xymax/binwidth) + 1) * binwidth
bins = arange(-lim, lim + binwidth, binwidth)
_ = ax_histx.hist(GCpG.VMag[idx_GCpG], bins=bins)
_ = ax_histy.hist(LoGal.Abs_Mag_B[idx_LoGal], bins=bins, orientation='horizontal')

if flag_svfg:
    savefig(path.join(path_fig,'joint_mags.jpeg'), **params_svfg)
In [52]:
# 1-C. fit conversion constant, color scatter
figure(figsize=sfig_size)
ax_histx = axes(rect_histx)
ax_histy = axes(rect_histy)
ax_histx.xaxis.set_major_formatter(nullfmt)
ax_histy.yaxis.set_major_formatter(nullfmt)
ax_main = axes(rect_main)
ax_main.set_xlabel('VMag-KMag in GCpG from Harris et al. (2013)')
ax_main.set_ylabel('BMag in GWGC from White et al. (2011)')

ax_main.scatter(GCpG.VMag[idx_GCpG]-GCpG.KMag[idx_GCpG], LoGal.Abs_Mag_B[idx_LoGal],
                c=GCpG.iMType[idx_GCpG], s=40*(2+log2(30/GCpG.Dist[idx_GCpG])), cmap=cmap)
ax_main.errorbar(GCpG.VMag[idx_GCpG]-GCpG.KMag[idx_GCpG], LoGal.Abs_Mag_B[idx_LoGal],
                 xerr=GCpG.e_VMag[idx_GCpG]-GCpG.e_KMag[idx_GCpG], yerr=LoGal.err_Abs_Mag_B[idx_LoGal],
                 fmt='+', elinewidth=3, alpha=0.6)

xx, yy = mgrid[ax_main.get_xlim()[0]:ax_main.get_xlim()[1]:100j, 
               ax_main.get_ylim()[0]-0.5:ax_main.get_ylim()[1]+0.5:100j]
positions = vstack([xx.ravel(), yy.ravel()])
values = vstack([GCpG.VMag[idx_GCpG].values-GCpG.KMag[idx_GCpG].values,
                 LoGal.Abs_Mag_B[idx_LoGal].values])
kernel = st.gaussian_kde(values)
zz = reshape(kernel(positions).T, xx.shape)
cfset = ax_main.contourf(xx, yy, zz, cmap='Greens', alpha=0.3)  # Contourf plot
cset = ax_main.contour(xx, yy, zz, colors='gray', alpha=0.7)  # Contour plot
ax_main.clabel(cset, inline=1, fontsize=16)
ax_main.legend(['Galaxy in both catalog: %d' % idx_GCpG.sum()], fontsize=16)

ax_main.set_xlim([1,4])
ax_main.set_ylim([-22,-14])
ax_histx.set_xlim(ax_main.get_xlim())
ax_histy.set_ylim(ax_main.get_ylim())
binwidth = 0.25
xymax = max([max(fabs(GCpG.VMag[idx_GCpG])), max(fabs(LoGal.Abs_Mag_B[idx_LoGal]))])
lim = (int(xymax/binwidth) + 1) * binwidth
bins = arange(-lim, lim + binwidth, binwidth)
_ = ax_histx.hist(GCpG.VMag[idx_GCpG]-GCpG.KMag[idx_GCpG], bins=bins)
_ = ax_histy.hist(LoGal.Abs_Mag_B[idx_LoGal], bins=bins, orientation='horizontal')

if flag_svfg:
    savefig(path.join(path_fig,'joint_color.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

Modeling GC populations (sec 2.3)

  1. [=] bandpass conversion: B/I/K in GWGC from White et al. (2011) to V/K in GCSN model from Harris et al. (2013)

    1. [x] GWGC luminosities

    2. [x] explore correlations

    3. [x] fit conversion constant: B/I in GWGC (197/148) vs V/K in GC SN on 197 shared galaxies

    4. [=] choose most probable value, based on kdeplot

      • =>$\mathrm{VMag}_\max = -20.515828946699045$
      • =>$\mathrm{BMag}_\max = -19.627615047946385$
      • =>$\mathrm{IMag}_\max = -21.008287606264027$
  2. revert to Ngc

  3. GC distribution

In [53]:
# 1-D. most probable value from kdeplot
f_joint, (ax1, ax2, ax3) = subplots(3, sharex=True, sharey=True, figsize=sfig_size)
ax1.set_xlim([-25,-10])
p_v = sns.kdeplot(GCpG.VMag[idx_GCpG], shade=True, ax=ax1, color='g')
x,y = p_v.get_lines()[0].get_data()
V_max = x[y.argmax()]
ax1.vlines(V_max, 0, y.max(), linestyle='dashed')
ax1.legend(['$\mathrm{VMag}_{\max}$ = %.2f, from Harris et al. (2013)' % V_max])

p_b = sns.kdeplot(LoGal.Abs_Mag_B[idx_LoGal], shade=True, ax=ax2, color='b')
x,y = p_b.get_lines()[0].get_data()
B_max = x[y.argmax()]
ax2.vlines(B_max, 0, y.max(), linestyle='dashed')
ax2.legend(['$\mathrm{BMag}_{\max}$ = %.2f, from White et al. (2011)' % B_max])

p_i = sns.kdeplot(LoGal.Abs_Mag_I[idx_LoGal], shade=True, ax=ax3, color='r')
x,y = p_i.get_lines()[0].get_data()
I_max = x[y.argmax()]
ax3.vlines(I_max, 0, y.max(), linestyle='dashed')
ax3.legend(['$\mathrm{IMag}_{\max}$ = %.2f, from White et al. (2011)' % I_max])

f_joint.subplots_adjust(hspace=0)

if flag_svfg:
    savefig(path.join(path_fig,'mags_max.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

Modeling GC populations (sec 2.3)

  1. [x] bandpass conversion: B/I/K in GWGC from White et al. (2011) to V/K in GCSN model from Harris et al. (2013)

    1. [x] GWGC luminosities

    2. [x] explore correlations

    3. [x] fit conversion constant: B/I in GWGC (197/148) vs V/K in GC SN on 197 shared galaxies

    4. [x] choose most probable value, based on kdeplot

      • =>$\mathrm{VMag}_\max = -20.515828946699045$
      • =>$\mathrm{BMag}_\max = -19.627615047946385$
      • =>$\mathrm{IMag}_\max = -21.008287606264027$
  2. [=] revert to Ngc:

    • convert BMag/IMag in GWGC to VMag in GCSN: B to V (7,877) + I to V (extra 32) = 7,909/8,946
    • apply $N_\text{GC} = S_N / 10^{0.4(M_V^T + 15)}$
    • result statistics and 5 'Interpolated Galaxy Samples', present in pandas table
      • => lower-bound GC Population is 658,910, over 8,946 galaxies within 30 Mpc
      • => 1,037 (11.59%) GWGC galaxy not in count
      • => 130 (1.45%) extra galaxy not counted due to GC SN truncation
      • => lowess fit is modest
  3. GC distribution

In [54]:
# 2. revert to Ngc
# 2-A. convert BMag/IMag to VMag
LoGal['VMag']=pd.Series(LoGal.Abs_Mag_B + V_max - B_max, index=LoGal.index)
LoGal['e_VMag']=pd.Series(LoGal.err_Abs_Mag_B 
                          - GCpG[idx_GCpG].e_VMag.median()
                          + LoGal[idx_LoGal].err_Abs_Mag_B.median())
_ = LoGal.Abs_Mag_B.isnull() & LoGal.Abs_Mag_I.notnull()
LoGal.VMag[_] = LoGal.Abs_Mag_I[_]  + V_max - I_max

# 2-B. apply GCSN to get Ngc
LoGal['Ngc'] = pd.Series(map(lambda x: 0 if isnan(f_Sn(x))
                             else round(f_Sn(x)/10**(0.4*(x+15))), LoGal.VMag),
                         index=LoGal.index)
LoGal['e_Ngc'] = pd.Series(map(lambda x: LoGal.Ngc[x] if isnan(f_Sn(LoGal.VMag[x]-LoGal.e_VMag[x]))
                               else abs(LoGal.Ngc[x] - (f_Sn(LoGal.VMag[x]-LoGal.e_VMag[x]) / 
                                                    10**(0.4*(LoGal.VMag[x]-LoGal.e_VMag[x]+15)))), 
                               LoGal.index), index=LoGal.index)

# 2-C. statistics of result and 5 'Interpolated Galaxy Samples'
# print('Total GC Population in the Local Universe (30 Mpc) is %d' % LoGal.Ngc.sum())
_ = ['Name', 'RA', 'Dec', 'VMag', 'Dist', 'Abs_Mag_B', 'Abs_Mag_I', 'Ngc','e_Ngc', 'iMType']
display(pd.concat([LoGal.loc[:,_].describe(include='all').loc[['count','min','max','mean','std']], 
                   LoGal.loc[:,_].sample(5,random_state=427)]).loc[:,_])
Name RA Dec VMag Dist Abs_Mag_B Abs_Mag_I Ngc e_Ngc iMType
count 8946 8946.000000 8946.000000 7909.000000 8946.000000 7877.000000 5606.000000 8946.000000 8946.000000 8946.000000
min NaN 0.005940 -89.334590 -23.478214 0.010000 -22.590000 -25.520000 0.000000 0.000000 0.000000
max NaN 23.996680 86.131800 -5.858214 29.986000 -4.970000 -1.680000 11608.000000 7187.308776 6.000000
mean NaN 10.973250 3.455383 -17.094336 19.011495 -16.210734 -17.158776 74.085848 46.053841 0.916387
std NaN 5.277991 34.889882 2.452026 6.616150 2.451796 2.627928 262.963677 190.324592 0.936047
168135 SDSSJ142852.74+123454.0 14.481330 12.581660 NaN 28.958000 NaN NaN 0.000000 0.000000 0.000000
168242 SDSSJ144213.99+333517.4 14.703880 33.588320 NaN 24.528000 NaN NaN 0.000000 0.000000 0.000000
146627 SDSSJ080937.39+413526.6 8.160380 41.590710 -14.168214 11.625000 -13.280000 -12.870000 5.000000 2.037453 0.000000
24031 UGC06251 11.223940 53.595040 -17.718214 18.450000 -16.830000 NaN 25.000000 7.388049 1.000000
171592 SDSSJ151605.06+561614.7 15.268080 56.270820 -16.628214 9.875000 -15.740000 -16.790000 14.000000 3.364954 0.000000

Populating GCs in the local universe (chap 2)

Modeling GC populations (sec 2.3)

  1. [x] bandpass conversion: B/I/K in GWGC from White et al. (2011) to V/K in GCSN model from Harris et al. (2013)

    1. [x] GWGC luminosities

    2. [x] explore correlations

    3. [x] fit conversion constant: B/I in GWGC (197/148) vs V/K in GC SN on 197 shared galaxies

    4. [x] choose most probable value, based on kdeplot

      • =>$\mathrm{VMag}_\max = -20.515828946699045$
      • =>$\mathrm{BMag}_\max = -19.627615047946385$
      • =>$\mathrm{IMag}_\max = -21.008287606264027$
  2. [x] revert to Ngc: 7,909/8,946 with Ngc

    • => lower-bound GC Population is 658,910, over 8,946 galaxies within 30 Mpc
    • => 1,037 (11.59%) GWGC galaxy not in count
    • => 130 (1.45%) extra galaxy not counted due to GC SN truncation
    • => lowess fit is modest
  3. [=] GC distribution

    • present in all-sky plot
In [56]:
# 2-D. all-sky zoom in view
fig = figure(figsize=fig_size)
ax = fig.add_subplot(111, projection="mollweide")
ax.set_xticklabels(['2h','4h','6h','8h','10h','12h','14h','16h','18h','20h','22h'])
ax.grid(True)
vmin = 0.1
host_GC = LoGal.Ngc > 0
# RA: [0h-24h] -> [-pi,pi]; Dec: [-90°,90°] -> [-pi/2,pi/2]
ax.scatter((LoGal.RA[host_GC]-12)/12*pi, LoGal.Dec[host_GC]/180*pi,
           s=5*(2+log2(30./LoGal.Dist[host_GC])),c=LoGal.Ngc[host_GC],alpha=0.5,edgecolors=None,
           norm=LogNorm(vmin=vmin,vmax=LoGal.Ngc.max()),cmap=cmap)
ax.scatter((LoGal.RA[-host_GC]-12)/12*pi, LoGal.Dec[-host_GC]/180*pi,
           s=2*log2(30./LoGal.Dist[-host_GC]),c='k',alpha=0.5,edgecolors=None)
axins = zoomed_inset_axes(ax, 3.2, loc=5, bbox_to_anchor=(1, 0.5), bbox_transform=ax.figure.transFigure)
axins.axis([-0.35, 0.3, -0.1, 0.4])
cax = axins.scatter((LoGal.RA[host_GC]-12)/12*pi, LoGal.Dec[host_GC]/180*pi,
                    s=5*(2+log2(30./LoGal.Dist[host_GC])),c=LoGal.Ngc[host_GC],alpha=0.5,edgecolors=None,
                    norm=LogNorm(vmin=vmin,vmax=LoGal.Ngc.max()),cmap=cmap)
axins.scatter((LoGal.RA[-host_GC]-12)/12*pi,LoGal.Dec[-host_GC]/180*pi,
              s=2*log2(30./LoGal.Dist[-host_GC]),c='k',alpha=0.5,edgecolors=None)
axins.set_xticklabels([])
axins.set_yticklabels([])
cbar = fig.colorbar(cax, orientation='horizontal',fraction=0.07,anchor=(0.5,0.0))

ax.add_patch(patches.Rectangle((-0.35,-0.1),0.65,0.5,linewidth=1,edgecolor='k',facecolor='none'))
axins.add_patch(patches.Rectangle((-0.349,-0.096),0.646,0.495,linewidth=1,edgecolor='k',facecolor='none'))
anno_kws = {'width':1, 'headwidth':1, 'headlength':1, 'shrink':0.05}
ax.annotate('',xy=(2.44,0.80),xytext=(-0.48,0.4),arrowprops=anno_kws)
ax.annotate('',xy=(2.50,-.84),xytext=(-0.48,-0.07),arrowprops=anno_kws)
ax.set_title('Total GC Population in the Local Universe (30 Mpc) is %d' % LoGal.Ngc.sum())

if flag_svfg:
    savefig(path.join(path_fig, 'all-sky.jpeg'), **params_svfg)

Populating GCs in the local universe (chap 2)

GWGC for galaxy abundance (sec 2.1): 8,946 galaxies within 30 Mpc

Harris catalog for GCpG model (sec 2.2.1): linear model can be improved

GCpG models (sec 2.2): a modest GC SN model

Modeling GC populations (sec 2.3): 658,910 GCs in 7,909 out of the galaxies within 30 Mpc

Name RA Dec VMag Dist Abs_Mag_B Abs_Mag_I Ngc e_Ngc iMType
count 8946 8946.000000 8946.000000 0.0 8946.000000 7877.000000 5606.000000 0.0 0.0 8946.000000
min NaN 0.005940 -89.334590 NaN 0.010000 -22.590000 -25.520000 NaN NaN 0.000000
max NaN 23.996680 86.131800 NaN 29.986000 -4.970000 -1.680000 NaN NaN 6.000000
mean NaN 10.973250 3.455383 NaN 19.011495 -16.210734 -17.158776 NaN NaN 0.916387
std NaN 5.277991 34.889882 NaN 6.616150 2.451796 2.627928 NaN NaN 0.936047
168135 SDSSJ142852.74+123454.0 14.481330 12.581660 NaN 28.958000 NaN NaN NaN NaN 0.000000
168242 SDSSJ144213.99+333517.4 14.703880 33.588320 NaN 24.528000 NaN NaN NaN NaN 0.000000
146627 SDSSJ080937.39+413526.6 8.160380 41.590710 NaN 11.625000 -13.280000 -12.870000 NaN NaN 0.000000
24031 UGC06251 11.223940 53.595040 NaN 18.450000 -16.830000 NaN NaN NaN 1.000000
171592 SDSSJ151605.06+561614.7 15.268080 56.270820 NaN 9.875000 -15.740000 -16.790000 NaN NaN 0.000000

Simulating GCs (chap 3 + chap 4)

Modeling GCs

  1. Characteristics (3.1)

    • Monte Carlo method (4.1)
  2. Stellar evolution (3.2)

    • SSE and BSE (4.2.1)
  3. Stellar dynamics (3.3)

    • Fewbody code (4.2.2)
  4. External environment (3.4)

    • Escape (4.2.3)
  5. MOCCA (4.2)

    • Monte Carlo + SSE and BSE + Fewbody code + Escape
In [57]:
ax = figure(figsize=(16,20)).add_subplot(111)
ax.imshow(imread('Fig_D/mont.png'))
ax.grid(False)
_ = ax.axis('off')