Earth Engine 데이터 다루기 with Python

본 문서는 Earth Engine Python API를 사용하여 ee.Image/Feature 등을 Pandas의 DataFrame 또는 Numpy의 Array 형태로 얻는 방법을 담고 있다.

Earth Engine과 Streamlit에서는 ImageCollection으로부터 직접 DataFrame을 생성하였다면, computeFeatures을 사용하여 바로 type을 바꾸는 방법을 담고 있다.

1. Library

필요한 library를 import 하고, earth engine 계정 인증과 초기화를 진행한다.

import altair as alt
import ee
import eerepr
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from mpl_toolkits.axes_grid1 import ImageGrid

# ee.Authenticate()
ee.Initialize()

2. Data

2-1. BASINS

WWF HydroSHEDS Basins Level 6

전세계의 500m 해상도의 수로 정보를 제공한다.

2-2. BOUNDARIES

FAO GAUL: Global Administrative Unit Layers 2015, First-Level Administrative Units

Global Administrative Unit Layers(GAUL, 전세계 행정구역 레이어)

2-3. CLIMATE

WorldClim Climatology V1

최저, 평균, 최고 기온과 강수량에 대한 월간 평균 지구 기후 데이터를 제공한다.

BASINS_ID = 'WWF/HydroSHEDS/v1/Basins/hybas_6'
BOUNDARIES_ID = 'FAO/GAUL/2015/level1'
CLIMATE_ID = 'WORLDCLIM/V1/MONTHLY'

BOUNDARIES_ID에서 대한민국을 필터링하여 대한민국 레이어(ko)를 얻고, ko를 사용하여 BASINS_ID에서 대한민국을 필터링한다.

basins = ee.FeatureCollection(BASINS_ID)
ko = ee.FeatureCollection(BOUNDARIES_ID).filter(
ee.Filter.eq('ADM0_NAME', 'Republic of Korea')
)

ko_basins = basins.filterBounds(ko)

CLIMATE_ID에서 강수량 밴드를 부분 집합화하고 개별 이미지를 단일 이미지로 쌓는다.

precip = ee.ImageCollection(CLIMATE_ID).select('prec')

months = precip.aggregate_array('month').getInfo()

band_names = [f'prec_month_{str(m).zfill(2)}' for m in months]

monthly_precip = ee.Image(precip.toBands().rename(band_names))

# monthly_precip.bandNames()

각 유역(ko_basins)의 평균 월 강수량을 계산한다.

ko_basins = monthly_precip.reduceRegions(
    collection=ko_basins,
    reducer=ee.Reducer.mean(),
    scale=1e3
)

3. FeatueCollection to Pandas’s DataFrame

ee.data.computeFeatures에서 'fileFormat': 'PANDAS_DATAFRAME'`을 사용하여 데이터를 pandas의 DataFrame 형태로 가져온다.

ko_basins_df = ee.data.computeFeatures({
    'expression': ko_basins,
    'fileFormat': 'PANDAS_DATAFRAME'
})

display(type(ko_basins_df))
ko_basins_df.head()

pandas.core.frame.DataFrame

	geo	HYBAS_ID	MAIN_BAS	NEXT_SINK	ORDER	...	prec_month_03	prec_month_04	prec_month_05	prec_month_06	prec_month_07	prec_month_08	prec_month_09	prec_month_10	prec_month_11	prec_month_12
0	{'type': 'Polygon', 'coordinates': [[[127.4993...	4060003940	4060003940	4060003940	1	...	58.028414	99.990045	93.973703	162.829607	263.427185	232.806206	143.810684	52.779612	44.205669	24.069764
1	{'type': 'Polygon', 'coordinates': [[[126.8624...	4060004100	4060004100	4060004100	1	...	62.543293	118.377280	109.540524	204.908318	303.571153	265.380813	154.833211	58.992146	50.466053	32.036306
2	{'type': 'Polygon', 'coordinates': [[[126.4374...	4060004500	4060004500	4060004500	1	...	61.486832	105.254806	101.688776	178.860403	259.184409	229.985039	148.628859	53.997028	52.833427	33.797719
3	{'type': 'Polygon', 'coordinates': [[[126.6916...	4060004680	4060004680	4060004680	1	...	53.852809	93.427826	91.965696	156.287028	303.727885	250.188197	138.769812	50.896502	48.903727	30.928487
4	{'type': 'Polygon', 'coordinates': [[[127.4375...	4060005050	4060005050	4060005050	1	...	51.468401	88.617984	96.304852	150.415489	338.487644	286.903079	168.188267	56.982934	49.862071	29.780648

5 rows × 26 columns

3-1. melt

ko_basins_df = ko_basins_df.melt(
    id_vars=["HYBAS_ID"],
    value_vars=band_names,
    var_name="Month",
    value_name="Precipitation",
)
ko_basins_df.head()

	HYBAS_ID	Month	Precipitation
0	4060003940	prec_month_01	27.502053
1	4060004100	prec_month_01	35.703600
2	4060004500	prec_month_01	36.438101
3	4060004680	prec_month_01	28.868651
4	4060005050	prec_month_01	28.169797

3-2. groupby & bar plot

groupby를 사용하여 각 유역의 평균 연간 강수량 그래프를 그린다.

ko_basins_df.groupby(['HYBAS_ID'])['Precipitation'].sum().plot.bar();

그림1

alt.Chart(ko_basins_df).mark_bar().encode(
    x=alt.X('HYBAS_ID:O'),
    y=alt.Y('Precipitation:Q', title='Precipitation (mm)'),
    color=alt.Color('Month', scale=alt.Scale(scheme='rainbow')),
    tooltip=alt.Tooltip(['HYBAS_ID', 'Precipitation', 'Month'])
).interactive()

4. FeatureCollection to GeoPandas’s GeoDataFrame

ee.data.computeFeatures에서 'fileFormat': 'GEOPANDAS_GEODATAFRAME'`을 사용하여 데이터를 geopandas의 GeoDataFrame 형태로 가져온다. 좌표계는 EPSG:5186로 지정해준다.

ko_basins_gdf = ee.data.computeFeatures({
    'expression': ko_basins,
    'fileFormat': 'GEOPANDAS_GEODATAFRAME'
})
ko_basins_gdf.crs = 'EPSG:5186'

display(type(ko_basins_gdf))
ko_basins_gdf.head()

geopandas.geodataframe.GeoDataFrame

	geometry	HYBAS_ID	MAIN_BAS	NEXT_SINK	ORDER	...	prec_month_03	prec_month_04	prec_month_05	prec_month_06	prec_month_07	prec_month_08	prec_month_09	prec_month_10	prec_month_11	prec_month_12
0	POLYGON ((127.499 35.408, 127.504 35.402, 127....	4060003940	4060003940	4060003940	1	...	58.028414	99.990045	93.973703	162.829607	263.427185	232.806206	143.810684	52.779612	44.205669	24.069764
1	POLYGON ((126.862 35.471, 126.864 35.467, 126....	4060004100	4060004100	4060004100	1	...	62.543293	118.377280	109.540524	204.908318	303.571153	265.380813	154.833211	58.992146	50.466053	32.036306
2	POLYGON ((126.437 35.175, 126.438 35.168, 126....	4060004500	4060004500	4060004500	1	...	61.486832	105.254806	101.688776	178.860403	259.184409	229.985039	148.628859	53.997028	52.833427	33.797719
3	POLYGON ((126.692 36.192, 126.694 36.191, 126....	4060004680	4060004680	4060004680	1	...	53.852809	93.427826	91.965696	156.287028	303.727885	250.188197	138.769812	50.896502	48.903727	30.928487
4	POLYGON ((127.438 38.742, 127.437 38.710, 127....	4060005050	4060005050	4060005050	1	...	51.468401	88.617984	96.304852	150.415489	338.487644	286.903079	168.188267	56.982934	49.862071	29.780648

5 rows × 26 columns

ko_gdf = ee.data.computeFeatures({
    'expression': ko,
    'fileFormat': 'GEOPANDAS_GEODATAFRAME'
})

ko_gdf.crs = 'EPSG:5186'
ko_basins_gdf = ko_basins_gdf.clip(ko_gdf).to_crs(ko_gdf.crs)

4-1. prec_total

각 유역의 12개월 평균 강수량을 합산하여 prec_total열을 만든다.

ko_basins_gdf['prec_total'] = ko_basins_gdf[band_names].sum(axis=1)
ko_basins_gdf.head()

	geometry	COAST	HYBAS_ID	MAIN_BAS	NEXT_SINK	ORDER	...	prec_month_04	prec_month_05	prec_month_06	prec_month_07	prec_month_08	prec_month_09	prec_month_10	prec_month_11	prec_month_12	prec_total
6	MULTIPOLYGON (((126.166 33.282, 126.165 33.283...	1	4060034010	4060034010	4060034010	0	...	142.514948	152.581449	223.035944	239.163145	234.208038	191.590624	67.014047	70.792926	46.237866	1594.097751
9	MULTIPOLYGON (((125.209 34.685, 125.209 34.685...	1	4060004110	4060004110	4060004110	0	...	125.082529	126.071002	215.532278	254.175067	242.603207	157.447390	54.749149	50.128210	28.387179	1404.952484
2	POLYGON ((126.438 35.179, 126.444 35.183, 126....	0	4060004500	4060004500	4060004500	1	...	105.254806	101.688776	178.860403	259.184409	229.985039	148.628859	53.997028	52.833427	33.797719	1307.807837
10	MULTIPOLYGON (((126.010 35.326, 126.001 35.324...	1	4060004510	4060004510	4060004510	0	...	95.282169	91.067951	152.975195	258.768160	220.711349	137.526791	52.846184	54.326684	36.889545	1234.627188
11	MULTIPOLYGON (((126.021 37.065, 126.014 37.063...	1	4060004690	4060004690	4060004690	0	...	90.999584	86.807500	131.694396	301.573913	244.272398	144.170222	50.639889	50.635402	30.537785	1234.594562

5 rows × 27 columns

4-2. 단계구분도

단계구분도를 만들고, 최소 강수량 지역은 빨간색 테두리로, 최대 강수량 지역은 분홍색 테두리로 표시해본다.

# Define the choropleth map.
ax = ko_basins_gdf.plot(
    column='prec_total',
    cmap='viridis_r',
    vmin=ko_basins_gdf['prec_total'].min(),
    vmax=ko_basins_gdf['prec_total'].max(),
    legend=False,
    edgecolor='grey', linewidth=0.5
)

# Highlight the basin with the minimum annual precipitation: subset the geometry
# with the minimum precipitation total and then add it to the basin
# precipitation plot.
min_prec_gdf = ko_basins_gdf.loc[[ko_basins_gdf['prec_total'].idxmin()]]
min_prec_gdf.plot(ax=ax, color='none', edgecolor='red', linewidth=2)

max_prec_gdf = ko_basins_gdf.loc[[ko_basins_gdf['prec_total'].idxmax()]]
max_prec_gdf.plot(ax=ax, color='none', edgecolor='pink', linewidth=2)
# Add axis labels, a colorbar, and rotate x axis ticks.
ax.set_xlabel('Eastings [m]')
ax.set_ylabel('Northings [m]')
colorbar = plt.colorbar(ax.get_children()[0], fraction=0.03)
colorbar.set_label('Precipitation (mm)')
plt.xticks(rotation=45)

plt.show()

그림1

5. Image to Numpy Array

ee.data.computeFeatures에서 'fileFormat': 'NUMPY_NDARRAY'`을 사용하여 데이터를 Numpy 구조 배열로 계산된 이미지 형태로 가져온다.

monthly_precip_korea = monthly_precip.clipToBoundsAndScale(
    geometry=ko_basins, scale=1500
)

monthly_precip_npy = ee.data.computePixels({
    'expression': monthly_precip_korea,
    'fileFormat': 'NUMPY_NDARRAY'
})

monthly_precip_npy.shape

(609, 541)

이 작업은 다중 대역 이미지 데이터에 적합하다. 각 밴드가 열 이름인 배열 테이블로 생각할 수 있다. 또, 각 밴드가 서로 다른 데이터 유형을 가질 수 있다.

5-1. 한 장의 지도

names = monthly_precip_npy.dtype.names
print('field names:', names)

prec_month_10_arr = monthly_precip_npy['prec_month_10']
print('Selected array (band) shape:', prec_month_10_arr.shape)
plt.imshow(prec_month_10_arr, vmin=0, vmax=320);

field names: ('prec_month_01', 'prec_month_02', 'prec_month_03', 'prec_month_04', 'prec_month_05', 'prec_month_06', 'prec_month_07', 'prec_month_08', 'prec_month_09', 'prec_month_10', 'prec_month_11', 'prec_month_12')
Selected array (band) shape: (609, 541)

그림1

5-2. 시계열 강수량 지도

각 달의 평균 강수량을 그려 시계열로 표시해본다.

# Set up the figure and grid.
fig = plt.figure(figsize=(20.0, 20.0))
grid = ImageGrid(
    fig,
    111,
    nrows_ncols=(4, 3),
    axes_pad=0.4,
    cbar_mode="single",
    cbar_location="right",
    cbar_pad=0.4,
    cbar_size="2%",
)

# Display each band to a grid cell.
for ax, name in zip(grid, names):
    ax.imshow(monthly_precip_npy[name], vmin=0, vmax=500)
    ax.set_title(name)

# Add colorbar.
colorbar = plt.colorbar(ax.get_children()[0], cax=grid[0].cax)
colorbar.set_label("Precipitation (mm)")

plt.show()

그림1

6. Soreted Data

6-1. FeatureCollection to DataFrame

속성 값으로 필터링하는 것이 가능하다. 한국의 유역으로 필터링하고, 하천 차수가 1이상인 유역만 DataFrame 또는 GeoDataFrame으로 가져온다. ORDER로 필터링을 하면 16개 중 6개만 남기 때문에 주석 처리하였다.

high_order_ko_basins_df = ee.data.listFeatures({
    'assetId': 'WWF/HydroSHEDS/v1/Basins/hybas_6',
    'region': ko.geometry().getInfo(),
    # 'filter': 'ORDER >= 1',
    'fileFormat': 'PANDAS_DATAFRAME'
})

high_order_ko_basins_df.head()

	geo	HYBAS_ID	MAIN_BAS	NEXT_SINK	ORDER	PFAF_ID	SORT	SUB_AREA	UP_AREA
0	{'type': 'Polygon', 'coordinates': [[[127.4993...	4060003940	4060003940	4060003940	1	424102	317	23296.4	23296.4
1	{'type': 'Polygon', 'coordinates': [[[126.8624...	4060004100	4060004100	4060004100	1	424104	319	4833.0	4833.0
2	{'type': 'Polygon', 'coordinates': [[[126.4374...	4060004500	4060004500	4060004500	1	424106	321	2860.9	2861.4
3	{'type': 'Polygon', 'coordinates': [[[126.6916...	4060004680	4060004680	4060004680	1	424108	323	9776.8	9776.8
4	{'type': 'Polygon', 'coordinates': [[[127.4375...	4060005050	4060005050	4060005050	1	424200	325	25245.3	25245.3

high_order_ko_basins_gdf = ee.data.listFeatures({
    'assetId': 'WWF/HydroSHEDS/v1/Basins/hybas_6',
    'region': ko.geometry().getInfo(),
    # 'filter': 'ORDER >= 1',
    'fileFormat': 'GEOPANDAS_GEODATAFRAME'
})

display(type(ko_basins_gdf))
high_order_ko_basins_gdf.head()

geopandas.geodataframe.GeoDataFrame

	geometry	HYBAS_ID	MAIN_BAS	NEXT_SINK	ORDER	PFAF_ID	SORT	SUB_AREA	UP_AREA
0	POLYGON ((127.49939 35.40779, 127.50356 35.402...	4060003940	4060003940	4060003940	1	424102	317	23296.4	23296.4
1	POLYGON ((126.86250 35.47083, 126.86371 35.466...	4060004100	4060004100	4060004100	1	424104	319	4833.0	4833.0
2	POLYGON ((126.43750 35.17500, 126.43824 35.167...	4060004500	4060004500	4060004500	1	424106	321	2860.9	2861.4
3	POLYGON ((126.69167 36.19166, 126.69412 36.191...	4060004680	4060004680	4060004680	1	424108	323	9776.8	9776.8
4	POLYGON ((127.43750 38.74167, 127.43715 38.710...	4060005050	4060005050	4060005050	1	424200	325	25245.3	25245.3

# Create an initial plot with the high river order watersheds.
ax = high_order_ko_basins_gdf.plot(edgecolor='purple', linewidth=1)

# Overlay the Washington state border for context.
ko_gdf.plot(ax=ax, color='none', edgecolor='black', linewidth=1)

# Set axis labels.
ax.set_xlabel('Eastings [degrees]')
ax.set_ylabel('Northings [degrees]')

plt.show()

그림1

6-2. Image to Numpy Array

ee.Image.clipToBoundsAndScale함수를 사용해 요청 영역과 스케일을 정의할 수 없다. 따라서 grid 매개변수를 사용해서 영역을 지정한다.

SCALE_FACTOR = 5

jan_mean_temp_npy = ee.data.getPixels({
    'assetId': 'WORLDCLIM/V1/MONTHLY/01',
    'fileFormat': 'NUMPY_NDARRAY',
    'grid': {
        'dimensions': {
            'width': 360 * SCALE_FACTOR,
            'height': 180 * SCALE_FACTOR
        },
        'affineTransform': {
            'scaleX': 1 / SCALE_FACTOR,
            'shearX': 0,
            'translateX': -180,
            'shearY': 0,
            'scaleY': -1 / SCALE_FACTOR,
            'translateY': 90
        },
        'crsCode': 'EPSG:4326',
    },
    'bandIds': ['tavg']
})

jan_mean_temp_npy.shape

(900, 1800)

배열에서 -9999보다 작은 값들은 Nan으로 대체한다. 배열의 모든 원소에 0.1을 곱해 값을 스케일링해 실제 온도로 변환한다.

jan_mean_temp_npy = jan_mean_temp_npy['tavg']

jan_mean_temp_npy = np.where(jan_mean_temp_npy < -9999, np.nan, jan_mean_temp_npy)
jan_mean_temp_npy = jan_mean_temp_npy * 0.1

fig = plt.figure(figsize=(10., 10.))
ax = plt.imshow(jan_mean_temp_npy, cmap='coolwarm', vmin=-40, vmax=40)

colorbar = plt.colorbar(ax, fraction=0.0235)
colorbar.set_label('Mean January Temp (°C)')

plt.show()

그림1