Python Tensorflow CNN¶

pooling¶

지난 포스팅에서 필터를 활용하여 conv2d 레이어층을 만들어 적용시켰었다.
CNN흐름에서 도출된 레이어층을 샘플링(풀링)한다.
풀링의 종류는 평균풀링과 맥스풀링이 존재하며,

maxpooling이 많이 쓰인다.

모듈세팅¶

import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

maxpooling 실습¶

padding X

#2*2테스트 이미지 생성, image의 텐서의 각 값은 아래와 같음

image = tf.constant([[[[1],[2]],
                    [[6],[4]]]], dtype=np.float32)

#maxpool2D연산을 수행한다
#pool_size2, strides=1, padding X
pool = keras.layers.MaxPool2D(pool_size=(2,2), strides=1, padding='VALID')(image)
print(pool.shape)
print(pool.numpy())

(1, 1, 1, 1)
[[[[6.]]]]

padding O

#2*2테스트 이미지 생성, image의 텐서의 각 값은 아래와 같음

image = tf.constant([[[[1],[2]],
                    [[6],[4]]]], dtype=np.float32)

#maxpool2D연산을 수행한다
#pool_size2, strides=1, padding O
pool = keras.layers.MaxPool2D(pool_size=(2,2), strides=1, padding='SAME')(image)
print(pool.shape)
print(pool.numpy())

(1, 2, 2, 1)
[[[[6.]
   [4.]]

  [[6.]
   [4.]]]]

conv -> pooling 실습¶

mnist data활용

#mnist데이타 호출
mnist = keras.datasets.mnist
class_names = ['0','1','2','3','4','5','6','7','8','9']

데이타 세팅¶

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

데이타 확인¶

print(train_images.shape)
print(train_labels.shape)
print(test_images.shape)
print(test_labels.shape)

(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)

print(train_images[0].shape)
print(train_images)

(28, 28)
[[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 ...

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]]

img = train_images[0]
plt.imshow(img, cmap='gray')
plt.show()

데이타 정규화¶

텐서의 값을 0~1사이의 값으로 정규화 시킨다

train_images = train_images.astype(np.float32) / 255.
test_images = test_images.astype(np.float32) / 255.

데이터 확인¶

정규화를 해도 데이터이미지는 동일하다 train_images의 이미지 1개만 사용함

img = train_images[0]
plt.imshow(img, cmap='gray')
plt.show()

img.shape

(28, 28)

conv2d레이어 연산¶

이미지를 4차원으로 변환한 후 작업한다

#4차원으로 변환
#배치size, 세로, 가로, 채널로 4차원으로 만들어 줘야함
#배치size를 -1로 하면, 자동으로 입력이 된다.
#이 경우 이미지 1장만 사용하기 때문에 1로 됨
#채널은 색상 (그레이) 한가지만 사용하기 때문에 1임
img = img.reshape(-1,28,28,1)
print("image.shape, 배치size, 세로, 가로, 채널", img.shape)
img = tf.convert_to_tensor(img)


#필터로 사용하기위해 랜덤값으로 init값 설정
weight_init = keras.initializers.RandomNormal(stddev=0.01)


#필터5개, 커널사이즈(필터사이즈)3*3, strides:2*2, 패딩 사용
conv2d = keras.layers.Conv2D(filters=5, kernel_size=3, strides=(2, 2), padding='SAME', 
                             kernel_initializer=weight_init)(img)


print("conv2d.shape , 배치size, 세로, 가로, 채널 ",conv2d.shape)
feature_maps = np.swapaxes(conv2d, 0, 3)
for i, feature_map in enumerate(feature_maps):
    plt.subplot(1,5,i+1), plt.imshow(feature_map.reshape(14,14), cmap='gray')
plt.show()

image.shape, 배치size, 세로, 가로, 채널 (1, 28, 28, 1)
conv2d.shape , 배치size, 세로, 가로, 채널  (1, 14, 14, 5)

pooling layer 연산¶

#pooling size 2*2
#strides 2
#padding O
#입력값 conv2d
pool = keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2), padding='SAME')(conv2d)
print("pool.shape , 배치size, 세로, 가로, 채널", pool.shape)

feature_maps = np.swapaxes(pool, 0, 3)
for i, feature_map in enumerate(feature_maps):
    plt.subplot(1,5,i+1), plt.imshow(feature_map.reshape(7, 7), cmap='gray')
plt.show()

pool.shape , 배치size, 세로, 가로, 채널 (1, 7, 7, 5)

이후 fully Connected layer연산을 진행한다.

pooling가 완료된 레이어를 flat해주고, 출력값을 도출하고 softmax등의 연산을 거치게 된다.

해당 포스팅은 모두를위한 딥러닝과, 부스트코스 강의를 참고하였습니다.

CNN 기본 원리 실습¶

파이썬 텐서플로우 CNN 코드 실습.
CNN : 합성곱 신경망(Convolutional Neural Network)
이미지의 특징을 찾아 분류하는데 용이하게 쓰임

모듈 세팅¶

import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

실습에 사용할 이미지 준비¶

#4차원의 텐서 준비
image= tf.constant([[[[1],[2],[3]],
                    [[4],[5],[6]],
                    [[7],[8],[9]]]],dtype = np.float32)
#배치size : 사진의 갯수
#채널 수 : 색상구분 (회색만으로 구분하기 때문에 1)
print("배치size, 세로, 가로, 채널 수",image.shape)
#
plt.imshow(image.numpy().reshape(3,3), cmap='Greys')
plt.show()

배치size, 세로, 가로, 채널 수 (1, 3, 3, 1)

필터, con레이어 설정 패딩 사용x¶

print("image.shape, 배치size, 세로, 가로, 채널", image.shape)

#필터로 사용할 weight
#2*2로 이루어져 있고, 각 값이 1로 된 필터임
#세로, 가로, 채널, 갯수
weight = np.array([[[[1.]],[[1.]]],
                   [[[1.]],[[1.]]]])
print("weight.shape, 세로, 가로, 채널, 갯수", weight.shape)
#init설정
weight_init = tf.constant_initializer(weight)

#con레이어 설정
#필터 1개, 커널사이즈(필터사이즈) 2*2 ->2, 패딩 x(valid), stribe =1(디폴트)

conv2d = keras.layers.Conv2D(filters=1, kernel_size=2, padding='VALID', 
                             kernel_initializer=weight_init)(image)
#배치size, 세로, 가로, 채널 
print("conv2d.shape , 배치size, 세로, 가로, 채널 ", conv2d.shape)
print(conv2d.numpy().reshape(2,2))
plt.imshow(conv2d.numpy().reshape(2,2), cmap='gray')
plt.show

#패딩을 사용하지 않았기 때문에, conv2연산후 가로세로가 줄었다.

image.shape, 배치size, 세로, 가로, 채널 (1, 3, 3, 1)
weight.shape, 세로, 가로, 채널, 갯수 (2, 2, 1, 1)
conv2d.shape , 배치size, 세로, 가로, 채널  (1, 2, 2, 1)
[[12. 16.]
 [24. 28.]]

<function matplotlib.pyplot.show(close=None, block=None)>

패딩 사용¶

print("image.shape, 배치size, 세로, 가로, 채널", image.shape)

#필터로 사용할 weight
#2*2로 이루어져 있고, 각 값이 1로 된 필터임
#세로, 가로, 채널, 갯수
weight = np.array([[[[1.]],[[1.]]],
                   [[[1.]],[[1.]]]])
print("weight.shape, 세로, 가로, 채널, 갯수", weight.shape)
#init설정
weight_init = tf.constant_initializer(weight)

#con레이어 설정
#필터 1개, 커널사이즈(필터사이즈) 2*2 ->2, 패딩 O(same), stribe =1(디폴트)

conv2d = keras.layers.Conv2D(filters=1, kernel_size=2, padding='SAME', 
                             kernel_initializer=weight_init)(image)
#배치size, 세로, 가로, 채널 
print("conv2d.shape , 배치size, 세로, 가로, 채널 ", conv2d.shape)
print(conv2d.numpy().reshape(3,3))
plt.imshow(conv2d.numpy().reshape(3,3), cmap='gray')
plt.show

#패딩을 사용하지 않았기 때문에, conv2연산후 가로세로가 줄었다.

image.shape, 배치size, 세로, 가로, 채널 (1, 3, 3, 1)
weight.shape, 세로, 가로, 채널, 갯수 (2, 2, 1, 1)
conv2d.shape , 배치size, 세로, 가로, 채널  (1, 3, 3, 1)
[[12. 16.  9.]
 [24. 28. 15.]
 [15. 17.  9.]]

<function matplotlib.pyplot.show(close=None, block=None)>

필터를 여러개 사용하기¶

print("image.shape, 배치size, 세로, 가로, 채널", image.shape)


#필터는 모두 2*2로 이우러져있음
#필터1은 1로만 채워짐
#필터2는 10으로만 채워짐
#필터3은 -1로만 채워짐


weight = np.array([[[[1.,10.,-1.]],[[1.,10.,-1.]]],
                   [[[1.,10.,-1.]],[[1.,10.,-1.]]]])
print("weight.shape, 세로, 가로, 채널, 갯수", weight.shape)

weight_init = tf.constant_initializer(weight)
conv2d = keras.layers.Conv2D(filters=3, kernel_size=2, padding='SAME',
                             kernel_initializer=weight_init)(image)
print("conv2d.shape , 배치size, 세로, 가로, 채널 ", conv2d.shape)
feature_maps = np.swapaxes(conv2d, 0, 3)
for i, feature_map in enumerate(feature_maps):
    print(feature_map.reshape(3,3))
    plt.subplot(1,3,i+1), plt.imshow(feature_map.reshape(3,3), cmap='gray')

plt.show()

image.shape, 배치size, 세로, 가로, 채널 (1, 3, 3, 1)
weight.shape, 세로, 가로, 채널, 갯수 (2, 2, 1, 3)
conv2d.shape , 배치size, 세로, 가로, 채널  (1, 3, 3, 3)
[[12. 16.  9.]
 [24. 28. 15.]
 [15. 17.  9.]]
[[120. 160.  90.]
 [240. 280. 150.]
 [150. 170.  90.]]
[[-12. -16.  -9.]
 [-24. -28. -15.]
 [-15. -17.  -9.]]

해당 포스팅은 모두를 위한 딥러닝, 부스트코스를 참고하여 작성하였습니다.

부동산 거래현황Data로 실습하는 Python 빅데이터처리(전처리, 시각화)¶

안녕하세요. 이번 포스팅은 파이썬을 활용하여 빅데이터를 실제로 다루어 보는 실습내용입니다
빅데이터라고 하기엔 자료의 양이 적고, 이미 정돈되어 있는것 같아 머쓱한데요,
일단 시작하겠습니다.

필요한 모듈 세팅¶

우선 데이터는 국가통계포털에서 csv로다운로드 받습니다.

import numpy as np
from pandas import read_csv
import pandas as pd
from pandas import DataFrame
from pandas import merge
from pandas import ExcelWriter
from matplotlib import pyplot

파일 확인¶

.head()는 파이썬 데이터프레임 조회시 상위 5줄을 보여준다

#파일 읽기 및 확인
시도별거래량csv = read_csv(r"/Users/donut/tstory/행정구역별_주택매매거래현황_20200812163714.csv", encoding="euc-kr")
시도별거래량csv.head()

시도별거래량csv.shape

(281, 32)

데이터프레임 결측치확인하기¶

결측치확인 = 시도별거래량csv.isna()
결측치합계=결측치확인.sum()
결측치합계

행정구역별(1)    0
행정구역별(2)    0
2018. 01    0
2018. 02    0
2018. 03    0
2018. 04    0
2018. 05    0
2018. 06    0
2018. 07    0
2018. 08    0
2018. 09    0
2018. 10    0
2018. 11    0
2018. 12    0
2019. 01    0
2019. 02    0
2019. 03    0
2019. 04    0
2019. 05    0
2019. 06    0
2019. 07    0
2019. 08    0
2019. 09    0
2019. 10    0
2019. 11    0
2019. 12    0
2020. 01    0
2020. 02    0
2020. 03    0
2020. 04    0
2020. 05    0
2020. 06    0
dtype: int64

데이터 전처리¶

데이터프레임 복사¶

시도별거래량DF = 시도별거래량csv.copy()

컬럼명 확인¶

#컬럼명 확인방법
시도별거래량col = list(시도별거래량DF.columns)
print(시도별거래량col)

['행정구역별(1)', '행정구역별(2)', '2018. 01', '2018. 02', '2018. 03', '2018. 04', '2018. 05', '2018. 06', '2018. 07', '2018. 08', '2018. 09', '2018. 10', '2018. 11', '2018. 12', '2019. 01', '2019. 02', '2019. 03', '2019. 04', '2019. 05', '2019. 06', '2019. 07', '2019. 08', '2019. 09', '2019. 10', '2019. 11', '2019. 12', '2020. 01', '2020. 02', '2020. 03', '2020. 04', '2020. 05', '2020. 06']

새롭게 지정해줄 컬럼명 설정¶

데이터프레임 리스트 공백제거 = .replace(" ","")

시도별거래량ncol = []
for i in range(len(시도별거래량col)):
    if i==0:
        시도별거래량ncol.append("시도")
    elif i==1:
        시도별거래량ncol.append("구군")    
    else:
        변환=시도별거래량col[i].replace(" ","")
        시도별거래량ncol.append(변환)
시도별거래량ncol

['시도',
 '구군',
 '2018.01',
 '2018.02',
 '2018.03',
 '2018.04',
 '2018.05',
 '2018.06',
 '2018.07',
 '2018.08',
 '2018.09',
 '2018.10',
 '2018.11',
 '2018.12',
 '2019.01',
 '2019.02',
 '2019.03',
 '2019.04',
 '2019.05',
 '2019.06',
 '2019.07',
 '2019.08',
 '2019.09',
 '2019.10',
 '2019.11',
 '2019.12',
 '2020.01',
 '2020.02',
 '2020.03',
 '2020.04',
 '2020.05',
 '2020.06']

컬럼명 바꾸기¶

컬럼명을 바꾸기 위해, 기존 컬럼명과, 새로운 컬럼명을 Dict형태로 대치시켜준다

colDict = {}
for i, v in enumerate(시도별거래량ncol):
    # `이름`의 i번째에 대응되는 항목을 `현재인덱스`에서 가져옴
    before = 시도별거래량col[i];
    colDict[before] = v
    
colDict

{'행정구역별(1)': '시도',
 '행정구역별(2)': '구군',
 '2018. 01': '2018.01',
 '2018. 02': '2018.02',
 '2018. 03': '2018.03',
 '2018. 04': '2018.04',
 '2018. 05': '2018.05',
 '2018. 06': '2018.06',
 '2018. 07': '2018.07',
 '2018. 08': '2018.08',
 '2018. 09': '2018.09',
 '2018. 10': '2018.10',
 '2018. 11': '2018.11',
 '2018. 12': '2018.12',
 '2019. 01': '2019.01',
 '2019. 02': '2019.02',
 '2019. 03': '2019.03',
 '2019. 04': '2019.04',
 '2019. 05': '2019.05',
 '2019. 06': '2019.06',
 '2019. 07': '2019.07',
 '2019. 08': '2019.08',
 '2019. 09': '2019.09',
 '2019. 10': '2019.10',
 '2019. 11': '2019.11',
 '2019. 12': '2019.12',
 '2020. 01': '2020.01',
 '2020. 02': '2020.02',
 '2020. 03': '2020.03',
 '2020. 04': '2020.04',
 '2020. 05': '2020.05',
 '2020. 06': '2020.06'}

컬럼명 바꾸기¶

파이썬 데이터프레임 컬럼명 바꾸기 = .rename()

시도별거래량전처리 = 시도별거래량DF.rename(columns = colDict)
시도별거래량전처리.head()

인덱스삭제¶

데이터프레임 0번째 인덱스(로우)를 삭제하기

시도별거래량DF = 시도별거래량전처리.drop(시도별거래량전처리.index[0])
시도별거래량DF.head()

인덱스 설정하기¶

인덱스로 설정할 열을 전처리 한 후 인덱스로 지정하겠음

old시도 =list(set(시도별거래량DF['시도']))
old시도

['충청남도',
 '경상남도',
 '강원도',
 '전라남도',
 '인천광역시',
 '전라북도',
 '대구광역시',
 '전국',
 '광주광역시',
 '충청북도',
 '제주특별자치도',
 '부산광역시',
 '울산광역시',
 '세종특별자치시',
 '경상북도',
 '서울특별시',
 '경기도',
 '대전광역시']

#스위치문을 사용해도 무관합니다.
new시도 = []
for i in range(len(old시도)):
    if len(old시도[i]) == 4:
        new시도.append(old시도[i][::2])
        
    elif len(old시도[i]) == 3:
        new시도.append(old시도[i][:2])  
    
    elif len(old시도[i]) >4:
        new시도.append(old시도[i][:2])
        
    else:
        new시도.append(old시도[i])
new시도

['충남',
 '경남',
 '강원',
 '전남',
 '인천',
 '전북',
 '대구',
 '전국',
 '광주',
 '충북',
 '제주',
 '부산',
 '울산',
 '세종',
 '경북',
 '서울',
 '경기',
 '대전']

시도DF = DataFrame(old시도)
시도DF['1'] = new시도
시도DF

시도DF.rename(columns = {0:'시도2', '1': 'sido'}, inplace=True)
시도DF

기존DF에 새로운 시도DF join하기¶

원본DF.join(붙힐DF.set_index('붙일DF의 기준컬럼명')['실제 붙일 컬럼명'],on='원본DF의 기준 컬럼명')

#데이터 프레임이름이 너무 길어서 df로이름을 줄이겠습니다.
df=시도별거래량DF.join(시도DF.set_index('시도2')['sido'],on='시도')
df.head()

인덱스 바꾸기¶

sido 열로 인덱스 바꿔주기

현재인덱스 = list(df.index)
시도리스트 = list(df['sido'])

indexDict ={}
for i, v in enumerate(시도리스트):
    before = 현재인덱스[i]
    indexDict[before] = v
indexDict

{1: '전국',
 2: '서울',
 3: '서울',
 4: '서울',
 5: '서울',
 6: '서울',
 7: '서울',
 8: '서울',
 9: '서울',
 10: '서울',
 11: '서울',
 12: '서울',
 13: '서울',
 14: '서울',
 15: '서울',
 16: '서울',
 17: '서울',
 18: '서울',
 19: '서울',
 20: '서울',
 21: '서울',
 22: '서울',
 23: '서울',
 24: '서울',
 25: '서울',
 26: '서울',
 27: '서울',
 28: '부산',
 29: '부산',
 30: '부산',
 31: '부산',
 32: '부산',
 33: '부산',
 34: '부산',
 35: '부산',
 36: '부산',
 37: '부산',
 38: '부산',
 39: '부산',
 40: '부산',
 41: '부산',
 42: '부산',
 43: '부산',
 44: '부산',
 45: '대구',
 46: '대구',
 47: '대구',
 48: '대구',
 49: '대구',
 50: '대구',
 51: '대구',
 52: '대구',
 53: '대구',
 54: '인천',
 55: '인천',
 56: '인천',
 57: '인천',
 58: '인천',
 59: '인천',
 60: '인천',
 61: '인천',
 62: '인천',
 63: '인천',
 64: '인천',
 65: '인천',
 66: '광주',
 67: '광주',
 68: '광주',
 69: '광주',
 70: '광주',
 71: '광주',
 72: '대전',
 73: '대전',
 74: '대전',
 75: '대전',
 76: '대전',
 77: '대전',
 78: '울산',
 79: '울산',
 80: '울산',
 81: '울산',
 82: '울산',
 83: '울산',
 84: '세종',
 85: '세종',
 86: '경기',
 87: '경기',
 88: '경기',
 89: '경기',
 90: '경기',
 91: '경기',
 92: '경기',
 93: '경기',
 94: '경기',
 95: '경기',
 96: '경기',
 97: '경기',
 98: '경기',
 99: '경기',
 100: '경기',
 101: '경기',
 102: '경기',
 103: '경기',
 104: '경기',
 105: '경기',
 106: '경기',
 107: '경기',
 108: '경기',
 109: '경기',
 110: '경기',
 111: '경기',
 112: '경기',
 113: '경기',
 114: '경기',
 115: '경기',
 116: '경기',
 117: '경기',
 118: '경기',
 119: '경기',
 120: '경기',
 121: '경기',
 122: '경기',
 123: '경기',
 124: '경기',
 125: '경기',
 126: '경기',
 127: '경기',
 128: '경기',
 129: '경기',
 130: '경기',
 131: '경기',
 132: '경기',
 133: '경기',
 134: '경기',
 135: '강원',
 136: '강원',
 137: '강원',
 138: '강원',
 139: '강원',
 140: '강원',
 141: '강원',
 142: '강원',
 143: '강원',
 144: '강원',
 145: '강원',
 146: '강원',
 147: '강원',
 148: '강원',
 149: '강원',
 150: '강원',
 151: '강원',
 152: '강원',
 153: '강원',
 154: '충북',
 155: '충북',
 156: '충북',
 157: '충북',
 158: '충북',
 159: '충북',
 160: '충북',
 161: '충북',
 162: '충북',
 163: '충북',
 164: '충북',
 165: '충북',
 166: '충북',
 167: '충북',
 168: '충북',
 169: '충북',
 170: '충남',
 171: '충남',
 172: '충남',
 173: '충남',
 174: '충남',
 175: '충남',
 176: '충남',
 177: '충남',
 178: '충남',
 179: '충남',
 180: '충남',
 181: '충남',
 182: '충남',
 183: '충남',
 184: '충남',
 185: '충남',
 186: '충남',
 187: '충남',
 188: '전북',
 189: '전북',
 190: '전북',
 191: '전북',
 192: '전북',
 193: '전북',
 194: '전북',
 195: '전북',
 196: '전북',
 197: '전북',
 198: '전북',
 199: '전북',
 200: '전북',
 201: '전북',
 202: '전북',
 203: '전북',
 204: '전북',
 205: '전남',
 206: '전남',
 207: '전남',
 208: '전남',
 209: '전남',
 210: '전남',
 211: '전남',
 212: '전남',
 213: '전남',
 214: '전남',
 215: '전남',
 216: '전남',
 217: '전남',
 218: '전남',
 219: '전남',
 220: '전남',
 221: '전남',
 222: '전남',
 223: '전남',
 224: '전남',
 225: '전남',
 226: '전남',
 227: '전남',
 228: '경북',
 229: '경북',
 230: '경북',
 231: '경북',
 232: '경북',
 233: '경북',
 234: '경북',
 235: '경북',
 236: '경북',
 237: '경북',
 238: '경북',
 239: '경북',
 240: '경북',
 241: '경북',
 242: '경북',
 243: '경북',
 244: '경북',
 245: '경북',
 246: '경북',
 247: '경북',
 248: '경북',
 249: '경북',
 250: '경북',
 251: '경북',
 252: '경북',
 253: '경북',
 254: '경남',
 255: '경남',
 256: '경남',
 257: '경남',
 258: '경남',
 259: '경남',
 260: '경남',
 261: '경남',
 262: '경남',
 263: '경남',
 264: '경남',
 265: '경남',
 266: '경남',
 267: '경남',
 268: '경남',
 269: '경남',
 270: '경남',
 271: '경남',
 272: '경남',
 273: '경남',
 274: '경남',
 275: '경남',
 276: '경남',
 277: '경남',
 278: '제주',
 279: '제주',
 280: '제주'}

df.rename(index=indexDict, inplace=True)
df.drop(['시도','sido'], axis=1, inplace=True)
df

데이터 값 확인¶

임의 숫자 타입 조사,
해당 자료에선 숫자가 int가 아닌 object로 확인됨

따라서 int로 바꿔줘야함

print(df.dtypes.head())
print(df['2019.02'][2])
print(type(df['2019.02'][2]))

구군         object
2018.01    object
2018.02    object
2018.03    object
2018.04    object
dtype: object
88
<class 'str'>

데이터형식 변환 문자열을 숫자형으로 (str->int)¶

먼저 df데이터 프레임의 '구군'열은 문자열이기 때문에,
숫자 부분만 숫자열로 바꿔줘야함.
필터를 통해 데이터프레임을 나눈후 다시 합쳐주겠음

df문자 = df.filter(df.columns[0:1])
df문자

df숫자 = df.filter(df.columns[2:])
df숫자.head()

파이썬 데이터프레임 문자를 숫자로 바꾸기¶

df숫자.apply(pd.to_numeric)사용시 "-"라는 문자 때문에 실행이 불가하다는 오류가 뜸
"-"를 "0"으로 바꿔준후 다시 df숫자.apply(pd.to_numeric)를 실행하겠음

df숫자.replace("-","0",inplace =True)
df숫자2=df숫자.apply(pd.to_numeric)
df숫자2.dtypes.head()

2018.02    int64
2018.03    int64
2018.04    int64
2018.05    int64
2018.06    int64
dtype: object

데이터 프레임 합치기 concat¶

문자열을 숫자로 바꿔주기위해 나눠놨던 데이터프레임을 다시 합쳐주겠습니다.
다만 concat이 알수없는 이유로 실행되지않아 다른방법을 사용하였습니다.
아래와 같은 방법으로 데이터 프레임의 컬럼명을 재정렬하여 반영하겠습니다.

df숫자2['구군']=list(df문자['구군'])
cols = df숫자2.columns.tolist()
newcols = cols[-1:] + cols[:-1]
'''
colDict = {}
for i,v in enumerate(newcols):
    before = cols[i]
    colDict[before] = v
colDict
'''

'\ncolDict = {}\nfor i,v in enumerate(newcols):\n    before = cols[i]\n    colDict[before] = v\ncolDict\n'

df완성=df숫자2[newcols]
df완성.head()

데이터프레임 전치 구하기¶

데이터 프레임의 행과 열을 바꾸기

소계df = df완성.query('구군 == "소계"')
전치 = 소계df.T
전치.drop(전치.index[0],inplace = True)

전치.head()

# 한글폰트, 그래픽 크기 설정
pyplot.rcParams["font.family"] = 'AppleGothic'
pyplot.rcParams["font.size"] = 16
pyplot.rcParams["figure.figsize"] = (20, 10)

전치.plot()
pyplot.grid()
pyplot.legend(bbox_to_anchor=(1, 0.9))
pyplot.title("월별 시도별 부동산 거래량")
pyplot.xlabel("월")
pyplot.ylabel("거래량")
pyplot.show()

/Users/donut/opt/anaconda3/lib/python3.7/site-packages/pandas/plotting/_matplotlib/core.py:1192: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels(xticklabels)

작업한 파일 저장하기¶

writer = ExcelWriter('부동산내역.xlsx')
전치.to_excel(writer, '부동산거래량원장')

writer.save()

마무리¶

전처리 연습을 위해 라이트하게 자료를 정리하였습니다.
대략 부동산 거래량이 언제 줄었고, 언제 다시 폭발적으로 증가했는지 파악이 가능했습니다.

추후에 해당 자료를 좀더 근사하게 가공 하도록 하겠습니다.

DNN(Deep Neural Network)_XOR¶

이번 포스팅은 XOR알고리즘 구현 3탄입니다.
이미 지난번 포스팅에서 각각 분류함수1회, 분류함수3회를 사용하여 풀이를 해보았었는데요.
이번에는 분류함수 4회를 사용하여 풀이하는 포스팅을 진행하겠습니다.

모듈선언¶

import numpy as np
import tensorflow as tf
tf.random.set_seed(0)
print(tf.__version__)

2.1.0

XOR데이터 선언¶

x = [[0, 0],
    [0, 1],
    [1, 0],
    [1, 1]]
y = [[0],
    [1],
    [1],
    [0]]

학습시킬 dataset선언¶

#학습시킬 dataset 준비
dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(len(x))

#데이터 셋 확인
[i for i in dataset]

[(<tf.Tensor: shape=(4, 2), dtype=int32, numpy=
  array([[0, 0],
         [0, 1],
         [1, 0],
         [1, 1]], dtype=int32)>,
  <tf.Tensor: shape=(4, 1), dtype=int32, numpy=
  array([[0],
         [1],
         [1],
         [0]], dtype=int32)>)]

전처리 함수¶

#형식을 플롯으로 맞춰준다.
#tf.cast는 2가지 사용법이 있다.
#tf.cast('조건') 일경우 참 거짓에 따라 1,0 반환
#그외에는 소수점 이하를 버림 해준다.
def preprocess_data(features, labels):
    features = tf.cast(features, tf.float32)
    labels = tf.cast(labels, tf.float32)
    return features, labels

preprocess_data(x,y)

(<tf.Tensor: shape=(4, 2), dtype=float32, numpy=
 array([[0., 0.],
        [0., 1.],
        [1., 0.],
        [1., 1.]], dtype=float32)>,
 <tf.Tensor: shape=(4, 1), dtype=float32, numpy=
 array([[0.],
        [1.],
        [1.],
        [0.]], dtype=float32)>)

w,b 선언¶

#총 4번의 분류함수 사용을 위해 W,b를 4개씩 만들어준다.
#이때 메트릭스의 계산이 가능하도록 차원을 잘 설정 해줘야한다.
#ex) (2,1) * (1,1) = (2,1)
#    (3,2) * (2,2) = (3,2)
#xor은 (4,2) 임으로 (2,?)가 곱해져야함
W1 = tf.Variable(tf.random.normal((2, 10)), name='weight1')
b1 = tf.Variable(tf.random.normal((1,)), name='bias1')

W2 = tf.Variable(tf.random.normal((10, 10)), name='weight2')
b2 = tf.Variable(tf.random.normal((1,)), name='bias2')

W3 = tf.Variable(tf.random.normal((10, 10)), name='weight3')
b3 = tf.Variable(tf.random.normal((1,)), name='bias3')

W4 = tf.Variable(tf.random.normal((10, 1)), name='weight4')
b4 = tf.Variable(tf.random.normal((1,)), name='bias4')

hypothesis함수¶

#여러겹의 시그모이드 함수계산을 한다.
def deep_nn(features):
    layer1 = tf.sigmoid(tf.matmul(features, W1) + b1)
    layer2 = tf.sigmoid(tf.matmul(layer1, W2) + b2)
    layer3 = tf.sigmoid(tf.matmul(layer2, W3) + b3)
    hypothesis = tf.sigmoid(tf.matmul(layer3, W4) + b4)
    return hypothesis

#오차율 함수
def loss_fn(hypothesis, features, labels):
    cost = -tf.reduce_mean(labels * tf.math.log(hypothesis) + (1 - labels) * tf.math.log(1 - hypothesis))
    return cost
#정확도 함수
def accuracy_fn(hypothesis, labels):
    predicted = tf.cast(hypothesis > 0.5, dtype=tf.float32)
    accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, labels), dtype=tf.float32))
    return accuracy
#경사하강법 함수
def grad(hypothesis, features, labels):
    with tf.GradientTape() as tape:
        loss_value = loss_fn(deep_nn(features),features,labels)
    return tape.gradient(loss_value, [W1, W2, W3, W4, b1, b2, b3, b4])

optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)

EPOCHS = 5000

for step in range(EPOCHS+1):
    for features, labels  in dataset:
        features, labels = preprocess_data(features, labels)
        grads = grad(deep_nn(features), features, labels)
        optimizer.apply_gradients(grads_and_vars=zip(grads,[W1, W2, W3,W4, b1, b2, b3, b4]))
        if step % 500 == 0:
            print("Iter: {}, Loss: {:.4f}".format(step, loss_fn(deep_nn(features),features,labels)))

Iter: 0, Loss: 0.6387
Iter: 500, Loss: 0.5629
Iter: 1000, Loss: 0.2864
Iter: 1500, Loss: 0.0466
Iter: 2000, Loss: 0.0176
Iter: 2500, Loss: 0.0100
Iter: 3000, Loss: 0.0068
Iter: 3500, Loss: 0.0051
Iter: 4000, Loss: 0.0040
Iter: 4500, Loss: 0.0033

#XOR에 대한 어큐러시 확인
x_data, y_data = preprocess_data(x, y)
test_acc = accuracy_fn(deep_nn(x_data),y_data)
print("Testset Accuracy: {:.4f}".format(test_acc))

Testset Accuracy: 1.0000

해당 포스팅은 부스트코스와,모두를위한 딥러닝 강의를 참고하여 작성하였습니다.

NN(Neural Network)_XOR¶

이번 포스팅은 지난번 XOR문제를 한번의 분류함수로 풀이한 포스팅과 이어지는 내용입니다.

이번에는 여러번의 logistic함수를 중첩해서 문제를 해결하게됩니다.

지난번과 동일한 데이터를 사용합니다.

import numpy as np
import tensorflow as tf

tf.random.set_seed(0)

print(tf.__version__)

2.1.0

#XOR데이터를 세팅해준다

x = [[0, 0],
     [0, 1],
     [1, 0],
     [1, 1]]
y = [[0],
     [1],
     [1],
     [0]]

#학습시킬 dataset 준비
dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(len(x))
#전처리를 위한 함수 준비(데이터형식을 맞춰줌)
def preprocess_data(features, labels):
    features = tf.cast(features, tf.float32)
    labels = tf.cast(labels, tf.float32)
    return features, labels

#총 3번의 분류함수 사용을 위해 W,b를 3개씩 만들어준다.

W1 = tf.Variable(tf.random.normal((2, 1)), name='weight1')
b1 = tf.Variable(tf.random.normal((1,)), name='bias1')

W2 = tf.Variable(tf.random.normal((2, 1)), name='weight2')
b2 = tf.Variable(tf.random.normal((1,)), name='bias2')

W3 = tf.Variable(tf.random.normal((2, 1)), name='weight3')
b3 = tf.Variable(tf.random.normal((1,)), name='bias3')

#뉴럴 넷 함수 설정
#총 3번의 시그모이드 함수를 적용시키게됨
#1,2계층에서 사용하여 얻은 시그모이드 값을
#3계층에서 한번더 시그모이드를 적용한다.
def neural_net(features):
    layer1 = tf.sigmoid(tf.matmul(features, W1) + b1)
    layer2 = tf.sigmoid(tf.matmul(features, W2) + b2)
    layer3 = tf.concat([layer1, layer2],-1)
    layer3 = tf.reshape(layer3, shape = [-1,2])
    hypothesis = tf.sigmoid(tf.matmul(layer3, W3) + b3)
    return hypothesis

#오차율, 정확도, 경사하강법 함수 설정
def loss_fn(hypothesis, labels):
    cost = -tf.reduce_mean(labels * tf.math.log(hypothesis) + (1 - labels) * tf.math.log(1 - hypothesis))
    return cost

optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)

def accuracy_fn(hypothesis, labels):
    predicted = tf.cast(hypothesis > 0.5, dtype=tf.float32)
    accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, labels), dtype=tf.float32))
    return accuracy

def grad(hypothesis, features, labels):
    with tf.GradientTape() as tape:
        loss_value = loss_fn(neural_net(features),labels)
    return tape.gradient(loss_value, [W1, W2, W3, b1, b2, b3])

#지난번 1회의 분류함수로 적용했을땐
#loss가 감소하지않고 일정했는데
#뉴럴넷, 즉 계층 별 계산을 통해 로스가 감소하는걸 확인 가능했다.
EPOCHS = 5000

for step in range(EPOCHS):
    for features, labels  in dataset:
        features, labels = preprocess_data(features, labels)
        grads = grad(neural_net(features), features, labels)
        optimizer.apply_gradients(grads_and_vars=zip(grads,[W1, W2, W3, b1, b2, b3]))
        if step % 500 == 0:
            print("Iter: {}, Loss: {:.4f}".format(step, loss_fn(neural_net(features),labels)))

Iter: 0, Loss: 1.0160
Iter: 500, Loss: 0.6482
Iter: 1000, Loss: 0.5609
Iter: 1500, Loss: 0.4236
Iter: 2000, Loss: 0.2926
Iter: 2500, Loss: 0.2021
Iter: 3000, Loss: 0.1465
Iter: 3500, Loss: 0.1120
Iter: 4000, Loss: 0.0895
Iter: 4500, Loss: 0.0739

해당포스팅은 부스트코스 강의와, 모두를위한 딥러닝 강의를 참고하였습니다.

NN(Neural Network)_XOR¶

NN(인공신경망)을 통해 컴퓨터가 사람의 뇌처럼
복잡한 계산을 할 수있게 한다.

대표적으로 XOR 알고리즘이 그 예다.
XOR은 아래 처럼, 주어진 두 수가 같은 경우에 1, 다를경우는 0을 출력한다. 즉 OR과 반대다.

XOR알고리즘을 풀기위해선 여러번의 여러번의 계산이 필요하다.
분류 함수로 한번에 할 수 없고, 분류 함수를 단계별로 진행해야 풀이가 가능하다.

이번 포스팅에서는 한번의 분류함수로 xor을 풀이해 보겠습니다.

x = [[0, 0],
     [0, 1],
     [1, 0],
     [1, 1]]
y = [[0],
     [1],
     [1],
     [0]]

XOR알고리즘을 tf코드로 모델 만들기¶

import numpy as np
import tensorflow as tf
print(tf.__version__)

2.1.0

#학습시킬 dataset 세팅
#배치값 미지정
dataset= tf.data.Dataset.from_tensor_slices((x,y))

#dataset값 확인
elem = [i for i in dataset]
print(elem)
print(len(elem))

[(<tf.Tensor: shape=(2,), dtype=int32, numpy=array([0, 0], dtype=int32)>, <tf.Tensor: shape=(1,), dtype=int32, numpy=array([0], dtype=int32)>), (<tf.Tensor: shape=(2,), dtype=int32, numpy=array([0, 1], dtype=int32)>, <tf.Tensor: shape=(1,), dtype=int32, numpy=array([1], dtype=int32)>), (<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 0], dtype=int32)>, <tf.Tensor: shape=(1,), dtype=int32, numpy=array([1], dtype=int32)>), (<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 1], dtype=int32)>, <tf.Tensor: shape=(1,), dtype=int32, numpy=array([0], dtype=int32)>)]
4

#학습시킬 dataset 세팅
#배치값 x의 길이로 설정
#배치는 한번에 학습시킬 size임
dataset= tf.data.Dataset.from_tensor_slices((x,y)).batch(len(x))

#dataset값 확인
elem = [i for i in dataset]
print(elem)
print(len(elem))

[(<tf.Tensor: shape=(4, 2), dtype=int32, numpy=
array([[0, 0],
       [0, 1],
       [1, 0],
       [1, 1]], dtype=int32)>, <tf.Tensor: shape=(4, 1), dtype=int32, numpy=
array([[0],
       [1],
       [1],
       [0]], dtype=int32)>)]
1

전처리 함수(데이터 타입 맞추기)¶

def preprocess_data(features, labels):
    features = tf.cast(features, tf.float32)
    labels = tf.cast(labels, tf.float32)
    return features, labels

W, b 설정¶

#W,b의 초기값은 0이나 랜덤으로 으로 정해주면된다
#W = tf.Variable(tf.random.normal((2,1)))
#b = tf.Variable(tf.random.normal((1,)))
W = tf.Variable(tf.zeros((2,1)), name= 'weight')
b = tf.Variable(tf.zeros((1,)), name= 'bias')
print(W.numpy(), b.numpy())

[[0.]
 [0.]] [0.]

시그모이드 선언¶

def logistic_regression(features):
    hypothesis = tf.divide(1., 1. + tf.exp(tf.matmul(features, W) + b))
    return hypothesis

코스트 함수 선언¶

def loss_fn(hypothesis, features, labels):
    cost = -tf.reduce_mean(labels * tf.math.log(logistic_regression(features)) + (1-labels) * tf.math.log(1-hypothesis))
    return cost
#러닝레이트도 함께 선언
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

결과값 도출 함수 구현¶

시그모이드 함수로 도출된 hypothesis를 0 or 1로 cast해준다.

tf.cast => 조건이 참일경우 1, 거짓일 경우 0출력
tf.equal => 주어인 값이 같은경우 True, 다를경우 False
아래는 cast와 equal를 이해하기 쉽도록 예시를 작성 하였습니다.

test1 = 0.3
test2 = tf.cast(test1 > 0.5,dtype = tf.float32)
test2.numpy()

0.0

test1 = 0.6
test2 = tf.cast(test1 > 0.5,dtype = tf.float32)
test2.numpy()

1.0

test1 = 0.6
test2 = 0.6
test3 = tf.equal(test1,test2)
test3.numpy()

True

test7 = tf.cast(tf.equal(test1,test2),dtype = tf.float32)
test7.numpy()

1.0

def accuracy_fn(hypothesis, labels):
    predicted = tf.cast(hypothesis > 0.5, dtype=tf.float32)
    accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, labels), dtype=tf.float32))
    return accuracy

경사하강법¶

def grad(hypothesis, features, labels):
    with tf.GradientTape() as tape:
        loss_value = loss_fn(logistic_regression(features),features,labels)
    return tape.gradient(loss_value, [W,b])

실행¶

로지스틱 함수로 xor알고리즘을 완벽하게 구현하지 못했다.
loss값이 더이상 감소하지 않는 것을 확인 할 수 있었다.

EPOCHS = 1001

for step in range(EPOCHS):
    for features, labels  in dataset:
        features, labels = preprocess_data(features, labels)
        grads = grad(logistic_regression(features), features, labels)
        optimizer.apply_gradients(grads_and_vars=zip(grads,[W,b]))
        if step % 100 == 0:
            print("Iter: {}, Loss: {:.4f}".format(step, loss_fn(logistic_regression(features),features,labels)))
print("W = {}, B = {}".format(W.numpy(), b.numpy()))

Iter: 0, Loss: 0.6931
Iter: 100, Loss: 0.6931
Iter: 200, Loss: 0.6931
Iter: 300, Loss: 0.6931
Iter: 400, Loss: 0.6931
Iter: 500, Loss: 0.6931
Iter: 600, Loss: 0.6931
Iter: 700, Loss: 0.6931
Iter: 800, Loss: 0.6931
Iter: 900, Loss: 0.6931
Iter: 1000, Loss: 0.6931
W = [[0.]
 [0.]], B = [0.]

이 포스팅은 부스트코스 강의, 모두를위한 딥러닝 강의를 참고하였습니다.

다중 선형회귀 복습¶

지난번포스팅했던 다중 선형회귀를 복습하는 시간을 가지겠습니다.
자세한 포스팅은 이전에 했기때문에 주석을 최소화 하였습니다.

import tensorflow as tf
import numpy as np
print(tf.__version__)

2.1.0

변수 만들기(컴프리헨션 복습)¶

test1 = [i for i in range(0,10) if i>2]
test1

[3, 4, 5, 6, 7, 8, 9]

test1 = list(range(0,10))
test2 = list(range(10,20))
print(test1)
print(test2)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

test3 = []
for i,v in zip(test1,test2):
    test3.append([i,v])
test3

[[0, 10],
 [1, 11],
 [2, 12],
 [3, 13],
 [4, 14],
 [5, 15],
 [6, 16],
 [7, 17],
 [8, 18],
 [9, 19]]

x,y, 값 나눠주기¶

x1 = [x[0] for x in test3]
x2 = [x[1] for x in test3]
y = [i for i in range(100,110)]

print(x1)
print(x2)
print(y)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109]

#x변수가 2개이기 때문에 W도 2개 설정
tf.random.set_seed(0)
W1 = tf.Variable(tf.random.uniform((1,),1., 10.0 ))
W2 = tf.Variable(tf.random.uniform((1,),1., 10.0 ))
b = tf.Variable(tf.random.uniform((1,),1., 10.0 ))

learning_rate = tf.Variable(0.001)
for i in range(1001):
    with tf.GradientTape() as tape:
        hypothesis = W1 * x1 + W2* x2 + b
        cost = tf.reduce_mean(tf.square(hypothesis - y))
    W1_grad, W2_grad, b_grad = tape.gradient(cost, [W1,W2,b])
    W1.assign_sub(learning_rate * W1_grad)
    W2.assign_sub(learning_rate * W2_grad)
    b.assign_sub(learning_rate * b_grad)
    
    if i % 50 ==0:
        print("{:5} | {:10.6f} | {:10.4f} | {:10.4f} | {:10.6f}".format(
        i, cost.numpy(), W1.numpy()[0], W2.numpy()[0], b.numpy()[0]))

    0 | 616.380981 |     3.4714 |     5.8110 |   2.753797
   50 | 282.371643 |    -0.0978 |     6.7484 |   3.204454
  100 | 141.552185 |    -2.5705 |     7.5713 |   3.534010
  150 |  70.959732 |    -4.3213 |     8.1539 |   3.767345
  200 |  35.571896 |    -5.5608 |     8.5664 |   3.932551
  250 |  17.832087 |    -6.4385 |     8.8585 |   4.049521
  300 |   8.939185 |    -7.0599 |     9.0652 |   4.132339
  350 |   4.481170 |    -7.4998 |     9.2117 |   4.190976
  400 |   2.246401 |    -7.8113 |     9.3153 |   4.232491
  450 |   1.126115 |    -8.0319 |     9.3887 |   4.261886
  500 |   0.564514 |    -8.1880 |     9.4407 |   4.282698
  550 |   0.282984 |    -8.2986 |     9.4775 |   4.297431
  600 |   0.141857 |    -8.3769 |     9.5035 |   4.307865
  650 |   0.071112 |    -8.4323 |     9.5220 |   4.315252
  700 |   0.035649 |    -8.4715 |     9.5350 |   4.320483
  750 |   0.017869 |    -8.4993 |     9.5443 |   4.324186
  800 |   0.008958 |    -8.5190 |     9.5508 |   4.326808
  850 |   0.004490 |    -8.5329 |     9.5554 |   4.328665
  900 |   0.002251 |    -8.5428 |     9.5587 |   4.329979
  950 |   0.001129 |    -8.5498 |     9.5610 |   4.330909
 1000 |   0.000566 |    -8.5547 |     9.5627 |   4.331569

변수 3개 메트릭스 활용¶

변수만들기¶

x1 = [x for x in range(0,10)]
x2 = [x for x in range(10,20)]
x3 = [x for x in range(30,40)]
y = [x for x in range(100,110)]

print(x1)
print(x2)
print(x3)
print(y)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109]

#변수를 한 메트릭스로 만들기
data =np.array([[i,v,z,h] for i,v,z,h in zip(x1,x2,x3,y)],dtype =np.float32)
data

array([[  0.,  10.,  30., 100.],
       [  1.,  11.,  31., 101.],
       [  2.,  12.,  32., 102.],
       [  3.,  13.,  33., 103.],
       [  4.,  14.,  34., 104.],
       [  5.,  15.,  35., 105.],
       [  6.,  16.,  36., 106.],
       [  7.,  17.,  37., 107.],
       [  8.,  18.,  38., 108.],
       [  9.,  19.,  39., 109.]], dtype=float32)

#X, Y 로 데이터 나눠주기
X = data[:,:-1]
Y = data[:,[-1]]
print(X)
print(Y)

[[ 0. 10. 30.]
 [ 1. 11. 31.]
 [ 2. 12. 32.]
 [ 3. 13. 33.]
 [ 4. 14. 34.]
 [ 5. 15. 35.]
 [ 6. 16. 36.]
 [ 7. 17. 37.]
 [ 8. 18. 38.]
 [ 9. 19. 39.]]
[[100.]
 [101.]
 [102.]
 [103.]
 [104.]
 [105.]
 [106.]
 [107.]
 [108.]
 [109.]]

print(X.shape)
print(X.shape[1])

(10, 3)
3

#W, b 설정해주기
W = tf.Variable(tf.random.normal((X.shape[1],1)))
b = tf.Variable(tf.random.normal((1,)))

# 예측 모델 및 경사하강법 적용
def predict(X):
    return tf.matmul(X,W) + b

learning_rate = 0.00001


for i in range(1001):
    with tf.GradientTape() as tape:
        cost =tf.reduce_mean((tf.square(predict(X) - y)))
        
    W_grad, b_grad = tape.gradient(cost, [W,b])
    
    W.assign_sub(learning_rate * W_grad)
    b.assign_sub(learning_rate * b_grad)
    
    if i % 500 ==0:
        print("{:5} | {:10.6f} | {:10.4f} | {:10.4f} | {:10.6f}".format(
            i, cost.numpy(), W.numpy()[0][0], W.numpy()[1][0], b.numpy()[0]))

    0 | 3581.468018 |     1.3541 |     0.4929 |  -2.126617
  500 | 186.300980 |     1.2123 |     0.8905 |  -2.072671
 1000 | 159.831528 |     0.9086 |     0.7253 |  -2.058817

#W값 확인
print(W)

<tf.Variable 'Variable:0' shape=(3, 1) dtype=float32, numpy=
array([[0.9085694 ],
       [0.72533375],
       [2.6267443 ]], dtype=float32)>

#X값으로 해 도출
predict(X)

<tf.Tensor: shape=(10, 1), dtype=float32, numpy=
array([[ 89.7837  ],
       [ 92.84181 ],
       [ 95.899925],
       [ 98.958046],
       [102.01615 ],
       [105.07428 ],
       [108.13239 ],
       [111.190506],
       [114.24863 ],
       [117.30674 ]], dtype=float32)>

#임의의 값으로 해 도출
predict([[ 1.,  1.,  4.],[ 145.,  50.,  50.]]).numpy()

array([[11.6773615],
       [87.66279  ]], dtype=float32)

	전국	서울	부산	대구	인천	광주	대전	울산	세종	경기	강원	충북	충남	전북	전남	경북	경남	제주
2018.02	69679	17685	3718	3444	4094	2436	2071	916	337	18759	1593	1676	2382	2406	1983	2544	2929	706
2018.03	92795	24122	6096	4504	5328	3294	2576	1150	600	24694	2182	2255	3262	2581	2640	3085	3691	735
2018.04	71751	12347	4456	4275	5393	3072	2312	1024	341	19305	2091	1954	2780	2421	2642	3005	3497	836
2018.05	67789	11719	4209	4167	5102	2887	2072	958	391	18233	1977	2153	2511	2283	2663	2645	3116	703
2018.06	65027	10401	4357	4837	4681	2819	2108	946	332	16439	1713	1820	2539	3103	2255	2723	2954	1000

NN_XOR 구현 (0)	2020.08.08
Logistic_XOR 구현 (0)	2020.08.08

DNN_XOR구현 (0)	2020.08.12
Logistic_XOR 구현 (0)	2020.08.08

DNN_XOR구현 (0)	2020.08.12
NN_XOR 구현 (0)	2020.08.08

Python Logistic Regression (0)	2020.07.28
Python Tensorflow 다중선형회귀 (0)	2020.07.25
Python Tensorflow 단순선형회귀 (0)	2020.07.20

	행정구역별(1)	행정구역별(2)	2018. 01	2018. 02	2018. 03	2018. 04	2018. 05	2018. 06	2018. 07	2018. 08	...	2019. 09	2019. 10	2019. 11	2019. 12	2020. 01	2020. 02	2020. 03	2020. 04	2020. 05	2020. 06
0	행정구역별(1)	행정구역별(2)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	...	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)	동(호)수 (동(호)수)
1	전국	소계	70354	69679	92795	71751	67789	65027	63687	65945	...	64088	82393	92413	118415	101334	115264	108677	73531	83494	138578
2	서울특별시	소계	15107	17685	24122	12347	11719	10401	11753	13577	...	11779	14145	17313	22156	16834	16661	16315	9452	10255	19463
3	서울특별시	종로구	220	189	305	358	139	163	164	165	...	139	164	203	298	217	244	212	120	153	235
4	서울특별시	중구	246	293	404	185	177	167	153	181	...	114	178	240	317	237	203	156	81	102	206

	0	1
0	충청남도	충남
1	경상남도	경남
2	강원도	강원
3	전라남도	전남
4	인천광역시	인천
5	전라북도	전북
6	대구광역시	대구
7	전국	전국
8	광주광역시	광주
9	충청북도	충북
10	제주특별자치도	제주
11	부산광역시	부산
12	울산광역시	울산
13	세종특별자치시	세종
14	경상북도	경북
15	서울특별시	서울
16	경기도	경기
17	대전광역시	대전

« 2025/11 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

'Python'에 해당되는 글 23건

Python Tensorflow CNN¶

pooling¶

모듈세팅¶

maxpooling 실습¶

conv -> pooling 실습¶

데이타 세팅¶

데이타 확인¶

데이타 정규화¶

데이터 확인¶

conv2d레이어 연산¶

pooling layer 연산¶

'Tensorflow > CNN' 카테고리의 다른 글

CNN 기본 원리 실습¶

모듈 세팅¶

실습에 사용할 이미지 준비¶

필터, con레이어 설정 패딩 사용x¶

패딩 사용¶

필터를 여러개 사용하기¶

'Tensorflow > CNN' 카테고리의 다른 글

부동산 거래현황Data로 실습하는 Python 빅데이터처리(전처리, 시각화)¶

필요한 모듈 세팅¶

파일 확인¶

데이터프레임 결측치확인하기¶

데이터 전처리¶

데이터프레임 복사¶

컬럼명 확인¶

새롭게 지정해줄 컬럼명 설정¶

컬럼명 바꾸기¶

컬럼명 바꾸기¶

인덱스삭제¶

인덱스 설정하기¶

기존DF에 새로운 시도DF join하기¶

인덱스 바꾸기¶

데이터 값 확인¶

데이터형식 변환 문자열을 숫자형으로 (str->int)¶

파이썬 데이터프레임 문자를 숫자로 바꾸기¶

데이터 프레임 합치기 concat¶

데이터프레임 전치 구하기¶

작업한 파일 저장하기¶

마무리¶

DNN(Deep Neural Network)_XOR¶

모듈선언¶

XOR데이터 선언¶

학습시킬 dataset선언¶

전처리 함수¶

w,b 선언¶

hypothesis함수¶

'Tensorflow > DNN' 카테고리의 다른 글

NN(Neural Network)_XOR¶

'Tensorflow > DNN' 카테고리의 다른 글

NN(Neural Network)_XOR¶

XOR알고리즘을 tf코드로 모델 만들기¶

전처리 함수(데이터 타입 맞추기)¶

W, b 설정¶

시그모이드 선언¶

코스트 함수 선언¶

결과값 도출 함수 구현¶

경사하강법¶

실행¶

'Tensorflow > DNN' 카테고리의 다른 글

다중 선형회귀 복습¶

변수 만들기(컴프리헨션 복습)¶

x,y, 값 나눠주기¶

변수 3개 메트릭스 활용¶

변수만들기¶

'Tensorflow > ML' 카테고리의 다른 글

티스토리툴바