[Python] 시애틀 자전거 수 시각화

2022. 4. 29. 14:52파이썬

import pandas as pd

data = pd.read_csv('Fremont_Bridge_Bicycle_Counter.csv', index_col='Date', parse_dates=True)
data.head()
  Fremont Bridge Total Fremont Bridge East Sidewalk Fremont Bridge West Sidewalk
Date      
2019-11-01 00:00:00 12.0 7.0 5.0
2019-11-01 01:00:00 7.0 0.0 7.0
2019-11-01 02:00:00 1.0 0.0 1.0
2019-11-01 03:00:00 6.0 6.0 0.0
2019-11-01 04:00:00 6.0 5.0 1.0
data.columns = ['Total','West','East']
# data['Total'] = data.eval('West+East')
data.head()
  Total West East
Date      
2019-11-01 00:00:00 12.0 7.0 5.0
2019-11-01 01:00:00 7.0 0.0 7.0
2019-11-01 02:00:00 1.0 0.0 1.0
2019-11-01 03:00:00 6.0 6.0 0.0
2019-11-01 04:00:00 6.0 5.0 1.0
import matplotlib.pyplot as plt
import seaborn; seaborn.set()
%matplotlib inline
data.plot()
plt.ylabel('Hourly Bicycle Count')
Text(0, 0.5, 'Hourly Bicycle Count')


  • 25,000개의 표본은 너무 밀집돼있어서 이해하기가 힘들다. 주 단위로 리샘플링 해보자
weekly = data.resample('W').sum()
weekly.plot(style=[':', '--', '-'])
plt.ylabel('Weekly bicycle count');


daily = data.resample('D').sum()
daily.rolling(30, center=True).sum().plot(style=[':', '--', '-'])
plt.ylabel('mean hourly count');

daily.rolling(50,center=True,win_type='gaussian').sum(std=10).plot(style=[':','--','-'])
<AxesSubplot:xlabel='Date'>


데이터 파해쳐 보기

  • 하루 시간대 기준의 평균 통행량
import numpy as np
by_time = data.groupby(data.index.time).mean()
hourly_ticks = 4 * 60 * 60 * np.arange(6)
by_time.plot(xticks=hourly_ticks, style=[':', '--', '-']);


  • 요일에 따른 통행량
    • 주중, 주말 총합 사이 차이가 많이난다
    • 월~금 평균 통행량이 주말 평균 통행량의 두배이다.
by_weekday = data.groupby(data.index.dayofweek).mean()
by_weekday.index = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']
by_weekday.plot(style=[':', '--', '-']);


  • 주중, 주말의 시간대별 차이

np.where

  • 조건에 만족하는 인덱스가 몇번인지?
np.where(data.index.weekday < 5, 'Weekday', 'Weekend')[0:50]
array(['Weekday', 'Weekday', 'Weekday', 'Weekday', 'Weekday', 'Weekday',
       'Weekday', 'Weekday', 'Weekday', 'Weekday', 'Weekday', 'Weekday',
       'Weekday', 'Weekday', 'Weekday', 'Weekday', 'Weekday', 'Weekday',
       'Weekday', 'Weekday', 'Weekday', 'Weekday', 'Weekday', 'Weekday',
       'Weekend', 'Weekend', 'Weekend', 'Weekend', 'Weekend', 'Weekend',
       'Weekend', 'Weekend', 'Weekend', 'Weekend', 'Weekend', 'Weekend',
       'Weekend', 'Weekend', 'Weekend', 'Weekend', 'Weekend', 'Weekend',
       'Weekend', 'Weekend', 'Weekend', 'Weekend', 'Weekend', 'Weekend',
       'Weekend', 'Weekend'], dtype='<U7')
np.where(data.index.weekday < 5)
(array([     0,      1,      2, ..., 149435, 149436, 149437]),)
data.index.time[0:5]
array([datetime.time(0, 0), datetime.time(1, 0), datetime.time(2, 0),
       datetime.time(3, 0), datetime.time(4, 0)], dtype=object)
weekend = np.where(data.index.weekday < 5, 'Weekday', 'Weekend')
by_time = data.groupby([weekend, data.index.time]).mean()
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 2, figsize=(14, 5))
by_time.loc['Weekday'].plot(ax=ax[0], title='Weekdays',
                           xticks=hourly_ticks, style=[':', '--', '-'])
by_time.loc['Weekend'].plot(ax=ax[1], title='Weekends',
                           xticks=hourly_ticks, style=[':', '--', '-']);

  • 주중에는 출퇴근 시간에 많이 이용하고, 주말에는 여가를 즐기는 패턴을 볼 수 있다.

 

 

 

참고 : 파이썬 데이터 사이언스 핸드북