[Python] json to dataframe (+ chain key)

관련오류

string 을 json으로 바꿀때 생기는 오류

hh = "{'coord': {'lon': 127.3794, 'lat': 36.5346}, 'weather': [{'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01d'}], 'base': 'stations', 'main': {'temp': 295.83, 'feels_like': 295.22, 'temp_min': 295.83, 'temp_max': 295.83, 'pressure': 1012, 'humidity': 41, 'sea_level': 1012, 'grnd_level': 1001}, 'visibility': 10000, 'wind': {'speed': 1.57, 'deg': 91, 'gust': 1.6}, 'clouds': {'all': 8}, 'dt': 1663655220, 'sys': {'type': 1, 'id': 8131, 'country': 'KR', 'sunrise': 1663622179, 'sunset': 1663666321}, 'timezone': 32400, 'id': 1845604, 'name': 'Cheongju-si', 'cod': 200}"

->Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

따옴표를 key로가지는것은 json이 아니므로

str = str.replace("\'", "\"")

명령어로 따옴표를 쌍따옴표로 변경!

해당 데이터를 dict로 변경 후 Dataframe으로 변경하려 할 때 생기는 오류

response_current_dict = json.loads(hh)
response_current_dict

{'coord': {'lon': 127.3794, 'lat': 36.5346},
 'weather': [{'id': 800,
   'main': 'Clear',
   'description': 'clear sky',
   'icon': '01d'}],
 'base': 'stations',
 'main': {'temp': 295.83,
  'feels_like': 295.22,
  'temp_min': 295.83,
  'temp_max': 295.83,
  'pressure': 1012,
  'humidity': 41,
  'sea_level': 1012,
  'grnd_level': 1001},
 'visibility': 10000,
 'wind': {'speed': 1.57, 'deg': 91, 'gust': 1.6},
 'clouds': {'all': 8},
 'dt': 1663655220,
 'sys': {'type': 1,
  'id': 8131,
  'country': 'KR',
  'sunrise': 1663622179,
  'sunset': 1663666321},
 'timezone': 32400,
 'id': 1845604,
 'name': 'Cheongju-si',
 'cod': 200}

pd.DataFrame(response_current_dict)

key 안에 key가 또 존재하기 때문에

Mixing dicts with non-Series may lead to ambiguous ordering. 오류 발생

pandas 1.01버전 이후부터는 json_normalize() 함수 제공

response_current_df = pd.json_normalize(response_current_dict)

명령어로 해결!

weather	base	visibility	dt	timezone	id	name	cod	coord.lon	coord.lat	...	main.grnd_level	wind.speed	wind.deg	wind.gust	clouds.all	sys.type	sys.id	sys.country	sys.sunrise	sys.sunset
0	[{'id': 800, 'main': 'Clear', 'description': '...	stations	10000	1663655220	32400	1845604	Cheongju-si	200	127.379

type(response_current_df['weather'].iloc[0])
# list

변환한 데이터 frame 안에 배열이 있을경우

base column의 내용을 보면 배열이 string형태로 들어가있다..

방법이 없다 type체크 후에 array면 따로 빼던가 해야할듯

비교 케이스

배열없을 때

test = "{'weather': {'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01d'}, 'base': 'stations'}"
test = test.replace("\'", "\"")
test_dict = json.loads(test)
pd.DataFrame(test_dict)

	weather	base
description	clear sky	stations
icon	01d	stations
id	800	stations
main	Clear	stations

배열 있을 때

test = "{'weather': [{'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01d'}], 'base': 'stations'}"
test = test.replace("\'", "\"")
test_dict = json.loads(test)
test_df = pd.DataFrame(test_dict)

	weather	base
0	{'id': 800, 'main': 'Clear', 'description': 'c...	stations

type(test_df['weather'].iloc[0])
# list

저작자표시 (새창열림)

'파이썬' 카테고리의 다른 글

[Python] dataframe 리스트 안에있는 조건 검색 (0)	2022.11.08
[python] df.drop if exists (0)	2022.10.13
[Python] If using all scalar values, you must pass an index (1)	2022.09.20
[Python] conda 가상환경 관리 (0)	2022.09.18
[python] list slicing (0)	2022.09.05

오늘 무언가를 하지않으면 내일은 저절로 오지않는다