Visualize Covid-19 cases in Shanghai at 2022
# Visualize Covid-19 cases in Shanghai from2022-03-01
# Acquire the reported Covid-19 cases in SH
- Scrap data from 上海本地宝
- local_url=“http://m.sh.bendibao.com/mip/233243.html”
- data provide in a table
- scrap all items from table
1 | # acquire html |
-
data structure
SH_COVID19_DATA structurenote: no death data in the table table_url
date | NewInfection | NewAsymptomatic | AllInfection | AllAsymptomatic | Death |
---|---|---|---|---|---|
2022-05-12 | 227 | 1869 | 56754 | 581422 | 0 |
2022-05-11 | 144 | 1305 | 56527 | 579553 | 0 |
… | … | … | … | … | … |
- find all these items in the table
main codes below:1
2
3
4
5
6
7
8
9
10
11
12
13
14for index, row_data in enumerate(table_row):
# from evev rows get 日期-date 本土确诊-NewInfection 累计确诊-AllInfection
if index % 2 == 0:
cn_date = row_data.find_all('td')[0].text.strip('\n')
# change to datetime string
date = deal_cn_date(cn_date, 2022) # add later
NewInfection = int(row_data.find_all('td')[-2].text.replace('\n', '').replace('\r', ''))
AllInfection = int(row_data.find_all('td')[-1].text.replace('\n', '').replace('\r', ''))
# print(f'date: {date}, NewInfection: {NewInfection}, AllInfection: {AllInfection}')
# from odd rows get 本土无症状-NewAsymptomatic 累计无症状-AllAsymptomatic
NewAsymptomatic = int(table_row[index + 1].find_all('td')[-2].text.replace('\n', '').replace('\r', ''))
AllAsymptomatic = int(table_row[index + 1].find_all('td')[-1].text.replace('\n', '').replace('\r', ''))
# print(f'NewAsymptomatic: {NewAsymptomatic}, AllAsymptomatic: {AllAsymptomatic}')
all_daily_data.append([date, NewInfection, AllInfection, NewAsymptomatic, AllAsymptomatic])
# Transform and save data
- dict data form:
SH_COVID19_DATA:
{“Date”:list[date], “NewInfection”: list[int], “NewAsymptomatic”: list[int],
“AllInfection”: list[int], “AllAsymptomatic”: list[int], “Death”: list[int] }
- transform to pandas dataframe
1 | def dict_to_DataFrame(data_dict: dict): |
- save pandas dataframe data to sqltable,csv,xlsx,json
main code:
1 | # sqlite table |
# plot data
via plotnine
- line-plot
1 | # import pandas as pd |
results:
- Calendar plot
1 | def calendar_map_Covid19data_SH(cal_data: pd.DataFrame): |
results:
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Cythnia!