almost 7 years ago
Make pickle Reliable with copyreg
在講 copyreg
這個內建的 module ,搭配 pickle
使用。
pickle
使用上很簡單,假設我們有個 class:
class GameState(object):
def __init__(self):
self.level = 0
self.lives = 4
state = GameState()
state.level += 1 # Player beat a level
state.lives -= 1 # Player had to try again
可以用 pickle
保存 object
import pickle
state_path = '/tmp/game_state.bin'
with open(state_path, 'wb') as f:
pickle.dump(state, f)
with open(state_path, 'rb') as f:
state_after = pickle.load(f)
# {'lives': 3, 'level': 1}
print(state_after.__dict__)
但是如果增加了新的 field,game_state.bin
load 回來的 object 當然不會有新的 field (points),可是它仍然是 GameState 的 instance,這會造成混亂。
class GameState(object):
def __init__(self):
self.level = 0
self.lives = 4
self.points = 0
with open(state_path, 'rb') as :
state_after = pickle.load(f)
# {'lives': 3, 'level': 1}
print(state_after.__dict__)
assert isinstance(state_after, GameState)
使用 copyreg
可以解決這個問題,它可以註冊用來 serialize Python 物件的函式。
Default Attribute Values
pickle_game_state()
回傳一個 tuple ,包含了拿來 unpickle 的函式以及傳入該函式的引數。
import copyreg
class GameState(object):
def __init__(self, level=0, lives=4, points=0):
self.level = level
self.lives = lives
self.points = points
def pickle_game_state(game_state):
kwargs = game_state.__dict__
return unpickle_game_state, (kwargs,)
def unpickle_game_state(kwargs):
return GameState(**kwargs)
copyreg.pickle(GameState, pickle_game_state)
Versioning Classes
copyreg
也可以拿來記錄版本,達到向後相容的目的。
假設原先的 class 如下
class GameState(object):
def __init__(self, level=0, lives=4, points=0, magic=5):
self.level = level
self.lives = lives
self.points = points
self.magic = magic
state = GameState()
state.points += 1000
serialized = pickle.dumps(state)
後來修改了,拿掉 lives ,這時原先使用預設參數的做法不能用了。
class GameState(object):
def __init__(self, level=0, points=0, magic=5):
self.level = level
self.points = points
self.magic = magic
# TypeError: __init__() got an unexpected keyword argument 'lives'
pickle.loads(serialized)
在 serialize 時多加上版號, deserialize 時加以判斷
def pickle_game_state(game_state):
kwargs = game_state.__dict__
kwargs['version'] = 2
return unpickle_game_state, (kwargs,)
def unpickle_game_state(kwargs):
version = kwargs.pop('version', 1)
if version == 1:
kwargs.pop('lives')
return GameState(**kwargs)
copyreg.pickle(GameState, pickle_game_state)
Stable Import Paths
重構程式時,如果 class 改名了,想要 load 舊的 serialized 物件當然不能用,但還是可以使用 copyreg
解決。
class BetterGameState(object):
def __init__(self, level=0, points=0, magic=5):
self.level = level
self.points = points
self.magic = magic
copyreg.pickle(BetterGameState, pickle_game_state)
可以發現 unpickle_game_state()
的 path 寫入 dump 出來的資料中,當然這樣做的缺點就是 unpickle_game_state()
所在的 module 不能改 path 了。
state = BetterGameState()
serialized = pickle.dumps(state)
print(serialized[:35])
>>>
b'\x80\x03c__main__\nunpickle_game_state\nq\x00}'