about 4 years ago

Make pickle Reliable with copyreg

在講 copyreg 這個內建的 module ,搭配 pickle 使用。

pickle 使用上很簡單,假設我們有個 class:

class GameState(object):
    def __init__(self):
        self.level = 0
        self.lives = 4

state = GameState()
state.level += 1  # Player beat a level

state.lives -= 1  # Player had to try again

可以用 pickle 保存 object

import pickle
state_path = '/tmp/game_state.bin'
with open(state_path, 'wb') as f:
    pickle.dump(state, f)

with open(state_path, 'rb') as f:
    state_after = pickle.load(f)
# {'lives': 3, 'level': 1}

print(state_after.__dict__)

但是如果增加了新的 field,game_state.bin load 回來的 object 當然不會有新的 field (points),可是它仍然是 GameState 的 instance,這會造成混亂。

class GameState(object):
    def __init__(self):
        self.level = 0
        self.lives = 4
        self.points = 0

with open(state_path, 'rb') as :
    state_after = pickle.load(f)
# {'lives': 3, 'level': 1}

print(state_after.__dict__)
assert isinstance(state_after, GameState)

使用 copyreg 可以解決這個問題,它可以註冊用來 serialize Python 物件的函式。

Default Attribute Values

pickle_game_state() 回傳一個 tuple ,包含了拿來 unpickle 的函式以及傳入該函式的引數。

import copyreg

class GameState(object):
    def __init__(self, level=0, lives=4, points=0):
        self.level = level
        self.lives = lives
        self.points = points

def pickle_game_state(game_state):
    kwargs = game_state.__dict__
    return unpickle_game_state, (kwargs,)

def unpickle_game_state(kwargs):
    return GameState(**kwargs)

copyreg.pickle(GameState, pickle_game_state)

Versioning Classes

copyreg 也可以拿來記錄版本,達到向後相容的目的。

假設原先的 class 如下

class GameState(object):
    def __init__(self, level=0, lives=4, points=0, magic=5):
        self.level = level
        self.lives = lives
        self.points = points
        self.magic = magic

state = GameState()
state.points += 1000
serialized = pickle.dumps(state)

後來修改了,拿掉 lives ,這時原先使用預設參數的做法不能用了。

class GameState(object):
    def __init__(self, level=0, points=0, magic=5):
        self.level = level
        self.points = points
        self.magic = magic

# TypeError: __init__() got an unexpected keyword argument 'lives'

pickle.loads(serialized)

在 serialize 時多加上版號, deserialize 時加以判斷

def pickle_game_state(game_state):
    kwargs = game_state.__dict__
    kwargs['version'] = 2
    return unpickle_game_state, (kwargs,)

def unpickle_game_state(kwargs):
    version = kwargs.pop('version', 1)
    if version == 1:
        kwargs.pop('lives')
    return GameState(**kwargs)

copyreg.pickle(GameState, pickle_game_state)

Stable Import Paths

重構程式時,如果 class 改名了,想要 load 舊的 serialized 物件當然不能用,但還是可以使用 copyreg 解決。

class BetterGameState(object):
    def __init__(self, level=0, points=0, magic=5):
        self.level = level
        self.points = points
        self.magic = magic

copyreg.pickle(BetterGameState, pickle_game_state)

可以發現 unpickle_game_state() 的 path 寫入 dump 出來的資料中,當然這樣做的缺點就是 unpickle_game_state() 所在的 module 不能改 path 了。

state = BetterGameState()
serialized = pickle.dumps(state)
print(serialized[:35])
>>>
b'\x80\x03c__main__\nunpickle_game_state\nq\x00}'
← Effective Python 心得筆記: Item 43 Python 慣用語 →
 
comments powered by Disqus