λ³Έλ¬Έ λ°”λ‘œκ°€κΈ°
Python

[Python] type hinting

by arirang_ 2024. 12. 9.

πŸ“Œ λͺ©μ 

νƒ€μž… νžŒνŒ…μ€ μ½”λ“œ μž‘μ„± μ‹œ κ°œλ°œμžμ™€ λ‹€λ₯Έ μ‚¬μš©μžμ—κ²Œ ν•¨μˆ˜μ˜ μž…λ ₯ 및 좜λ ₯ νƒ€μž…μ— λŒ€ν•œ 정보λ₯Ό μ œκ³΅ν•˜μ—¬ μ½”λ“œλ₯Ό 더 λͺ…ν™•ν•˜κ³  μœ μ§€λ³΄μˆ˜ν•˜κ²Œ μ‰½κ²Œ λ§Œλ“ λ‹€.

 

πŸ“Œ μ˜ˆμ‹œ

1. λ°˜ν™˜κ°’ X

def print_world() -> None:
    print("world")

 

-> : ν•¨μˆ˜κ°€ μ–΄λ–€ νƒ€μž…μ˜ 값을 λ°˜ν™˜ν•˜λŠ”μ§€ λͺ…μ‹œν•˜λŠ” λ°˜ν™˜ νƒ€μž… 힌트(return type hint)λ₯Ό μ§€μ •ν•˜λŠ” 문법이닀.

-> None : ν•¨μˆ˜κ°€ 값을 λ°˜ν™˜ν•˜μ§€ μ•ŠμŒμ„ μ˜λ―Έν•œλ‹€. 즉 returnλ¬Έ 없이 μ‹€ν–‰λ§Œ ν•˜κ³  λλ‚˜λŠ” ν•¨μˆ˜μ΄λ‹€.

result = print_world()

resultλŠ” None이닀.

 

μ°Έκ³ : λ°˜ν™˜ νƒ€μž… νžŒνŠΈκ°€ μ—†λŠ” 경우

def print_world():
    print("world")

 

2. λ°˜ν™˜κ°’ int

def add(x: int, y: int) -> int:
    return x + y

result = add(3, 5)  # λ°˜ν™˜κ°’μ€ int νƒ€μž…

 


3. λ°˜ν™˜κ°’ 객체

def __call__(
        self,
        text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
        text_pair: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
        text_target: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
        text_pair_target: Optional[
            Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]
        ] = None,
        add_special_tokens: bool = True,
        padding: Union[bool, str, PaddingStrategy] = False,
        truncation: Union[bool, str, TruncationStrategy] = None,
        max_length: Optional[int] = None,
        stride: int = 0,
        is_split_into_words: bool = False,
        pad_to_multiple_of: Optional[int] = None,
        padding_side: Optional[bool] = None,
        return_tensors: Optional[Union[str, TensorType]] = None,
        return_token_type_ids: Optional[bool] = None,
        return_attention_mask: Optional[bool] = None,
        return_overflowing_tokens: bool = False,
        return_special_tokens_mask: bool = False,
        return_offsets_mapping: bool = False,
        return_length: bool = False,
        verbose: bool = True,
        **kwargs,
    ) -> BatchEncoding:
      
        # To avoid duplicating
        all_kwargs = {
            "add_special_tokens": add_special_tokens,
            "padding": padding,
            "truncation": truncation,
            "max_length": max_length,
            "stride": stride,
            "is_split_into_words": is_split_into_words,
            "pad_to_multiple_of": pad_to_multiple_of,
            "padding_side": padding_side,
            "return_tensors": return_tensors,
            "return_token_type_ids": return_token_type_ids,
            "return_attention_mask": return_attention_mask,
            "return_overflowing_tokens": return_overflowing_tokens,
            "return_special_tokens_mask": return_special_tokens_mask,
            "return_offsets_mapping": return_offsets_mapping,
            "return_length": return_length,
            "split_special_tokens": kwargs.pop("split_special_tokens", self.split_special_tokens),
            "verbose": verbose,
        }
        all_kwargs.update(kwargs)
        if text is None and text_target is None:
            raise ValueError("You need to specify either `text` or `text_target`.")
        if text is not None:
            # The context manager will send the inputs as normal texts and not text_target, but we shouldn't change the
            # input mode in this case.
            if not self._in_target_context_manager:
                self._switch_to_input_mode()
            encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
        if text_target is not None:
            self._switch_to_target_mode()
            target_encodings = self._call_one(text=text_target, text_pair=text_pair_target, **all_kwargs)
        # Leave back tokenizer in input mode
        self._switch_to_input_mode()

        if text_target is None:
            return encodings
        elif text is None:
            return target_encodings
        else:
            encodings["labels"] = target_encodings["input_ids"]
            return encodings

__call__ λ©”μŠ€λ“œμ˜ λ°˜ν™˜ νƒ€μž…μ€ Hugging Face의 tranformers λΌμ΄λΈŒλŸ¬λ¦¬μ—μ„œ μ œκ³΅ν•˜λŠ” BatchEncoding 객체이닀.

 

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# μž…λ ₯ ν…μŠ€νŠΈ
text = "I love AI."
text_target = "AI is amazing."

# ν† ν¬λ‚˜μ΄μ € 호좜
output = tokenizer(text=text, text_target=text_target, padding="max_length", max_length=10, return_tensors="pt")

print(output)

 

tokenizer λ‚΄λΆ€ ν•¨μˆ˜ __call__ 싀행됨

 

좜λ ₯ μ˜ˆμ‹œ

BatchEncoding(data={
    "input_ids": tensor([[  101,  1045,  2293,  9931,  1012,   102,     0,     0,     0,     0]]),
    "attention_mask": tensor([[1, 1, 1, 1, 1, 1, 0, 0, 0, 0]]),
    "labels": tensor([[  101,  9931,  2003,  6429,  1012,   102,     0,     0,     0,     0]])
})

 

πŸ“Œ 좜처 μ½”λ“œ

https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L2788

 

transformers/src/transformers/tokenization_utils_base.py at main · huggingface/transformers

πŸ€— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers

github.com

 

'Python' μΉ΄ν…Œκ³ λ¦¬μ˜ λ‹€λ₯Έ κΈ€

좔상 λ©”μ„œλ“œ (Abstract Method)  (1) 2024.12.10