faknow.model.content_based
- faknow.model.content_based.multi_modal
- faknow.model.content_based.multi_modal.cafe
- faknow.model.content_based.multi_modal.eann
- faknow.model.content_based.multi_modal.hmcan
- faknow.model.content_based.multi_modal.mcan
- faknow.model.content_based.multi_modal.mfan
- faknow.model.content_based.multi_modal.safe
- faknow.model.content_based.multi_modal.spotfake
faknow.model.content_based.endef
- class faknow.model.content_based.endef.ENDEF(pre_trained_bert_name: str, base_model: AbstractModel, mlp_dims: List[int] | None = None, dropout_rate=0.2, entity_weight=0.1, loss_weight=0.2)[source]
Bases:
AbstractModel
Generalizing to the Future: Mitigating Entity Bias in Fake News Detection, SIGIR 2022 paper: https://dl.acm.org/doi/10.1145/3477495.3531816 code: https://github.com/ICTMCG/ENDEF-SIGIR2022
- __init__(pre_trained_bert_name: str, base_model: AbstractModel, mlp_dims: List[int] | None = None, dropout_rate=0.2, entity_weight=0.1, loss_weight=0.2)[source]
- Parameters:
pre_trained_bert_name (str) – the name or local path of pre-trained bert model
base_model (AbstractModel) – the base model(content_based) using with entity features
mlp_dims (List[int]) – a list of the dimensions in MLP layer, if None, [384] will be taken as default
dropout_rate (float) – dropout rate. Default=0.2
entity_weight (float) – the weight of entity in train. Default=0.1
loss_weight (float) – the weight of entity in loss. Default=0.2
- calculate_loss(data) Tensor [source]
calculate loss via BCELoss
- Parameters:
data (dict) – batch data dict
- Returns:
loss value
- Return type:
loss (Tensor)
- dict_to_dict(inputs: Dict)[source]
change inputs to one layer dict if it’s nesting
- Parameters:
inputs (Dict) – dict need to be processed
- forward(base_model_params: Dict, entity_token_id: Tensor, entity_mask: Tensor)[source]
- Parameters:
base_model_params (Dict) – a dictionary including all param base_model.forward() need
entity_token_id (Tensor) – entity’s token ids from bert tokenizer, shape=(batch, max_len)
entity_mask (Tensor) – mask from bert tokenizer, shape=(batch_size, max_len)
- Returns:
unbiased_prediction(Tensor): considering both biased_predction and entity_prediction, prediction of being fake, shape = (batch_size, ).
entity_prediction(Tensor): prediction of entity shape = (batch_size, ).
- Return type:
tuple
- predict(data_without_label) Tensor [source]
predict the probability of being fake news
- Parameters:
data_without_label (Dict[str, Any]) – batch data dict
- Returns:
shape is same as base_model
- Return type:
Tensor
- training: bool
faknow.model.content_based.m3fend
- class faknow.model.content_based.m3fend.M3FEND(emb_dim, mlp_dims, dropout, semantic_num, emotion_num, style_num, LNN_dim, domain_num, dataset)[source]
Bases:
AbstractModel
M3FEND: Memory-Guided Multi-View Multi-Domain Fake News Detection paper: https://ieeexplore.ieee.org/document/9802916, TKDE 2022 code: https://github.com/ICTMCG/M3FEND
- __init__(emb_dim, mlp_dims, dropout, semantic_num, emotion_num, style_num, LNN_dim, domain_num, dataset)[source]
- Parameters:
emb_dim (int) – Dimensionality of the embeddings.
mlp_dims (List[int]) – List of dimensions for the MLP layers.
dropout (float) – Dropout probability.
semantic_num (int) – Number of semantic experts.
emotion_num (int) – Number of emotion experts.
style_num (int) – Number of style experts.
LNN_dim (int) – Dimensionality of the Latent Neural Network (LNN).
domain_num (int) – Number of domains.
dataset (str) – Dataset identifier (‘ch’ for Chinese, ‘en’ for English).
- calculate_loss(batch_data) Tensor [source]
Calculate the loss for the M3FEND model.
- Parameters:
batch_data (Dict[str, Tensor]) –
'content' (Input data containing) –
'content_masks' –
'comments' –
'comments_masks' –
'content_emotion' –
:param : :param ‘comments_emotion’: :param ‘emotion_gap’: :param ‘style_feature’: :param ‘category’: :param ‘label’ tensors.:
- Returns:
loss
- Return type:
Tensor
- forward(**kwargs)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- init_memory()[source]
通过 K-Means 聚类,为每个领域创建一个域内存,该域内存包含了该领域内样本特征的聚类中心 这有助于模型学习领域内的代表性特征,提高模型对不同领域数据的适应能力
- predict(data_without_label) Tensor [source]
predict the probability of being fake news
- Parameters:
data_without_label (Dict[str, Tensor]) – batch data dict
- Returns:
softmax probability, shape=(batch_size, 2)
- Return type:
Tensor
- save_feature(**kwargs)[source]
这段代码的作用是将所有样本的归一化特征按照它们的领域(domain)信息保存在self.all_feature 字典中。 键对应一个领域(domain)的整数,值是一个包含该领域所有样本特征的列表。每个特征都以 NumPy 数组的形式表示。
- training: bool
- class faknow.model.content_based.m3fend.MemoryNetwork(input_dim, emb_dim, domain_num, memory_num=10)[source]
Bases:
Module
- __init__(input_dim, emb_dim, domain_num, memory_num=10)[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(feature, category)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
faknow.model.content_based.mdfend
- class faknow.model.content_based.mdfend.MDFEND(pre_trained_bert_name: str, domain_num: int, mlp_dims: List[int] | None = None, dropout_rate=0.2, expert_num=5)[source]
Bases:
AbstractModel
MDFEND: Multi-domain Fake News Detection, CIKM 2021 paper: https://dl.acm.org/doi/10.1145/3459637.3482139 code: https://github.com/kennqiang/MDFEND-Weibo21
- __init__(pre_trained_bert_name: str, domain_num: int, mlp_dims: List[int] | None = None, dropout_rate=0.2, expert_num=5)[source]
- Parameters:
pre_trained_bert_name (str) – the name or local path of pre-trained bert model
domain_num (int) – total number of all domains
mlp_dims (List[int]) – a list of the dimensions in MLP layer, if None, [384] will be taken as default, default=384
dropout_rate (float) – rate of Dropout layer, default=0.2
expert_num (int) – number of experts also called TextCNNLayer, default=5
- calculate_loss(data) Tensor [source]
calculate loss via BCELoss
- Parameters:
data (dict) – batch data dict
- Returns:
loss value
- Return type:
loss (Tensor)
- forward(token_id: Tensor, mask: Tensor, domain: Tensor) Tensor [source]
- Parameters:
token_id (Tensor) – token ids from bert tokenizer, shape=(batch_size, max_len)
mask (Tensor) – mask from bert tokenizer, shape=(batch_size, max_len)
domain (Tensor) – domain id, shape=(batch_size,)
- Returns:
the prediction of being fake, shape=(batch_size,)
- Return type:
FloatTensor
- predict(data_without_label) Tensor [source]
predict the probability of being fake news
- Parameters:
data_without_label (Dict[str, Any]) – batch data dict
- Returns:
one-hot probability, shape=(batch_size, 2)
- Return type:
Tensor
- training: bool
faknow.model.content_based.textcnn
- class faknow.model.content_based.textcnn.TextCNN(word_vectors: ~torch.Tensor, filter_num=100, kernel_sizes: ~typing.List[int] | None = None, activate_func: ~typing.Callable | None = <function relu>, dropout=0.5, freeze=False)[source]
Bases:
AbstractModel
Convolutional Neural Networks for Sentence Classification, EMNLP 2014 paper: https://aclanthology.org/D14-1181/ code: https://github.com/yoonkim/CNN_sentence
- __init__(word_vectors: ~torch.Tensor, filter_num=100, kernel_sizes: ~typing.List[int] | None = None, activate_func: ~typing.Callable | None = <function relu>, dropout=0.5, freeze=False)[source]
- Parameters:
word_vectors (torch.Tensor) – weights of word embedding layer, shape=(vocab_size, embedding_size)
filter_num (int) – number of filters in conv layer. Default=100
kernel_sizes (List[int]) – list of different kernel_num sizes for TextCNNLayer. Default=[3, 4, 5]
activate_func (Callable) – activate function for TextCNNLayer. Default=relu
dropout (float) – drop out rate of fully connected layer. Default=0.5
freeze (bool) – whether to freeze weights in word embedding layer while training. Default=False
- calculate_loss(data) Tensor [source]
calculate loss via CrossEntropyLoss
- Parameters:
data – batch data tuple
- Returns:
loss
- Return type:
torch.Tensor
- forward(text: Tensor) Tensor [source]
- Parameters:
text – batch data, shape=(batch_size, max_len)
- Returns:
output, shape=(batch_size, 2)
- Return type:
Tensor
- predict(data_without_label)[source]
predict the probability of being fake news
- Parameters:
data_without_label – batch data
- Returns:
softmax probability, shape=(batch_size, 2)
- Return type:
Tensor
- training: bool