faknow.model.content_based

faknow.model.content_based.endef

class faknow.model.content_based.endef.ENDEF(pre_trained_bert_name: str, base_model: AbstractModel, mlp_dims: List[int] | None = None, dropout_rate=0.2, entity_weight=0.1, loss_weight=0.2)[source]

Bases: AbstractModel

Generalizing to the Future: Mitigating Entity Bias in Fake News Detection, SIGIR 2022 paper: https://dl.acm.org/doi/10.1145/3477495.3531816 code: https://github.com/ICTMCG/ENDEF-SIGIR2022

__init__(pre_trained_bert_name: str, base_model: AbstractModel, mlp_dims: List[int] | None = None, dropout_rate=0.2, entity_weight=0.1, loss_weight=0.2)[source]
Parameters:
  • pre_trained_bert_name (str) – the name or local path of pre-trained bert model

  • base_model (AbstractModel) – the base model(content_based) using with entity features

  • mlp_dims (List[int]) – a list of the dimensions in MLP layer, if None, [384] will be taken as default

  • dropout_rate (float) – dropout rate. Default=0.2

  • entity_weight (float) – the weight of entity in train. Default=0.1

  • loss_weight (float) – the weight of entity in loss. Default=0.2

calculate_loss(data) Tensor[source]

calculate loss via BCELoss

Parameters:

data (dict) – batch data dict

Returns:

loss value

Return type:

loss (Tensor)

dict_to_dict(inputs: Dict)[source]

change inputs to one layer dict if it’s nesting

Parameters:

inputs (Dict) – dict need to be processed

forward(base_model_params: Dict, entity_token_id: Tensor, entity_mask: Tensor)[source]
Parameters:
  • base_model_params (Dict) – a dictionary including all param base_model.forward() need

  • entity_token_id (Tensor) – entity’s token ids from bert tokenizer, shape=(batch, max_len)

  • entity_mask (Tensor) – mask from bert tokenizer, shape=(batch_size, max_len)

Returns:

  • unbiased_prediction(Tensor): considering both biased_predction and entity_prediction, prediction of being fake, shape = (batch_size, ).

  • entity_prediction(Tensor): prediction of entity shape = (batch_size, ).

Return type:

tuple

predict(data_without_label) Tensor[source]

predict the probability of being fake news

Parameters:

data_without_label (Dict[str, Any]) – batch data dict

Returns:

shape is same as base_model

Return type:

Tensor

training: bool

faknow.model.content_based.m3fend

class faknow.model.content_based.m3fend.M3FEND(emb_dim, mlp_dims, dropout, semantic_num, emotion_num, style_num, LNN_dim, domain_num, dataset)[source]

Bases: AbstractModel

M3FEND: Memory-Guided Multi-View Multi-Domain Fake News Detection paper: https://ieeexplore.ieee.org/document/9802916, TKDE 2022 code: https://github.com/ICTMCG/M3FEND

__init__(emb_dim, mlp_dims, dropout, semantic_num, emotion_num, style_num, LNN_dim, domain_num, dataset)[source]
Parameters:
  • emb_dim (int) – Dimensionality of the embeddings.

  • mlp_dims (List[int]) – List of dimensions for the MLP layers.

  • dropout (float) – Dropout probability.

  • semantic_num (int) – Number of semantic experts.

  • emotion_num (int) – Number of emotion experts.

  • style_num (int) – Number of style experts.

  • LNN_dim (int) – Dimensionality of the Latent Neural Network (LNN).

  • domain_num (int) – Number of domains.

  • dataset (str) – Dataset identifier (‘ch’ for Chinese, ‘en’ for English).

calculate_loss(batch_data) Tensor[source]

Calculate the loss for the M3FEND model.

Parameters:
  • batch_data (Dict[str, Tensor]) –

  • 'content' (Input data containing) –

  • 'content_masks'

  • 'comments'

  • 'comments_masks'

  • 'content_emotion'

:param : :param ‘comments_emotion’: :param ‘emotion_gap’: :param ‘style_feature’: :param ‘category’: :param ‘label’ tensors.:

Returns:

loss

Return type:

Tensor

forward(**kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_memory()[source]

通过 K-Means 聚类,为每个领域创建一个域内存,该域内存包含了该领域内样本特征的聚类中心 这有助于模型学习领域内的代表性特征,提高模型对不同领域数据的适应能力

predict(data_without_label) Tensor[source]

predict the probability of being fake news

Parameters:

data_without_label (Dict[str, Tensor]) – batch data dict

Returns:

softmax probability, shape=(batch_size, 2)

Return type:

Tensor

save_feature(**kwargs)[source]

这段代码的作用是将所有样本的归一化特征按照它们的领域(domain)信息保存在self.all_feature 字典中。 键对应一个领域(domain)的整数,值是一个包含该领域所有样本特征的列表。每个特征都以 NumPy 数组的形式表示。

training: bool
write(**kwargs)[source]
class faknow.model.content_based.m3fend.MemoryNetwork(input_dim, emb_dim, domain_num, memory_num=10)[source]

Bases: Module

__init__(input_dim, emb_dim, domain_num, memory_num=10)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(feature, category)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
write(all_feature, category)[source]
faknow.model.content_based.m3fend.cal_length(x)[source]
faknow.model.content_based.m3fend.convert_to_onehot(label, batch_size, num)[source]
faknow.model.content_based.m3fend.norm(x)[source]

faknow.model.content_based.mdfend

class faknow.model.content_based.mdfend.MDFEND(pre_trained_bert_name: str, domain_num: int, mlp_dims: List[int] | None = None, dropout_rate=0.2, expert_num=5)[source]

Bases: AbstractModel

MDFEND: Multi-domain Fake News Detection, CIKM 2021 paper: https://dl.acm.org/doi/10.1145/3459637.3482139 code: https://github.com/kennqiang/MDFEND-Weibo21

__init__(pre_trained_bert_name: str, domain_num: int, mlp_dims: List[int] | None = None, dropout_rate=0.2, expert_num=5)[source]
Parameters:
  • pre_trained_bert_name (str) – the name or local path of pre-trained bert model

  • domain_num (int) – total number of all domains

  • mlp_dims (List[int]) – a list of the dimensions in MLP layer, if None, [384] will be taken as default, default=384

  • dropout_rate (float) – rate of Dropout layer, default=0.2

  • expert_num (int) – number of experts also called TextCNNLayer, default=5

calculate_loss(data) Tensor[source]

calculate loss via BCELoss

Parameters:

data (dict) – batch data dict

Returns:

loss value

Return type:

loss (Tensor)

forward(token_id: Tensor, mask: Tensor, domain: Tensor) Tensor[source]
Parameters:
  • token_id (Tensor) – token ids from bert tokenizer, shape=(batch_size, max_len)

  • mask (Tensor) – mask from bert tokenizer, shape=(batch_size, max_len)

  • domain (Tensor) – domain id, shape=(batch_size,)

Returns:

the prediction of being fake, shape=(batch_size,)

Return type:

FloatTensor

predict(data_without_label) Tensor[source]

predict the probability of being fake news

Parameters:

data_without_label (Dict[str, Any]) – batch data dict

Returns:

one-hot probability, shape=(batch_size, 2)

Return type:

Tensor

training: bool

faknow.model.content_based.textcnn

class faknow.model.content_based.textcnn.TextCNN(word_vectors: ~torch.Tensor, filter_num=100, kernel_sizes: ~typing.List[int] | None = None, activate_func: ~typing.Callable | None = <function relu>, dropout=0.5, freeze=False)[source]

Bases: AbstractModel

Convolutional Neural Networks for Sentence Classification, EMNLP 2014 paper: https://aclanthology.org/D14-1181/ code: https://github.com/yoonkim/CNN_sentence

__init__(word_vectors: ~torch.Tensor, filter_num=100, kernel_sizes: ~typing.List[int] | None = None, activate_func: ~typing.Callable | None = <function relu>, dropout=0.5, freeze=False)[source]
Parameters:
  • word_vectors (torch.Tensor) – weights of word embedding layer, shape=(vocab_size, embedding_size)

  • filter_num (int) – number of filters in conv layer. Default=100

  • kernel_sizes (List[int]) – list of different kernel_num sizes for TextCNNLayer. Default=[3, 4, 5]

  • activate_func (Callable) – activate function for TextCNNLayer. Default=relu

  • dropout (float) – drop out rate of fully connected layer. Default=0.5

  • freeze (bool) – whether to freeze weights in word embedding layer while training. Default=False

calculate_loss(data) Tensor[source]

calculate loss via CrossEntropyLoss

Parameters:

data – batch data tuple

Returns:

loss

Return type:

torch.Tensor

forward(text: Tensor) Tensor[source]
Parameters:

text – batch data, shape=(batch_size, max_len)

Returns:

output, shape=(batch_size, 2)

Return type:

Tensor

predict(data_without_label)[source]

predict the probability of being fake news

Parameters:

data_without_label – batch data

Returns:

softmax probability, shape=(batch_size, 2)

Return type:

Tensor

training: bool