Package ai.djl.nn.transformer
Class BertBlock.Builder
java.lang.Object
ai.djl.nn.transformer.BertBlock.Builder
- Enclosing class:
- BertBlock
-
Method Summary
Modifier and TypeMethodDescriptionbase()Sets this builder's params to the BASE config of the original BERT paper.build()Returns a new BertBlock with the parameters of this builder.large()Sets this builder's params to the LARGE config of the original BERT paper.micro()Sets this builder's params to a minimal configuration that nevertheless performs quite well.nano()Tiny config for testing on laptops.optAttentionHeadCount(int attentionHeadCount) Sets the number of attention heads to use in each transformer block.optEmbeddingSize(int embeddingSize) Sets the embedding size to use for input tokens.optHiddenDropoutProbability(float hiddenDropoutProbability) Sets the dropout probabilty in the hidden fully connected networks.optHiddenSize(int hiddenSize) Sets the size of the hidden layers in the fully connected networks used.optMaxSequenceLength(int maxSequenceLength) Sets the maximum sequence length this model can process.optTransformerBlockCount(int transformerBlockCount) Sets the number of transformer blocks to use.optTypeDictionarySize(int typeDictionarySize) Sets the number of possible token types.setTokenDictionarySize(int tokenDictionarySize) Sets the number of tokens in the dictionary.
-
Method Details
-
setTokenDictionarySize
Sets the number of tokens in the dictionary.- Parameters:
tokenDictionarySize- the number of tokens in the dictionary- Returns:
- this builder
-
optTypeDictionarySize
Sets the number of possible token types. This should be a very small number (2-16).- Parameters:
typeDictionarySize- the number of possible token types. This should be a very small number (2-16)- Returns:
- this builder
-
optEmbeddingSize
Sets the embedding size to use for input tokens. This size must be divisible by the number of attention heads.- Parameters:
embeddingSize- the embedding size to use for input tokens.- Returns:
- this builder
-
optTransformerBlockCount
Sets the number of transformer blocks to use.- Parameters:
transformerBlockCount- the number of transformer blocks to use- Returns:
- this builder
-
optAttentionHeadCount
Sets the number of attention heads to use in each transformer block. This number must divide the embedding size without rest.- Parameters:
attentionHeadCount- the number of attention heads to use in each transformer block.- Returns:
- this builder
-
optHiddenSize
Sets the size of the hidden layers in the fully connected networks used.- Parameters:
hiddenSize- the size of the hidden layers in the fully connected networks used.- Returns:
- this builder
-
optHiddenDropoutProbability
Sets the dropout probabilty in the hidden fully connected networks.- Parameters:
hiddenDropoutProbability- the dropout probabilty in the hidden fully connected networks.- Returns:
- this builder
-
optMaxSequenceLength
Sets the maximum sequence length this model can process. Memory and compute requirements of the attention mechanism is O(n²), so large values can easily exhaust your GPU memory!- Parameters:
maxSequenceLength- the maximum sequence length this model can process.- Returns:
- this builder
-
nano
Tiny config for testing on laptops.- Returns:
- this builder
-
micro
Sets this builder's params to a minimal configuration that nevertheless performs quite well.- Returns:
- this builder
-
base
Sets this builder's params to the BASE config of the original BERT paper. (except for the dictionary size)- Returns:
- this builder
-
large
Sets this builder's params to the LARGE config of the original BERT paper. (except for the dictionary size)- Returns:
- this builder
-
build
Returns a new BertBlock with the parameters of this builder.- Returns:
- a new BertBlock with the parameters of this builder.
-