CBFV
CBFV.FileName
CBFV.generatefeatures
CBFV.processelementdatabase
CBFV.processinputdata
CBFV.readdatabasefile
CBFV.generatefeatures
— Functiongeneratefeatures(data; elementdata,dropduplicate,combine,sumfeatures,returndataframe)
generatefeatures(dataname; kwargs...)
This is the primary function for generating the CBFV features for a dataset of formulas with or without existing features. This function will process the input data and grab the provided element database. The assigning of features is then executed based on the CBFV approach. If the returndataframe=true
then a DataFrame
data type is returned by this function with the added columns :target
and :formula
.
I am not using OrderedDict
so the column names will be arranged based on the native Dict
ordering.
Arguments
data::DataFrame
: This is the data set that you want to be featurized for example.elementdata::Union{String,FileName} or Union{String,DataFrame}
: The name of the internal database or the file path and
name to an external database.
dropduplicate::Bool=true
: Option to drop duplicate entries.combine::Bool=false
: Option to combine existing features indata
with the generated feature set.sumfeatures::Bool=false
: Option to include thesum_
feature columns.returndataframe::Bool=true
: Option to return aDataFrame
. Will include:target
and:formula
columns.
Returns
generatedataframe::DataFrame
formulae::Vector{String}, features::Array{Number,2}, targets::Vector{Number}
The following featurization schemes are included within CBFV.jl:
oliynyk
(default)magpie
mat2vec
jarvis
onehot
random_200
using DataFrames
using CBFV
d = DataFrame(:formula=>["Tc1V1","Cu1Dy1","Cd3N2"],:target=>[248.539,66.8444,91.5034])
generatefeatures(d)
CBFV.processelementdatabase
— Methodprocesselementdatabase(data)
Takes the element feature dataframe and process it to return a dictionary with values of type Array{String,N}
` and a Array representation of the entire database.
Arguments
data::DataFrame
: element feature dataframe from database file
Returns
elementproperties::Dict{Symbol,Array{String,N}}
: dictionary with keys:symbols
,:index
, and:missing
which return Array{String,N} values for the dataframearrayrepresentation::Array{Any,2}
: representation of the dataframe
CBFV.processinputdata
— Methodprocessinputdata(datainput,elementdatabase)
Take the data set that contains the formula's, target values, and additional features and then extract the elemental properties from the element database provided. Also get the column/feature used in the element properties.
Arguments
datainput::DataFrame
: data containing columns:formula
and:target
.elementfeatures::Array{Number,2}
: element feature set based on database
Returns
elpropnames::Array{String,1}
: The names of the properties in elemental databaseprocesseddata::Vector{Dict{Symbol,Any}}
: The processed input data based on elemental database.
CBFV.readdatabasefile
— Methodreaddatabasefile(pathtofile)
Returns DataFrame of an elemental database file in databases/
Arguments
pathtofile::String
: path to the CSV formatted file to readstringtype::Type{Union{String,InlineString}}=String
:CSV.jl
string storage typepool::Bool=false
:CSV.File
will poolString
column values.
Returns
data::DataFrame
: the dataframe representation of the csv file.
Some of the behaviors of CSV.jl
will create data types that are inconnsistant with the several function argument types in CBFV
. If you use this function to read the data files the data frame constructed via CSV will work properly.
CBFV.FileName
— Typegeneratefeatures Datatype for multiple dispatch. Allows for passing external database.