CBFV
CBFV.FileNameCBFV.generatefeaturesCBFV.processelementdatabaseCBFV.processinputdataCBFV.readdatabasefile
CBFV.generatefeatures — Functiongeneratefeatures(data; elementdata,dropduplicate,combine,sumfeatures,returndataframe)
generatefeatures(dataname; kwargs...)This is the primary function for generating the CBFV features for a dataset of formulas with or without existing features. This function will process the input data and grab the provided element database. The assigning of features is then executed based on the CBFV approach. If the returndataframe=true then a DataFrame data type is returned by this function with the added columns :target and :formula.
I am not using OrderedDict so the column names will be arranged based on the native Dict ordering.
Arguments
data::DataFrame: This is the data set that you want to be featurized for example.elementdata::Union{String,FileName} or Union{String,DataFrame}: The name of the internal database or the file path and
name to an external database.
dropduplicate::Bool=true: Option to drop duplicate entries.combine::Bool=false: Option to combine existing features indatawith the generated feature set.sumfeatures::Bool=false: Option to include thesum_feature columns.returndataframe::Bool=true: Option to return aDataFrame. Will include:targetand:formulacolumns.
Returns
generatedataframe::DataFrameformulae::Vector{String}, features::Array{Number,2}, targets::Vector{Number}
The following featurization schemes are included within CBFV.jl:
oliynyk(default)magpiemat2vecjarvisonehotrandom_200
using DataFrames
using CBFV
d = DataFrame(:formula=>["Tc1V1","Cu1Dy1","Cd3N2"],:target=>[248.539,66.8444,91.5034])
generatefeatures(d)CBFV.processelementdatabase — Methodprocesselementdatabase(data)Takes the element feature dataframe and process it to return a dictionary with values of type Array{String,N}` and a Array representation of the entire database.
Arguments
data::DataFrame: element feature dataframe from database file
Returns
elementproperties::Dict{Symbol,Array{String,N}}: dictionary with keys:symbols,:index, and:missingwhich return Array{String,N} values for the dataframearrayrepresentation::Array{Any,2}: representation of the dataframe
CBFV.processinputdata — Methodprocessinputdata(datainput,elementdatabase)Take the data set that contains the formula's, target values, and additional features and then extract the elemental properties from the element database provided. Also get the column/feature used in the element properties.
Arguments
datainput::DataFrame: data containing columns:formulaand:target.elementfeatures::Array{Number,2}: element feature set based on database
Returns
elpropnames::Array{String,1}: The names of the properties in elemental databaseprocesseddata::Vector{Dict{Symbol,Any}}: The processed input data based on elemental database.
CBFV.readdatabasefile — Methodreaddatabasefile(pathtofile)Returns DataFrame of an elemental database file in databases/
Arguments
pathtofile::String: path to the CSV formatted file to readstringtype::Type{Union{String,InlineString}}=String:CSV.jlstring storage typepool::Bool=false:CSV.Filewill poolStringcolumn values.
Returns
data::DataFrame: the dataframe representation of the csv file.
Some of the behaviors of CSV.jl will create data types that are inconnsistant with the several function argument types in CBFV. If you use this function to read the data files the data frame constructed via CSV will work properly.
CBFV.FileName — Typegeneratefeatures Datatype for multiple dispatch. Allows for passing external database.