![]() dist_plot ( df ) # default representation of a distribution plot, other settings include fill_range, histogram. corr_plot ( df, target = 'wine' ) # default representation of correlations with the feature column corr_plot ( df, split = 'neg' ) # displaying only negative correlations ![]() corr_plot ( df, split = 'pos' ) # displaying only positive correlations, other settings include threshold, cmap. missingval_plot ( df ) # default representation of missing values in a DataFrame, plenty of settings are available loss of information Examplesįind all available examples as well as applications of the functions in klib.clean() with detailed descriptions here. pool_duplicate_subsets ( df ) # pools subset of cols based on duplicates with min. mv_col_handling ( df ) # drops features with high ratio of missing vals based on informational content - klib. drop_missing ( df ) # drops missing values, also called in data_cleaning() - klib. convert_datatypes ( df ) # converts existing to more efficient dtypes, also called inside data_cleaning() - klib. clean_column_names ( df ) # cleans and standardizes column names, also called inside data_cleaning() - klib. data_cleaning ( df ) # performs datacleaning (drop duplicates & empty rows/cols, adjust dtypes.) - klib. missingval_plot ( df ) # returns a figure containing information about missing values # klib.clean - functions for cleaning datasets - klib. dist_plot ( df ) # returns a distribution plot for every numeric feature - klib. corr_plot ( df ) # returns a color-encoded heatmap, ideal for correlations - klib. corr_mat ( df ) # returns a color-encoded correlation matrix - klib. cat_plot ( df ) # returns a visualization of the number and frequency of categorical features - klib. DataFrame ( data ) # scribe - functions for visualizing datasets - klib. Usage import klib import pandas as pd df = pd. Use the package manager pip to install klib.Īlternatively, to install this package with conda run: Additionally, there are great introductions and overviews of the functionality on PythonBytes or on YouTube (Data Professor). Explanations on key functionalities can be found on Medium / TowardsDataScience and in the examples section. Klib is a Python library for importing, cleaning, analyzing and preprocessing data.
0 Comments
Leave a Reply. |