# Spreadsheet Helper

<figure><img src="https://291121471-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-McrRFZHYH27bqKzOVDd%2Fuploads%2Fgham42FOFJNJT4tDhiuv%2Fimage.png?alt=media&#x26;token=7a2e96f7-6539-4746-aa9b-d6919edc6bd9" alt="" width="183"><figcaption></figcaption></figure>

## Overview

SpreadSheet Helper is a tool designed for performing operations similar to spreadsheets and SQL like operations, such as data grouping and joining. You need to specify the data you want to work with by referencing it.

## Infer data types

To keep data type consistent, you can choose whether to enable data type inference or leave it disabled (it's turned off by default).

![Infer data types option in all relevant actions](https://291121471-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-McrRFZHYH27bqKzOVDd%2Fuploads%2FAC1wmSNcmZUuSe3EoSdf%2Fimage.png?alt=media\&token=3d80cc01-4c22-4c93-b5d0-43c36d5bc2ea)

If this option is turned on, the Helper will try to automatically infer the data type of each column. This might cause unexpected results when working with zip codes and other numbers that can start with 0.

Otherwise, the data types will be left as provided by the source.

## Actions

### 1. Group columns by

Group a spreadsheet or CSV file and apply different aggregates and properties.

The available aggregation functions are:

* mean
* sum
* size
* count
* std
* var
* sem
* first
* last
* min
* max
* median

For example, let's find the lowest price among devices. To do this we need to group columns by `Product`\
Result columns must use the following syntax:\
`<column name or index>;<aggregation function><new column name (optional)>`\
In the example \
Price - column name\
min - aggregation function\
lowest\_price - new column name (optional)

<figure><img src="https://291121471-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-McrRFZHYH27bqKzOVDd%2Fuploads%2FUarag94naAhraaSf9ajz%2Fimage.png?alt=media&#x26;token=e4407def-9f18-44f2-8eec-c920a5f12532" alt=""><figcaption></figcaption></figure>

### 2. Insert column

Insert a column into a spreadsheet[ - Pandas Docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html) for further details

### 3. Join

Join multiple spreadsheets or tables with a common column (VLookup)[ - Pandas Docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) for further details.

This comes in handy if you want to map different tables based on common column data.

### 4. Query spreadsheet

The database is purely in memory and **SQLite** is used. See SQLite functions [here](https://sqlite.org/lang_corefunc.html).

Special cases not covered by SQLite:

* [Full outer join](https://www.sqlitetutorial.net/sqlite-full-outer-join/) - can be achieved like here

### 5. Remove duplicates

Remove duplicates from a spreadsheet[ - Pandas Docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html) for further details

### 6. Remove specific labels

Remove a specified row or column[ - Pandas Docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html) for further details

### 7. Return first n rows

Return the first n rows of a spreadsheet[ - Pandas Docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) for further details

### 8. Sort by

Sort the spreadsheet by one or more columns - [Pandas Docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html) for further details
