Skip to content

Markdown formatted cell value #1572

@cod3monk

Description

@cod3monk

google spreadsheets allow formatting within cells.

I needed to transform it into markdown for further processing. Here is a code snippet, which retrieves the formatting using gspread and then transforms the formatting to markdown.

def gspread_get_markdown(worksheet, ranges=None):
    """Load data from gspread.worksheet with formatting and transform to markdown"""
    cell_data = worksheet.client.spreadsheets_get(
        worksheet.spreadsheet_id,
        params={'ranges': ranges,
                'fields': 'sheets(data(rowData(values(formattedValue,hyperlink,textFormatRuns))))'})
    
    data = []
    for row_idx, row in enumerate(cell_data['sheets'][0]['data'][0]['rowData']):
        row_data = []
        for col_idx, cell in enumerate(row.get('values', [])):
            row_data.append(gspread_cell_to_markdown(cell))
        data.append(row_data)
    return data
    

def gspread_cell_to_markdown(cell_value):
    """render gspread formatted cell to markdown"""
    # with inspiration from https://stackoverflow.com/a/77413771/2754040
    # format documentation:
    # https://developers.google.com/workspace/sheets/api/reference/rest/v4/spreadsheets/other?hl=de#TextFormat
    c = cell_value
    if "hyperlink" in c:
        md = f"[{c['formattedValue']}]({c['hyperlink']})"
    elif "textFormatRuns" in c:
        md = c['formattedValue']
        last_idx = len(md)
        for f in c['textFormatRuns'][::-1]:  # reverse iterate so startIndex matches md-string index
            start = ''
            end = ''
            if 'startIndex' in f:
                start_index = f['startIndex']
            else:
                start_index = 0
            if 'link' in f['format']:
                start += '['
                end += '](' + f['format']['link']['uri'] + ')'
            if f['format'].get('bold', False):
                start += '**'
                end += '**'
            if f['format'].get('italic', False):
                start += '*'
                end += '*'
            if f['format'].get('strikethrough', False):
                start += '~~'
                end += '~~'
            md = (
                    md[:start_index] +
                    start +
                    md[start_index:last_idx] + 
                    end +
                    md[last_idx:]
                )
            last_idx = start_index
    else:
        md = c.get('formattedValue', '')

    return md

This could be a useful utility to be included.

Currently not supported is the cell-global formatting, but should be easy to add if interested. I may create a pull request, if this is something that others want to see included in gspread.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions