This guide is designed for non-coders and beginners who want to scrape URLs or extract data from web pages into Google Sheets by simply copying and pasting code into the app script.
Important Note:
We recommend not scraping URLs if it is not explicitly allowed. Always check the website's Terms of Service (TOS) before attempting to extract data. Proceed at your own risk. Violating a website's TOS may have legal consequences
What Is the GETURL
Function?
The GETURL
function allows you to extract content from web pages directly into your Google Sheets. It can fetch the page's title, headings, meta descriptions, paragraphs, and more. You can specify what you want to extract, or let the function retrieve a default set of elements like the page title and main headings.
Step-by-Step Setup
1. Open Your Google Sheet
- Go to Google Sheets.
- Open an existing spreadsheet or create a new one.
2. Open the Apps Script Editor
- In the menu bar of your spreadsheet, click on Extensions.
- Select Apps Script. This will open the Apps Script editor in a new tab.
3. Paste the Code
- In the Apps Script editor, remove any placeholder code that appears by default in appscript editor.
- Copy the entire code provided in this link and paste it into the app script editor.
4. Save and Authorize
- Click the disk icon (or press
Ctrl + S
) to save the script. - Give the script a name like
WebScraper
. - When you save or run the function for the first time, it will ask for authorization.
- Click Review Permissions.
- Select your Google account.
- Review the permissions and click Allow.
This allows the script to access external URLs and update your spreadsheet.
How to Use the GETURL
Function
Once the script is set up, you can use the GETURL
function directly in your Google Sheets.
Function Format:
Parameters:
url
(required):
The webpage URL you want to extract data from.
Example: "https://example.com"
component
(optional):
The specific part of the webpage to extract. If not provided, the function will return a default set of components (like the page title and headings).
Example: "title"
for the page title, "h1"
for the main heading, or "meta:description"
for the meta description.
Examples:
1. Extract the Title of a Page
In a cell, type:
This will fetch the <title>
tag from the webpage and display it in the cell.
2. Extract All <h1>
Headings
This will fetch all the <h1>
headings on the page.
3. Extract Meta Description
This will fetch the meta description content from the page.
4. Use the Function Without a Component
This will extract multiple elements (title, headings, and paragraphs) as defined in the script’s settings.
Available Settings in the Script
You can adjust the behavior of the function by modifying settings directly in the Apps Script editor. These settings are found at the top of the script.
max_character_returned
(default: 10000):
Limits the number of characters returned by the function to prevent excessively long outputs. Increase this value if you want more content in your results.
Example:
follow_redirect
(default: true):
Determines whether the function follows redirects (e.g., when a URL redirects to another page).
Example:
retry_max
(default: 3):
The maximum number of times the function retries fetching a URL if it encounters an error.
Example:
user_agent
(default: "Mozilla/5.0"):
The User-Agent string sent with the request. This can be changed if you encounter issues with websites blocking certain types of requests.
Example:
defaultExtractionList
:
Defines the components to extract when no component
parameter is provided. By default, it includes title
, h1
, h2
, p
, etc. You can add or remove components from this list.
Example:
Common Issues and Troubleshooting
1. Authorization Issues
If you encounter an error when trying to use the function, ensure that you have authorized the script.
- Go to Extensions > Apps Script.
- In the Apps Script editor, click Run > geturl to trigger the authorization process.
2. Malformed HTML
Some websites may have poorly formatted HTML, which can cause parsing errors. In such cases, the function uses fallback methods (like regex) to extract content. However, the results might not be perfect for heavily JavaScript-driven pages.
3.Blocking Websites
Some websites may block automated requests or requests that don’t appear to come from a regular browser. This could result in a timeout error or a response with no content.
Important Note:
We recommend not scraping URLs if it is not explicitly allowed. Always check the website's Terms of Service (TOS) before attempting to extract data. Proceed at your own risk. Violating a website's TOS may have legal consequences
4. JavaScript-Rendered Pages
If the webpage relies on JavaScript to load its content dynamically, the function may only fetch the initial HTML. This can lead to incomplete results. Unfortunately, Apps Script doesn’t support executing JavaScript on fetched pages.
4. Rate Limits
If you use the function frequently or scrape many URLs, some websites may block your IP temporarily. To avoid this:
- Add a delay between requests (manually or via script logic).
- Use a custom
user_agent
to mimic a real browser.
5. Blank Results
If the function returns blank:
- Ensure the URL is correct and accessible.
- Check if the requested component exists on the page.
- Increase the retry attempts (
retry_max
) in the script if the page is slow to load.
Summary of Features
What It Can Extract:
- Title:
"title"
- Headings:
"h1"
, "h2"
, ..., "h6"
- Text Content:
"p"
, "div"
, "ul"
, "li"
, "table"
- Images:
"img"
(all image URLs) - Meta Tags:
"meta:description"
, "meta:keywords"
, "meta:title"
When to Use a Parameter:
- If you know the specific element to extract (e.g.,
"h1"
), provide it as the second parameter. - If you want multiple elements (title, headings, etc.), omit the second parameter.
Why Use This Script?
- No Coding Needed: Once set up, use it like any other Google Sheets formula.
- Highly Customizable: Adjust settings to fit your needs.
- Reliable Multi-Step Parsing: Attempts XML parsing first, then uses fallback methods for problematic pages.
With this setup, you can easily scrape and analyze data from web pages without leaving your spreadsheet!
/**
* Fetches a webpage and extracts specific or default components (title, headings, paragraphs, etc.).
*
* Usage Examples:
* 1) Single Parameter:
* =GETURL("https://www.example.com")
* - Extracts the default set of components (title, h1-h6, p, etc.) as configured in the script.
*
* 2) Two Parameters:
* =GETURL("https://www.example.com", "title")
* - Extracts only the of the page.
*
* @param {string} url The URL of the webpage you want to scrape (e.g. "https://example.com").
* @param {string} [component] (Optional) The element or metadata to extract, e.g. "title", "h1", "meta:description".
* If omitted, the script returns multiple default components.
* @return {string} The extracted content from the webpage.
* @customfunction
*/
function GETURL(url, component) {
// ========== CUSTOMIZABLE SETTINGS ==========
/**
* Maximum number of characters returned in the final output.
* You can increase this if you want more text per cell.
*/
var max_character_returned = 10000;
/**
* Whether to follow HTTP redirects automatically.
* Valid values: true or false.
*/
var follow_redirect = true;
/**
* Maximum number of times to retry fetching the URL if it fails.
*/
var retry_max = 3;
/**
* The User-Agent string to send with the request.
* Some websites behave differently based on the User-Agent.
*/
var user_agent = "Mozilla/5.0";
/**
* A list of components to extract if 'component' parameter is not provided.
* You can add/remove items (e.g. "img", "table") as desired.
*/
var defaultExtractionList = ["title", "h1", "h2", "h3", "h4", "h5", "h6", "p"];
// ========== FUNCTION LOGIC ==========
// Basic validation
if (!url) {
Logger.log("Error: Missing URL parameter.");
return "Error: Please provide a URL.";
}
try {
Logger.log("GETURL called with URL: " + url +
(component ? (", component: " + component) : ", no component provided"));
// Fetch HTML with retries
var htmlContent = fetchWithRetries(url, follow_redirect, retry_max, user_agent);
if (!htmlContent) {
Logger.log("Error: Unable to fetch the URL after retries: " + url);
return "Error: Unable to fetch the provided URL.";
}
// Attempt to parse HTML normally
var document = parseHtmlRobust(htmlContent);
if (!document) {
Logger.log("First parse attempt failed. Trying to clean the HTML and re-parse...");
var cleaned = cleanHtmlContent(htmlContent);
document = parseHtmlRobust(cleaned);
}
// If no component is provided, use the defaultExtractionList
if (!component) {
// Gather the results from all default components
var results = [];
for (var i = 0; i < defaultExtractionList.length; i++) {
var comp = defaultExtractionList[i];
var extracted = "";
if (document) {
extracted = extractComponent(document, comp);
} else {
Logger.log("Unable to parse. Using regex fallback for comp: " + comp);
extracted = regexFallback(htmlContent, comp);
}
// If content is found, label it
if (extracted && extracted.trim() !== "") {
results.push("[" + comp + "]\n" + extracted.trim());
}
}
var finalResult = results.join("\n\n").trim();
if (!finalResult) {
Logger.log("No content found for default extraction list in URL: " + url);
finalResult = "No content found (default extraction).";
}
// Truncate if too long
if (finalResult.length > max_character_returned) {
Logger.log("Result exceeded max character limit. Truncating output.");
finalResult = finalResult.substring(0, max_character_returned) + "...";
}
return finalResult;
}
// If we do have a component provided
var result = "";
if (document) {
// Extract normally if the HTML was parsed
result = extractComponent(document, component);
} else {
Logger.log("Unable to parse even after cleaning. Using regex fallback for component: " + component);
result = regexFallback(htmlContent, component);
}
// If no result found
if (!result || result.trim() === "") {
Logger.log("No content found for component: " + component + " in URL: " + url);
result = "No content found for component: " + component;
}
// Truncate to max_character_returned
if (result.length > max_character_returned) {
Logger.log("Result exceeded max character limit. Truncating output.");
result = result.substring(0, max_character_returned) + "...";
}
return result;
} catch (e) {
Logger.log("Unexpected error in GETURL function: " + e.toString());
return "Error: An unexpected issue occurred. " + e.toString();
}
}
/**
* Attempts to fetch a URL with multiple retries.
*
* @param {string} url The URL to fetch.
* @param {boolean} follow_redirect Whether to follow redirects.
* @param {number} retry_max How many times to retry fetching on failure.
* @param {string} user_agent The User-Agent header to send.
* @return {string|null} The raw HTML text if successful, or null if not.
*/
function fetchWithRetries(url, follow_redirect, retry_max, user_agent) {
var attempts = 0;
var response = null;
var fetchOptions = {
followRedirects: follow_redirect,
muteHttpExceptions: true,
headers: {
"User-Agent": user_agent
}
};
while (attempts < retry_max) {
try {
Logger.log("Attempting fetch #" + (attempts + 1) + " for URL: " + url);
response = UrlFetchApp.fetch(url, fetchOptions);
var code = response.getResponseCode();
Logger.log("HTTP response code: " + code + " for URL: " + url);
if (code >= 200 && code < 300) {
return response.getContentText();
} else {
Logger.log("Non-2xx response code. Retrying...");
attempts++;
Utilities.sleep(500); // brief delay
}
} catch (e) {
Logger.log("Error fetching URL: " + url + " on attempt " + (attempts + 1) + ": " + e.toString());
attempts++;
Utilities.sleep(500); // brief delay
}
}
Logger.log("Failed to fetch URL after " + retry_max + " attempts: " + url);
return null;
}
/**
* Parses HTML robustly using XmlService. Returns the root element or null on failure.
*
* @param {string} htmlContent Raw HTML string.
* @return {XmlService.Element|null} Parsed document root element, or null if parsing fails.
*/
function parseHtmlRobust(htmlContent) {
try {
var htmlOutput = HtmlService.createHtmlOutput(htmlContent).getContent();
var document = XmlService.parse(htmlOutput);
return document.getRootElement();
} catch (e) {
Logger.log("Error parsing HTML using XmlService: " + e.toString());
return null;
}
}
/**
* Cleans HTML content to fix invalid entities that can break XmlService parsing.
*
* @param {string} htmlContent Raw HTML string.
* @return {string} Cleaned HTML string.
*/
function cleanHtmlContent(htmlContent) {
Logger.log("Cleaning HTML content to fix invalid entities.");
// Replace standalone '&' with '&' except known valid entities
return htmlContent.replace(/&(?!(amp;|lt;|gt;|quot;|apos;|#[0-9]+;|#[xX][0-9A-Fa-f]+;))/g, '&');
}
/**
* Extracts content from a parsed document based on the requested component.
*
* @param {XmlService.Element} document Parsed HTML document root.
* @param {string} component Which element or meta to extract.
* @return {string} Extracted content or an empty string if not found.
*/
function extractComponent(document, component) {
try {
var lc = component.toLowerCase();
if (lc.startsWith("meta:")) {
var metaName = lc.split(":")[1];
return getMetaContent(document, metaName);
}
switch (lc) {
case "title":
return getTagText(document, "title");
case "h1":
case "h2":
case "h3":
case "h4":
case "h5":
case "h6":
case "p":
case "div":
case "li":
case "ul":
case "table":
return getAllTagText(document, lc);
case "img":
return getAllImageSrc(document);
default:
Logger.log("Unknown component: " + component + ". Trying as a tag.");
return getAllTagText(document, lc);
}
} catch (e) {
Logger.log("Error extracting component (" + component + "): " + e.toString());
return "";
}
}
/**
* Extracts content of a meta tag by the 'name' attribute.
*
* @param {XmlService.Element} document Parsed HTML document root.
* @param {string} metaName The meta name to look for (e.g. "description").
* @return {string} The content attribute if found, otherwise an empty string.
*/
function getMetaContent(document, metaName) {
var metas = document.getDescendants().filter(function(d) {
return d.getType() === XmlService.Element && d.getName() === 'meta';
}).map(function(elem) {
return elem.asElement();
});
for (var i = 0; i < metas.length; i++) {
var nameAttr = metas[i].getAttribute('name');
if (nameAttr && nameAttr.getValue().toLowerCase() === metaName) {
var contentAttr = metas[i].getAttribute('content');
if (contentAttr) {
return contentAttr.getValue();
}
}
}
return "";
}
/**
* Returns the text content of the first occurrence of a specific tag.
*
* @param {XmlService.Element} document Parsed HTML document root.
* @param {string} tagName e.g. "title", "p".
* @return {string} The text of the tag or empty if not found.
*/
function getTagText(document, tagName) {
var elements = document.getDescendants().filter(function(d) {
return d.getType() === XmlService.Element && d.getName().toLowerCase() === tagName;
});
if (elements.length > 0) {
return elements[0].asElement().getText();
}
return "";
}
/**
* Returns the concatenated text content of all elements with a certain tag.
*
* @param {XmlService.Element} document Parsed HTML document root.
* @param {string} tagName e.g. "p", "h1", "div".
* @return {string} Joined text from all found tags.
*/
function getAllTagText(document, tagName) {
var elements = document.getDescendants().filter(function(d) {
return d.getType() === XmlService.Element && d.getName().toLowerCase() === tagName;
});
var texts = elements.map(function(e) {
return e.asElement().getText();
});
return texts.join("\n").trim();
}
/**
* Returns all image src attributes from the parsed document.
*
* @param {XmlService.Element} document Parsed HTML document root.
* @return {string} All <img src="..."> URLs, one per line.
*/
function getAllImageSrc(document) {
var elements = document.getDescendants().filter(function(d) {
return d.getType() === XmlService.Element && d.getName().toLowerCase() === "img";
});
var srcs = [];
for (var i = 0; i < elements.length; i++) {
var srcAttr = elements[i].asElement().getAttribute("src");
if (srcAttr) {
srcs.push(srcAttr.getValue());
}
}
return srcs.join("\n");
}
/**
* Regex fallback for when XmlService parsing fails.
* Useful for simpler elements like <title>, headings, <p>, etc.
*
* @param {string} html Raw HTML string.
* @param {string} component The component name to extract.
* @return {string} Extracted text or an empty string if not found.
*/
function regexFallback(html, component) {
Logger.log("Using regex fallback for component: " + component);
var lc = component.toLowerCase();
// Fallback for <title>
if (lc === "title") {
var titleMatch = html.match(/<title[^>]*>([\s\S]*?)<\/title>/i);
return titleMatch && titleMatch[1] ? titleMatch[1].trim() : "";
}
// Fallback for meta:xxx
if (lc.startsWith("meta:")) {
var metaName = lc.split(":")[1];
var metaRegex = new RegExp('<meta[^>]*name=["\']' + metaName + '["\'][^>]*content=["\']([^"\']*)["\']', 'i');
var metaMatch = html.match(metaRegex);
return metaMatch && metaMatch[1] ? metaMatch[1].trim() : "";
}
// Headings, paragraphs, divs, lists, tables
if (["h1","h2","h3","h4","h5","h6","p","div","li","ul","table"].indexOf(lc) !== -1) {
var tagRegex = new RegExp("<" + lc + "[^>]*>([\\s\\S]*?)<\\/" + lc + ">", "gi");
var matches = [];
var m;
while ((m = tagRegex.exec(html)) !== null) {
var textContent = m[1].replace(/<[^>]+>/g, '').trim();
if (textContent) matches.push(textContent);
}
return matches.join("\n");
}
// Fallback for img
if (lc === "img") {
var imgRegex = /<img[^>]+src=["']([^"']+)["']/gi;
var imgMatches = [];
var im;
while ((im = imgRegex.exec(html)) !== null) {
imgMatches.push(im[1]);
}
return imgMatches.join("\n");
}
// If unknown component
return "";
}
</code>
</pre>
</div>
<script>
function toggleCode() {
const codeBlock = document.getElementById('codeBlock');
const toggleButton = document.querySelector('.toggle-button');
if (codeBlock.style.display === 'none') {
codeBlock.style.display = 'block';
toggleButton.textContent = 'Hide Code';
} else {
codeBlock.style.display = 'none';
toggleButton.textContent = 'Show Code';
}
}
</script><h3>4. Save and Authorize</h3><ol><li>Click the <strong>disk icon</strong> (or press <code>Ctrl + S</code>) to save the script.</li><li>Give the script a name like <code>WebScraper</code>.</li><li>When you save or run the function for the first time, it will ask for authorization.<ul><li>Click <strong>Review Permissions</strong>.</li><li>Select your Google account.</li><li>Review the permissions and click <strong>Allow</strong>.<br>This allows the script to access external URLs and update your spreadsheet.</li></ul></li></ol><hr><h2>How to Use the <code>GETURL</code> Function</h2><p>Once the script is set up, you can use the <code>GETURL</code> function directly in your Google Sheets.</p><h3><strong>Function Format:</strong></h3><pre class="!overflow-visible"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950"><div class="sticky top-9 md:top-[5.75rem]"><div class="absolute bottom-0 right-2 flex h-9 items-center"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-sm"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg>Copy code</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre hljs language-scss">=<span class="hljs-built_in">GETURL</span>(url, [component])
</code></div></div></pre><h3><strong>Parameters:</strong></h3><ol><li><p><strong><code>url</code> (required):</strong><br>The webpage URL you want to extract data from.<br>Example: <code>"https://example.com"</code></p></li><li><p><strong><code>component</code> (optional):</strong><br>The specific part of the webpage to extract. If not provided, the function will return a default set of components (like the page title and headings).<br>Example: <code>"title"</code> for the page title, <code>"h1"</code> for the main heading, or <code>"meta:description"</code> for the meta description.</p></li></ol><hr><h3><strong>Examples:</strong></h3><h4>1. Extract the Title of a Page</h4><p>In a cell, type:</p><pre class="!overflow-visible"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950"><div class="sticky top-9 md:top-[5.75rem]"><div class="absolute bottom-0 right-2 flex h-9 items-center"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-sm"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg>Copy code</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre hljs language-arduino">=<span class="hljs-built_in">GETURL</span>(<span class="hljs-string">"https://example.com"</span>, <span class="hljs-string">"title"</span>)
</code></div></div></pre><p>This will fetch the <code><title></code> tag from the webpage and display it in the cell.</p><h4>2. Extract All <code><h1></code> Headings</h4><pre class="!overflow-visible"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950"><div class="sticky top-9 md:top-[5.75rem]"><div class="absolute bottom-0 right-2 flex h-9 items-center"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-sm"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg>Copy code</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre hljs language-arduino">=<span class="hljs-built_in">GETURL</span>(<span class="hljs-string">"https://example.com"</span>, <span class="hljs-string">"h1"</span>)
</code></div></div></pre><p>This will fetch all the <code><h1></code> headings on the page.</p><h4>3. Extract Meta Description</h4><pre class="!overflow-visible"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950"><div class="sticky top-9 md:top-[5.75rem]"><div class="absolute bottom-0 right-2 flex h-9 items-center"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-sm"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg>Copy code</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre hljs language-arduino">=<span class="hljs-built_in">GETURL</span>(<span class="hljs-string">"https://example.com"</span>, <span class="hljs-string">"meta:description"</span>)
</code></div></div></pre><p>This will fetch the meta description content from the page.</p><h4>4. Use the Function Without a Component</h4><pre class="!overflow-visible"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950"><div class="sticky top-9 md:top-[5.75rem]"><div class="absolute bottom-0 right-2 flex h-9 items-center"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-sm"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg>Copy code</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre hljs language-scss">=<span class="hljs-built_in">GETURL</span>("https://example.com")
</code></div></div></pre><p>This will extract multiple elements (title, headings, and paragraphs) as defined in the script’s settings.</p><hr><h2>Available Settings in the Script</h2><p>You can adjust the behavior of the function by modifying settings directly in the Apps Script editor. These settings are found at the <strong>top of the script</strong>.</p><ol><li><p><strong><code>max_character_returned</code> (default: 10000):</strong><br>Limits the number of characters returned by the function to prevent excessively long outputs. Increase this value if you want more content in your results.</p><p>Example:</p><pre class="!overflow-visible"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950"><div class="sticky top-9 md:top-[5.75rem]"><div class="absolute bottom-0 right-2 flex h-9 items-center"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-sm"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg>Copy code</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre hljs language-javascript"><span class="hljs-keyword">var</span> max_character_returned = <span class="hljs-number">2000</span>; <span class="hljs-comment">// Allow up to 2000 characters</span>
</code></div></div></pre></li><li><p><strong><code>follow_redirect</code> (default: true):</strong><br>Determines whether the function follows redirects (e.g., when a URL redirects to another page).</p><p>Example:</p><pre class="!overflow-visible"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950"><div class="sticky top-9 md:top-[5.75rem]"><div class="absolute bottom-0 right-2 flex h-9 items-center"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-sm"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg>Copy code</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre hljs language-javascript"><span class="hljs-keyword">var</span> follow_redirect = <span class="hljs-literal">false</span>; <span class="hljs-comment">// Do not follow redirects</span>
</code></div></div></pre></li><li><p><strong><code>retry_max</code> (default: 3):</strong><br>The maximum number of times the function retries fetching a URL if it encounters an error.</p><p>Example:</p><pre class="!overflow-visible"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950"><div class="sticky top-9 md:top-[5.75rem]"><div class="absolute bottom-0 right-2 flex h-9 items-center"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-sm"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg>Copy code</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre hljs language-javascript"><span class="hljs-keyword">var</span> retry_max = <span class="hljs-number">5</span>; <span class="hljs-comment">// Retry up to 5 times</span>
</code></div></div></pre></li><li><p><strong><code>user_agent</code> (default: "Mozilla/5.0"):</strong><br>The User-Agent string sent with the request. This can be changed if you encounter issues with websites blocking certain types of requests.</p><p>Example:</p><pre class="!overflow-visible"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950"><div class="sticky top-9 md:top-[5.75rem]"><div class="absolute bottom-0 right-2 flex h-9 items-center"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-sm"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg>Copy code</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre hljs language-javascript"><span class="hljs-keyword">var</span> user_agent = <span class="hljs-string">"MyCustomAgent/1.0"</span>; <span class="hljs-comment">// Custom User-Agent</span>
</code></div></div></pre></li><li><p><strong><code>defaultExtractionList</code>:</strong><br>Defines the components to extract when no <code>component</code> parameter is provided. By default, it includes <code>title</code>, <code>h1</code>, <code>h2</code>, <code>p</code>, etc. You can add or remove components from this list.</p><p>Example:</p><pre class="!overflow-visible"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary dark:bg-gray-950"><div class="sticky top-9 md:top-[5.75rem]"><div class="absolute bottom-0 right-2 flex h-9 items-center"><div class="flex items-center rounded bg-token-sidebar-surface-primary px-2 font-sans text-xs text-token-text-secondary dark:bg-token-main-surface-secondary"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-sm"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg>Copy code</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="!whitespace-pre hljs language-javascript"><span class="hljs-keyword">var</span> defaultExtractionList = [<span class="hljs-string">"title"</span>, <span class="hljs-string">"h1"</span>, <span class="hljs-string">"h2"</span>, <span class="hljs-string">"h3"</span>, <span class="hljs-string">"p"</span>];
</code></div></div></pre></li></ol><hr><h2>Common Issues and Troubleshooting</h2><h3>1. <strong>Authorization Issues</strong></h3><p>If you encounter an error when trying to use the function, ensure that you have authorized the script.</p><ul><li>Go to <strong>Extensions > Apps Script</strong>.</li><li>In the Apps Script editor, click <strong>Run > geturl</strong> to trigger the authorization process.</li></ul><h3>2. <strong>Malformed HTML</strong></h3><p>Some websites may have poorly formatted HTML, which can cause parsing errors. In such cases, the function uses fallback methods (like regex) to extract content. However, the results might not be perfect for heavily JavaScript-driven pages.</p><h3>3. <strong>JavaScript-Rendered Pages</strong></h3><p>If the webpage relies on JavaScript to load its content dynamically, the function may only fetch the initial HTML. This can lead to incomplete results. Unfortunately, Apps Script doesn’t support executing JavaScript on fetched pages.</p><h3>4. <strong>Rate Limits</strong></h3><p>If you use the function frequently or scrape many URLs, some websites may block your IP temporarily. To avoid this:</p><ul><li>Add a delay between requests (manually or via script logic).</li><li>Use a custom <code>user_agent</code> to mimic a real browser.</li></ul><h3>5. <strong>Blank Results</strong></h3><p>If the function returns blank:</p><ul><li>Ensure the URL is correct and accessible.</li><li>Check if the requested component exists on the page.</li><li>Increase the retry attempts (<code>retry_max</code>) in the script if the page is slow to load.</li></ul><hr><h2>Summary of Features</h2><h3>What It Can Extract:</h3><ul><li><strong>Title</strong>: <code>"title"</code></li><li><strong>Headings</strong>: <code>"h1"</code>, <code>"h2"</code>, ..., <code>"h6"</code></li><li><strong>Text Content</strong>: <code>"p"</code>, <code>"div"</code>, <code>"ul"</code>, <code>"li"</code>, <code>"table"</code></li><li><strong>Images</strong>: <code>"img"</code> (all image URLs)</li><li><strong>Meta Tags</strong>: <code>"meta:description"</code>, <code>"meta:keywords"</code>, <code>"meta:title"</code></li></ul><h3>When to Use a Parameter:</h3><ul><li>If you know the specific element to extract (e.g., <code>"h1"</code>), provide it as the second parameter.</li><li>If you want multiple elements (title, headings, etc.), omit the second parameter.</li></ul><hr><h2>Why Use This Script?</h2><ul><li><strong>No Coding Needed</strong>: Once set up, use it like any other Google Sheets formula.</li><li><strong>Highly Customizable</strong>: Adjust settings to fit your needs.</li><li><strong>Reliable Multi-Step Parsing</strong>: Attempts XML parsing first, then uses fallback methods for problematic pages.</li></ul><p>With this setup, you can easily scrape and analyze data from web pages without leaving your spreadsheet!</p>