Silly Assumptions

I’ve been doing a bunch of video work lately. Some quick and dirty editing of videos for NYCSS so they could do boat briefings remotely, a bunch more for a friend who is an instructor at NAIT and needed to do demos remotely, and finally I am just starting on some for L and her COVID-mutated instructional semester.

 

This has involved over 30 or so videos to date and well over 250 gig of data.

The Need for Speed

My 2015 Macbook Pro has been working like a champ and I really only ran into issues when copying massive files off the internal drive (500 gig with barely 100 gig left as working space) back and forth to my externals and rendering a few of the files with lots of adjustments. And I really didn’t think I could do anythingto speed up any of that without a huge investment of $$.

Turns out I was wrong.

Issue 1

I was using my external 2 terabyte usb/sata drive as both a repository for my cache files and storage for completed work. Copying a 4 gig finished file took 5 or 6 minutes and if I left it too long and had to transfer 15 or 20 gig it was etter if I just left and went and had a coffee.

I don’t remember what it was that got me looking at my USB hub, but at some point I noticed it was a cheapo one I had bought years ago and was strictly USB 2.0. That is to say 60 megabytes a second. I did a quick “About this Computer” and lo and behold the Macbook Pro  physical usb ports were USB 3.0—that is to say rated for 500 megabytes a second. Um. But I still didn’t do anything about it because, well, math is hard. Then one of my 2 USB ports on the Macbook stopped working.

$37 dollars later and now things are really screaming. Silly, silly boy.

Issue 2

My trusty externals have been chuggin’ away like champs but because they are mechanical hard drives they can really only do one thing at a time — imagine they are remotely like a turntable: the read/write arm has to move back and forth across a physical “platter” constantly every time it performs an operation.

What the alternative? A SSD (solid state drives: essentially just big flash drives) which are purely electronic and have no moving parts. But SSDs are really expensive right?

Ummm. No.

At least not any more. I picked up a couple of high speed Samsung 500 gig SSD drives for ~$130/each. Scream-ing Fast. (See above screen shot.)

And so tiny!

So now I can fire stuff back and forth quickly and have an extra terabyte of space. And best of all if I dump my working cache files to one of them, it can also read/write asynchronously which helps to dramatically improve performance when I am working in Premiere and After Effects.

In summation

I made some pretty silly assumptions. The  core of which was that technology, and most especially technology pricing, would stand still — with the corollary that my 5-year-old machine couldn’t be made to work faster and harder. I am especially chagrined by the USB hub debacle. $37 bucks. 8X faster (I finally did the math). Duh.

As an Aside

All this work has really been an awesome learning experience. I have honed my After Effects skills some more, learned to deal with a new kind of workflow, and best of all got to try out Adobe’s cloud-sharing to work collaboratively with others. We must of moved 300+ gig of files back and forth over the cloud.

I should set up shop doing fancy video effects for all those remote teachers and professors 🙂

Theme Code notes

A few Macblaze.ca specific fixes

To get my Instagram posts to play nice with flex rather than floats I added

.category-instagram .post-content{
  display:flex;
  flex-wrap: wrap;
}

Fixed the Headings CSS

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

 

Ascending post order

macblazev v 0.75.9

Back when I posted about changing the order of certain categories and/or all of them. But for some reason the feature seems to be missing from MacblazeIV which reversed the order on all categories. So I needed to put it back in.

What I came up with is this. It goes in the category.php file after if ( have_posts() ) and allows me to specify which categories are sorted by ascending order and which are descending. This snippet also allows me to specify posts per page which I didn’t end up using but I left in here for posterity.

< Changes display order for certain categories >
        

    $thiscat =  get_query_var('cat'); 
        $catobject = get_category($thiscat); 
        $parentcat = $catobject->category_parent; 
        $slugname  = get_category($parentcat)->slug;

                if (in_array($slugname, array('cat-1', 'cat-2')) || is_category(array('cat-3','cat-4')) ) {
                    query_posts( $query_string."&orderby=date&order=ASC&posts_per_page=2" ); 
                }

It’s based on https://stackoverflow.com/questions/19961130/wordpress-how-to-get-parent-category-id and essentially finds the current category, finds the the id of the parent category then uses an if statement to check if that parent is one of the categories in the array. Then it checks to see if the current category is listed (without having a parent) as one that should also have an ascending order.

Parents and individual categories are listed by their slugs.

Right now the parents are: array('Trips Etc.') — so that all trips under that category read front to back — and the individual categories I want ascending are array('its-novel','recipes-of-yore').

All-in-all it works, but I need a more elegant solution.

Useful links for the future

Announcing MacblazeV

So with all my work on the Hugo websites it occurred to me that this site could use some love. It went through a major redesign last year, but the back end was still in the last century.

Redesigned Theme

So while the look hasn’t changed much (except for some fixes/refinements I have been meaning to do for ages) the theme itself (now called MacblazeV) has been redone almost from scratch. My intent was to build it completely from scratch without borrowing any code MacblazeIV was built on a blank framework from Underscores  but this time I was trying to avoid that. In the end I did use some of their  functions for customization, but it was pretty minimal. 95% of this theme was done  from  a blank file—I am actually pretty chuffed with myself about that.

WordPress has gotten more and more complicated over the years and all the bells and whistles are pretty complex to implement. Because this theme is for personal use I was able to ignore a large part of that and just add the features I knew I would use.

Design stuff

I am not 100% about all the design elements at this point. As I said this was more about rebuilding the back end and getting rid of floats etc. So I will likely still be tweaking it for a while. But I am super happy with its responsiveness and and the overall structure. It will be much easier to modify now.

I still have to work on some technical issues regarding Instagram and some excessive resource use on the server, but that’s a separate subject.

ebook Update 2020

We’ve added a few more books this year to the list.

L and I worked on some childhood favourites of hers and I added a few plays, some Wodehouse and a Barsoom book. As usual they can all be found at Standard Ebooks‘ website and the whole list is current on my portfolio site: astart.ca.

And so…

The Wodehouse shorts took a lot of work and research—a massive canon and many only available in modern collection or the original serial publications. And there are still a lot more stories to add. It was fun to delve back into Restoration drama (The Way of the World) and I added a few Shakespeare—I had actually never read The Merry Wives of Windsor. As always, I hope you give some of them a try. I also try to keep a current list of books over at astart.ca/ebooks.

Hugo again

A quick update

Having (mostly) successfully updated my professional site using Hugo I decided to take a swing at L’s. It was more of a blog format so it gave me some insight into how that kind of functionality could be used.

I also learned I was a dinosaur who still used floats and ended up updating both it and my own site to use flexbox. Learn something new every day! That in itself is worth a post or two.

Netlify

Under the category of learning and things I need to go into greater detail about later, I also switched the hosting of readingwithapencil.com from wordpress.com to Netlify. As a result I have a more flexible site without actually having to pay for anything (WordPress charged for the use of a custom domain).

The workflow works off of Github which I have been using more and more with the Standard Ebooks project so it is pretty smooth. All in all it is really worth of a post of its own but there are so many videos out there it might just serve you (the reader) better to  go watch a couple of them.

But I will eventually jot down my thoughts here…at least so I can figure out what I did when I inevitably break it—that being, originally, the whole purpose of this site.

Hugo!

While I was looking into linux, I came across a vlog that recommended using Hugo and Netlify as a way to maintain a free web presence. I’ve found a lot of these sorts of things (“free”) and even went so far as to set up a small site using the free parts of Google Cloud to get my Python project up and running.

But what struck me about Hugo was that it was a static website and therefore faster and more secure than the typical WordPress install. And it was an intriguing concept that you could mimic the  flexibility of a dynamic site using static pages. So I decided to give it a go.

I deiced to leave Netlify as an experiment for a future project and set about rebuilding the site using Hugo. As a result my old, much ignored portfolio site astart.ca is now refreshed and way more speedy even though I didn’t change the content or the host.

So what is it?

I will get into more it in a later post. But basically it’s a framework that allows you to build the website using templates and pseudo-dynamic techniques and when you are ready to go, you just “publish” the project and it exports the whole website as static pages. It supports a ton of themes like WordPress, although again I decided to build my own from scratch.

Pages are built using markdown. It’s a versatile markup language and one I keep trying to use so one of the side benefits of this is I have become much more facile using it. The gist is that now the site is built all you have to do is open a text file, add content using markdown formatting, link to accompanying pictures and then  just run a short “deploy” script to automatically rebuild the site and upload to your host. Simple.

It’s perfect for a site like the portfolio that doesn’t change much, but now I am going to try it on a more blog-oriented site to see it it will stand up to more frequent posting. I will let you know.

As for astart.ca, well it’s  up and running and  has a fresh new design. Check it out. Now all I have to do is dig up more current material to post. And that’s the hard part 😉

Books Read & Calibre Web Update

Previously (Making A “Books Read” Page) I had posted how I added info to the default Calibre Web templates to add the Series and Pub date information so I could scrape it. Well, it’s gotten a bit more complex since then. Someone added a similar mod to the Github repository which has not yet been incorporated. They didn’t add the pub dates, but did add the series to a few more pages so I thought I would restate my changes here for future reference.

/templates/shelf.html

Add both series info and pub date for the python web scraping program to access:

{% if entry.series|length > 0 %}
    <p class="series">
        {{_('Book')}} {{entry.series_index}} {{_('of')}} <a href="{{url_for('web.books_list', data='series',sort='abc', book_id=entry.series[0].id)}}">{{entry.series[0].name}}</a>
    </p>
{% endif %}


{% if entry.pubdate[:10] != '0101-01-01' %}
    <p class="publishing-date">{{entry.pubdate|formatdate}} </p>
{% endif %}

Added around line 45 (just after the {% endif  %} for the author section). 

/templates/index.html

Add only series info just for aesthetics (note the code is from the proposed mod and is slightly different):

        {% if entry.series.__len__() > 0 %}
        <p class="series">
          <a href="{{url_for('web.books_list', data='series', sort='new', book_id=entry.series[0].id )}}">
            {{entry.series[0].name}}
          </a> 
          ({{entry.series_index}})
        </p>
        {% endif %}

Added around line 111 (just after the {% endif  %} for the author section). (This might be 36… there seems to have been a change…)

/templates/discover.html

        {% if entry.series.__len__() > 0 %}
        <p class="series">
          <a href="{{url_for('web.books_list', data='series', sort='new', book_id=entry.series[0].id )}}">
            {{entry.series[0].name}}
          </a> 
          ({{entry.series_index}})
        </p>
        {% endif %}

Added around line 36 (just after the {% endif  %} for the author section).


I am trying to figure out a way to automate the mods if the main repository doesn’t decide to incorporate the changes but so far an elegant solution eludes me.

Flask Part Deux

A continuation of The Great Flask Adventure.

The structure

When last we left our heroes we had posted a groovy python script: Mark III. This was saved as yacht_app.py in a folder. The rest of the files were built and  also stored there. The structure of the folders is thus:

[searchyachtworld]
—yacht_app.py
—[output]
——boatlist.json (a file generated by the app)
—[static]
——[css]
———style.css
——[images]
———artboard.png
—[templates]
——index.html
——results.html
——template.html

Back to the app

The app/python file consists of several parts which mostly consist of mini scripts to render results to a specific template. The simplest is:

@app.route("/")
def home():
    return render_template("index.html")

This simply displays the “index.html file which is a basic form. The next is:

@app.route('/results')
def results():
    data = []
    with open("output/boatlist.json", "r") as jdata:
        data = json.load(jdata)
    return render_template("results.html", boatlist=data['boats'],predata=data['fileinfo'])

 

This defines “results.html,” basically calling for it to open using the boatlist.json file as its data.

The next one is “index.html” after the search button is clicked and it uses a form post request to gather the input data an executes the rest of the python script using that data. I am not going to get into that as it’s just a variation of the Book Page scraping.

I did add a bit at the end that reopens the output json file and uses the submitted search parameter to reorder it before moving on to the results page.

@app.route("/", methods=['POST'])
def echo():
    #get index form data
    if request.method == "POST":
        inputcurr=request.form["inputcurr"]
        minprice=request.form["minprice"]
        maxprice=request.form["maxprice"]
        minlength=request.form["minlength"]
...
    data = []
    with open("output/boatlist.json", "r") as jdata:
        data = json.load(jdata)
        data['boats'].sort(key=keyparam)
    return render_template('results.html', boatlist=data['boats'],predata=data['fileinfo'])

 Back to the HTML

Flask uses the template.html file to set all the default elements (header, navbar, styles sheets etc.)

I won’t bother with the code for the index page, but here is the results which is pretty simple. Basically extracting the header information form the “predata” section of the json and then a loop though the “boatlist” to display each boat.

<!DOCTYPE html>
<html lang="en">
<head>
    <title>Yachtworld Results</title>
</head>

<body>
{% extends "template.html" %}
{% set active_page = "results" %}
{% block content %}
<div class="page-header">
    <h2 class="orange">YachtWorld Results</h2>
    <div id="preface">
        {% for pb in predata %}
    <div>
        <p>{{pb.Text}}<br/>Updated: {{pb.Date}}
        <br/><a href="{{pb.Creator}}">created by {{pb.Creator}}</a></p>
        <p>Price range : <strong>${{pb.Low}} </strong> and <strong>${{pb.High}}</strong> (${{pb.Currency}})<br/>
        Boat length: <strong>{{pb.Short}}'</strong> – <strong>{{pb.Long}}'</strong></p>
    </div>
    {% endfor %}
    </div>
    {% for boat in boatlist %}
    <div class="col-xs-6" style="min-height:170px;">
        <div class="col-md-5 text-right ">
            <img src="{{boat.Thumb}}" alt="" width="150px">
        </div>
        <div class="col-md-7">
            <h3><a href="{{boat.URL}}">{{boat.Name}}</a></h3>
            <p><strong>${{boat.Price}} </strong> / {{boat.Size}}</br>
            {{boat.Location}}</p>
        </div>
    </div>
    {% endfor %}

{% endblock %}
</body>
</html>

Pretty simple really…lol.

In conclusion

Anyway I don’t suspect anyone will actually understand/get much out of all this and its here mostly for posterity. There are plenty of resources online to help dig into the code. 

I am still playing with it and it will continue to evolve. I did post it on github if anyone is interested in the latest version (I have already added in some bits to handle price errors). I am still searching for host to make it publicly available but  anyone can download it from Github if they want to run it on their own server.

The Great Flask Adventure

I just published a blog post over on neverforever.ca about trying to build a web app to scrape YachtWorld. I thought I would record the details here so I can remember what I have done. The complete (and updated) repository is on github if anyone is interested.

Why?

Some time in the recent past YachtWorld  decided to redo their website. And one of the outcomes of that is that you can no longer search for boats in multiple places at the same time and, I now had to perform three separate searches with no way to “save” a previous search and be able to compare.  I figured I could adapt my newfound python skills and scrape the site and deliver output to the website.

Mark I

I copied my previous efforts and produced a python script that produced a markdown file to view on a webbrowser.

Mark II

I decided to output a JSON file instead and then build a php page to read it using JQUERY and Javascript. The json format  had two dict, one for general info and one for boat listings:

{
"fileinfo": [
{
"Date": "April 03, 2020 08:46",
"Text": "Results are a Yachtworld search of sailboats in Washington, Oregon and B.C.",
"Currency": "CAD",
"Low": "30000",
"High": "120000",
"Short": "34",
"Long": "48",
"Creator": "http://neverforever.ca"
}
],
"boats": [
{
"URL": "https://www.yachtworld.com/boats/1980/cheoy-lee-clipper-42-ketch-3577567/",
"Name": "Cheoy Lee Clipper 42 Ketch",
"Price": "80,000",
"Size": "42 ft / 1980",
"Location": "Vancouver, British Columbia, Canada",
"Thumb": "https://newimages.yachtworld.com/resize/1/16/77/7191677_20190822081237806_1_LARGE.jpg?f=/1/16/77/7191677_20190822081237806_1_LARGE.jpg&w=520&h=346&t=1566486758"
}
]

Then I used javascript to retrieve the data and loop through “boats” to display the html code.


/*Retrieve Listings*/
var data;

jQuery.get("boatlist.json", function(d) {
data = d;

/*numeric (price) sort
var datab = data.boats.sort(function(a, b) {return parseFloat(a.Price.replace(/,/g, '')) - parseFloat(b.Price.replace(/,/g, ''))});
*/

/*text (length) sort*/
var datab = data.boats.sort(function(a, b){
var x = a.Size.toLowerCase();
var y = b.Size.toLowerCase();
if (x < y) {return -1;}
if (x > y) {return 1;}
return 0;
});

// loop through all boats
datab.forEach(function(bb) {
// now put each boat in a <div>
$("#boats").append(`
<div class="col-xs-6" style="min-height:170px;">
<div class="col-md-5 text-right ">
<img src="${bb.Thumb}" alt="" width="150px">
</div>
<div class="col-md-7">
<h3><a href="${bb.URL}">${bb.Name}</a></h3>
<p><strong>\$${bb.Price} </strong> \/ ${bb.Size}</br>
${bb.Location}</p>
</div>
</div>
`);
});
});

It worked pretty good but relied on me running the python script each time. After a bit of investigation I decided to turn to Flask to see if I could host it all on a website. Since the Calibre-Web site that I was scraping for my Books Read project ran on Flask I knew it could be done.

Mark III

So here is the script I finally ended up with


from flask import Flask, render_template, request, jsonify
import json
app = Flask(__name__)
@app.route("/")
def home():
return render_template("index.html")
@app.route('/results')
def results():
data = []
with open("output/boatlist.json", "r") as jdata:
data = json.load(jdata)
return render_template("results.html", boatlist=data['boats'],predata=data['fileinfo'])
@app.route("/", methods=['POST'])
def echo():
#get index form data
if request.method == "POST":
inputcurr=request.form["inputcurr"]
minprice=request.form["minprice"]
maxprice=request.form["maxprice"]
minlength=request.form["minlength"]
maxlength=request.form["maxlength"]
texta= minlength + "–" + maxlength +"ft\n" + inputcurr +": $" +minprice + "-" + maxprice
textb= minlength + "–" + maxlength +"ft<br/>" + inputcurr +": $" +minprice + "-" + maxprice
# build sort param ie data['boats'].sort(key=lambda s: s['Location'])
sortparam=request.form["inputsearch"]
if sortparam == 'Location':
keyparam = lambda s: s['Location']
elif sortparam == 'Price':
keyparam = lambda s: int(s['Price'].replace(',', ''))
elif sortparam == 'Size':
keyparam = lambda s: s['Size']

# import various libraries
import requests
from bs4 import BeautifulSoup
import re
#enable math.ceil
import math
# enable sys.exit()
import sys
import csv
import json
from datetime import datetime
import os
# set header to avoid being labeled a bot
headers = {
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
# set base url
baseurl='https://www.yachtworld.com/boats-for-sale/type-sail/region-northamerica/'
# input low number
if minprice == '':
minpricenum = '30000'
else:
minpricenum = minprice
print(minpricenum)
# input high number
if maxprice == '':
maxpricenum = '120000'
else:
maxpricenum = maxprice
print(maxpricenum)
# input currency
if inputcurr == '':
curr = 'CAD'
else:
curr = inputcurr
print(curr)
# input low length
if minlength == '':
lowlen = '34'
else:
lowlen = minlength
print(lowlen)
# input high length
if maxlength == '':
highlen = '48'
else:
highlen = maxlength
print(highlen)
# set variables
pricerange = '&price=' + minpricenum + '-' + maxpricenum
wash = 'country-united-states/state-washington/'
oreg = 'country-united-states/state-oregon/'
bc = 'country-canada/province-british-columbia/'
currlen = '?currency=' + curr + '&length=' + lowlen + '-' + highlen
# create list of url variables
urllist=[bc,wash,oreg]
#check to see if external drive is mounted and mount it
#if os.path.ismount("/Volumes/www/") == False:
# print ("False monkey")
# os.system("open smb://admin:Sally1@Mini%20Media%20Server._smb._tcp.local/www")
# set path to export as file
path_folder="output/"
# set date and time
now = datetime.now()
dt_string = now.strftime("%B %d, %Y %H:%M")
# create empty list
arrayjson = []
#loop though pages in urllist
for page in urllist:
# get url
urlpath = baseurl+page+currlen+pricerange
page = requests.get(urlpath, timeout=5)
boatpg = BeautifulSoup(page.content, "html.parser")
# find boat listings section
boatlist = boatpg.find('div', class_="search-right-col")
#find single boat listing
boatlisting = boatlist.find_all('a')
#loop though listing and append to list
for listname in boatlisting:
nameurl = listname['href']
thumb = listname.find("meta", property="image")
#add https and find content of meta and substring url to remove first two characters
thumburl="https://" + thumb["content"][2:]
name = listname.find('div', property="name")
priceraw = listname.find('div', class_="price")
#remove extra info from front and back
price = re.search("\$.*? (?= *)",priceraw.text)
cost = price.group()[1:-1]
sizeyear = listname.find('div', class_="listing-card-length-year")
location = listname.find('div', class_="listing-card-location")
#write to json format
writejson = {
"URL": nameurl,
"Name": name.text,
"Price": cost,
"Size": sizeyear.text,
"Location":location.text,
"Thumb": thumburl
}
# append to list
arrayjson.append(writejson)
#add Preface list (array)
arraypreface = []
preface = {
'Date': dt_string,
'Text': 'Results are a Yachtworld search of sailboats in Washington, Oregon and B.C.',
'Currency': curr,
'Low': minpricenum,
'High': maxpricenum,
'Short':lowlen,
'Long': highlen,
'Creator': 'http://neverforever.ca'
}
#append to list
arraypreface.append(preface)
# open json file with path
with open(path_folder+'boatlist.json', 'w') as outfile:
#dump two lists with dict names and add formatting (default=str solves date issue)
json.dump({'fileinfo': arraypreface, 'boats': arrayjson}, outfile, indent=4, default=str)
data = []
with open("output/boatlist.json", "r") as jdata:
data = json.load(jdata)
data['boats'].sort(key=keyparam)
return render_template('results.html', boatlist=data['boats'],predata=data['fileinfo'])

Continued: Flask Part Deux…