#duraspace IRC Log


IRC Log for 2017-09-13

Timestamps are in GMT/BST.

[6:51] -sinisalo.freenode.net- *** Looking up your hostname...
[6:51] -sinisalo.freenode.net- *** Checking Ident
[6:51] -sinisalo.freenode.net- *** Found your hostname
[6:52] -sinisalo.freenode.net- *** No Ident response
[6:52] * DuraLogBot (~PircBot@webster.duraspace.org) has joined #duraspace
[6:52] * Topic is 'Welcome to DuraSpace IRC. This channel is used for formal meetings and is logged - http://irclogs.duraspace.org/'
[6:52] * Set by tdonohue on Thu Sep 15 17:49:38 UTC 2016
[12:50] * tdonohue (~tdonohue@dspace/tdonohue) has joined #duraspace
[12:53] * mhwood (~mhwood@mhw.ulib.iupui.edu) has joined #duraspace
[13:38] * misilot (~misilot@p-body.lib.fit.edu) has joined #duraspace
[18:30] * misilot (~misilot@p-body.lib.fit.edu) Quit (Remote host closed the connection)
[18:30] * misilot (~misilot@p-body.lib.fit.edu) has joined #duraspace
[19:45] <DSpaceSlackBot1> <tdonohue> <here>: Reminder that our weekly DSpace DevMtg starts at the top of the hour. Agenda at https://wiki.duraspace.org/display/DSPACE/DevMtg+2017-09-13 With DSpace 6.2 out the door, we should have more time for open discussion today / questions (as they come up).
[20:00] <DSpaceSlackBot1> <tdonohue> <here>: Hi all, and welcome. It's time for our weekly DSpace DevMtg. Agenda is linked above.
[20:00] <DSpaceSlackBot1> <tdonohue> I'd like to do a quick roll call (as I've been finding these late, 20UTC mtgs are less well attended...likely cause it's nighttime in Europe)
[20:01] <DSpaceSlackBot1> <terrywbrady> here!
[20:02] <DSpaceSlackBot1> <tdonohue> While we are roll-calling, I'll quickly say.... Thanks to everyone who helped with DSpace 6.2! It was good to get that release out the door finally (and thanks to @mwood for cutting/shipping the release)
[20:03] <DSpaceSlackBot1> <tdonohue> I see that @mwood mentioned he's a bit distracted in dev. So, sounds like he'll be here shortly as well
[20:03] <DSpaceSlackBot1> <mwood> I've been undistracted.
[20:05] <DSpaceSlackBot1> <tdonohue> I will admit that our agenda is on the lighter sidetoday. Most of the High Priority Tickets are geared towards 7.0... we have "ongoing discussion topics" too. But, I noticed @terrywbrady also added a comment: https://wiki.duraspace.org/display/DSPACE/DevMtg+2017-09-13?focusedCommentId=90964712#comment-90964712
[20:05] <DSpaceSlackBot1> <tdonohue> So, I'd like to raise that topic (recap of DCAT discussions about statistics) to the beginning here, unless anyone objects or has higher priority topics?
[20:06] <DSpaceSlackBot1> <terrywbrady> sounds good to me
[20:06] <DSpaceSlackBot1> <tdonohue> Not hearing any (and we're a small group here anyhow). @terrywbrady were there parts of the DCAT discussion you wanted to highlight? https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+September+2017
[20:07] <DSpaceSlackBot1> <terrywbrady> At the DSpace user meeting, @tdonohue raised a question: should DSpace maintain a stats repository OR should DSpace rely on stats vendors to report statistics.
[20:08] <DSpaceSlackBot1> <terrywbrady> I recommended this as a conversation for DCAT. We had a great discussion for a full hour and several folks shared their approaches.
[20:09] <DSpaceSlackBot1> <terrywbrady> There was no specific consensus to answer the original question, but it did feel like a topic that generated a lot of good conversation.
[20:09] <DSpaceSlackBot1> <terrywbrady> piwik came up as an alternative to Google Analytics.
[20:09] <DSpaceSlackBot1> <tdonohue> yes, looks like a very good discussion
[20:10] <DSpaceSlackBot1> <mwood> I wish I'd been there to add a third possibility: should DSpace export cases to be munched by general statistical tools, and provide places to display the results?
[20:10] <DSpaceSlackBot1> <terrywbrady> I think that if we brought a specific implementation idea forward, we could get some good feedback from DCAT.
[20:10] <DSpaceSlackBot1> <tdonohue> And I like the note in here about > Ultimately the discussion .. is one that should happen on a larger scale within the Open Repositories community as a whole. There is no easy solution...
[20:12] <DSpaceSlackBot1> <terrywbrady> Within my own department, we have differing views: privacy of stats vs the convenience of looking at all stats in one place (Google Analytics)
[20:13] <DSpaceSlackBot1> <mwood> So which measures should be private, and which should be shared?
[20:13] <DSpaceSlackBot1> <tdonohue> Personally, I have doubts that DSpace can ever truly be "great at stats" in itself (which is why I asked this question). We just don't have the resources (and statistics are not easy, unless you only give very basic statistics, and even then, spider-filtering isn't so easy). But we can better integrate with tools/services that are "great at stats"
[20:13] <DSpaceSlackBot1> <tdonohue> And if the repository community starts to standardize on what tools/services are "great at stats" we can target those specifically.
[20:14] <DSpaceSlackBot1> <mwood> Which measures can you learn from out of a single facility, and which tell you useful things when aggregated across facilities?
[20:14] <DSpaceSlackBot1> <terrywbrady> On the call, we did not identify any services beyond Google Analytics and piwik
[20:15] <DSpaceSlackBot1> <mwood> Which measures are important to (a) end users, (b) depositors, (c) local administrators, (d) sysadmins and developers?
[20:15] <DSpaceSlackBot1> <terrywbrady> One challenge that we could encounter in deferring to external services is that it would be more difficult to provide enriched statistics (counts by author)
[20:16] <DSpaceSlackBot1> <mwood> Which measures does it make sense to outsource (GA etc) and which should be done in-house (counts by author)?
[20:16] <DSpaceSlackBot1> <tdonohue> I know the other tool that has gained interest as of late is RAMP (Repository Analytics and Metrics Portal): http://ramp.montana.edu/ But, it currently *requires* Google Analytics itself, it just displays GA stats with some nice aggregate reports.
[20:18] <DSpaceSlackBot1> <tdonohue> Yes, it could make sense to possibly only concentrate on very specific in-house statistics (author specific, community/collection/item specific). But even then, visualizations/reporting is hard even if you limit out-of-the-box stats.
[20:19] <DSpaceSlackBot1> <mwood> Not if you ship the cases to R and the reports back into DSpace.
[20:19] <DSpaceSlackBot1> <tdonohue> Not saying it's impossible, but none of this is a small problem, unfortunately, and everyone seems to want slightly unique reports
[20:19] <DSpaceSlackBot1> <terrywbrady> Or we make the REST api capable of pulling useful stats numbers
[20:19] <DSpaceSlackBot1> <terrywbrady> (useful sets of data)
[20:20] <DSpaceSlackBot1> <tdonohue> Though, maybe there's a way to export stats into a very generic format (think CSV) so that reports could be generated via Excel or similar
[20:20] <DSpaceSlackBot1> <tdonohue> Or, yes, REST API
[20:20] <DSpaceSlackBot1> <terrywbrady> I hope that having a UI built on a good REST api may make it much easier to integrate visualization tools
[20:21] <DSpaceSlackBot1> <tdonohue> @terrywbrady: yes, you may be right there. I think a good REST API makes some of this potentially easier in that all the data will be surfaced there (in order to feed the new UI), so you could also surface it *elsewhere*
[20:22] <DSpaceSlackBot1> <mwood> Spreadsheet and stat products typically know how to get data out of databases. We could provide some read-only views for that, if there were data in the database to view.
[20:22] <DSpaceSlackBot1> <terrywbrady> It doesn't solve the bot filtering issue, but it does make the rendering of stats much easier.
[20:23] <DSpaceSlackBot1> <mwood> The only way to even come close to solving the bot issue is eternal viglance and frequent updates to the bot lists.
[20:24] <DSpaceSlackBot1> <tdonohue> In any case, I'm glad these discussions took place. I definitely don't see us dropping Solr Stats anytime soon. But, eventually, there may be a decision point (no idea when exactly). We have to admit that Solr Stats needs a lot of love (in the future) to make it better at bot filtering, easier to create custom reports/stats, etc. Or, we have to start looking elsewhere for these needs.
[20:24] <DSpaceSlackBot1> <terrywbrady> At our meeting in DC, I think there was also a suggestion that interpretation of bots/bot filtering might need to be subjective for an institution.
[20:25] <DSpaceSlackBot1> <tdonohue> Yes, there are often runaway "one-off" bots that do massive downloads that *won't* be on any centralized bot list. Each institution will need a way to manage their own bot lists even if a "good list" is found/maintained centrally
[20:25] <DSpaceSlackBot1> <mwood> Let me say something that I've been hinting at: we don't have to do *everything* inside DSpace; we don't have to do *everything* in-house outside of DSpace; we don't have to do *everything* via a third-party service.
[20:26] <DSpaceSlackBot1> <tdonohue> I agree completely, @mwood. In fact, the less we do, the better we can do it. (Obviously we don't want to do too little that we are no longer useful though)
[20:26] <DSpaceSlackBot1> <mwood> What we do need to do is to integrate statistical data products that make sense, and integrate well with external processes that want our cases.
[20:28] <DSpaceSlackBot1> <tdonohue> If we had a good plugin model, this would also be a great area where we can simply "surface the data" (via REST or wherever) and let a plugin deal with how to display visualizations, reports, etc.
[20:29] <DSpaceSlackBot1> <tdonohue> So, I'm not sure if there's anything else we want to discuss here (seems like this discussion is wrapping up, and there are just 3 of us here)
[20:29] <DSpaceSlackBot1> <tdonohue> There's no real next steps, but maybe this is something to analyze / keep in mind as DSpace 7 progresses... especially how we can surface this data in useful ways (via REST) not only for the new UI, but also for other tools to use
[20:30] <DSpaceSlackBot1> <terrywbrady> This was a helpful discussion to have. It does not sound like we are advocating for the elimination of a stats repo within DSpace... but perhaps recommending simplifying how we interact with it.
[20:30] <DSpaceSlackBot1> <mwood> Are we talking about surfacing cases, or statistical products?
[20:31] <DSpaceSlackBot1> <tdonohue> @terrywbrady: I think that's accurate, though we still do have the bots problem to figure out a solution to. And, I think this will always be an ongoing discussion (as stats are always changing, so we either need to keep up, or find other solutions)
[20:32] <DSpaceSlackBot1> <mwood> Stats are not just always changing; different people want different data analyzed different ways for different uses.
[20:32] <DSpaceSlackBot1> <mwood> Sometimes they don't even know what they want and how to process it until they explore a bit.
[20:32] <DSpaceSlackBot1> <tdonohue> @mwood: not sure I understand your question. If it refers to the DSpace 7 comment I made, I just meant surfacing the statistical data (from Solr) via REST in "useful ways" (perhaps even configurable or extendable ways)
[20:34] <DSpaceSlackBot1> <mwood> Well, cases are "someone somewhere fetched something." Statistical products are "277 downloads this month" or a global heatmap of interest in a given collection in July 2016.
[20:34] <DSpaceSlackBot1> <tdonohue> It's just worth us revisiting this idea when DSpace 7 work comes around to Stats... and making sure the REST API designed is configurable/extendable enough to allow for "custom Stats" (and I'm being purposefully vague there)
[20:36] <DSpaceSlackBot1> <tdonohue> I suspect what we'll end up surfacing (to some extent) may even be a Stats "search" REST endpoint... allowing for custom searches across the search stats (within reason and based on stats gathered), and depend on the UI to use tools to visualize the search results in useful ways
[20:37] <DSpaceSlackBot1> <tdonohue> But that's just a very off the top of my head idea here.
[20:38] <DSpaceSlackBot1> <tdonohue> In any case, I'm not sure we have any next steps. But, if anyone feels good ideas coming on (or feel something said above :point_up: is a good idea), maybe we could start tracking the idea more formally in the DSpace 7 tickets for REST API
[20:38] <DSpaceSlackBot1> <mwood> I'm not so sure we want to be pulling millions of cases out of REST.
[20:39] <DSpaceSlackBot1> <terrywbrady> I presume we would rely on DSpace (SOLR) to facet those records ... not to bulk export them.
[20:40] <DSpaceSlackBot1> <mwood> So we are talking processed data products, not cases.
[20:41] <DSpaceSlackBot1> <terrywbrady> We have a powerful engine for processing records, so I think that is what we would do...
[20:41] <DSpaceSlackBot1> <mwood> Heh, a real stat. package would make Solr's stat processing look like a discarded battery.
[20:41] <DSpaceSlackBot1> <tdonohue> Yes, the UI itself (especially angular) isn't meant for processing large amounts of data... data would have to be preprocessed.
[20:43] <DSpaceSlackBot1> <tdonohue> In any case, if anyone comes up with bright ideas, please do write them down (on wiki, in a placeholder ticket, etc). This is a topic we haven't yet discussed in great detail in the DSpace 7 team, and yet we'll obviously be encountering this in the near future
[20:43] <DSpaceSlackBot1> <tdonohue> So, brainstorms are more than welcome with regards to supporting stats in REST
[20:44] <DSpaceSlackBot1> <tdonohue> As discussion on this has slowed, I suggest we move along to other topics. We have 15 minutes left here
[20:45] <DSpaceSlackBot1> <mwood> OK
[20:45] <DSpaceSlackBot1> <terrywbrady> As a starting point, I could offer some suggested REST enpoints to support the stats reporting tools that I built in PHP: https://github.com/Georgetown-University-Libraries/batch-tools/wiki/Statistics-reporting
[20:45] <DSpaceSlackBot1> <tdonohue> Are there other pressing topics anyone has to bring up for today?
[20:45] <DSpaceSlackBot1> <terrywbrady> I agree with moving on with the agenda...
[20:47] <DSpaceSlackBot1> <tdonohue> Not hearing other pressing topics, so I'll suggest looking again at our "Ongoing Discussion Topics" on the agenda. Last week we talked through a few of these (and rewrote 3.a.). But, we never got to 3.b., which is worth talking on briefly
[20:47] <DSpaceSlackBot1> <tdonohue> namely DS-3372 / DS-3587 (both around more SQL dialect support)
[20:47] <DSpaceSlackBot1> <tdonohue> https://jira.duraspace.org/browse/DS-3372
[20:47] <DSpaceSlackBot1> <tdonohue> https://jira.duraspace.org/browse/DS-3587
[20:48] <DSpaceSlackBot1> <mwood> Well, that's one reason that drove us to Hibernate. But we'd need maintainers who have access to the various supported brands.
[20:48] <DSpaceSlackBot1> <tdonohue> I know we are not a big group here, but just curious what others think about this topic
[20:49] <DSpaceSlackBot1> <tdonohue> Like @mwood, I'm not against this concept...but I also worry about maintenance. I don't hear many people desiring MySQL or SQL Server support (if they are out there, hopefully they speak up)
[20:49] <DSpaceSlackBot1> <tdonohue> And I personally hesitate to add support that no one really plans to use
[20:49] <DSpaceSlackBot1> <terrywbrady> It would be helpful to hear the compelling argument for those database variants.
[20:50] <DSpaceSlackBot1> <terrywbrady> I know that in my past jobs, there was an enterprise desire to consolidate database vendors, but I am not sure that applies to our institutions.
[20:50] <DSpaceSlackBot1> <mwood> TBH I would have asked if we could move the Oracle sites to PostgreSQL and get down to *one*.
[20:51] <DSpaceSlackBot1> <terrywbrady> It seems like DSpace is chosen as a platform first rather than a specific database vendor.
[20:52] <DSpaceSlackBot1> <terrywbrady> Is hosting easier with one of these other database variants? Is the licensing more favorable? Is performance significantly better?
[20:52] <DSpaceSlackBot1> <terrywbrady> How widespread is the Oracle usage today?
[20:53] <DSpaceSlackBot1> <tdonohue> I think the good thing about Hibernate is this is a bit easier now. But, yes, I don't see it as particularly pressing. There is definitely also the opportunity to deprecate Oracle (if we found not many sites are using it) and either replace it with something else, or not replace it. I don't have good stats though (hard to track) on DB usage for DSpace. Based on questions on lists, there don't tend to be
[20:53] <DSpaceSlackBot1> that's 5%, 10% or 20% of overall sites.
[20:54] <DSpaceSlackBot1> <tdonohue> So, maybe the best we can do now is leave these tickets open for comment?
[20:54] <DSpaceSlackBot1> <mwood> Lately I've been doing a fair amount of testing patches for Oracle, even though I don't use it here. I have a VirtualBox I can spin up if needed.
[20:55] <DSpaceSlackBot1> <mwood> That's sort of a glimpse of how many Oracle sites we have.
[20:55] <DSpaceSlackBot1> <tdonohue> I think if we found that a larger number (even just 10-20 largish sites) wanted to move to SQL Server or MySQL or [whatever], that'd make this a higher priority. Right now, I haven't heard that though
[20:55] <DSpaceSlackBot1> <terrywbrady> My impression is that postgres has a strong reputation as an open source component. At one time, I remember hearing that MySQL licensing was becoming less open source friendly.
[20:56] <DSpaceSlackBot1> <mwood> MySQL is owned by Oracle these days.
[20:56] <DSpaceSlackBot1> <tdonohue> My best guess is we are around 90+% Postgres (maybe even more in the 95+% range), but I don't have strong stats to back that up
[20:57] <DSpaceSlackBot1> <mwood> Years ago it was kind of an oddball in the RDBMS market, but it's been coming closer to the mainstream.
[20:58] <DSpaceSlackBot1> <tdonohue> So, my suggestion on this topic is to take it off our agenda as an "ongoing topic" and let it sit in the tickets for now. If the tickets get supportive comments (or we hear more about new DB support elsewhere) we can revisit them
[20:58] <DSpaceSlackBot1> <mwood> Sounds well to me.
[20:59] <DSpaceSlackBot1> <tdonohue> I'll do that then
[21:00] <DSpaceSlackBot1> <tdonohue> And with that, we are out of time for today. Because we don't have a bug-fix-release pressing, I'd encourage anyone (here or reading this later) to send me DevMtg topics. If we don't have topics, we can always review tickets, but we have a good opportunity to discuss larger topics right now (at least until the next major bug fix comes in the door)
[21:01] <DSpaceSlackBot1> <tdonohue> And, finally, a reminder that our next DSpace 7 Mtg is tomorrow at 15UTC in Google Hangouts. See also https://wiki.duraspace.org/display/DSPACE/DSpace+7+Working+Group#DSpace7WorkingGroup-NextMeeting
[21:02] <DSpaceSlackBot1> <tdonohue> So, with that, we'll close up for today! I'll talk to you all next week, or tomorrow (if you plan to join the DSpace 7 mtg)
[21:02] <DSpaceSlackBot1> <terrywbrady> Have a great day!
[21:02] <DSpaceSlackBot1> <tdonohue> Or, you know, on Slack ;)
[21:02] <DSpaceSlackBot1> <mwood> Thanks all!
[21:02] * tdonohue (~tdonohue@dspace/tdonohue) has left #duraspace
[21:04] * mhwood (~mhwood@mhw.ulib.iupui.edu) Quit (Remote host closed the connection)

These logs were automatically created by DuraLogBot on irc.freenode.net using the Java IRC LogBot.